-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to stop spider on errors #47
Comments
You can define a suite that will be executed periodically using The number of validation errors is included in Does it solves your problem? If not we can try figure out a better way to handle these two scenarios. |
Sorry, this seems like it overlaps a bit with #50 and I probably forgot about this ticket when making the other. That would probably work, but I think it would be nice to use existing channels
|
Closing as per #50 (comment) |
Sorry @Gallaecio 😓 I rushed and accidentally closed #50 - alright to reopen this? |
I think this is a very good candidate for the next release. I would like to suggest a new settings:
We can based our |
Maybe expressions monitors can be used to achieve this, and we can use the pattern described in periodic monitors documentation. I am not yet quite sure that a built-in monitor is the best solution, as we may want to close the spider based on different assertions. |
@rennerocha @raphapassini What about |
Maybe Something like this roughly ?
|
If the same can be achieved with periodic monitors, as @rennerocha suggests, I agree with him that it may be better to use them instead. They give great flexibility, allowing complex expression evaluations and also allowing to react in different ways (not just to close the spider). We could extend the periodic monitors documentation page, instead. Maybe cover the specific case of stopping a spider on errors. |
I'm not against monitors per se but it's a significant extra burden on getting set up. I think the discussion from #50 is getting dragged in here since the proposed solutions are similar, so continuing down that path: Suppose you start with a spider and you have a rough schema - to get from that to errors in the log you need to
To figure this out, if you're not well versed in Spidermon, requires looking at multiple documentation pages, copying bits and pieces of several examples. Getting any of the settings wrong (or missing a setting) can lead to Spidermon doing nothing. There are a lot of projects that don't need all of Spidermon's features right away, just validation, or just one small check or whatever, and all this setup is a fairly high hurdle. Compare to writing a new pipeline to do validation with jsonschema directly + log the errors: one new dependency (jsonschema), one setting, 15 lines of code in an existing file (pipelines.py), and it's easier to verify operation because there's no layers of indirection between Scrapy and your own code in the pipeline. Making a new feature for every use case obviously isn't a good solution, but putting complete examples in the documentation for every use case doesn't seem great either and would lead to lots difficult to update duplicate code there. Perhaps there are some basic use cases that could be made into 1-setting features, although that would make gradually increasing usage difficult. Or integrating more with Scrapy's error handling mechanisms? |
@Gallaecio @rennerocha @raphapassini |
Well, I guess this may be a common-enough scenario to have its own option. Worth a pull request, at least; we can discuss further over an implementation proposal. |
#216 I have added a new setting in spidermon as proposed by @raphapassini above
Note: Documentation is not added so far I have implemented the solution |
As @andrewbaxter said, it is not a good solution to create new features for every use case and we need to consider that a Spidermon user is a developer that is able to create her/his own monitors with custom (and more complex) checks that matches their scenarios. So we need to keep new settings and automatic validation as simple as possible. I didn't like the I believe that a more specific setting as So we add in our settings: Also, instead of adding this logic inside the pipeline, as @rosheen33 did, maybe it is a better idea to follow the same patterns of Closespider extension L39 that verify the number of errors only when a What do you all think? :-) |
Just throwing my 2 cents here. |
|
I think it would be nice to have validation errors treated like other spider errors - increment the error counter, optionally stop the spider with an error condition if over an error count.
I'd primarily like to stop the spider so I made that the title, but having both would be nice.
The text was updated successfully, but these errors were encountered: