New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise error when start_url found instead of start_urls. #4170
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid including the start_url
value in the response, as it may initially mislead the user to think the issue is in the value itself.
Also, the message says 'start_urls' list not found
, which I’m not sure is accurate in all cases. Maybe it should say not found or empty
? And maybe we should check if it’s empty instead of if it has non-empty values (any
), because make_requests_from_url
should already fail if there is a start_urls
list with a value that evaluates to False
.
I am having trouble with the tests since it seems we should start a crawling for getting the result from Maybe just checking when trying to set the variable could be a good solution. I added a modified version of Lines 102 to 107 in 494f38a
|
6823ae7
to
bd64173
Compare
Codecov Report
@@ Coverage Diff @@
## master #4170 +/- ##
==========================================
- Coverage 83.75% 83.74% -0.01%
==========================================
Files 165 165
Lines 9719 9721 +2
Branches 1445 1446 +1
==========================================
+ Hits 8140 8141 +1
Misses 1332 1332
- Partials 247 248 +1
|
2c6f076
to
a86e69d
Compare
Hey! There are real-world spiders which use start_url option, as a way to customize things (found them using our internal code search). Is there a way to distinguish between intended and unintended usage? It is not good to show a warning for a perfectly valid use case. I wonder if a better fix would be to have a warning in the default start_requests implementation: if start_urls is None, but start_url is present, then show a warning. There won't be a warning if start_requests function is using It could make sense to check real spiders which use start_request attribute, to see how any change we make affect them. |
You are right @kmike, that was one of my other options. Setting this warning in |
a86e69d
to
96f7e97
Compare
…'start_urls' and 'start_url' found. Added test.
96f7e97
to
1718e45
Compare
Thanks @mabelvj! |
Fixes #4133
Raise AttributeError error when empty 'start_urls' and 'start_url' found. Added test.