Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise error when start_url found instead of start_urls. #4170

Merged
merged 2 commits into from Dec 5, 2019

Conversation

mabelvj
Copy link
Contributor

@mabelvj mabelvj commented Nov 18, 2019

Fixes #4133

Raise AttributeError error when empty 'start_urls' and 'start_url' found. Added test.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid including the start_url value in the response, as it may initially mislead the user to think the issue is in the value itself.

Also, the message says 'start_urls' list not found, which I’m not sure is accurate in all cases. Maybe it should say not found or empty? And maybe we should check if it’s empty instead of if it has non-empty values (any), because make_requests_from_url should already fail if there is a start_urls list with a value that evaluates to False.

scrapy/spiders/__init__.py Outdated Show resolved Hide resolved
scrapy/spiders/__init__.py Outdated Show resolved Hide resolved
@mabelvj
Copy link
Contributor Author

mabelvj commented Nov 20, 2019

I am having trouble with the tests since it seems we should start a crawling for getting the result from start_requests -> 96f7e97.

Maybe just checking when trying to set the variable could be a good solution. I added a modified version of __set_attr__ where we can for this condition and warn the user. This way also, if start_requests is redefined, this warning will keep. Similar warnings are used in other parts of the code

scrapy/scrapy/item.py

Lines 102 to 107 in 494f38a

def __setattr__(self, name, value):
if not name.startswith('_'):
raise AttributeError("Use item[%r] = %r to set field value" %
(name, value))
super(DictItem, self).__setattr__(name, value)

@codecov
Copy link

codecov bot commented Nov 20, 2019

Codecov Report

Merging #4170 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4170      +/-   ##
==========================================
- Coverage   83.75%   83.74%   -0.01%     
==========================================
  Files         165      165              
  Lines        9719     9721       +2     
  Branches     1445     1446       +1     
==========================================
+ Hits         8140     8141       +1     
  Misses       1332     1332              
- Partials      247      248       +1
Impacted Files Coverage Δ
scrapy/spiders/__init__.py 98.46% <100%> (+0.04%) ⬆️
scrapy/utils/trackref.py 83.78% <0%> (-2.71%) ⬇️

@mabelvj mabelvj force-pushed the 4133-handle-start_url branch 2 times, most recently from 2c6f076 to a86e69d Compare November 20, 2019 11:54
@kmike
Copy link
Member

kmike commented Nov 20, 2019

Hey! There are real-world spiders which use start_url option, as a way to customize things (found them using our internal code search). Is there a way to distinguish between intended and unintended usage? It is not good to show a warning for a perfectly valid use case.

I wonder if a better fix would be to have a warning in the default start_requests implementation: if start_urls is None, but start_url is present, then show a warning. There won't be a warning if start_requests function is using start_url attribute in this case, as it'd be overridden.

It could make sense to check real spiders which use start_request attribute, to see how any change we make affect them.

@mabelvj
Copy link
Contributor Author

mabelvj commented Nov 21, 2019

You are right @kmike, that was one of my other options. Setting this warning in start_requests will allow overriding it when using start_url.

…'start_urls' and 'start_url' found. Added test.
@kmike
Copy link
Member

kmike commented Dec 5, 2019

Thanks @mabelvj!

@kmike kmike merged commit 250da28 into scrapy:master Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle it gracefully when start_url is used instead of start_urls
4 participants