New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add note that "allowed_domains" should be a list of domains, not URLs #2250
Comments
What if instead of simply documenting, Scrapy detect this case and issues a warning? Even better, it could extract the domain from the URL and use that, while issuing a warning like:
|
+1 to issue a warning. I'm less sure about inferring domain, for example for |
I think the user expectation would be I think it would be reasonable to use whatever urlparse parses as netloc:
|
Are you ok with having the new behavior as a seperate issue? |
Sure, inferring a domain from an URL in allowed_domains looks like a separate feature indeed. But I think warning users about URLs detected in allowed_domains and warning users about it belongs to the same issue as documenting (this one). Makes sense? |
It does |
May I have this issue? |
…L to the domain as per urlparse netloc (scrapy#2250)
Hi, |
Sent in a PR for this |
Hey, seeing as the last pull request hasn't been accepted for quite some time, mind if I submit my own? This would be my first one... |
[MRG+1] Issues a warning when user puts a URL into allowed_domains (#2250)
Fixed by #3011 |
(just logging the issue before I forget)
It may seem obvious by the name of the attribute that
allowed_domains
is about domain names, but it's not uncommon for scrapy users to make the mistake of doingallowed_domains = ['http://www.example.com']
I believe it is worth adding a note in http://doc.scrapy.org/en/latest/topics/spiders.html?#scrapy.spiders.Spider.allowed_domains
The text was updated successfully, but these errors were encountered: