-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Reppy instead of robotsparser #2669
Conversation
Conflicts: scrapy/contrib/downloadermiddleware/robotstxt.py setup.py Resolved. Uses Reppy for robotstxt middleware.
# with unicode input, non-ASCII encoded bytes decoding fails in Python2 | ||
policy = HeaderWithDefaultPolicy(default=1800, minimum=600) | ||
|
||
rp=Robots.fetch(response.url, ttl_policy=policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this performs an HTTP request to get the robots.txt file. Can you check?
We do not want to re-fetch the file, nor use a synchronous networking call (reppy uses python-requests as far as I understand)
@@ -8,3 +8,4 @@ six>=1.5.2 | |||
PyDispatcher>=2.0.5 | |||
service_identity | |||
parsel>=1.1 | |||
reppy>=0.3.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 0.3.0 the first Python 3 compatible release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I ran the tests using py35
.
Thank you @Parth-Vader for stepping in! |
It looks like either Cython or a C++ compiler is needed to build reppy now. |
@redapple relevant discussion: seomoz/reppy#33 |
Version 0.3.3 seems to be Cython free : |
I think depending on old and unsupported version of a library is not OK, and depending on a library which is hard to install is also not OK. |
Agreed. |
It looks like reppy is not (anymore) the right option for Scrapy's needs. |
#3796 made it possible to configure which robots.txt parser to use, including a built-in adapter fopr Reppy. |
Made the changes using #949 for #754.