Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Reppy instead of robotsparser #2669

Closed
wants to merge 14 commits into from
Closed

Conversation

Parth-Vader
Copy link
Member

Made the changes using #949 for #754.

# with unicode input, non-ASCII encoded bytes decoding fails in Python2
policy = HeaderWithDefaultPolicy(default=1800, minimum=600)

rp=Robots.fetch(response.url, ttl_policy=policy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this performs an HTTP request to get the robots.txt file. Can you check?
We do not want to re-fetch the file, nor use a synchronous networking call (reppy uses python-requests as far as I understand)

@@ -8,3 +8,4 @@ six>=1.5.2
PyDispatcher>=2.0.5
service_identity
parsel>=1.1
reppy>=0.3.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 0.3.0 the first Python 3 compatible release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I ran the tests using py35.

@redapple
Copy link
Contributor

Thank you @Parth-Vader for stepping in!

@redapple
Copy link
Contributor

It looks like either Cython or a C++ compiler is needed to build reppy now.
Can you try updating Travis config to add Cython?
I'm not sure we want to add another non-python dependency to Scrapy by default. Do they have a non-C++/less-speedy/Python-only fallback?

@kmike
Copy link
Member

kmike commented Mar 20, 2017

@redapple relevant discussion: seomoz/reppy#33

@Parth-Vader
Copy link
Member Author

Parth-Vader commented Mar 20, 2017

@kmike
Copy link
Member

kmike commented Mar 20, 2017

I think depending on old and unsupported version of a library is not OK, and depending on a library which is hard to install is also not OK.

@Parth-Vader
Copy link
Member Author

Agreed.
So, is Reppy the right option for the purpose?

@redapple
Copy link
Contributor

It looks like reppy is not (anymore) the right option for Scrapy's needs.

@Gallaecio
Copy link
Member

#3796 made it possible to configure which robots.txt parser to use, including a built-in adapter fopr Reppy.

@Gallaecio Gallaecio closed this Aug 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants