Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Reppy Python version support #5226

Closed
Gallaecio opened this issue Aug 11, 2021 · 5 comments
Closed

Document Reppy Python version support #5226

Gallaecio opened this issue Aug 11, 2021 · 5 comments

Comments

@Gallaecio
Copy link
Member

Gallaecio commented Aug 11, 2021

The optional dependency on reppy for one of the built-in robots.txt parsers is preventing us from running the extra-dependencies CI job with Python 3.9+. https://github.com/seomoz/reppy has not have a commit for ~1.5 years.

So I think we should deprecate the component.

If we don’t, we should document this limitation, and schedule a deprecation for 1 year before Python 3.8 reaches end of life, i.e. in 9 months, because once we drop Python 3.8 support we will be forced to remove this component anyway, so giving a deprecation warning 1 year before is probably in the best interest of any user of the component.

@wRAR
Copy link
Member

wRAR commented Aug 11, 2021

I would be fine with not running tests for it and documenting that it's "not really supported" but I don't think we do this? Otherwise I'm fine with deprecating and then removing it.

For the context, it was first released in 1.8, together with all other "new" robots.txt parsers as a GSoC 2019 contribution, and while it was requested in the initial GSoC issue #3656, it was attempted much earlier: #949

So I think as long as we have other supported parsers it's fine to remove this one.

@wRAR
Copy link
Member

wRAR commented Aug 12, 2021

The parser comparison in our docs: https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.robotstxt

The performance comparison (linked from where): https://anubhavp28.github.io/gsoc-weekly-checkin-12/

So the main feature of Reppy is being much faster (which isn't really important for spiders that scrape 0 or 1 robots.txt file per run but may be important e.g. for broad crawls).

@Gallaecio
Copy link
Member Author

Gallaecio commented Aug 12, 2021

We’ve decided to postpone deprecation for 1 year before 3.8 end of life. If by that time the issue remains, we will deprecate so that by the time Scrapy drops 3.8 support it also drops reppy support.

Right now we need to document the Python version requirement.

@umairnsr87
Copy link
Contributor

Can I work on this issue @Gallaecio ?

@Gallaecio
Copy link
Member Author

@umairnsr87 Yes, please! We need to update the documentation about Reppy, to clearly indicate that it only work with Python 3.8 and earlier.

@Gallaecio Gallaecio changed the title Deprecate reppy support Document Reppy Python version support Aug 16, 2021
@wRAR wRAR closed this as completed Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants