Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide complete API documentation coverage of scrapy.linkextractors #4045

Merged
merged 9 commits into from Dec 19, 2019

Conversation

Gallaecio
Copy link
Member

I made the assumption that FilteringLinkExtractor was originally created for code sharing between the implementations based on lxml and sgmllib.

Here I’m marking FilteringLinkExtractor as deprecated, assuming that upon removal its code can go into LxmlLinkExtractor.

I’m also updating the documentation about link extractors to talk about LxmlLinkExtractor only, without even mentioning the possibility of alternative implementations.

@codecov
Copy link

codecov bot commented Sep 30, 2019

Codecov Report

Merging #4045 into master will decrease coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4045      +/-   ##
==========================================
- Coverage   83.85%   83.77%   -0.09%     
==========================================
  Files         165      165              
  Lines        9615     9634      +19     
  Branches     1440     1442       +2     
==========================================
+ Hits         8063     8071       +8     
- Misses       1304     1311       +7     
- Partials      248      252       +4
Impacted Files Coverage Δ
scrapy/linkextractors/lxmlhtml.py 92.3% <ø> (ø) ⬆️
scrapy/linkextractors/__init__.py 97.01% <100%> (+0.34%) ⬆️
scrapy/robotstxt.py 75.3% <0%> (-22.23%) ⬇️
scrapy/core/downloader/__init__.py 89.31% <0%> (-1.53%) ⬇️
scrapy/core/downloader/handlers/http11.py 92.99% <0%> (+0.08%) ⬆️
scrapy/utils/trackref.py 85.71% <0%> (+2.85%) ⬆️
scrapy/spiders/crawl.py 92.47% <0%> (+10.33%) ⬆️

@elacuesta
Copy link
Member

Looks good to me, except for the following:

I’m also updating the documentation about link extractors to talk about LxmlLinkExtractor only, without even mentioning the possibility of alternative implementations.

Personally I don't remember seeing a custom Link extractor implementation, but I'm not sure about removing that section. It doesn't hurt, it doesn't need any particular code to be handled and it's technically still supported.
Just a comment, as I said I haven't seen or written any custom extractor so my opinion on the subject is not that strong.

@Gallaecio
Copy link
Member Author

If we leave it, I would like to at least mention a few use cases, and show a code example. But I cannot think of any use case myself.

@Gallaecio
Copy link
Member Author

@kmike What do you think, should custom link extractors remain in the documentation or go?

@kmike
Copy link
Member

kmike commented Oct 18, 2019

I like the simplification of the docs, both the wording and content. +1 to deprecate / un-expose FilteringLinkExtractor. As I recall, it was needed when we had several link extractors, but now this code is all way to complex for what it is doing.

@Gallaecio Gallaecio changed the title Provide complete API documentation coverage of scrapy.linkextractors [WIP] Provide complete API documentation coverage of scrapy.linkextractors Oct 21, 2019
@Gallaecio Gallaecio changed the title [WIP] Provide complete API documentation coverage of scrapy.linkextractors Provide complete API documentation coverage of scrapy.linkextractors Oct 21, 2019
tests/test_linkextractors.py Outdated Show resolved Hide resolved
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @Gallaecio! I left a small comment about a class name used in tests, but otherwise it looks ready to merge 👍

@kmike kmike merged commit fb3fb17 into scrapy:master Dec 19, 2019
@kmike
Copy link
Member

kmike commented Dec 19, 2019

Thanks @Gallaecio!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants