Provide complete API documentation coverage of scrapy.linkextractors #4045

Gallaecio · 2019-09-30T16:27:54Z

I made the assumption that FilteringLinkExtractor was originally created for code sharing between the implementations based on lxml and sgmllib.

Here I’m marking FilteringLinkExtractor as deprecated, assuming that upon removal its code can go into LxmlLinkExtractor.

I’m also updating the documentation about link extractors to talk about LxmlLinkExtractor only, without even mentioning the possibility of alternative implementations.

codecov · 2019-09-30T16:46:16Z

Codecov Report

Merging #4045 into master will decrease coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #4045      +/-   ##
==========================================
- Coverage   83.85%   83.77%   -0.09%     
==========================================
  Files         165      165              
  Lines        9615     9634      +19     
  Branches     1440     1442       +2     
==========================================
+ Hits         8063     8071       +8     
- Misses       1304     1311       +7     
- Partials      248      252       +4

Impacted Files	Coverage Δ
scrapy/linkextractors/lxmlhtml.py	`92.3% <ø> (ø)`	⬆️
scrapy/linkextractors/__init__.py	`97.01% <100%> (+0.34%)`	⬆️
scrapy/robotstxt.py	`75.3% <0%> (-22.23%)`	⬇️
scrapy/core/downloader/__init__.py	`89.31% <0%> (-1.53%)`	⬇️
scrapy/core/downloader/handlers/http11.py	`92.99% <0%> (+0.08%)`	⬆️
scrapy/utils/trackref.py	`85.71% <0%> (+2.85%)`	⬆️
scrapy/spiders/crawl.py	`92.47% <0%> (+10.33%)`	⬆️

elacuesta · 2019-10-04T15:32:20Z

Looks good to me, except for the following:

I’m also updating the documentation about link extractors to talk about LxmlLinkExtractor only, without even mentioning the possibility of alternative implementations.

Personally I don't remember seeing a custom Link extractor implementation, but I'm not sure about removing that section. It doesn't hurt, it doesn't need any particular code to be handled and it's technically still supported.
Just a comment, as I said I haven't seen or written any custom extractor so my opinion on the subject is not that strong.

Gallaecio · 2019-10-14T13:25:56Z

If we leave it, I would like to at least mention a few use cases, and show a code example. But I cannot think of any use case myself.

Gallaecio · 2019-10-18T12:08:53Z

@kmike What do you think, should custom link extractors remain in the documentation or go?

docs/topics/link-extractors.rst

kmike · 2019-10-18T21:57:52Z

I like the simplification of the docs, both the wording and content. +1 to deprecate / un-expose FilteringLinkExtractor. As I recall, it was needed when we had several link extractors, but now this code is all way to complex for what it is doing.

scrapy/linkextractors/lxmlhtml.py

tests/test_linkextractors.py

kmike

Looks good, thanks @Gallaecio! I left a small comment about a class name used in tests, but otherwise it looks ready to merge 👍

…umentation-coverage

This reverts commit ee9881d.

kmike · 2019-12-19T21:01:52Z

Thanks @Gallaecio!

Provide complete API documentation coverage of scrapy.linkextractors

7f4f98f

Gallaecio added enhancement docs labels Sep 30, 2019

kmike reviewed Oct 18, 2019

View reviewed changes

docs/topics/link-extractors.rst Outdated Show resolved Hide resolved

Gallaecio changed the title ~~Provide complete API documentation coverage of scrapy.linkextractors~~ [WIP] Provide complete API documentation coverage of scrapy.linkextractors Oct 21, 2019

Gallaecio commented Oct 21, 2019

View reviewed changes

scrapy/linkextractors/lxmlhtml.py Outdated Show resolved Hide resolved

constructor → __init__ method

0fbd1ff

Gallaecio changed the title ~~[WIP] Provide complete API documentation coverage of scrapy.linkextractors~~ Provide complete API documentation coverage of scrapy.linkextractors Oct 21, 2019

Gallaecio added 3 commits December 5, 2019 14:43

Merge branch 'master' into documentation-coverage

1fc2b14

Fix Flake8-reported issues

a4ef975

Merge branch 'master' into documentation-coverage

607815d

kmike reviewed Dec 17, 2019

View reviewed changes

tests/test_linkextractors.py Outdated Show resolved Hide resolved

kmike approved these changes Dec 17, 2019

View reviewed changes

Gallaecio added 4 commits December 18, 2019 12:08

Improve FilteringLinkExtractor.__new__

ee9881d

Use a better name for the LxmlLinkExtractor subclassing test

174769a

Merge remote-tracking branch 'origin/documentation-coverage' into doc…

1f689b5

…umentation-coverage

Revert "Improve FilteringLinkExtractor.__new__"

e22c0c2

This reverts commit ee9881d.

kmike merged commit fb3fb17 into scrapy:master Dec 19, 2019

Laerte mentioned this pull request Nov 21, 2022

Remove FilteringLinkExtractor #5720

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide complete API documentation coverage of scrapy.linkextractors #4045

Provide complete API documentation coverage of scrapy.linkextractors #4045

Gallaecio commented Sep 30, 2019

codecov bot commented Sep 30, 2019 •

edited

elacuesta commented Oct 4, 2019

Gallaecio commented Oct 14, 2019

Gallaecio commented Oct 18, 2019

kmike commented Oct 18, 2019

kmike left a comment

kmike commented Dec 19, 2019

Provide complete API documentation coverage of scrapy.linkextractors #4045

Provide complete API documentation coverage of scrapy.linkextractors #4045

Conversation

Gallaecio commented Sep 30, 2019

codecov bot commented Sep 30, 2019 • edited

Codecov Report

elacuesta commented Oct 4, 2019

Gallaecio commented Oct 14, 2019

Gallaecio commented Oct 18, 2019

kmike commented Oct 18, 2019

kmike left a comment

Choose a reason for hiding this comment

kmike commented Dec 19, 2019

codecov bot commented Sep 30, 2019 •

edited