Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Feature filteringlinkextractor restrict text #3635

Conversation

@matthieucham
Copy link
Contributor

@matthieucham matthieucham commented Feb 21, 2019

This is the proposed implementation of issue #3622 with unit tests and doc

@codecov
Copy link

@codecov codecov bot commented Feb 22, 2019

Codecov Report

Merging #3635 into master will decrease coverage by 2.9%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3635      +/-   ##
==========================================
- Coverage   84.52%   81.62%   -2.91%     
==========================================
  Files         167      167              
  Lines        9410     9413       +3     
  Branches     1397     1399       +2     
==========================================
- Hits         7954     7683     -271     
- Misses       1199     1466     +267     
- Partials      257      264       +7
Impacted Files Coverage Δ
scrapy/linkextractors/lxmlhtml.py 92.4% <ø> (ø) ⬆️
scrapy/linkextractors/__init__.py 96.66% <100%> (-3.34%) ⬇️
scrapy/linkextractors/sgml.py 0% <0%> (-96.81%) ⬇️
scrapy/linkextractors/regex.py 0% <0%> (-95.66%) ⬇️
scrapy/linkextractors/htmlparser.py 0% <0%> (-92.07%) ⬇️
scrapy/core/downloader/handlers/s3.py 62.9% <0%> (-32.26%) ⬇️
scrapy/extensions/statsmailer.py 0% <0%> (-30.44%) ⬇️
scrapy/utils/boto.py 46.66% <0%> (-26.67%) ⬇️
scrapy/_monkeypatches.py 42.85% <0%> (-14.29%) ⬇️
scrapy/link.py 86.36% <0%> (-13.64%) ⬇️
... and 14 more

Loading

@codecov
Copy link

@codecov codecov bot commented Feb 22, 2019

Codecov Report

Merging #3635 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3635      +/-   ##
==========================================
+ Coverage   84.52%   84.53%   +<.01%     
==========================================
  Files         167      167              
  Lines        9410     9413       +3     
  Branches     1397     1399       +2     
==========================================
+ Hits         7954     7957       +3     
  Misses       1199     1199              
  Partials      257      257
Impacted Files Coverage Δ
scrapy/linkextractors/lxmlhtml.py 92.4% <ø> (ø) ⬆️
scrapy/linkextractors/sgml.py 96.8% <ø> (ø) ⬆️
scrapy/linkextractors/__init__.py 100% <100%> (ø) ⬆️

Loading

Same as allow and deny args, it holds a string, a regex or an iterable of. Links whose text don't match one of the regex are filtered out.
DOC restrict_text in LxmlLinkExtractor
@matthieucham matthieucham force-pushed the feature-filteringlinkextractor-restrict-text branch from 2771d7d to e3b1525 Feb 28, 2019
@matthieucham matthieucham reopened this Mar 1, 2019
@kmike kmike changed the title Feature filteringlinkextractor restrict text [MRG+1] Feature filteringlinkextractor restrict text Mar 14, 2019
@kmike
Copy link
Member

@kmike kmike commented Mar 14, 2019

Thanks @matthieucham! I like the feature, and implementation looks good 👍

Loading

@Gallaecio Gallaecio merged commit 1f7413d into scrapy:master Mar 15, 2019
3 checks passed
Loading
@matthieucham matthieucham deleted the feature-filteringlinkextractor-restrict-text branch Mar 27, 2019
@kmike kmike added this to the v1.7 milestone Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants