Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use safe_url_string in link extraction #4321

Merged
merged 3 commits into from Feb 19, 2020

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Feb 7, 2020

Fixes #1403, fixes #998 (#998 (comment)), closes #1949

There are 2 test output changes:

  • 1 test change is to adjust expectations to those reported at #1949 (comment) in line with the issue being fixed
  • Another test change consists of link extractors encoding whitespace, something they stopped doing in 1.4.0 and would be doing again now. I don’t think it is a major issue, and in fact I find it weird that everything in the path was quoted but whitespace, so it may even be a change for the better, consistency-wise.

scrapy/linkextractors/lxmlhtml.py Outdated Show resolved Hide resolved
canonicalize_url changes links in undesirable ways.
@Gallaecio Gallaecio changed the title Use canonicalize_url in link extraction Use safe_url_string in link extraction Feb 12, 2020
@codecov
Copy link

@codecov codecov bot commented Feb 12, 2020

Codecov Report

Merging #4321 into master will decrease coverage by 0.17%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4321      +/-   ##
==========================================
- Coverage   83.95%   83.77%   -0.18%     
==========================================
  Files         165      165              
  Lines        9893     9907      +14     
  Branches     1469     1472       +3     
==========================================
- Hits         8306     8300       -6     
- Misses       1334     1353      +19     
- Partials      253      254       +1     
Impacted Files Coverage Δ
scrapy/robotstxt.py 75.30% <0.00%> (-22.23%) ⬇️
scrapy/utils/defer.py 95.65% <0.00%> (-1.85%) ⬇️
scrapy/http/response/text.py 100.00% <0.00%> (ø) ⬆️
scrapy/http/response/__init__.py 94.11% <0.00%> (ø) ⬆️
scrapy/utils/signal.py 93.47% <0.00%> (+0.14%) ⬆️
scrapy/utils/project.py 75.43% <0.00%> (+0.43%) ⬆️

@kmike kmike requested a review from wRAR Feb 13, 2020
wRAR
wRAR approved these changes Feb 19, 2020
@wRAR wRAR merged commit 528b894 into scrapy:master Feb 19, 2020
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants