Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Do not fail on canonicalizing URLs with wrong netlocs #2038

merged 1 commit into from Jul 8, 2016


Copy link

@redapple redapple commented Jun 8, 2016

Fixes #2010

Not quite sure what to do with non-ASCII chars in netlocs that do not convert well with IDNA rules.
Left the netloc as is for now.

Copy link

@codecov-io codecov-io commented Jun 8, 2016

Current coverage is 83.34%

Merging #2038 into master will increase coverage by <.01%

Powered by Codecov. Last updated by b7925e4...1aec520

@redapple redapple changed the title Do not fail on canonicalizing URLs with wrong netlocs [MRG] Do not fail on canonicalizing URLs with wrong netlocs Jun 14, 2016
@redapple redapple added this to the v1.1.1 milestone Jun 14, 2016
@kmike kmike merged commit b7553d9 into scrapy:master Jul 8, 2016
2 checks passed
2 checks passed
codecov/patch 100% of diff hit (target 100%)
continuous-integration/travis-ci/pr The Travis CI build passed
redapple added a commit that referenced this pull request Jul 8, 2016
[backport][1.1] Do not fail on canonicalizing URLs with wrong netlocs (PR #2038)
@redapple redapple deleted the redapple:canonicalize-idna-failures branch Jul 8, 2016
Copy link

@kmike kmike commented Jul 13, 2016

This PR fixes canonicalize_url function, but it doesn't fix an issue in practice because of scrapy/w3lib#62. It now fails later, when an extracted link is used to create scrapy.Request, as scrapy.Request constructor calls safe_download_url.

Copy link
Contributor Author

@redapple redapple commented Jul 14, 2016

Good point. Have you spotted other places where this could break still?
PR up: scrapy/w3lib#63

Copy link

@kmike kmike commented Jul 14, 2016

It seems your PR should fix the issue, there are no other uses of idna encoding in w3lib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.