Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tab and newlines in url.safe_url_string #133

Merged
merged 4 commits into from
Aug 8, 2019

Conversation

elacuesta
Copy link
Member

See scrapy/scrapy#3855

Forgive me if this is an overly naïve approach, I must admit I'm not overly familiar with the formalities of URL definition and handling.

@codecov
Copy link

codecov bot commented Jul 11, 2019

Codecov Report

Merging #133 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
+ Coverage    95.3%   95.32%   +0.01%     
==========================================
  Files           7        7              
  Lines         469      471       +2     
  Branches       93       93              
==========================================
+ Hits          447      449       +2     
  Misses         15       15              
  Partials        7        7
Impacted Files Coverage Δ
w3lib/url.py 97.93% <100%> (+0.02%) ⬆️

w3lib/url.py Outdated Show resolved Hide resolved
w3lib/url.py Outdated Show resolved Hide resolved
@elacuesta
Copy link
Member Author

@kmike Applied both suggestions, thank you!

@elacuesta elacuesta changed the title Remove tab and newlines from URLs Remove tab and newlines in url.safe_url_string Jul 11, 2019
tests/test_url.py Outdated Show resolved Hide resolved
@@ -34,9 +34,12 @@ def _quote_byte(error):

_safe_chars = RFC3986_RESERVED + RFC3986_UNRESERVED + EXTRA_SAFE_CHARS + b'%'

_ascii_tab_newline_re = re.compile(r'[\t\n\r]') # see https://infra.spec.whatwg.org/#ascii-tab-or-newline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are linking to the ‘living standard’, which changes constantly, I think we should not link directly to it, but through the Wayback Machine. Both here and on the docstring below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wayback Machine is being blocked and unblocked in Russia, so I'm fine with whatwg link :)

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @elacuesta!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants