Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Post#url and Page#url escape #1568

Merged
merged 3 commits into from Apr 2, 2014

Conversation

Projects
None yet
5 participants
@nitoyon
Copy link
Contributor

nitoyon commented Sep 22, 2013

Post#url escape

Post#url was escaped using CGI.escape.
When file name contains a space character, its url points to
non-existing URL.

For example, when we have a post named 2013-01-02-foo bar.md,
we expect its url to be /2013/01/02/foo%20bar.html,
but it was actually /2013/01/02/foo+bar.html.

We now use URI.escape.

Page#url escape

Post#url wasn't escaped at all.

For example, when we have a page named a#b.html,
we expect its url to be a%23b.html,
but it was actually a#b.html.

We now use URI.escape.

nitoyon added a commit to nitoyon/tech.nitoyon.com that referenced this pull request Sep 22, 2013

tags plugin: URI.escape fix
Delete quick fix for jekyll/jekyll#1211 and add quick fix for
jekyll/jekyll#1568.
@parkr

This comment has been minimized.

Copy link
Member

parkr commented Sep 29, 2013

Curious that we never handled this before. What sort of backwards-incompatibilities will this present?

/cc @benbalter @mattr-

@benbalter

This comment has been minimized.

Copy link
Contributor

benbalter commented Sep 30, 2013

It looks like this may fix a regression?

I'd think on most servers this would be handled transparently, so no backwards compatibility, no? (other than say url matching or something in liquid)

@mattr-

This comment has been minimized.

Copy link
Member

mattr- commented Oct 1, 2013

I believe this is safe and doesn't present any problems with backwards incompatibilities as we seem to be doing the unescaping at the proper points in the code.

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Oct 4, 2013

I'm 👍 on this, then.

@nitoyon

This comment has been minimized.

Copy link
Contributor Author

nitoyon commented Oct 21, 2013

Can you merge this pull request?

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Oct 21, 2013

@benbalter Curious that the test discussed in #7 is still passing.

@nitoyon I see in the tests you offer a change, rather than addition. I'd love to make sure all the other tests (simpler) remain the same and the added conditions are added rather as new tests.

@nitoyon

This comment has been minimized.

Copy link
Contributor Author

nitoyon commented Oct 23, 2013

@parkr OK. I'll fix it later. I won't modify existing tests.

@nitoyon

This comment has been minimized.

Copy link
Contributor Author

nitoyon commented Nov 5, 2013

@parkr I fixed tests and rebased.😄

@parkr parkr closed this Mar 17, 2014

@parkr parkr reopened this Mar 17, 2014

@parkr

This comment has been minimized.

Copy link
Member

parkr commented Mar 17, 2014

I would really rather see any URL-related things happen in the URL class instead of in Page or Post. Could you please update your PR accordingly? Thanks!

@nitoyon

This comment has been minimized.

Copy link
Contributor Author

nitoyon commented Mar 19, 2014

@parkr I moved URI.escape in Page or Post class to Jekyll::URL.

@@ -135,7 +135,7 @@ def relative_path
#
# Returns the destination file path String.
def destination(dest)
path = Jekyll.sanitized_path(dest, url)
path = Jekyll.sanitized_path(dest, URI.unescape(url))

This comment has been minimized.

@parkr

parkr Mar 20, 2014

Member

How will this break current sites?

This comment has been minimized.

@nitoyon

nitoyon Mar 21, 2014

Author Contributor

Won't break anything. URLs are now escaped properly and destination file path doesn't change.

@@ -260,7 +260,7 @@ def render(layouts, site_payload)
# Returns destination file path String.
def destination(dest)
# The url needs to be unescaped in order to preserve the correct filename
path = Jekyll.sanitized_path(dest, CGI.unescape(url))
path = Jekyll.sanitized_path(dest, URI.unescape(url))

This comment has been minimized.

@parkr

parkr Mar 20, 2014

Member

Also here – how does this modify current behaviour such that current links on Jekyll sites would break?

This comment has been minimized.

@nitoyon

nitoyon Mar 21, 2014

Author Contributor

I found a bug in my patch. We should escape ?, but URI.escape doesn't escape it. I'll fix later.

nitoyon added some commits Nov 5, 2013

Fix Post#url escape
Post#url was escaped using CGI.escape.
When file name contains a space character, its url points to
non-existing URL.

For example, when we have a post named '2014-01-02-foo bar.md',
we expect its url to be '/2014/01/02/foo%20bar.html',
but it was actually '/2014/01/02/foo+bar.html'.

We now define Jekyll::URL.escape_path and Jekyll::URL.unescape_path,
and use them to escape and unescape Post#url
Fix Page#url escape
Post#url wasn't escaped at all.

For example, when we have a page named 'a#b.html',
we expect its url to be 'a%23b.html',
but it was actually 'a#b.html'.

We now use Jekyll::URL.escape_path and Jekyll::URL.unescape_path.
@nitoyon

This comment has been minimized.

Copy link
Contributor Author

nitoyon commented Mar 21, 2014

In RFC 3986, URI path segment is defined as follows:

segment       = *pchar
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
               / "*" / "+" / "," / ";" / "="

CGI.escape converts (see lib/cgi/util.rb):

  • [^ a-zA-Z0-9_.-] to hex
  • white space to +

URI.escape converts (see lib/uri/common.rb):

  • [^;/?:@&=+$,\[\]\-_.!~*'()#{ALNUM}] to hex

Jekyll::URL.escape_path converts:

  • [^a-zA-Z\d\-._~!$&\'()*+,;=:@\/] to hex

Comparison

character CGI.escape URI.escape Jekyll::URL.escape_path path segment (RFC 3986)
a-zA-Z0-9_.- nop nop nop nop or hex
;:@&=+$,!~*'() hex nop nop nop or hex
/ hex nop nop hex (BUT, never apears in path segment)
?[] hex nop hex hex
white space + %20 %20 %20
others hex hex hex hex

With this PR, ;:@&=+$,!~*'() in Post#url is no longer escaped, which is allowed in RFC 3986.

parkr added a commit that referenced this pull request Apr 2, 2014

@parkr parkr merged commit 806f43c into jekyll:master Apr 2, 2014

1 check passed

default The Travis CI build passed
Details
@parkr

This comment has been minimized.

Copy link
Member

parkr commented Apr 2, 2014

@gregose a bit after the fact, but could you please give this a once-over? Your 👀 are appreciated for 🔒's sake. ❤️

parkr added a commit that referenced this pull request Apr 2, 2014

@nitoyon nitoyon deleted the nitoyon:url-escape branch Apr 3, 2014

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.