New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always normalize the result of the relative_url filter #6185

Merged
merged 4 commits into from Jul 1, 2017

Conversation

Projects
None yet
5 participants
@benbalter
Contributor

benbalter commented Jun 28, 2017

Since #6137, baseurl defaults to nil.

That means, for the first time, this code path is excercised (since we weren't checking to_s.empty? before. This has two implications.

First, URL's aren't normalized, which is an inconsistent behavior.

Second, if baseurl isn't set, in Jekyll >= 3.4.4, the relative_url filter will return the result of the page's url method by reference, which is @url, rather than a new string. This continue to be true even if you call to_s on the input or output (I suspect because @url is already a string, to_s has some sort of return thing if thing.is_a? String check).

Imagine the following scenario: You have a generator with the following:

class Generator < Jekyll::Generator
  def generate(site)
    site.pages.each do |page| 
      page.scan(/\[\]\(.*\)/).each |link| do
        do_something(link, page.url) 
      end
    end
  end

  def do_something(link, page_url)
    page_url << "#foo"
    Jekyll.logger.info "This Page's URL with '#foo' added: ", url
  end
end

Obviously you wouldn't do this exactly, but you can see how the behavior will be different depending on if baseurl is set or not. If it's set, the output would be as expected. If baseurl isn't set, do_something will be acting on the Page's @url instance variable, meaning every iteration will add an additional #foo to the URL (and affect the page's output when it goes to get written).

This PR removes the guard statement, and instead, always runs the output of relative_url through Addressable parse and normalize methods, ensuring we're always returning a new string, and other processes don't inadvertently mangle the Page's URL or output path.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Jun 28, 2017

Member

Can we test the raw behavior in the tests directly? It'd be great to be able to be able to ensure this doesn't happen again.

Member

parkr commented Jun 28, 2017

Can we test the raw behavior in the tests directly? It'd be great to be able to be able to ensure this doesn't happen again.

@benbalter

This comment has been minimized.

Show comment
Hide comment
@benbalter

benbalter Jun 28, 2017

Contributor

I added a test via 8de4728. The test fails on 3.4 (#6186), but does not fail on master or 3.5. On 3.5, we run the nil? check on santized_baseurl, which itself, calls site.config["baseurl"].to_s, meaning sanitized_baseurl will never be nil and the condition of the early return will never be satisfied (that's not to say it may be nil someday, so I added the test here as well, to prevent a regression).

Contributor

benbalter commented Jun 28, 2017

I added a test via 8de4728. The test fails on 3.4 (#6186), but does not fail on master or 3.5. On 3.5, we run the nil? check on santized_baseurl, which itself, calls site.config["baseurl"].to_s, meaning sanitized_baseurl will never be nil and the condition of the early return will never be satisfied (that's not to say it may be nil someday, so I added the test here as well, to prevent a regression).

Addressable::URI.parse(
ensure_leading_slash(sanitized_baseurl) + ensure_leading_slash(input.to_s)
parts.compact.map { |part| ensure_leading_slash(part.to_s) }.join

This comment has been minimized.

@ashmaroli

ashmaroli Jun 29, 2017

Member

can we have at least one mutating method here instead of two non-mutating methods?
i.e. either

parts.compact!.map { }

or

parts.compact.map! { }
@ashmaroli

ashmaroli Jun 29, 2017

Member

can we have at least one mutating method here instead of two non-mutating methods?
i.e. either

parts.compact!.map { }

or

parts.compact.map! { }

This comment has been minimized.

@parkr

parkr Jun 29, 2017

Member

Why?

@parkr

parkr Jun 29, 2017

Member

Why?

This comment has been minimized.

@ashmaroli

ashmaroli Jun 29, 2017

Member

Because parts.compact returns a copy of parts: parts'
and map will return another copy of parts': parts''

in effect loading memory with unnecessary objects (at each iteration in the example cited in this PR)

@ashmaroli

ashmaroli Jun 29, 2017

Member

Because parts.compact returns a copy of parts: parts'
and map will return another copy of parts': parts''

in effect loading memory with unnecessary objects (at each iteration in the example cited in this PR)

This comment has been minimized.

@benbalter

benbalter Jun 29, 2017

Contributor

Wouldn't it be GC'd when relative_url returns since it's no longer referenced?

@benbalter

benbalter Jun 29, 2017

Contributor

Wouldn't it be GC'd when relative_url returns since it's no longer referenced?

This comment has been minimized.

@ashmaroli

ashmaroli Jun 29, 2017

Member

Most probably.. Well, I guess, since its a local variable....... Oh well, nevermind...

@ashmaroli

ashmaroli Jun 29, 2017

Member

Most probably.. Well, I guess, since its a local variable....... Oh well, nevermind...

This comment has been minimized.

@parkr

parkr Jun 30, 2017

Member

If it becomes a big performance issue, we can fix it then. 😄

@parkr

parkr Jun 30, 2017

Member

If it becomes a big performance issue, we can fix it then. 😄

@parkr

parkr approved these changes Jul 1, 2017

Addressed by the PR author.

@parkr

This comment has been minimized.

Show comment
Hide comment
@parkr

parkr Jul 1, 2017

Member

@jekyllbot: merge +fix

Member

parkr commented Jul 1, 2017

@jekyllbot: merge +fix

@jekyllbot jekyllbot merged commit 9f78157 into master Jul 1, 2017

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jekyllbot jekyllbot deleted the normalize-relative-path-3-5 branch Jul 1, 2017

jekyllbot added a commit that referenced this pull request Jul 1, 2017

parkr added a commit that referenced this pull request Jul 1, 2017

@@ -23,9 +23,9 @@ def absolute_url(input)
# Returns a URL relative to the domain root as a String.
def relative_url(input)
return if input.nil?
return ensure_leading_slash(input.to_s) if sanitized_baseurl.nil?
parts = [sanitized_baseurl, input]

This comment has been minimized.

@parkr

parkr Jul 1, 2017

Member

This method,sanitized_baseurl, always returns a string, which means parts.compact is a no-op. This appears to invalidate our assumptions here?

@parkr

parkr Jul 1, 2017

Member

This method,sanitized_baseurl, always returns a string, which means parts.compact is a no-op. This appears to invalidate our assumptions here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment