New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP - Improve permalink generation for URLs with special characters #944

Merged
merged 7 commits into from Aug 28, 2013

Conversation

Projects
None yet
7 participants
@x3ro
Contributor

x3ro commented Apr 9, 2013

This PR is a WIP attempt to solve #782 馃槃 Since I'm not sure what the best possible way to proceed is, I'm posting it to get some feedback.

I've already refactored everything I found that is related to URL generation into the URL module, which is now included in Post and Page. Those two only specify what placeholders they'd like to have replaced by means of url_placeholders and the URL#url method does the rest. Note that in the current url_placeholders method there is still some URL escaping going on, which I'll still refactor).

In the current state, all tests still pass, and the next step would be to actually alter/improve the current URL generation, which, as described in #782, has a few shortcomings regarding special characters.

The general question is, how should the generated URLs look, and is it acceptable to make breaking changes in permalink generation. I've already played around a bit, and found Stringex, which seems to generate fairly useful output, but using it would break the current URL generation tests, since some of the output is simply different (for example, it converts everything to lowercase, always). A more obvious downside is the fact that it tries to do stuff like this though:

"10% off if you act now".to_url => "10-percent-off-if-you-act-now"

While awesome, this creates the need to specify a locale for permalink generation, which might one could consider overkill.

I'd really like to hear your opinions on this one.

Refactor URL processing/generation into separate module
This is done to prepare for improved permalink generation
for URLs containing special characters, as proposed in
issue #782
@x3ro

This comment has been minimized.

Contributor

x3ro commented May 29, 2013

I still think that this could be merged even if the decision on how to modify permalinks will not be made anytime soon. As it currently stands, this PR is basically refactoring, with the advantage of allowing own url_placeholders to be defined 馃槃

# Returns a sanitized String URL
def sanitize_url(in_url)
# Remove all double slashes

This comment has been minimized.

@kelvinst

kelvinst Jul 10, 2013

Just a self-opinion, some comments (especially those that only explain what the code does) are unnecessary, like this and the others below...

This comment has been minimized.

@mattr-

mattr- Jul 10, 2013

Member

Technically, you're right, they're unnecessary. However, we've chosen to
employ TomDoc for our internal documentation and these comments serve as
TomDoc formatting documentation.

On Wed, Jul 10, 2013 at 1:27 PM, Kelvin Raffael Stinghen <
notifications@github.com> wrote:

In lib/jekyll/url.rb:

  • def url
  •  @url ||= sanitize_url(permalink || generate_url)
    
  • end
  • Generate the URL by replacing all placeholders with their respective values

  • Returns the unsanitizied String URL

  • def generate_url
  •  url_placeholders.inject(template) { |result, token|
    
  •    result.gsub(/:#{token.first}/, token.last)
    
  •  }
    
  • end
  • Returns a sanitized String URL

  • def sanitize_url(in_url)
  •  # Remove all double slashes
    

Just a self-opinion, some comments (especially those that only explain
what the code does) are unnecessary like this and the others below...


Reply to this email directly or view it on GitHubhttps://github.com//pull/944/files#r5122795
.

This comment has been minimized.

@kelvinst

kelvinst Jul 10, 2013

OK, was just an opinion, just when I get a code with too much inline comments (comments like "now I'll do that" and "now that"), my first thought was "this code is as incomprehensible as well?"... But in this case is just a single case, it's just that I've seen worse things, and considered a bad practice...

This comment has been minimized.

@x3ro

x3ro Jul 10, 2013

Contributor

In my opinion, it depends on the complexity of the lines being explained. In this concrete case, I like how I can read through the code like this:

  1. Remove all double slashes
  2. Remove every URL segment that consists solely of dots
  3. Append a trailing slash to the URL if the unsanitized URL had one

It is, again in my opinion, way nicer than having to read this:

  1. url = in_url.gsub(/\/\//, "/")
  2. url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
  3. url += "/" if in_url =~ /\/$/

(especially for the second line)

This comment has been minimized.

@kelvinst

kelvinst Jul 11, 2013

Yep, nice... It's pretty more readable, I'm agreed

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

Maybe each of these could be split into separate descriptive methods and put together to achieve the sanitized URL? String sanitation is a notoriously ugly process and maybe breaking things out into separate private methods could aid in elucidating the process.

This comment has been minimized.

@kelvinst

kelvinst Jul 26, 2013

I has thought in this suggestion too, but, in my thoughts, each extracted method would be very tiny... But, thinking a little more now, I prefer three separeted tiny methods that will be used only one time than "inline comments"... Good suggestion (= Thanks @parkr

This comment has been minimized.

@maul-esel

maul-esel Jul 26, 2013

Contributor

This goes in a similar direction as the discussion in #1341, partly realized here. It's intended for paths, but it works for URLs as well (there's scarcely any difference between a relative URL and a relative path anyway).

@kelvinst

This comment has been minimized.

kelvinst commented Jul 10, 2013

So @x3ro, I'd take a look on your change, seems like it's only an refactor right? The changes doesn't really change the operation and doesn't change any special characters, right?

I'd like the refactors and it's good enough to merge for me...

@x3ro

This comment has been minimized.

Contributor

x3ro commented Jul 10, 2013

Yes, as it stands its only a refactor. It was done with improving permalinks in mind (see #782), but none of that is contained in this PR 馃槃

@kelvinst

This comment has been minimized.

kelvinst commented Jul 10, 2013

So @mattr- and @parkr, what do you think about merge this? I'd revised the code and don't detected any change to the bahavior too... maybe this will improve code climate a little more

@mattr-

This comment has been minimized.

Member

mattr- commented Jul 10, 2013

Can this be updated to the latest version of the master branch? We've made some changes in this area of the code lately and can't merge it without an update. Thanks!

@x3ro

This comment has been minimized.

Contributor

x3ro commented Jul 11, 2013

I'll see if I can get that done today :)

@x3ro

This comment has been minimized.

Contributor

x3ro commented Jul 11, 2013

Unfortunately, the current master has a couple of tests failing, and bisect points to bd0e45c as the offending commit (ping @parkr)

I've updated my local branch, but I'd rather have those failing tests fix so that I can make sure everything is okay on my side 馃槃

Edit: Hmm, that commit seems a little old, though, maybe it's the wrong one. Tests currently failing do so because of NameError: uninitialized constant Jekyll::Pager::Pathname, e.g.

      1) Error:
test: Pager should determine the pagination path. (TestPager):
NameError: uninitialized constant Jekyll::Pager::Pathname
@parkr

This comment has been minimized.

Member

parkr commented Jul 11, 2013

@x3ro On the latest master, all tests pass for me and seem to be fine on Travis. What version of Ruby are you using? Try adding require 'pathname' to lib/jekyll.rb under # stdlib and try again.

@maul-esel

This comment has been minimized.

Contributor

maul-esel commented Jul 11, 2013

So I'm not the only one. After another merge of master, this recently occured again on my side as well - miraculously fixed the next day. What sorcery is this?

@x3ro

This comment has been minimized.

Contributor

x3ro commented Jul 11, 2013

@parkr Fascinatingly enough, the problem went away after upgrading to 1.9.3p448... Should've tried that before bugging you guys ;)

Merge branch 'master' into permalink-special-characters
Conflicts:
	lib/jekyll/page.rb
	lib/jekyll/post.rb
@x3ro

This comment has been minimized.

Contributor

x3ro commented Jul 14, 2013

@mattr- Finally got this done! :shipit:

@mattr-

This comment has been minimized.

Member

mattr- commented Jul 15, 2013

LGTM. @parkr?

File.join(@dir, self.data['permalink'])
else
self.data['permalink']
end

This comment has been minimized.

@maul-esel

maul-esel Jul 23, 2013

Contributor

May I ask, what effect do these changes have?

This comment has been minimized.

@x3ro

x3ro Jul 23, 2013

Contributor

If only I knew. The logic was introduced by @parkr in 1f23bc4 and 0e82b4e - I just moved it away from its current location.

This comment has been minimized.

@maul-esel

maul-esel Jul 23, 2013

Contributor

Oh sorry, I didn't see that.

This comment has been minimized.

@parkr

parkr Jul 23, 2013

Member

In jekyll pre-v1.0, you could specify a permalink as relative to its dir. Pretty cool actually

# Generate the URL by replacing all placeholders with their respective values
#
# Returns the _unsanitizied_ String URL

This comment has been minimized.

@maul-esel

maul-esel Jul 23, 2013

Contributor

Little typo here.

@parkr

This comment has been minimized.

Member

parkr commented Jul 23, 2013

I think I'd prefer it if it weren't a module which was included, but rather a class that was instantiated with the various options and a resource.

@kelvinst

This comment has been minimized.

kelvinst commented Jul 23, 2013

Agreed, that way the code will less intrusive, cause doesn't need to remember to implement the url_placeholder method (this placeholders hash can be passed through parameters in the initializer of the instantiated object)

馃憤 to @parkr idea

@parkr

This comment has been minimized.

parkr commented on f5d0be9 Jul 26, 2013

I 鉂わ笍 this refactor. Thanks!

@url += "/" if url =~ /\/$/
@url.gsub!(/\A([^\/])/, '/\1')
@url
# See url.rb for an explanation

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

Maybe this comment could describe what it's providing for the URL class?

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

We're using TomDoc - please follow the guidelines there. :)

This comment has been minimized.

@x3ro

x3ro Jul 26, 2013

Contributor

Wouldn't it be better to keep the docs in the url.rb file in order to avoid duplication?

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

This method isn't in your url.rb though. If I come across this method, I want to know what it's about, and it's only ever used in #url, there isn't much for me to go from. I want to know what this method is doing and this comment says "go figure it out in this other file, best of luck"

def generate_url
@placeholders.inject(@template) { |result, token|
result.gsub(/:#{token.first}/, token.last)
}

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

We generally prefer do ... end for blocks on multiple lines. :)

# the placeholder ":year" with the current year.
# :permalink - If supplied, no URL will be generated from the
# template. Instead, the given permalink will be
# used as URL.

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

馃憤 AWESOME comment.

#
# URL.new({
# :template => /:categories/:title.html",
# :placeholders => {:categories => "ruby", :title => "something"}

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

Should these Symbol keys be String keys?

:placeholders => { "categories" => "ruby", "title" => "something" }

This comment has been minimized.

@x3ro

x3ro Jul 26, 2013

Contributor

Hmm... I used symbols since it lined up nicely with the syntax used in the template string, but you're right that the url_placeholders methods use strings. Is there an advantage of using strings over symbols?

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

The only difference I can think of is saving memory and CPU cycles to convert. In Ruby < 2.0, it's just easier and more efficient to use Strings.

This comment has been minimized.

@jpiasetz

jpiasetz Jul 29, 2013

Contributor

@parkr really? I thought the opposite was true. Strings are fat and symbols are fast. This in 1.9.3 is almost 5 times faster with symbols.

require 'benchmark'

str = Benchmark.measure do
  10_000_000.times do
    "test"
  end
end.total

sym = Benchmark.measure do
  10_000_000.times do
    :test
  end
end.total

puts "String: " + str.to_s
puts "Symbol: " + sym.to_s
puts

This comment has been minimized.

@parkr

parkr Jul 29, 2013

Member

Ok, go for symbols!

This comment has been minimized.

@x3ro

x3ro Jul 29, 2013

Contributor

I will. Thanks for the clarification @jpiasetz :)

This comment has been minimized.

@kelvinst

kelvinst Jul 29, 2013

I had the same question @jpiasetz, thank you to take a time and test this

In lib/jekyll/url.rb:

@@ -0,0 +1,67 @@
+# Public: Methods that generate a URL for a resource such as a Post or a Page.
+#
+# Examples
+#
+# URL.new({
+# :template => /:categories/:title.html",
+# :placeholders => {:categories => "ruby", :title => "something"}
I will. Thanks for the clarification @jpiasetz :)


Reply to this email directly or view it on GitHub.

@parkr

This comment has been minimized.

Member

parkr commented Jul 26, 2013

Looks like a mixup between Strings and Symbols as keys for the placeholders Hash. :)

:placeholders => url_placeholders,
:permalink => permalink
}).to_s
end

This comment has been minimized.

@maul-esel

maul-esel Jul 26, 2013

Contributor

Downside of class vs. module is that this code is duplicated now in Page and Post. Any way to avoid this?

This comment has been minimized.

@parkr

parkr Jul 26, 2013

Member

Pass in the page/post as the arg and call the necessary methods? I'd prefer to keep the URL class as dumb as possible. I don't mind this duplication. Modules are good but in very small doses. :) I'd prefer to have the logic completely separated here.

This comment has been minimized.

@x3ro

x3ro Jul 26, 2013

Contributor

I agree with @parkr here I guess. I initially went with the Module approach, because that was how most of the Jekyll functionality seemed to be implemented.. The duplication in this case is relatively minor, and I think it's worth to put up with it, if the alternative is making the URL class aware of methods it has to call on some Post or Page object 馃槃

This comment has been minimized.

@maul-esel

maul-esel Jul 26, 2013

Contributor

Yeah, passing the post and page is not a good alternative. So the best way seems to leave it that way, at least for now.

This comment has been minimized.

@kelvinst

kelvinst Jul 26, 2013

Agreed with @parkr, to avoid black magic into URL, I think that's the best way...

Maybe, if you want to simplify, create a factory or builder class with a method for generate the URL can help, you can pass the page or the post as parameter for it and access the necessary methods, since the both has these three methods.

Buuuut, I didn't like the previous solution because nothing ensure us that the three methods really exists. Some days ago #1300 added a refactor that extracts an main class from Page and Post, with that changes, I think the factory was pretty more "secure".

@x3ro

This comment has been minimized.

Contributor

x3ro commented Jul 31, 2013

Updated. I didn't modify the way the URL class is invoked, though. @kelvinst suggested a factory, which is a great idea. However, I think that in this particular case it's overkill, since people trying to get the URL of a page or post will invoke the url method on these classes (at least I cannot think of a scenario where one wouldn't want to do that). For every other use case, the constructor would've to be used anyway 馃槃

@kelvinst

This comment has been minimized.

kelvinst commented Aug 1, 2013

Very nice @x3ro (:
I was thinking in maybe only a method factory, but this is good enough for me too.

@kelvinst

This comment has been minimized.

kelvinst commented Aug 1, 2013

And I agreed that a factory class is a big overkill for this.

@parkr

This comment has been minimized.

Member

parkr commented Aug 3, 2013

(Travis failure is unrelated.)

@parkr

This comment has been minimized.

Member

parkr commented Aug 3, 2013

This PR LGTM. @mattr-?

@x3ro

This comment has been minimized.

Contributor

x3ro commented Aug 23, 2013

馃崓 ?

@parkr

This comment has been minimized.

Member

parkr commented Aug 26, 2013

Just need to hear from @mattr-.

@mattr-

This comment has been minimized.

Member

mattr- commented Aug 27, 2013

i'll review it tomorrow. need to 馃挙 now.

@mattr-

This comment has been minimized.

Member

mattr- commented Aug 27, 2013

馃憤 :shipit:

parkr added a commit that referenced this pull request Aug 28, 2013

Merge pull request #944 from x3ro/permalink-special-characters
WIP - Improve permalink generation for URLs with special characters

@parkr parkr merged commit 0d890e4 into jekyll:master Aug 28, 2013

1 check failed

default The Travis CI build failed
Details

@x3ro x3ro deleted the x3ro:permalink-special-characters branch Oct 11, 2013

@jekyll jekyll locked and limited conversation to collaborators Feb 27, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.