Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

WIP - Improve permalink generation for URLs with special characters #944

Merged
merged 7 commits into from

6 participants

@x3ro

This PR is a WIP attempt to solve #782 :smile: Since I'm not sure what the best possible way to proceed is, I'm posting it to get some feedback.

I've already refactored everything I found that is related to URL generation into the URL module, which is now included in Post and Page. Those two only specify what placeholders they'd like to have replaced by means of url_placeholders and the URL#url method does the rest. Note that in the current url_placeholders method there is still some URL escaping going on, which I'll still refactor).

In the current state, all tests still pass, and the next step would be to actually alter/improve the current URL generation, which, as described in #782, has a few shortcomings regarding special characters.

The general question is, how should the generated URLs look, and is it acceptable to make breaking changes in permalink generation. I've already played around a bit, and found Stringex, which seems to generate fairly useful output, but using it would break the current URL generation tests, since some of the output is simply different (for example, it converts everything to lowercase, always). A more obvious downside is the fact that it tries to do stuff like this though:

"10% off if you act now".to_url => "10-percent-off-if-you-act-now"

While awesome, this creates the need to specify a locale for permalink generation, which might one could consider overkill.

I'd really like to hear your opinions on this one.

@x3ro x3ro Refactor URL processing/generation into separate module
This is done to prepare for improved permalink generation
for URLs containing special characters, as proposed in
issue #782
cfcbe1f
@x3ro

I still think that this could be merged even if the decision on how to modify permalinks will not be made anytime soon. As it currently stands, this PR is basically refactoring, with the advantage of allowing own url_placeholders to be defined :smile:

@kelvinst kelvinst commented on the diff
lib/jekyll/url.rb
((27 lines not shown))
+ def url
+ @url ||= sanitize_url(permalink || generate_url)
+ end
+
+ # Generate the URL by replacing all placeholders with their respective values
+ #
+ # Returns the _unsanitizied_ String URL
+ def generate_url
+ url_placeholders.inject(template) { |result, token|
+ result.gsub(/:#{token.first}/, token.last)
+ }
+ end
+
+ # Returns a sanitized String URL
+ def sanitize_url(in_url)
+ # Remove all double slashes

Just a self-opinion, some comments (especially those that only explain what the code does) are unnecessary, like this and the others below...

@mattr- Owner
mattr- added a note

OK, was just an opinion, just when I get a code with too much inline comments (comments like "now I'll do that" and "now that"), my first thought was "this code is as incomprehensible as well?"... But in this case is just a single case, it's just that I've seen worse things, and considered a bad practice...

@x3ro
x3ro added a note

In my opinion, it depends on the complexity of the lines being explained. In this concrete case, I like how I can read through the code like this:

  1. Remove all double slashes
  2. Remove every URL segment that consists solely of dots
  3. Append a trailing slash to the URL if the unsanitized URL had one

It is, again in my opinion, way nicer than having to read this:

  1. url = in_url.gsub(/\/\//, "/")
  2. url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
  3. url += "/" if in_url =~ /\/$/

(especially for the second line)

Yep, nice... It's pretty more readable, I'm agreed

@parkr Owner
parkr added a note

Maybe each of these could be split into separate descriptive methods and put together to achieve the sanitized URL? String sanitation is a notoriously ugly process and maybe breaking things out into separate private methods could aid in elucidating the process.

I has thought in this suggestion too, but, in my thoughts, each extracted method would be very tiny... But, thinking a little more now, I prefer three separeted tiny methods that will be used only one time than "inline comments"... Good suggestion (= Thanks @parkr

This goes in a similar direction as the discussion in #1341, partly realized here. It's intended for paths, but it works for URLs as well (there's scarcely any difference between a relative URL and a relative path anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@kelvinst

So @x3ro, I'd take a look on your change, seems like it's only an refactor right? The changes doesn't really change the operation and doesn't change any special characters, right?

I'd like the refactors and it's good enough to merge for me...

@x3ro

Yes, as it stands its only a refactor. It was done with improving permalinks in mind (see #782), but none of that is contained in this PR :smile:

@kelvinst

So @mattr- and @parkr, what do you think about merge this? I'd revised the code and don't detected any change to the bahavior too... maybe this will improve code climate a little more

@mattr-
Owner

Can this be updated to the latest version of the master branch? We've made some changes in this area of the code lately and can't merge it without an update. Thanks!

@x3ro

I'll see if I can get that done today :)

@x3ro

Unfortunately, the current master has a couple of tests failing, and bisect points to bd0e45c as the offending commit (ping @parkr)

I've updated my local branch, but I'd rather have those failing tests fix so that I can make sure everything is okay on my side :smile:

Edit: Hmm, that commit seems a little old, though, maybe it's the wrong one. Tests currently failing do so because of NameError: uninitialized constant Jekyll::Pager::Pathname, e.g.

      1) Error:
test: Pager should determine the pagination path. (TestPager):
NameError: uninitialized constant Jekyll::Pager::Pathname
@parkr
Owner

@x3ro On the latest master, all tests pass for me and seem to be fine on Travis. What version of Ruby are you using? Try adding require 'pathname' to lib/jekyll.rb under # stdlib and try again.

@maul-esel

So I'm not the only one. After another merge of master, this recently occured again on my side as well - miraculously fixed the next day. What sorcery is this?

@x3ro

@parkr Fascinatingly enough, the problem went away after upgrading to 1.9.3p448... Should've tried that before bugging you guys ;)

@x3ro x3ro Merge branch 'master' into permalink-special-characters
Conflicts:
	lib/jekyll/page.rb
	lib/jekyll/post.rb
2ac98a7
@x3ro

@mattr- Finally got this done! :shipit:

@mattr-
Owner

LGTM. @parkr?

@maul-esel maul-esel commented on the diff
lib/jekyll/page.rb
@@ -37,7 +38,12 @@ def dir
#
# Returns the String permalink or nil if none has been set.
def permalink
- self.data && self.data['permalink']
+ return nil if self.data.nil? || self.data['permalink'].nil?
+ if site.config['relative_permalinks']
+ File.join(@dir, self.data['permalink'])
+ else
+ self.data['permalink']
+ end

May I ask, what effect do these changes have?

@x3ro
x3ro added a note

If only I knew. The logic was introduced by @parkr in 1f23bc4 and 0e82b4e - I just moved it away from its current location.

Oh sorry, I didn't see that.

@parkr Owner
parkr added a note

In jekyll pre-v1.0, you could specify a permalink as relative to its dir. Pretty cool actually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/jekyll/url.rb
((18 lines not shown))
+#
+#
+
+module Jekyll
+ module URL
+
+ # The generated relative url of this page. e.g. /about.html.
+ #
+ # Returns the String url.
+ def url
+ @url ||= sanitize_url(permalink || generate_url)
+ end
+
+ # Generate the URL by replacing all placeholders with their respective values
+ #
+ # Returns the _unsanitizied_ String URL

Little typo here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@parkr
Owner

I think I'd prefer it if it weren't a module which was included, but rather a class that was instantiated with the various options and a resource.

@kelvinst

Agreed, that way the code will less intrusive, cause doesn't need to remember to implement the url_placeholder method (this placeholders hash can be passed through parameters in the initializer of the instantiated object)

:+1: to @parkr idea

@parkr
Owner

I :heart: this refactor. Thanks!

lib/jekyll/page.rb
((27 lines not shown))
- # sanitize url
- @url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
- @url += "/" if url =~ /\/$/
- @url.gsub!(/\A([^\/])/, '/\1')
- @url
+ # See url.rb for an explanation
@parkr Owner
parkr added a note

Maybe this comment could describe what it's providing for the URL class?

@parkr Owner
parkr added a note

We're using TomDoc - please follow the guidelines there. :)

@x3ro
x3ro added a note

Wouldn't it be better to keep the docs in the url.rb file in order to avoid duplication?

@parkr Owner
parkr added a note

This method isn't in your url.rb though. If I come across this method, I want to know what it's about, and it's only ever used in #url, there isn't much for me to go from. I want to know what this method is doing and this comment says "go figure it out in this other file, best of luck"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
lib/jekyll/url.rb
((33 lines not shown))
+
+ # The generated relative URL of the resource
+ #
+ # Returns the String URL
+ def to_s
+ sanitize_url(@permalink || generate_url)
+ end
+
+ # Internal: Generate the URL by replacing all placeholders with their
+ # respective values
+ #
+ # Returns the _unsanitizied_ String URL
+ def generate_url
+ @placeholders.inject(@template) { |result, token|
+ result.gsub(/:#{token.first}/, token.last)
+ }
@parkr Owner
parkr added a note

We generally prefer do ... end for blocks on multiple lines. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@parkr parkr commented on the diff
lib/jekyll/url.rb
((8 lines not shown))
+# }).to_s
+#
+module Jekyll
+ class URL
+
+ # options - One of :permalink or :template must be supplied.
+ # :template - The String used as template for URL generation,
+ # for example "/:path/:basename:output_ext", where
+ # a placeholder is prefixed with a colon.
+ # :placeholders - A hash containing the placeholders which will be
+ # replaced when used inside the template. E.g.
+ # { "year" => Time.now.strftime("%Y") } would replace
+ # the placeholder ":year" with the current year.
+ # :permalink - If supplied, no URL will be generated from the
+ # template. Instead, the given permalink will be
+ # used as URL.
@parkr Owner
parkr added a note

:+1: AWESOME comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@parkr parkr commented on the diff
lib/jekyll/url.rb
@@ -0,0 +1,67 @@
+# Public: Methods that generate a URL for a resource such as a Post or a Page.
+#
+# Examples
+#
+# URL.new({
+# :template => /:categories/:title.html",
+# :placeholders => {:categories => "ruby", :title => "something"}
@parkr Owner
parkr added a note

Should these Symbol keys be String keys?

:placeholders => { "categories" => "ruby", "title" => "something" }
@x3ro
x3ro added a note

Hmm... I used symbols since it lined up nicely with the syntax used in the template string, but you're right that the url_placeholders methods use strings. Is there an advantage of using strings over symbols?

@parkr Owner
parkr added a note

The only difference I can think of is saving memory and CPU cycles to convert. In Ruby < 2.0, it's just easier and more efficient to use Strings.

@parkr really? I thought the opposite was true. Strings are fat and symbols are fast. This in 1.9.3 is almost 5 times faster with symbols.

require 'benchmark'

str = Benchmark.measure do
  10_000_000.times do
    "test"
  end
end.total

sym = Benchmark.measure do
  10_000_000.times do
    :test
  end
end.total

puts "String: " + str.to_s
puts "Symbol: " + sym.to_s
puts
@parkr Owner
parkr added a note

Ok, go for symbols!

@x3ro
x3ro added a note

I will. Thanks for the clarification @jpiasetz :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@parkr
Owner

Looks like a mixup between Strings and Symbols as keys for the placeholders Hash. :)

@maul-esel maul-esel commented on the diff
lib/jekyll/post.rb
((19 lines not shown))
- "i_day" => date.strftime("%d").to_i.to_s,
- "i_month" => date.strftime("%m").to_i.to_s,
- "categories" => categories.map { |c| URI.escape(c.to_s) }.join('/'),
- "short_month" => date.strftime("%b"),
- "y_day" => date.strftime("%j"),
- "output_ext" => self.output_ext
- }.inject(template) { |result, token|
- result.gsub(/:#{Regexp.escape token.first}/, token.last)
- }.gsub(/\/\//, "/")
- end
+ @url ||= URL.new({
+ :template => template,
+ :placeholders => url_placeholders,
+ :permalink => permalink
+ }).to_s
+ end

Downside of class vs. module is that this code is duplicated now in Page and Post. Any way to avoid this?

@parkr Owner
parkr added a note

Pass in the page/post as the arg and call the necessary methods? I'd prefer to keep the URL class as dumb as possible. I don't mind this duplication. Modules are good but in very small doses. :) I'd prefer to have the logic completely separated here.

@x3ro
x3ro added a note

I agree with @parkr here I guess. I initially went with the Module approach, because that was how most of the Jekyll functionality seemed to be implemented.. The duplication in this case is relatively minor, and I think it's worth to put up with it, if the alternative is making the URL class aware of methods it has to call on some Post or Page object :smile:

Yeah, passing the post and page is not a good alternative. So the best way seems to leave it that way, at least for now.

Agreed with @parkr, to avoid black magic into URL, I think that's the best way...

Maybe, if you want to simplify, create a factory or builder class with a method for generate the URL can help, you can pass the page or the post as parameter for it and access the necessary methods, since the both has these three methods.

Buuuut, I didn't like the previous solution because nothing ensure us that the three methods really exists. Some days ago #1300 added a refactor that extracts an main class from Page and Post, with that changes, I think the factory was pretty more "secure".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@x3ro

Updated. I didn't modify the way the URL class is invoked, though. @kelvinst suggested a factory, which is a great idea. However, I think that in this particular case it's overkill, since people trying to get the URL of a page or post will invoke the url method on these classes (at least I cannot think of a scenario where one wouldn't want to do that). For every other use case, the constructor would've to be used anyway :smile:

@kelvinst

Very nice @x3ro (:
I was thinking in maybe only a method factory, but this is good enough for me too.

@kelvinst

And I agreed that a factory class is a big overkill for this.

@parkr
Owner

(Travis failure is unrelated.)

@parkr
Owner

This PR LGTM. @mattr-?

@x3ro

:pineapple: ?

@parkr
Owner

Just need to hear from @mattr-.

@mattr-
Owner

i'll review it tomorrow. need to :zzz: now.

@mattr-
Owner

:+1: :shipit:

@parkr parkr merged commit 0d890e4 into jekyll:master
@x3ro x3ro deleted the x3ro:permalink-special-characters branch
@parkr parkr referenced this pull request in Shopify/liquid
Closed

Do not support Ruby 1.9 Hash syntax #289

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 9, 2013
  1. @x3ro

    Refactor URL processing/generation into separate module

    x3ro authored
    This is done to prepare for improved permalink generation
    for URLs containing special characters, as proposed in
    issue #782
Commits on Jul 14, 2013
  1. @x3ro

    Merge branch 'master' into permalink-special-characters

    x3ro authored
    Conflicts:
    	lib/jekyll/page.rb
    	lib/jekyll/post.rb
Commits on Jul 25, 2013
  1. @x3ro

    Merge branch 'master' of https://github.com/mojombo/jekyll into perma…

    x3ro authored
    …link-special-characters
  2. @x3ro
Commits on Jul 31, 2013
  1. @x3ro

    Use symbols for all placeholders

    x3ro authored
    See jekyll#944 (comment)
    for a discussion.
  2. @x3ro
  3. @x3ro
This page is out of date. Refresh to see the latest.
View
1  lib/jekyll.rb
@@ -33,6 +33,7 @@ def require_all(path)
require 'jekyll/configuration'
require 'jekyll/site'
require 'jekyll/convertible'
+require 'jekyll/url'
require 'jekyll/layout'
require 'jekyll/page'
require 'jekyll/post'
View
43 lib/jekyll/page.rb
@@ -37,7 +37,12 @@ def dir
#
# Returns the String permalink or nil if none has been set.
def permalink
- self.data && self.data['permalink']
+ return nil if self.data.nil? || self.data['permalink'].nil?
+ if site.config['relative_permalinks']
+ File.join(@dir, self.data['permalink'])
+ else
+ self.data['permalink']
+ end

May I ask, what effect do these changes have?

@x3ro
x3ro added a note

If only I knew. The logic was introduced by @parkr in 1f23bc4 and 0e82b4e - I just moved it away from its current location.

Oh sorry, I didn't see that.

@parkr Owner
parkr added a note

In jekyll pre-v1.0, you could specify a permalink as relative to its dir. Pretty cool actually

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
end
# The template of the permalink.
@@ -61,29 +66,21 @@ def template
#
# Returns the String url.
def url
- return @url if @url
-
- url = if permalink
- if site.config['relative_permalinks']
- File.join(@dir, permalink)
- else
- permalink
- end
- else
- {
- "path" => @dir,
- "basename" => self.basename,
- "output_ext" => self.output_ext,
- }.inject(template) { |result, token|
- result.gsub(/:#{token.first}/, token.last)
- }.gsub(/\/\//, "/")
- end
+ @url ||= URL.new({
+ :template => template,
+ :placeholders => url_placeholders,
+ :permalink => permalink
+ }).to_s
+ end
- # sanitize url
- @url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
- @url += "/" if url =~ /\/$/
- @url.gsub!(/\A([^\/])/, '/\1')
- @url
+ # Returns a hash of URL placeholder names (as symbols) mapping to the
+ # desired placeholder replacements. For details see "url.rb"
+ def url_placeholders
+ {
+ :path => @dir,
+ :basename => self.basename,
+ :output_ext => self.output_ext
+ }
end
# Extract information from the page filename.
View
49 lib/jekyll/post.rb
@@ -195,36 +195,31 @@ def template
end
# The generated relative url of this post.
- # e.g. /2008/11/05/my-awesome-post.html
#
- # Returns the String URL.
+ # Returns the String url.
def url
- return @url if @url
-
- url = if permalink
- permalink
- else
- {
- "year" => date.strftime("%Y"),
- "month" => date.strftime("%m"),
- "day" => date.strftime("%d"),
- "title" => CGI.escape(slug),
- "i_day" => date.strftime("%d").to_i.to_s,
- "i_month" => date.strftime("%m").to_i.to_s,
- "categories" => categories.map { |c| URI.escape(c.to_s) }.join('/'),
- "short_month" => date.strftime("%b"),
- "y_day" => date.strftime("%j"),
- "output_ext" => self.output_ext
- }.inject(template) { |result, token|
- result.gsub(/:#{Regexp.escape token.first}/, token.last)
- }.gsub(/\/\//, "/")
- end
+ @url ||= URL.new({
+ :template => template,
+ :placeholders => url_placeholders,
+ :permalink => permalink
+ }).to_s
+ end

Downside of class vs. module is that this code is duplicated now in Page and Post. Any way to avoid this?

@parkr Owner
parkr added a note

Pass in the page/post as the arg and call the necessary methods? I'd prefer to keep the URL class as dumb as possible. I don't mind this duplication. Modules are good but in very small doses. :) I'd prefer to have the logic completely separated here.

@x3ro
x3ro added a note

I agree with @parkr here I guess. I initially went with the Module approach, because that was how most of the Jekyll functionality seemed to be implemented.. The duplication in this case is relatively minor, and I think it's worth to put up with it, if the alternative is making the URL class aware of methods it has to call on some Post or Page object :smile:

Yeah, passing the post and page is not a good alternative. So the best way seems to leave it that way, at least for now.

Agreed with @parkr, to avoid black magic into URL, I think that's the best way...

Maybe, if you want to simplify, create a factory or builder class with a method for generate the URL can help, you can pass the page or the post as parameter for it and access the necessary methods, since the both has these three methods.

Buuuut, I didn't like the previous solution because nothing ensure us that the three methods really exists. Some days ago #1300 added a refactor that extracts an main class from Page and Post, with that changes, I think the factory was pretty more "secure".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
- # sanitize url
- @url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
- @url += "/" if url =~ /\/$/
- @url.gsub!(/\A([^\/])/, '/\1')
- @url
+ # Returns a hash of URL placeholder names (as symbols) mapping to the
+ # desired placeholder replacements. For details see "url.rb"
+ def url_placeholders
+ {
+ :year => date.strftime("%Y"),
+ :month => date.strftime("%m"),
+ :day => date.strftime("%d"),
+ :title => CGI.escape(slug),
+ :i_day => date.strftime("%d").to_i.to_s,
+ :i_month => date.strftime("%m").to_i.to_s,
+ :categories => (categories || []).map { |c| URI.escape(c.to_s) }.join('/'),
+ :short_month => date.strftime("%b"),
+ :y_day => date.strftime("%j"),
+ :output_ext => self.output_ext
+ }
end
# The UID for this post (useful in feeds).
View
67 lib/jekyll/url.rb
@@ -0,0 +1,67 @@
+# Public: Methods that generate a URL for a resource such as a Post or a Page.
+#
+# Examples
+#
+# URL.new({
+# :template => /:categories/:title.html",
+# :placeholders => {:categories => "ruby", :title => "something"}
@parkr Owner
parkr added a note

Should these Symbol keys be String keys?

:placeholders => { "categories" => "ruby", "title" => "something" }
@x3ro
x3ro added a note

Hmm... I used symbols since it lined up nicely with the syntax used in the template string, but you're right that the url_placeholders methods use strings. Is there an advantage of using strings over symbols?

@parkr Owner
parkr added a note

The only difference I can think of is saving memory and CPU cycles to convert. In Ruby < 2.0, it's just easier and more efficient to use Strings.

@parkr really? I thought the opposite was true. Strings are fat and symbols are fast. This in 1.9.3 is almost 5 times faster with symbols.

require 'benchmark'

str = Benchmark.measure do
  10_000_000.times do
    "test"
  end
end.total

sym = Benchmark.measure do
  10_000_000.times do
    :test
  end
end.total

puts "String: " + str.to_s
puts "Symbol: " + sym.to_s
puts
@parkr Owner
parkr added a note

Ok, go for symbols!

@x3ro
x3ro added a note

I will. Thanks for the clarification @jpiasetz :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+# }).to_s
+#
+module Jekyll
+ class URL
+
+ # options - One of :permalink or :template must be supplied.
+ # :template - The String used as template for URL generation,
+ # for example "/:path/:basename:output_ext", where
+ # a placeholder is prefixed with a colon.
+ # :placeholders - A hash containing the placeholders which will be
+ # replaced when used inside the template. E.g.
+ # { "year" => Time.now.strftime("%Y") } would replace
+ # the placeholder ":year" with the current year.
+ # :permalink - If supplied, no URL will be generated from the
+ # template. Instead, the given permalink will be
+ # used as URL.
@parkr Owner
parkr added a note

:+1: AWESOME comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ def initialize(options)
+ @template = options[:template]
+ @placeholders = options[:placeholders] || {}
+ @permalink = options[:permalink]
+
+ if (@template || @permalink).nil?
+ raise ArgumentError, "One of :template or :permalink must be supplied."
+ end
+ end
+
+ # The generated relative URL of the resource
+ #
+ # Returns the String URL
+ def to_s
+ sanitize_url(@permalink || generate_url)
+ end
+
+ # Internal: Generate the URL by replacing all placeholders with their
+ # respective values
+ #
+ # Returns the _unsanitizied_ String URL
+ def generate_url
+ @placeholders.inject(@template) do |result, token|
+ result.gsub(/:#{token.first}/, token.last)
+ end
+ end
+
+ # Returns a sanitized String URL
+ def sanitize_url(in_url)
+ # Remove all double slashes

Just a self-opinion, some comments (especially those that only explain what the code does) are unnecessary, like this and the others below...

@mattr- Owner
mattr- added a note

OK, was just an opinion, just when I get a code with too much inline comments (comments like "now I'll do that" and "now that"), my first thought was "this code is as incomprehensible as well?"... But in this case is just a single case, it's just that I've seen worse things, and considered a bad practice...

@x3ro
x3ro added a note

In my opinion, it depends on the complexity of the lines being explained. In this concrete case, I like how I can read through the code like this:

  1. Remove all double slashes
  2. Remove every URL segment that consists solely of dots
  3. Append a trailing slash to the URL if the unsanitized URL had one

It is, again in my opinion, way nicer than having to read this:

  1. url = in_url.gsub(/\/\//, "/")
  2. url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
  3. url += "/" if in_url =~ /\/$/

(especially for the second line)

Yep, nice... It's pretty more readable, I'm agreed

@parkr Owner
parkr added a note

Maybe each of these could be split into separate descriptive methods and put together to achieve the sanitized URL? String sanitation is a notoriously ugly process and maybe breaking things out into separate private methods could aid in elucidating the process.

I has thought in this suggestion too, but, in my thoughts, each extracted method would be very tiny... But, thinking a little more now, I prefer three separeted tiny methods that will be used only one time than "inline comments"... Good suggestion (= Thanks @parkr

This goes in a similar direction as the discussion in #1341, partly realized here. It's intended for paths, but it works for URLs as well (there's scarcely any difference between a relative URL and a relative path anyway).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
+ url = in_url.gsub(/\/\//, "/")
+
+ # Remove every URL segment that consists solely of dots
+ url = url.split('/').reject{ |part| part =~ /^\.+$/ }.join('/')
+
+ # Append a trailing slash to the URL if the unsanitized URL had one
+ url += "/" if in_url =~ /\/$/
+
+ # Always add a leading slash
+ url.gsub!(/\A([^\/])/, '/\1')
+ url
+ end
+ end
+end
View
28 test/test_url.rb
@@ -0,0 +1,28 @@
+require 'helper'
+
+class TestURL < Test::Unit::TestCase
+ context "The URL class" do
+
+ should "throw an exception if neither permalink or template is specified" do
+ assert_raises ArgumentError do
+ URL.new(:placeholders => {})
+ end
+ end
+
+ should "replace placeholders in templates" do
+ assert_equal "/foo/bar", URL.new(
+ :template => "/:x/:y",
+ :placeholders => {:x => "foo", :y => "bar"}
+ ).to_s
+ end
+
+ should "return permalink if given" do
+ assert_equal "/le/perma/link", URL.new(
+ :template => "/:x/:y",
+ :placeholders => {:x => "foo", :y => "bar"},
+ :permalink => "/le/perma/link"
+ ).to_s
+ end
+
+ end
+end
Something went wrong with that request. Please try again.