Permalink
Browse files

Flesh out the parameterize method to support non-ascii text and under…

…scores.
  • Loading branch information...
1 parent 46bac29 commit 1ddde91303883b47f2215779cf45d7008377bd0d @NZKoz NZKoz committed Sep 11, 2008
Showing with 5 additions and 2 deletions.
  1. +1 −1 activesupport/lib/active_support/inflector.rb
  2. +4 −1 activesupport/test/inflector_test_cases.rb
View
2 activesupport/lib/active_support/inflector.rb
@@ -257,7 +257,7 @@ def demodulize(class_name_in_module)
# <%= link_to(@person.name, person_path %>
# # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
def parameterize(string, sep = '-')
- string.gsub(/[^a-z0-9]+/i, sep).downcase
+ string.chars.normalize(:kd).to_s.gsub(/[^\x00-\x7F]+/, '').gsub(/[^a-z0-9_\-]+/i, sep).downcase
end
# Create the name of a table like Rails does for models to table names. This method
View
5 activesupport/test/inflector_test_cases.rb
@@ -144,7 +144,10 @@ module InflectorTestCases
StringToParameterized = {
"Donald E. Knuth" => "donald-e-knuth",
- "Random text with *(bad)* characters" => "random-text-with-bad-characters"
+ "Random text with *(bad)* characters" => "random-text-with-bad-characters",
+ "Malmö" => "malmo",
+ "Garçons" => "garcons",
+ "Allow_Under_Scores" => "allow_under_scores"
}
UnderscoreToHuman = {

8 comments on commit 1ddde91

@henrik

Nice. Shouldn’t the to_s go right after “string”, though?

@tarmo

to_s is to convert the Multibyte::Chars back to a string after normalization.

@henrik

tarmo: Ah, right. A to_s after “string” would make it more robust for input like nil or numbers, but that might not be desired.

@NZKoz
Ruby on Rails member

I’m not sure the nil safety is warranted. 99.999% of people will call this with String#parameterize, not Inflector.parameterize…

@tomstuart

This method should also collapse multiple occurrences of the separator (‘foo—-bar’ => ‘foo-bar’) and strip leading/trailing occurrences (‘foo-bar’ => ‘foo-bar’).

@Manfred

A couple of considerations. When $KCODE isn’t set to UTF-8 in Ruby <= 1.8.6 this will break because normalize isn’t defined on String. Parameterizing non-ASCII strings results in a blank string: ‘おはよ’.parameterize => ‘’. I know that non of the other inflector methods support non-ASCII characters, what’s the verdict on this?

@henrik

I updated Slugalizer based on some of the code traded in the parameterize comments. The biggest change was that is now turns e.g. “foo@bar.com” into “foo-bar-com” instead of “foobarcom” – but it still squeezes multiple separators and removes leading/trailing separators, so " ! foo—dash@bar.com ! " becomes “foo-dash-bar-com”.

I think the current version of Slugalizer has no downsides compared to the current version of parameterize, but it also handles the stuff tomstuart mentioned. It also works with other $KCODEs than ‘u’, that I can tell.

While I do think it’s good to keep it lean, if this method should be present at all, it might as well be as good as it can be – at least as long as it’s just a matter of another short line or two of code.

Regarding the blank string, I think that’s perfectly reasonable. It would certainly be more useful if Japanese etc were transcribed, but I think then we’re firmly in plugin country (see Stringex).

@karmi

Thanks, NZKoz!

Also check this ticket

Please sign in to comment.