Browse files

Flesh out the parameterize method to support non-ascii text and under…

  • Loading branch information...
1 parent 46bac29 commit 1ddde91303883b47f2215779cf45d7008377bd0d @NZKoz NZKoz committed Sep 11, 2008
Showing with 5 additions and 2 deletions.
  1. +1 −1 activesupport/lib/active_support/inflector.rb
  2. +4 −1 activesupport/test/inflector_test_cases.rb
2 activesupport/lib/active_support/inflector.rb
@@ -257,7 +257,7 @@ def demodulize(class_name_in_module)
# <%= link_to(, person_path %>
# # => <a href="/person/1-donald-e-knuth">Donald E. Knuth</a>
def parameterize(string, sep = '-')
- string.gsub(/[^a-z0-9]+/i, sep).downcase
+ string.chars.normalize(:kd).to_s.gsub(/[^\x00-\x7F]+/, '').gsub(/[^a-z0-9_\-]+/i, sep).downcase
# Create the name of a table like Rails does for models to table names. This method
5 activesupport/test/inflector_test_cases.rb
@@ -144,7 +144,10 @@ module InflectorTestCases
StringToParameterized = {
"Donald E. Knuth" => "donald-e-knuth",
- "Random text with *(bad)* characters" => "random-text-with-bad-characters"
+ "Random text with *(bad)* characters" => "random-text-with-bad-characters",
+ "Malmö" => "malmo",
+ "Garçons" => "garcons",
+ "Allow_Under_Scores" => "allow_under_scores"
UnderscoreToHuman = {

8 comments on commit 1ddde91


Nice. Shouldn’t the to_s go right after “string”, though?


to_s is to convert the Multibyte::Chars back to a string after normalization.


tarmo: Ah, right. A to_s after “string” would make it more robust for input like nil or numbers, but that might not be desired.

Ruby on Rails member

I’m not sure the nil safety is warranted. 99.999% of people will call this with String#parameterize, not Inflector.parameterize…


This method should also collapse multiple occurrences of the separator (‘foo—-bar’ => ‘foo-bar’) and strip leading/trailing occurrences (‘foo-bar’ => ‘foo-bar’).


A couple of considerations. When $KCODE isn’t set to UTF-8 in Ruby <= 1.8.6 this will break because normalize isn’t defined on String. Parameterizing non-ASCII strings results in a blank string: ‘おはよ’.parameterize => ‘’. I know that non of the other inflector methods support non-ASCII characters, what’s the verdict on this?


I updated Slugalizer based on some of the code traded in the parameterize comments. The biggest change was that is now turns e.g. “” into “foo-bar-com” instead of “foobarcom” – but it still squeezes multiple separators and removes leading/trailing separators, so " ! foo— ! " becomes “foo-dash-bar-com”.

I think the current version of Slugalizer has no downsides compared to the current version of parameterize, but it also handles the stuff tomstuart mentioned. It also works with other $KCODEs than ‘u’, that I can tell.

While I do think it’s good to keep it lean, if this method should be present at all, it might as well be as good as it can be – at least as long as it’s just a matter of another short line or two of code.

Regarding the blank string, I think that’s perfectly reasonable. It would certainly be more useful if Japanese etc were transcribed, but I think then we’re firmly in plugin country (see Stringex).


Thanks, NZKoz!

Also check this ticket

Please sign in to comment.