Inflector support for acronyms (Issue #1366) #1648

Merged
merged 3 commits into from Jun 24, 2011
View
1 activesupport/lib/active_support/core_ext/string/inflections.rb
@@ -1,5 +1,4 @@
require 'active_support/inflector/methods'
-require 'active_support/inflector/inflections'
require 'active_support/inflector/transliterate'
# String inflections define new methods on the String class to transform names for different purposes.
View
145 activesupport/lib/active_support/inflector/inflections.rb
@@ -20,10 +20,61 @@ def self.instance
@__instance__ ||= new
end
- attr_reader :plurals, :singulars, :uncountables, :humans
+ attr_reader :plurals, :singulars, :uncountables, :humans, :acronyms, :acronym_regex
def initialize
- @plurals, @singulars, @uncountables, @humans = [], [], [], []
+ @plurals, @singulars, @uncountables, @humans, @acronyms, @acronym_regex = [], [], [], [], {}, /(?=a)b/
+ end
+
+ # Specifies a new acronym. An acronym must be specified as it will appear in a camelized string. An underscore
+ # string that contains the acronym will retain the acronym when passed to `camelize`, `humanize`, or `titleize`.
+ # A camelized string that contains the acronym will maintain the acronym when titleized or humanized, and will
+ # convert the acronym into a non-delimited single lowercase word when passed to +underscore+.
+ #
+ # Examples:
+ # acronym 'HTML'
+ # titleize 'html' #=> 'HTML'
+ # camelize 'html' #=> 'HTML'
+ # underscore 'MyHTML' #=> 'my_html'
+ #
+ # The acronym, however, must occur as a delimited unit and not be part of another word for conversions to recognize it:
+ #
+ # acronym 'HTTP'
+ # camelize 'my_http_delimited' #=> 'MyHTTPDelimited'
+ # camelize 'https' #=> 'Https', not 'HTTPs'
+ # underscore 'HTTPS' #=> 'http_s', not 'https'
+ #
+ # acronym 'HTTPS'
+ # camelize 'https' #=> 'HTTPS'
+ # underscore 'HTTPS' #=> 'https'
+ #
+ # Note: Acronyms that are passed to `pluralize` will no longer be recognized, since the acronym will not occur as
+ # a delimited unit in the pluralized result. To work around this, you must specify the pluralized form as an
+ # acronym as well:
+ #
+ # acronym 'API'
+ # camelize(pluralize('api')) #=> 'Apis'
+ #
+ # acronym 'APIs'
+ # camelize(pluralize('api')) #=> 'APIs'
+ #
+ # `acronym` may be used to specify any word that contains an acronym or otherwise needs to maintain a non-standard
+ # capitalization. The only restriction is that the word must begin with a capital letter.
+ #
+ # Examples:
+ # acronym 'RESTful'
+ # underscore 'RESTful' #=> 'restful'
+ # underscore 'RESTfulController' #=> 'restful_controller'
+ # titleize 'RESTfulController' #=> 'RESTful Controller'
+ # camelize 'restful' #=> 'RESTful'
+ # camelize 'restful_controller' #=> 'RESTfulController'
+ #
+ # acronym 'McDonald'
+ # underscore 'McDonald' #=> 'mcdonald'
+ # camelize 'mcdonald' #=> 'McDonald'
+ def acronym(word)
+ @acronyms[word.downcase] = word
+ @acronym_regex = /#{@acronyms.values.join("|")}/

Why not make this a method ?

def acronym_regex
    /#{@acronyms.values.join("|")}/
end

You wouldn't have to redefine the variable for every acronym, making bootstrapping a bit faster.

@dasch
dasch added a note Jun 12, 2011

I think it's best to set the variable when an acronym is added. Otherwise, the regex may need to be recompiled each time underscore, camelize, etc. are called. Basically, the regex is cached in @acronym_regex.

@dlee
dlee added a note Jun 13, 2011

@dmathieu: It's like what @dasch says. I'm trading bootstrapping speed for runtime speed.

Yeah, but if you have 100 acronyms, you define it 100 times.
You could, when you add an acronym, define it to nil.

And have the acronym_regex method use @acronym_regex ||= /regex/

That way, you don't define it only when it's needed and as long as it doesn't change.

@dlee
dlee added a note Jun 13, 2011

Yeah, I thought of that, but I think resetting the regex variable 100 times at bootstrap is better than running the @acronym_regex ||= /regex/ conditional set every time acronym_regex() is called (which could be millions or billions of times).

I noticed a 70% slowdown when using the conditional set instead of the attr_reader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
end
# Specifies a new pluralization rule and its replacement. The rule can either be a string or a regular expression.
@@ -117,95 +168,5 @@ def inflections
Inflections.instance
end
end
-
- # Returns the plural form of the word in the string.
- #
- # Examples:
- # "post".pluralize # => "posts"
- # "octopus".pluralize # => "octopi"
- # "sheep".pluralize # => "sheep"
- # "words".pluralize # => "words"
- # "CamelOctopus".pluralize # => "CamelOctopi"
- def pluralize(word)
- result = word.to_s.dup
-
- if word.empty? || inflections.uncountables.include?(result.downcase)
- result
- else
- inflections.plurals.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
- result
- end
- end
-
- # The reverse of +pluralize+, returns the singular form of a word in a string.
- #
- # Examples:
- # "posts".singularize # => "post"
- # "octopi".singularize # => "octopus"
- # "sheep".singularize # => "sheep"
- # "word".singularize # => "word"
- # "CamelOctopi".singularize # => "CamelOctopus"
- def singularize(word)
- result = word.to_s.dup
-
- if inflections.uncountables.any? { |inflection| result =~ /\b(#{inflection})\Z/i }
- result
- else
- inflections.singulars.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
- result
- end
- end
-
- # Capitalizes the first word and turns underscores into spaces and strips a
- # trailing "_id", if any. Like +titleize+, this is meant for creating pretty output.
- #
- # Examples:
- # "employee_salary" # => "Employee salary"
- # "author_id" # => "Author"
- def humanize(lower_case_and_underscored_word)
- result = lower_case_and_underscored_word.to_s.dup
-
- inflections.humans.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
- result.gsub(/_id$/, "").gsub(/_/, " ").capitalize
- end
-
- # Capitalizes all the words and replaces some characters in the string to create
- # a nicer looking title. +titleize+ is meant for creating pretty output. It is not
- # used in the Rails internals.
- #
- # +titleize+ is also aliased as as +titlecase+.
- #
- # Examples:
- # "man from the boondocks".titleize # => "Man From The Boondocks"
- # "x-men: the last stand".titleize # => "X Men: The Last Stand"
- def titleize(word)
- humanize(underscore(word)).gsub(/\b('?[a-z])/) { $1.capitalize }
- end
-
- # Create the name of a table like Rails does for models to table names. This method
- # uses the +pluralize+ method on the last word in the string.
- #
- # Examples
- # "RawScaledScorer".tableize # => "raw_scaled_scorers"
- # "egg_and_ham".tableize # => "egg_and_hams"
- # "fancyCategory".tableize # => "fancy_categories"
- def tableize(class_name)
- pluralize(underscore(class_name))
- end
-
- # Create a class name from a plural table name like Rails does for table names to models.
- # Note that this returns a string and not a Class. (To convert to an actual class
- # follow +classify+ with +constantize+.)
- #
- # Examples:
- # "egg_and_hams".classify # => "EggAndHam"
- # "posts".classify # => "Post"
- #
- # Singular names are not handled correctly:
- # "business".classify # => "Busines"
- def classify(table_name)
- # strip out any leading schema name
- camelize(singularize(table_name.to_s.sub(/.*\./, '')))
- end
end
end
View
105 activesupport/lib/active_support/inflector/methods.rb
@@ -1,3 +1,5 @@
+require 'active_support/inflector/inflections'
+
module ActiveSupport
# The Inflector transforms words from singular to plural, class names to table names, modularized class names to ones without,
# and class names to foreign keys. The default inflections for pluralization, singularization, and uncountable words are kept
@@ -10,6 +12,44 @@ module ActiveSupport
module Inflector
extend self
+ # Returns the plural form of the word in the string.
+ #
+ # Examples:
+ # "post".pluralize # => "posts"
+ # "octopus".pluralize # => "octopi"
+ # "sheep".pluralize # => "sheep"
+ # "words".pluralize # => "words"
+ # "CamelOctopus".pluralize # => "CamelOctopi"
+ def pluralize(word)
+ result = word.to_s.dup
+
+ if word.empty? || inflections.uncountables.include?(result.downcase)
+ result
+ else
+ inflections.plurals.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
+ result
+ end
+ end
+
+ # The reverse of +pluralize+, returns the singular form of a word in a string.
+ #
+ # Examples:
+ # "posts".singularize # => "post"
+ # "octopi".singularize # => "octopus"
+ # "sheep".singularize # => "sheep"
+ # "word".singularize # => "word"
+ # "CamelOctopi".singularize # => "CamelOctopus"
+ def singularize(word)
+ result = word.to_s.dup
+
+ if inflections.uncountables.any? { |inflection| result =~ /\b(#{inflection})\Z/i }
+ result
+ else
+ inflections.singulars.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
+ result
+ end
+ end
+
# By default, +camelize+ converts strings to UpperCamelCase. If the argument to +camelize+
# is set to <tt>:lower</tt> then +camelize+ produces lowerCamelCase.
#
@@ -25,12 +65,14 @@ module Inflector
# though there are cases where that does not hold:
#
# "SSLError".underscore.camelize # => "SslError"
- def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
- if first_letter_in_uppercase
- lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
+ def camelize(term, uppercase_first_letter = true)
+ string = term.to_s
+ if uppercase_first_letter
+ string = string.sub(/^[a-z\d]*/) { inflections.acronyms[$&] || $&.capitalize }
else
- lower_case_and_underscored_word.to_s[0].chr.downcase + camelize(lower_case_and_underscored_word)[1..-1]
+ string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
end
+ string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
end
# Makes an underscored, lowercase form from the expression in the string.
@@ -48,13 +90,66 @@ def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
def underscore(camel_cased_word)
word = camel_cased_word.to_s.dup
word.gsub!(/::/, '/')
- word.gsub!(/([A-Z]+)([A-Z][a-z])/,'\1_\2')
+ word.gsub!(/(?:([A-Za-z\d])|^)(#{inflections.acronym_regex})(?=\b|[^a-z])/) { "#{$1}#{$1 && '_'}#{$2.downcase}" }
+ word.gsub!(/([A-Z\d]+)([A-Z][a-z])/,'\1_\2')
word.gsub!(/([a-z\d])([A-Z])/,'\1_\2')
word.tr!("-", "_")
word.downcase!
word
end
+ # Capitalizes the first word and turns underscores into spaces and strips a
+ # trailing "_id", if any. Like +titleize+, this is meant for creating pretty output.
+ #
+ # Examples:
+ # "employee_salary" # => "Employee salary"
+ # "author_id" # => "Author"
+ def humanize(lower_case_and_underscored_word)
+ result = lower_case_and_underscored_word.to_s.dup
+ inflections.humans.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
+ result.gsub!(/_id$/, "")
+ result.gsub(/(_)?([a-z\d]*)/i) { "#{$1 && ' '}#{inflections.acronyms[$2] || $2.downcase}" }.gsub(/^\w/) { $&.upcase }
+ end
+
+ # Capitalizes all the words and replaces some characters in the string to create
+ # a nicer looking title. +titleize+ is meant for creating pretty output. It is not
+ # used in the Rails internals.
+ #
+ # +titleize+ is also aliased as as +titlecase+.
+ #
+ # Examples:
+ # "man from the boondocks".titleize # => "Man From The Boondocks"
+ # "x-men: the last stand".titleize # => "X Men: The Last Stand"
+ def titleize(word)
+ humanize(underscore(word)).gsub(/\b('?[a-z])/) { $1.capitalize }
+ end
+
+ # Create the name of a table like Rails does for models to table names. This method
+ # uses the +pluralize+ method on the last word in the string.
+ #
+ # Examples
+ # "RawScaledScorer".tableize # => "raw_scaled_scorers"
+ # "egg_and_ham".tableize # => "egg_and_hams"
+ # "fancyCategory".tableize # => "fancy_categories"
+ def tableize(class_name)
+ pluralize(underscore(class_name))
+ end
+
+ # Create a class name from a plural table name like Rails does for table names to models.
+ # Note that this returns a string and not a Class. (To convert to an actual class
+ # follow +classify+ with +constantize+.)
+ #
+ # Examples:
+ # "egg_and_hams".classify # => "EggAndHam"
+ # "posts".classify # => "Post"
+ #
+ # Singular names are not handled correctly:
+ # "business".classify # => "Busines"
+ def classify(table_name)
+ # strip out any leading schema name
+ camelize(singularize(table_name.to_s.sub(/.*\./, '')))
+ end
+
# Replaces underscores with dashes in the string.
#
# Example:
View
86 activesupport/test/inflector_test.rb
@@ -94,6 +94,88 @@ def test_camelize_with_lower_downcases_the_first_letter
assert_equal('capital', ActiveSupport::Inflector.camelize('Capital', false))
end
+ def test_camelize_with_underscores
+ assert_equal("CamelCase", ActiveSupport::Inflector.camelize('Camel_Case'))
+ end
+
+ def test_acronyms
+ ActiveSupport::Inflector.inflections do |inflect|
+ inflect.acronym("API")
+ inflect.acronym("HTML")
+ inflect.acronym("HTTP")
+ inflect.acronym("RESTful")
+ inflect.acronym("W3C")
+ inflect.acronym("PhD")
+ inflect.acronym("RoR")
+ inflect.acronym("SSL")
+ end
+
+ # camelize underscore humanize titleize
+ [
+ ["API", "api", "API", "API"],
+ ["APIController", "api_controller", "API controller", "API Controller"],
+ ["Nokogiri::HTML", "nokogiri/html", "Nokogiri/HTML", "Nokogiri/HTML"],
+ ["HTTPAPI", "http_api", "HTTP API", "HTTP API"],
+ ["HTTP::Get", "http/get", "HTTP/get", "HTTP/Get"],
+ ["SSLError", "ssl_error", "SSL error", "SSL Error"],
+ ["RESTful", "restful", "RESTful", "RESTful"],
+ ["RESTfulController", "restful_controller", "RESTful controller", "RESTful Controller"],
+ ["IHeartW3C", "i_heart_w3c", "I heart W3C", "I Heart W3C"],
+ ["PhDRequired", "phd_required", "PhD required", "PhD Required"],
+ ["IRoRU", "i_ror_u", "I RoR u", "I RoR U"],
+ ["RESTfulHTTPAPI", "restful_http_api", "RESTful HTTP API", "RESTful HTTP API"],
+
+ # misdirection
+ ["Capistrano", "capistrano", "Capistrano", "Capistrano"],
+ ["CapiController", "capi_controller", "Capi controller", "Capi Controller"],
+ ["HttpsApis", "https_apis", "Https apis", "Https Apis"],
+ ["Html5", "html5", "Html5", "Html5"],
+ ["Restfully", "restfully", "Restfully", "Restfully"],
+ ["RoRails", "ro_rails", "Ro rails", "Ro Rails"]
+ ].each do |camel, under, human, title|
+ assert_equal(camel, ActiveSupport::Inflector.camelize(under))
+ assert_equal(camel, ActiveSupport::Inflector.camelize(camel))
+ assert_equal(under, ActiveSupport::Inflector.underscore(under))
+ assert_equal(under, ActiveSupport::Inflector.underscore(camel))
+ assert_equal(title, ActiveSupport::Inflector.titleize(under))
+ assert_equal(title, ActiveSupport::Inflector.titleize(camel))
+ assert_equal(human, ActiveSupport::Inflector.humanize(under))
+ end
+ end
+
+ def test_acronym_override
+ ActiveSupport::Inflector.inflections do |inflect|
+ inflect.acronym("API")
+ inflect.acronym("LegacyApi")
+ end
+
+ assert_equal("LegacyApi", ActiveSupport::Inflector.camelize("legacyapi"))
+ assert_equal("LegacyAPI", ActiveSupport::Inflector.camelize("legacy_api"))
+ assert_equal("SomeLegacyApi", ActiveSupport::Inflector.camelize("some_legacyapi"))
+ assert_equal("Nonlegacyapi", ActiveSupport::Inflector.camelize("nonlegacyapi"))
+ end
+
+ def test_acronyms_camelize_lower
+ ActiveSupport::Inflector.inflections do |inflect|
+ inflect.acronym("API")
+ inflect.acronym("HTML")
+ end
+
+ assert_equal("htmlAPI", ActiveSupport::Inflector.camelize("html_api", false))
+ assert_equal("htmlAPI", ActiveSupport::Inflector.camelize("htmlAPI", false))
+ assert_equal("htmlAPI", ActiveSupport::Inflector.camelize("HTMLAPI", false))
+ end
+
+ def test_underscore_acronym_sequence
+ ActiveSupport::Inflector.inflections do |inflect|
+ inflect.acronym("API")
+ inflect.acronym("HTML5")
+ inflect.acronym("HTML")
+ end
+
+ assert_equal("html5_html_api", ActiveSupport::Inflector.underscore("HTML5HTMLAPI"))
+ end
+
def test_underscore
CamelToUnderscore.each do |camel, underscore|
assert_equal(underscore, ActiveSupport::Inflector.underscore(camel))
@@ -148,8 +230,8 @@ def test_parameterize_and_normalize
end
def test_parameterize_with_custom_separator
- StringToParameterized.each do |some_string, parameterized_string|
- assert_equal(parameterized_string.gsub('-', '_'), ActiveSupport::Inflector.parameterize(some_string, '_'))
+ StringToParameterizeWithUnderscore.each do |some_string, parameterized_string|
+ assert_equal(parameterized_string, ActiveSupport::Inflector.parameterize(some_string, '_'))
end
end
View
3 activesupport/test/inflector_test_cases.rb
@@ -168,6 +168,7 @@ module InflectorTestCases
StringToParameterizeWithNoSeparator = {
"Donald E. Knuth" => "donaldeknuth",
+ "With-some-dashes" => "withsomedashes",
"Random text with *(bad)* characters" => "randomtextwithbadcharacters",
"Trailing bad characters!@#" => "trailingbadcharacters",
"!@#Leading bad characters" => "leadingbadcharacters",
@@ -179,6 +180,8 @@ module InflectorTestCases
StringToParameterizeWithUnderscore = {
"Donald E. Knuth" => "donald_e_knuth",
"Random text with *(bad)* characters" => "random_text_with_bad_characters",
+ "With-some-dashes" => "with-some-dashes",
+ "Retain_underscore" => "retain_underscore",
"Trailing bad characters!@#" => "trailing_bad_characters",
"!@#Leading bad characters" => "leading_bad_characters",
"Squeeze separators" => "squeeze_separators",
View
10 railties/guides/source/active_support_core_extensions.textile
@@ -1460,7 +1460,15 @@ end
That may be handy to compute method names in a language that follows that convention, for example JavaScript.
-INFO: As a rule of thumb you can think of +camelize+ as the inverse of +underscore+, though there are cases where that does not hold: <tt>"SSLError".underscore.camelize</tt> gives back <tt>"SslError"</tt>.
+INFO: As a rule of thumb you can think of +camelize+ as the inverse of +underscore+, though there are cases where that does not hold: <tt>"SSLError".underscore.camelize</tt> gives back <tt>"SslError"</tt>. To support cases such as this, Active Support allows you to specify acronyms in +config/initializers/inflections.rb+:
+
+<ruby>
+ActiveSupport::Inflector.inflections do |inflect|
+ inflect.acronym 'SSL'
+end
+
+"SSLError".underscore.camelize #=> "SSLError"
+</ruby>
+camelize+ is aliased to +camelcase+.
View
4 railties/guides/source/initialization.textile
@@ -661,9 +661,9 @@ require 'active_support/inflections'
require 'active_support/core_ext/string/inflections'
</ruby>
-The +active_support/inflector/methods+ file has already been required by +active_support/autoload+ and so won't be loaded again here.
+The +active_support/inflector/methods+ file has already been required by +active_support/autoload+ and so won't be loaded again here. The +activesupport/lib/active_support/inflector/inflections.rb+ is required by +active_support/inflector/methods+.
-h4. +activesupport/lib/active_support/inflector/inflections.rb+
+h4. +active_support/inflections+
This file references the +ActiveSupport::Inflector+ constant which isn't loaded by this point. But there were autoloads set up in +activesupport/lib/active_support.rb+ which will load the file which loads this constant and so then it will be defined. Then this file defines pluralization and singularization rules for words in Rails. This is how Rails knows how to pluralize "tomato" to "tomatoes".