Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inflector support for acronyms (Issue #1366) #1648

Merged
merged 3 commits into from Jun 24, 2011
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
@@ -1,5 +1,4 @@
require 'active_support/inflector/methods'
require 'active_support/inflector/inflections'
require 'active_support/inflector/transliterate'

# String inflections define new methods on the String class to transform names for different purposes.
Expand Down
145 changes: 53 additions & 92 deletions activesupport/lib/active_support/inflector/inflections.rb
Expand Up @@ -20,10 +20,61 @@ def self.instance
@__instance__ ||= new
end

attr_reader :plurals, :singulars, :uncountables, :humans
attr_reader :plurals, :singulars, :uncountables, :humans, :acronyms, :acronym_regex

def initialize
@plurals, @singulars, @uncountables, @humans = [], [], [], []
@plurals, @singulars, @uncountables, @humans, @acronyms, @acronym_regex = [], [], [], [], {}, /(?=a)b/
end

# Specifies a new acronym. An acronym must be specified as it will appear in a camelized string. An underscore
# string that contains the acronym will retain the acronym when passed to `camelize`, `humanize`, or `titleize`.
# A camelized string that contains the acronym will maintain the acronym when titleized or humanized, and will
# convert the acronym into a non-delimited single lowercase word when passed to +underscore+.
#
# Examples:
# acronym 'HTML'
# titleize 'html' #=> 'HTML'
# camelize 'html' #=> 'HTML'
# underscore 'MyHTML' #=> 'my_html'
#
# The acronym, however, must occur as a delimited unit and not be part of another word for conversions to recognize it:
#
# acronym 'HTTP'
# camelize 'my_http_delimited' #=> 'MyHTTPDelimited'
# camelize 'https' #=> 'Https', not 'HTTPs'
# underscore 'HTTPS' #=> 'http_s', not 'https'
#
# acronym 'HTTPS'
# camelize 'https' #=> 'HTTPS'
# underscore 'HTTPS' #=> 'https'
#
# Note: Acronyms that are passed to `pluralize` will no longer be recognized, since the acronym will not occur as
# a delimited unit in the pluralized result. To work around this, you must specify the pluralized form as an
# acronym as well:
#
# acronym 'API'
# camelize(pluralize('api')) #=> 'Apis'
#
# acronym 'APIs'
# camelize(pluralize('api')) #=> 'APIs'
#
# `acronym` may be used to specify any word that contains an acronym or otherwise needs to maintain a non-standard
# capitalization. The only restriction is that the word must begin with a capital letter.
#
# Examples:
# acronym 'RESTful'
# underscore 'RESTful' #=> 'restful'
# underscore 'RESTfulController' #=> 'restful_controller'
# titleize 'RESTfulController' #=> 'RESTful Controller'
# camelize 'restful' #=> 'RESTful'
# camelize 'restful_controller' #=> 'RESTfulController'
#
# acronym 'McDonald'
# underscore 'McDonald' #=> 'mcdonald'
# camelize 'mcdonald' #=> 'McDonald'
def acronym(word)
@acronyms[word.downcase] = word
@acronym_regex = /#{@acronyms.values.join("|")}/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a method ?

def acronym_regex
    /#{@acronyms.values.join("|")}/
end

You wouldn't have to redefine the variable for every acronym, making bootstrapping a bit faster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to set the variable when an acronym is added. Otherwise, the regex may need to be recompiled each time underscore, camelize, etc. are called. Basically, the regex is cached in @acronym_regex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmathieu: It's like what @dasch says. I'm trading bootstrapping speed for runtime speed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but if you have 100 acronyms, you define it 100 times.
You could, when you add an acronym, define it to nil.

And have the acronym_regex method use @acronym_regex ||= /regex/

That way, you don't define it only when it's needed and as long as it doesn't change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought of that, but I think resetting the regex variable 100 times at bootstrap is better than running the @acronym_regex ||= /regex/ conditional set every time acronym_regex() is called (which could be millions or billions of times).

I noticed a 70% slowdown when using the conditional set instead of the attr_reader.

end

# Specifies a new pluralization rule and its replacement. The rule can either be a string or a regular expression.
Expand Down Expand Up @@ -117,95 +168,5 @@ def inflections
Inflections.instance
end
end

# Returns the plural form of the word in the string.
#
# Examples:
# "post".pluralize # => "posts"
# "octopus".pluralize # => "octopi"
# "sheep".pluralize # => "sheep"
# "words".pluralize # => "words"
# "CamelOctopus".pluralize # => "CamelOctopi"
def pluralize(word)
result = word.to_s.dup

if word.empty? || inflections.uncountables.include?(result.downcase)
result
else
inflections.plurals.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
result
end
end

# The reverse of +pluralize+, returns the singular form of a word in a string.
#
# Examples:
# "posts".singularize # => "post"
# "octopi".singularize # => "octopus"
# "sheep".singularize # => "sheep"
# "word".singularize # => "word"
# "CamelOctopi".singularize # => "CamelOctopus"
def singularize(word)
result = word.to_s.dup

if inflections.uncountables.any? { |inflection| result =~ /\b(#{inflection})\Z/i }
result
else
inflections.singulars.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
result
end
end

# Capitalizes the first word and turns underscores into spaces and strips a
# trailing "_id", if any. Like +titleize+, this is meant for creating pretty output.
#
# Examples:
# "employee_salary" # => "Employee salary"
# "author_id" # => "Author"
def humanize(lower_case_and_underscored_word)
result = lower_case_and_underscored_word.to_s.dup

inflections.humans.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
result.gsub(/_id$/, "").gsub(/_/, " ").capitalize
end

# Capitalizes all the words and replaces some characters in the string to create
# a nicer looking title. +titleize+ is meant for creating pretty output. It is not
# used in the Rails internals.
#
# +titleize+ is also aliased as as +titlecase+.
#
# Examples:
# "man from the boondocks".titleize # => "Man From The Boondocks"
# "x-men: the last stand".titleize # => "X Men: The Last Stand"
def titleize(word)
humanize(underscore(word)).gsub(/\b('?[a-z])/) { $1.capitalize }
end

# Create the name of a table like Rails does for models to table names. This method
# uses the +pluralize+ method on the last word in the string.
#
# Examples
# "RawScaledScorer".tableize # => "raw_scaled_scorers"
# "egg_and_ham".tableize # => "egg_and_hams"
# "fancyCategory".tableize # => "fancy_categories"
def tableize(class_name)
pluralize(underscore(class_name))
end

# Create a class name from a plural table name like Rails does for table names to models.
# Note that this returns a string and not a Class. (To convert to an actual class
# follow +classify+ with +constantize+.)
#
# Examples:
# "egg_and_hams".classify # => "EggAndHam"
# "posts".classify # => "Post"
#
# Singular names are not handled correctly:
# "business".classify # => "Busines"
def classify(table_name)
# strip out any leading schema name
camelize(singularize(table_name.to_s.sub(/.*\./, '')))
end
end
end
105 changes: 100 additions & 5 deletions activesupport/lib/active_support/inflector/methods.rb
@@ -1,3 +1,5 @@
require 'active_support/inflector/inflections'

module ActiveSupport
# The Inflector transforms words from singular to plural, class names to table names, modularized class names to ones without,
# and class names to foreign keys. The default inflections for pluralization, singularization, and uncountable words are kept
Expand All @@ -10,6 +12,44 @@ module ActiveSupport
module Inflector
extend self

# Returns the plural form of the word in the string.
#
# Examples:
# "post".pluralize # => "posts"
# "octopus".pluralize # => "octopi"
# "sheep".pluralize # => "sheep"
# "words".pluralize # => "words"
# "CamelOctopus".pluralize # => "CamelOctopi"
def pluralize(word)
result = word.to_s.dup

if word.empty? || inflections.uncountables.include?(result.downcase)
result
else
inflections.plurals.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
result
end
end

# The reverse of +pluralize+, returns the singular form of a word in a string.
#
# Examples:
# "posts".singularize # => "post"
# "octopi".singularize # => "octopus"
# "sheep".singularize # => "sheep"
# "word".singularize # => "word"
# "CamelOctopi".singularize # => "CamelOctopus"
def singularize(word)
result = word.to_s.dup

if inflections.uncountables.any? { |inflection| result =~ /\b(#{inflection})\Z/i }
result
else
inflections.singulars.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
result
end
end

# By default, +camelize+ converts strings to UpperCamelCase. If the argument to +camelize+
# is set to <tt>:lower</tt> then +camelize+ produces lowerCamelCase.
#
Expand All @@ -25,12 +65,14 @@ module Inflector
# though there are cases where that does not hold:
#
# "SSLError".underscore.camelize # => "SslError"
def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
if first_letter_in_uppercase
lower_case_and_underscored_word.to_s.gsub(/\/(.?)/) { "::#{$1.upcase}" }.gsub(/(?:^|_)(.)/) { $1.upcase }
def camelize(term, uppercase_first_letter = true)
string = term.to_s
if uppercase_first_letter
string = string.sub(/^[a-z\d]*/) { inflections.acronyms[$&] || $&.capitalize }
else
lower_case_and_underscored_word.to_s[0].chr.downcase + camelize(lower_case_and_underscored_word)[1..-1]
string = string.sub(/^(?:#{inflections.acronym_regex}(?=\b|[A-Z_])|\w)/) { $&.downcase }
end
string.gsub(/(?:_|(\/))([a-z\d]*)/i) { "#{$1}#{inflections.acronyms[$2] || $2.capitalize}" }.gsub('/', '::')
end

# Makes an underscored, lowercase form from the expression in the string.
Expand All @@ -48,13 +90,66 @@ def camelize(lower_case_and_underscored_word, first_letter_in_uppercase = true)
def underscore(camel_cased_word)
word = camel_cased_word.to_s.dup
word.gsub!(/::/, '/')
word.gsub!(/([A-Z]+)([A-Z][a-z])/,'\1_\2')
word.gsub!(/(?:([A-Za-z\d])|^)(#{inflections.acronym_regex})(?=\b|[^a-z])/) { "#{$1}#{$1 && '_'}#{$2.downcase}" }
word.gsub!(/([A-Z\d]+)([A-Z][a-z])/,'\1_\2')
word.gsub!(/([a-z\d])([A-Z])/,'\1_\2')
word.tr!("-", "_")
word.downcase!
word
end

# Capitalizes the first word and turns underscores into spaces and strips a
# trailing "_id", if any. Like +titleize+, this is meant for creating pretty output.
#
# Examples:
# "employee_salary" # => "Employee salary"
# "author_id" # => "Author"
def humanize(lower_case_and_underscored_word)
result = lower_case_and_underscored_word.to_s.dup
inflections.humans.each { |(rule, replacement)| break if result.gsub!(rule, replacement) }
result.gsub!(/_id$/, "")
result.gsub(/(_)?([a-z\d]*)/i) { "#{$1 && ' '}#{inflections.acronyms[$2] || $2.downcase}" }.gsub(/^\w/) { $&.upcase }
end

# Capitalizes all the words and replaces some characters in the string to create
# a nicer looking title. +titleize+ is meant for creating pretty output. It is not
# used in the Rails internals.
#
# +titleize+ is also aliased as as +titlecase+.
#
# Examples:
# "man from the boondocks".titleize # => "Man From The Boondocks"
# "x-men: the last stand".titleize # => "X Men: The Last Stand"
def titleize(word)
humanize(underscore(word)).gsub(/\b('?[a-z])/) { $1.capitalize }
end

# Create the name of a table like Rails does for models to table names. This method
# uses the +pluralize+ method on the last word in the string.
#
# Examples
# "RawScaledScorer".tableize # => "raw_scaled_scorers"
# "egg_and_ham".tableize # => "egg_and_hams"
# "fancyCategory".tableize # => "fancy_categories"
def tableize(class_name)
pluralize(underscore(class_name))
end

# Create a class name from a plural table name like Rails does for table names to models.
# Note that this returns a string and not a Class. (To convert to an actual class
# follow +classify+ with +constantize+.)
#
# Examples:
# "egg_and_hams".classify # => "EggAndHam"
# "posts".classify # => "Post"
#
# Singular names are not handled correctly:
# "business".classify # => "Busines"
def classify(table_name)
# strip out any leading schema name
camelize(singularize(table_name.to_s.sub(/.*\./, '')))
end

# Replaces underscores with dashes in the string.
#
# Example:
Expand Down