Skip to content

Loading…

Use squeeze instead of gsub in SquishNormalizer #53

Closed
wants to merge 2 commits into from

2 participants

@samleb

According to these benchmarks, using squeeze is an order of magnitude faster than gsub when dealing with squishing/squeezing spaces on almost all Ruby implementations, except on Rubinius where it doesn't make much of a difference.
Here is a patch that replaces gsub by squeeze in SquishNormalizer.normalize, with all tests green :)

@mnaberez mnaberez commented on an outdated diff
...attribute_normalizer/normalizers/squish_normalizer.rb
@@ -2,7 +2,7 @@ module AttributeNormalizer
module Normalizers
module SquishNormalizer
def self.normalize(value, options = {})
- value.is_a?(String) ? value.strip.gsub(/\s+/, ' ') : value
+ value.is_a?(String) ? value.strip.squeeze : value
irb> " hello\t world".strip.gsub /\s+/, ' '
=> "hello world"
irb> " hello\t world".strip.squeeze
=> "helo\t world"
@samleb
samleb added a note

You're making an excellent point: the behavior of this normalizer regarding tabs & newlines isn't specified (only tested with normal spaces in specs), nor clearly documented:

  • :squish Will strip leading and trailing whitespace and convert any consecutive spaces to one space each

I was personally using this normalizer on varchar columns (usually represented with a simple <input> field, thus not really facing the tab/newline problem), but as soon as I went using it on text columns (with <textarea>s), I realized it was dropping newlines, a behavior that didn't suit my needs (basic simple_format), and that I personally found surprising, and deceptive according to the documentation.

IMHO, either the documentation & specs have to be updated accordingly (which makes this normalizer almost useless for multilines attributes), or the implementation has to be modified to follow the documentation (which will not be backward compatible).

BTW, maybe another normalizer could be introduced for multiline attributes, to remove all control characters except for newlines:

config.normalizers[:remove_control_characters] = lambda do |value, options|
  value.try(:gsub, /[[:cntrl:]&&[^\n]]/, '')
end

Anyway, I'd be happy to help :)

Also:

irb> " hello\t world".strip.squeeze
=> "helo\t world"

Note the missing l.

@samleb
samleb added a note

Oh sorry about that I've totally missed the ' ' argument and thought the missing l was a typo.
I've corrected this in 748e2bd which is included in the PR.
I've also updated the original benchmarks to reflect this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mnaberez

ActiveSupport provides a String#squish method. Currently AttributeNormalizer has an implementation like that in Rails 3. The ActiveSupport implementation changed in Rails 4. I assume AttributeNormalizer does not just call String#squish to avoid a dependency on ActiveSupport.

I am not in favor of changing this normalizer to use squeeze(' ') because I think it would be surprising for Rails users to have a normalizer called squish that behaves quite differently from String#squish. I do agree that the specs for this normalizer should be improved.

@samleb

Oh, sorry about that, I've never really came across this ActiveSupport String#squish method but I completely agree with you now that it'd be surprising indeed if the SquishNormalizer didn't behave like String#squish does.
Anyway I think that the WhitespaceNormalizer does a perfect job as dealing with repeating spaces (and removing carriage returns too BTW), so I guess that was the behavior I was looking for in actual applications, it's just that as it wasn't in 1.1.0, I thought SquishNormalizer could do the job.
Thanks again and sorry for the disturbance.

@samleb samleb closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Mar 9, 2014
  1. @samleb
Commits on Mar 12, 2014
  1. @samleb

    Fix missing argument in SquishNormalizer.normalize, update correspond…

    samleb committed
    …ing spec to avoid regression
Showing with 2 additions and 2 deletions.
  1. +1 −1 lib/attribute_normalizer/normalizers/squish_normalizer.rb
  2. +1 −1 spec/author_spec.rb
View
2 lib/attribute_normalizer/normalizers/squish_normalizer.rb
@@ -2,7 +2,7 @@ module AttributeNormalizer
module Normalizers
module SquishNormalizer
def self.normalize(value, options = {})
- value.is_a?(String) ? value.strip.gsub(/\s+/, ' ') : value
+ value.is_a?(String) ? value.strip.squeeze(' ') : value
end
end
end
View
2 spec/author_spec.rb
@@ -13,7 +13,7 @@
it { should normalize_attribute(:first_name).from(' ').to('') }
# :squish normalizer
- it { should normalize_attribute(:nickname).from(' this nickname ').to('this nickname') }
+ it { should normalize_attribute(:nickname).from(' hello world ').to('hello world') }
# :blank normalizer
it { should normalize_attribute(:last_name).from('').to(nil) }
Something went wrong with that request. Please try again.