Odd number parsing #10

veesahni · 2020-01-09T19:58:39Z

Environment:
MRI Ruby 2.6.5
Crass 1.05

Real excerpt from an email sent by Outlook:

p.5e1367490fa5f06927cafe55msonormal {
  mso-style-name:5e1367490fa5f06927cafe55msonormal;
}

Test code:

require 'crass'
require 'pp'

str =  <<END
p.5e1367490fa5f06927cafe55msonormal {
  mso-style-name:5e1367490fa5f06927cafe55msonormal;
}
END

PP.pp Crass.parse str

Output:
Can't paste on github since output is > 65KB. There are some giant numbers that result from attempting to convert "5e1367490fa5f06927cafe55msonormal" into a number.

Analysis:
It appears that Crass is trying to parse these names (class name, style name) into a number. This is ending up in a giant number that results in large amounts of memory usage in the generated parse tree. Is this a bug?

The text was updated successfully, but these errors were encountered:

rgrove · 2020-01-09T21:52:04Z

Fascinating!

Your analysis seems to be correct. It looks to me like Crass is actually parsing this CSS correctly according to the spec, which results in 5e1367490 being treated as a number token representing the number 5×10^1367490.

In this case the value of the number token shouldn't really matter since it ultimately won't be used as a number. But Crass doesn't impose an upper bound on numbers, and this is a ridiculously large one, which is what's causing the problem. Seems like the right thing to do here would probably be to clamp number values to a maximum value, but I (or someone) will need to do some research to see what, if anything, the available standards say about this in order to figure out what that number should be.

veesahni · 2020-01-09T23:55:35Z

Attempting to represent this in ruby:

2.6.5 :085 > 5e1367490
 => Infinity
2.6.5 :086 > 5e13674
 => Infinity
2.6.5 :087 > 5e1367
 => Infinity
2.6.5 :088 > 5e136
 => 5.0e+136
2.6.5 :089 > 5e136.class
 => Float
2.6.5 :090 > 5e136 == 5e136.to_i
 => true
2.6.5 :091 > 5e136.to_i
 => 50000000000000001642078124460246303949350628317980584775615671312937350344939399777200065781386370634197475239216121778932424531710574592

Note that 5e1367490.to_i raises an exception (since it's Infinity!). So ultimately this is a larger number than Ruby will easily let you set in the first place. It seems you're able to even generate this in Crass as a result of the formula that came from the CSS spec (tokenizer.rb/convert_string_to_number).

An exponent can be either a very large number or a very small number, so it's normally represented as a float (which does have limits). However, in this exact case, the giant number (5×10^1367490) is being handled as a Bignum internally and I can't find any reference to limits there except those imposed by system resources.

Suggestions for limits

Since Ruby would normally represent an exponent as a Float, we could use the MAX/MIN values provided by ruby-core / Float:

MAX
The largest possible integer in a double-precision floating point number.

Usually defaults to 1.7976931348623157e+308.

MAX_10_EXP
The largest positive exponent in a double-precision floating point where 10 raised to this power minus 1.

Usually defaults to 308.

MIN
The smallest positive normalized number in a double-precision floating point.

Usually defaults to 2.2250738585072014e-308.

If the platform supports denormalized numbers, there are numbers between zero and Float::MIN. 0.0.next_float returns the smallest positive floating point number including denormalized numbers.

MIN_10_EXP
The smallest negative exponent in a double-precision floating point where 10 raised to this power minus 1.

Usually defaults to -307.

Suggestion to convert to float

Currently, Crass represents 5e100000 as an Integer and 5e-100000 as a Rational.

If exponents are typically represents as floats, the principle of least surprise would suggest that they should probably be parsed into a ruby Float in the first place... It also appears to be the more memory efficient option:

2.6.5 :187 > ObjectSpace.memsize_of(1e305)
 => 40
2.6.5 :188 > ObjectSpace.memsize_of(1e305.to_i)
 => 168

rgrove · 2020-01-10T17:49:19Z

@veesahni Thanks for that excellent summary. I agree that using a Float sounds like the best option.

I may have time to work on this this weekend, but if you need this fixed sooner a PR would be welcome!

Number values are now limited to a maximum of `Float::MAX` and a minimum of negative `Float::MAX`. Internally, `Integer` is now used for numbers parsed as the "integer" type (as defined in the spec), while `Float` is used for numbers parsed as the "number" type. Fixes #10

rgrove · 2020-01-12T05:12:36Z

This is fixed in PR #11. @veesahni, would you mind trying that change out and verifying that it solves your problem?

veesahni · 2020-01-12T16:59:46Z

In real world samples, we give Crass hundreds/thousands such exponent looking names in a single call to parse. The giant numbers were causing RSS (resident set size) to balloon. Sending similar samples through the new code keeps RSS in check - so yes, this solves the problem!

Appreciate the quick turn around :)

rgrove added the bug label Jan 9, 2020

rgrove mentioned this issue Jan 12, 2020

Limit number values to a sensible range #11

Merged

veesahni closed this as completed Jan 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Odd number parsing #10

Odd number parsing #10

veesahni commented Jan 9, 2020 •

edited

Loading

rgrove commented Jan 9, 2020

veesahni commented Jan 9, 2020

rgrove commented Jan 10, 2020

rgrove commented Jan 12, 2020

veesahni commented Jan 12, 2020

Odd number parsing #10

Odd number parsing #10

Comments

veesahni commented Jan 9, 2020 • edited Loading

rgrove commented Jan 9, 2020

veesahni commented Jan 9, 2020

rgrove commented Jan 10, 2020

rgrove commented Jan 12, 2020

veesahni commented Jan 12, 2020

veesahni commented Jan 9, 2020 •

edited

Loading