Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Odd number parsing #10

Closed
veesahni opened this issue Jan 9, 2020 · 5 comments · Fixed by #11
Closed

Odd number parsing #10

veesahni opened this issue Jan 9, 2020 · 5 comments · Fixed by #11
Labels

Comments

@veesahni
Copy link

veesahni commented Jan 9, 2020

Environment:
MRI Ruby 2.6.5
Crass 1.05

Real excerpt from an email sent by Outlook:

p.5e1367490fa5f06927cafe55msonormal {
  mso-style-name:5e1367490fa5f06927cafe55msonormal;
}

Test code:

require 'crass'
require 'pp'

str =  <<END
p.5e1367490fa5f06927cafe55msonormal {
  mso-style-name:5e1367490fa5f06927cafe55msonormal;
}
END

PP.pp Crass.parse str

Output:
Can't paste on github since output is > 65KB. There are some giant numbers that result from attempting to convert "5e1367490fa5f06927cafe55msonormal" into a number.

Analysis:
It appears that Crass is trying to parse these names (class name, style name) into a number. This is ending up in a giant number that results in large amounts of memory usage in the generated parse tree. Is this a bug?

@rgrove
Copy link
Owner

rgrove commented Jan 9, 2020

Fascinating!

Your analysis seems to be correct. It looks to me like Crass is actually parsing this CSS correctly according to the spec, which results in 5e1367490 being treated as a number token representing the number 5×10^1367490.

In this case the value of the number token shouldn't really matter since it ultimately won't be used as a number. But Crass doesn't impose an upper bound on numbers, and this is a ridiculously large one, which is what's causing the problem. Seems like the right thing to do here would probably be to clamp number values to a maximum value, but I (or someone) will need to do some research to see what, if anything, the available standards say about this in order to figure out what that number should be.

@rgrove rgrove added the bug label Jan 9, 2020
@veesahni
Copy link
Author

veesahni commented Jan 9, 2020

Attempting to represent this in ruby:

2.6.5 :085 > 5e1367490
 => Infinity
2.6.5 :086 > 5e13674
 => Infinity
2.6.5 :087 > 5e1367
 => Infinity
2.6.5 :088 > 5e136
 => 5.0e+136
2.6.5 :089 > 5e136.class
 => Float
2.6.5 :090 > 5e136 == 5e136.to_i
 => true
2.6.5 :091 > 5e136.to_i
 => 50000000000000001642078124460246303949350628317980584775615671312937350344939399777200065781386370634197475239216121778932424531710574592

Note that 5e1367490.to_i raises an exception (since it's Infinity!). So ultimately this is a larger number than Ruby will easily let you set in the first place. It seems you're able to even generate this in Crass as a result of the formula that came from the CSS spec (tokenizer.rb/convert_string_to_number).

An exponent can be either a very large number or a very small number, so it's normally represented as a float (which does have limits). However, in this exact case, the giant number (5×10^1367490) is being handled as a Bignum internally and I can't find any reference to limits there except those imposed by system resources.

Suggestions for limits

Since Ruby would normally represent an exponent as a Float, we could use the MAX/MIN values provided by ruby-core / Float:

MAX
The largest possible integer in a double-precision floating point number.

Usually defaults to 1.7976931348623157e+308.

MAX_10_EXP
The largest positive exponent in a double-precision floating point where 10 raised to this power minus 1.

Usually defaults to 308.

MIN
The smallest positive normalized number in a double-precision floating point.

Usually defaults to 2.2250738585072014e-308.

If the platform supports denormalized numbers, there are numbers between zero and Float::MIN. 0.0.next_float returns the smallest positive floating point number including denormalized numbers.

MIN_10_EXP
The smallest negative exponent in a double-precision floating point where 10 raised to this power minus 1.

Usually defaults to -307.

Suggestion to convert to float

Currently, Crass represents 5e100000 as an Integer and 5e-100000 as a Rational.

If exponents are typically represents as floats, the principle of least surprise would suggest that they should probably be parsed into a ruby Float in the first place... It also appears to be the more memory efficient option:

2.6.5 :187 > ObjectSpace.memsize_of(1e305)
 => 40
2.6.5 :188 > ObjectSpace.memsize_of(1e305.to_i)
 => 168

@rgrove
Copy link
Owner

rgrove commented Jan 10, 2020

@veesahni Thanks for that excellent summary. I agree that using a Float sounds like the best option.

I may have time to work on this this weekend, but if you need this fixed sooner a PR would be welcome!

rgrove added a commit that referenced this issue Jan 12, 2020
Number values are now limited to a maximum of `Float::MAX` and a minimum
of negative `Float::MAX`.

Internally, `Integer` is now used for numbers parsed as the "integer"
type (as defined in the spec), while `Float` is used for numbers parsed
as the "number" type.

Fixes #10
@rgrove
Copy link
Owner

rgrove commented Jan 12, 2020

This is fixed in PR #11. @veesahni, would you mind trying that change out and verifying that it solves your problem?

@veesahni
Copy link
Author

In real world samples, we give Crass hundreds/thousands such exponent looking names in a single call to parse. The giant numbers were causing RSS (resident set size) to balloon. Sending similar samples through the new code keeps RSS in check - so yes, this solves the problem!

Appreciate the quick turn around :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants