Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Restructured node mapping so common cases are evaluated first, saving unnecessary comparisons. #93

merged 1 commit into from Dec 17, 2012


2 participants

nirvdrum commented Oct 13, 2012

The execution savings on this one weren't massive, but with this change, Regexp#=== and String#=== moved further down the hotspot list. Without #92 applied, Regexp#=== was the # 1 hotspot. With it applied, it moved down to # 3. With this change applied on top of that, it moves down to # 6. String#=== moves from # 11 to # 37.

I was testing this change out with two competing YAML files. One makes heavy use of serialized ruby/object and the other makes heavy use of ruby/struct. I gave ruby/struct preference, but it was rather arbitrary. I broke the tie by giving preference to VCR, which serializes a lot of structs, and seems to be a common use of Psych. I could go either way with it though.

Another interesting idea would be to implement the matches as a self-organizing list, which would allow the engine to adapt to the underlying document structure. Each match would have to be order-independent, but I think in this case they are (accounted for by moving Rational and Complex under the general ruby/object match).


tenderlove commented Oct 13, 2012

Rather than doing this, why don't we combine all of the when to 1 regexp? Then we could delegate to a method based on a hash lookup. e.g. this:

        when /^!ruby\/array:(.*)$/
          # ...
        when /^!ruby\/struct:?(.*)?$/
          # ...

could become this:

        when /^(!ruby\/(?:array|struct)):(.*)$/
          send REVIVE[$1], $2

I think most of the when clauses could be collapsed in that way. The resulting regexp would probably be ridiculous, but it would eventually result in one call to Regexp#===. wdyt?


nirvdrum commented Oct 13, 2012

I'm a fan of that idea. It's similar in spirit to the self-organizing list without a lot of the complexity. You just need to ensure that the matches delegate properly. E.g., '!ruby/object:Complex', '!ruby/object:Rational', and !ruby/object:Pathname all match the same general pattern but need very different bodies. The first two are easily looked up in a hash, the third one not so much. So, you'd need to special-case that or dynamically add closures for each custom class.

As you note, this does come at the cost of clarity. Documentation here would help a lot.


tenderlove commented Oct 13, 2012

I think we can handle the ruby/object case like so:

def complex match
  p :complex => match

def rational match
  p :rational => match

def object match
  p :object => [match, (match.string.split(':', 2)[1] || "Object")]

  '!ruby/object:Complex'  => :complex,
  '!ruby/object:Rational' => :rational,
  '!ruby/object'          => :object,

re = %r{
  ^!ruby/object:(?:Complex|Rational)$ |

].each do |str|
  match = re.match(str)
  send METHODS[match.to_s], match

Do you want to try this and see what your benchmarks yield?


nirvdrum commented Oct 15, 2012

So, the number of Regexp#=== calls drops off, but it's equally offset by the number of hash code calculations and hash lookups. It's almost an even trade-off. I'll play with a couple more ideas in this space, but as of now the two approaches seem to be a toss-up so it's just a matter of whichever is more aesthetically pleasing.


tenderlove commented Oct 16, 2012

Wow, really? I'm surprised that hash calculation on a string would be so expensive. Do you have the diffs (and a benchmark) around so I can poke at it?


nirvdrum commented Oct 16, 2012

I pushed what I had up here (still very much in-progress): https://github.com/nirvdrum/psych/tree/new_mapping_approach

I'm currently working with a YAML file the author of #84 emailed me. I attached the test script, YAML, and generated profile output here: https://gist.github.com/25b6c191482276f5624b

Note in the new.profile how far Kernel#hash and Hash#empty climbed.


tenderlove commented Oct 17, 2012

Ugh. I've come to about the same conclusion. I added a branch here with the hash lookups and delegate. I've also updated the benchmark with an iteration per second test, and also a null test here. So the fastest we can possibly get is the null parsing speed. Otherwise, we'll need to start looking at improving libyaml ourselves.

Anyway, I'm happy to merge this PR, or branch. Which do you like better?


tenderlove commented Nov 17, 2012



nirvdrum commented Nov 19, 2012

Sorry, I'm going to look at this tonight. With both JRuby 1.7.1 and MRI 1.9.3-p327 augmenting their hash implementations, I'd like to see if the hash variant performs any better now.


tenderlove commented Nov 19, 2012

@nirvdrum no problem! Thanks for taking a look. I appreciate it.


nirvdrum commented Dec 17, 2012

I couldn't find any material difference even with newer rubies. So, I guess it's just a stylistic thing. I find the current regexp match a bit easier to follow than a map to method names. Thus, my vote would be to merge this pull request over the other branch. But, I don't think the other branch is so hard to read that I'd oppose merging that either.


tenderlove commented Dec 17, 2012

I'll pull in yours because I think I like it better.

@tenderlove tenderlove merged commit 2ceb7b6 into ruby:master Dec 17, 2012

1 check failed

default The Travis build failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment