Retain Mash's original key with indifferent access #296

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
2 participants
@michaelherold
Contributor

michaelherold commented Apr 19, 2015

This is an attempt at fixing the issue of Mash's keys being converted
to strings upon assignment. Since we want Mash to behave like a Hash,
this is undesireable; however, many gems downstream depend on the
indifferent access that is achieved through casting all keys to strings.

I had to touch a few tests that were looking specifically for a certain
type of key (i.e. relying on the fact that keys are cast). They should
now be semantically correct.

There is one interesting thing that pops out from this. Checking for
equality between Mashes (and Hashes with Mashes) now fails unless the
keys are equivalent. For example:

Hashie::Mash.new(p: 'test') == Hashie::Mash.new('p' => 'test') #=> false

I also added a test for issue #246, which is solved by this PR.

I believe that this should perform on par with the current
implementation. I'm going to work on a benchmark to double-check, since
a performance regression is less desireable than the fix that this
gives.

Fixes #196
Fixes #246

michaelherold added some commits Apr 19, 2015

Retain Mash's original key with indifferent access
This is an attempt at fixing the issue of Mash's keys being converted
to strings upon assignment. Since we want Mash to behave like a Hash,
this is undesireable; however, many gems downstream depend on the
indifferent access that is achieved through casting all keys to strings.

I had to touch a few tests that were looking specifically for a certain
type of key (i.e. relying on the fact that keys are cast). They should
now be semantically correct.

There is one interesting thing that pops out from this. Checking for
equality between Mashes (and Hashes with Mashes) now fails unless the
keys are equivalent. For example:

```ruby
Hashie::Mash.new(p: 'test') == Hashie::Mash.new('p' => 'test') #=> false
```

I also added a test for issue #246, which is solved by this PR.

I _believe_ that this should perform on par with the current
implementation. I'm going to work on a benchmark to double-check, since
a performance regression is less desireable than the fix that this
gives.

Fixes #196
Fixes #246
@michaelherold

This comment has been minimized.

Show comment
Hide comment
@michaelherold

michaelherold Apr 19, 2015

Contributor

Hmm. I guess I was wrong on the performance. Here's the benchmark I used:

require 'benchmark'
require 'hashie'

N = 100_000
test = Hashie::Mash.new

Benchmark.bmbm do |x|
  x.report('benchmark:') do
    N.times do |n|
      test["test#{n}"] = 'wahoo'
      test["test#{n}"] == 'wahoo'
    end
  end
end

The results on v3.4.1:

Rehearsal ----------------------------------------------
benchmark:   0.180000   0.020000   0.200000 (  0.193239)
------------------------------------- total: 0.200000sec

                 user     system      total        real
benchmark:   0.150000   0.000000   0.150000 (  0.159733)

And the results on the branch:

Rehearsal ----------------------------------------------
benchmark:   0.470000   0.000000   0.470000 (  0.474326)
------------------------------------- total: 0.470000sec

                 user     system      total        real
benchmark:   0.450000   0.000000   0.450000 (  0.452725)

So it's roughly a 3x slowdown, which is not good. Anyone else have some ideas?

Contributor

michaelherold commented Apr 19, 2015

Hmm. I guess I was wrong on the performance. Here's the benchmark I used:

require 'benchmark'
require 'hashie'

N = 100_000
test = Hashie::Mash.new

Benchmark.bmbm do |x|
  x.report('benchmark:') do
    N.times do |n|
      test["test#{n}"] = 'wahoo'
      test["test#{n}"] == 'wahoo'
    end
  end
end

The results on v3.4.1:

Rehearsal ----------------------------------------------
benchmark:   0.180000   0.020000   0.200000 (  0.193239)
------------------------------------- total: 0.200000sec

                 user     system      total        real
benchmark:   0.150000   0.000000   0.150000 (  0.159733)

And the results on the branch:

Rehearsal ----------------------------------------------
benchmark:   0.470000   0.000000   0.470000 (  0.474326)
------------------------------------- total: 0.470000sec

                 user     system      total        real
benchmark:   0.450000   0.000000   0.450000 (  0.452725)

So it's roughly a 3x slowdown, which is not good. Anyone else have some ideas?

@dblock

This comment has been minimized.

Show comment
Hide comment
@dblock

dblock Apr 20, 2015

Contributor

This is a nice attempt. I think you should update the CHANGELOG and we should seriously consider merging this, even though there's a performance hit.

Swap the order of comparison of is_a?(String) vs. is_a?(Symbol) in convert_key, is it better/worse?

Contributor

dblock commented Apr 20, 2015

This is a nice attempt. I think you should update the CHANGELOG and we should seriously consider merging this, even though there's a performance hit.

Swap the order of comparison of is_a?(String) vs. is_a?(Symbol) in convert_key, is it better/worse?

@michaelherold

This comment has been minimized.

Show comment
Hide comment
@michaelherold

michaelherold Apr 20, 2015

Contributor

Swap the order of comparison of is_a?(String) vs. is_a?(Symbol) in convert_key, is it better/worse?

Nope, it's in pretty much the same place performance-wise.

I can update the CHANGELOG so this is mergeable.

One thing about this implementation: it should only really be used on 2.2+, since it makes a lot of symbols that might not be used elsewhere in the system. Without symbol GC, this could be an issue for long-running processes like server applications.

Lastly, since we're making Mash behave this way, should I also copy the changes over to the IndifferentAccess extension? I didn't do it in the original PR because it was a proof-of-concept, but I do think they should behave the same. What does everyone think?

Contributor

michaelherold commented Apr 20, 2015

Swap the order of comparison of is_a?(String) vs. is_a?(Symbol) in convert_key, is it better/worse?

Nope, it's in pretty much the same place performance-wise.

I can update the CHANGELOG so this is mergeable.

One thing about this implementation: it should only really be used on 2.2+, since it makes a lot of symbols that might not be used elsewhere in the system. Without symbol GC, this could be an issue for long-running processes like server applications.

Lastly, since we're making Mash behave this way, should I also copy the changes over to the IndifferentAccess extension? I didn't do it in the original PR because it was a proof-of-concept, but I do think they should behave the same. What does everyone think?

@dblock

This comment has been minimized.

Show comment
Hide comment
@dblock

dblock Apr 20, 2015

Contributor

Hm, interesting. The symbol thing is a big deal, it's going to be a nasty surprise for people that use an older Ruby version. I didn't realize that and I wonder whether that alone makes it unmergeable - we cannot be leaking memory left and right!

You should move forward with the PR though, it might not be in 2015, but eventually we can merge it I am sure.

Contributor

dblock commented Apr 20, 2015

Hm, interesting. The symbol thing is a big deal, it's going to be a nasty surprise for people that use an older Ruby version. I didn't realize that and I wonder whether that alone makes it unmergeable - we cannot be leaking memory left and right!

You should move forward with the PR though, it might not be in 2015, but eventually we can merge it I am sure.

michaelherold added a commit to michaelherold/hashie that referenced this pull request Nov 21, 2015

Add an extension to maintain original Mash keys
One of the behaviors of Mash that we see regularly surprise users is
that Mash stringifies any keys passed into it. This leads to unexpected
lack of synergy between Mash and its cousins (particularly Dash), since
the property DSLs do not handle indifferent key access.

This extension ensures that the original keys are kept inside the Mash's
data structure, at the expense of more costly logic for fetching
information indifferently. I have included a benchmark that compares the
two. The benchmark shows that when you are passing string keys into a
Mash, using this extension will actually be _faster_ than the default
implementation, but that the reverse is true when passing symbol keys.

In #296, I tried to do this universally for all Mashes, which slowed
down the fetching behavior for Mash significantly. I like this attempt
much better because it allows users to opt into the new behavior if they
want it, while still keeping the default implementation as-is.

Fixes #196 by giving the option of keeping the original structure of the
Mash when using it with Dash.

Fixes #246 by giving the option of opting into keeping the original
keys.

Closes #296 by giving a more flexible path forward that doesn't change
the semantics of the main Mash class.

michaelherold added a commit to michaelherold/hashie that referenced this pull request Feb 7, 2016

Add an extension to maintain original Mash keys
One of the behaviors of Mash that we see regularly surprise users is
that Mash stringifies any keys passed into it. This leads to unexpected
lack of synergy between Mash and its cousins (particularly Dash), since
the property DSLs do not handle indifferent key access.

This extension ensures that the original keys are kept inside the Mash's
data structure, at the expense of more costly logic for fetching
information indifferently. I have included a benchmark that compares the
two. The benchmark shows that when you are passing string keys into a
Mash, using this extension will actually be _faster_ than the default
implementation, but that the reverse is true when passing symbol keys.

In #296, I tried to do this universally for all Mashes, which slowed
down the fetching behavior for Mash significantly. I like this attempt
much better because it allows users to opt into the new behavior if they
want it, while still keeping the default implementation as-is.

Fixes #196 by giving the option of keeping the original structure of the
Mash when using it with Dash.

Fixes #246 by giving the option of opting into keeping the original
keys.

Closes #296 by giving a more flexible path forward that doesn't change
the semantics of the main Mash class.

michaelherold added a commit to michaelherold/hashie that referenced this pull request Feb 22, 2017

Add an extension to maintain original Mash keys
One of the behaviors of Mash that we see regularly surprise users is
that Mash stringifies any keys passed into it. This leads to unexpected
lack of synergy between Mash and its cousins (particularly Dash), since
the property DSLs do not handle indifferent key access.

This extension ensures that the original keys are kept inside the Mash's
data structure, at the expense of more costly logic for fetching
information indifferently. I have included a benchmark that compares the
two. The benchmark shows that when you are passing string keys into a
Mash, using this extension will actually be _faster_ than the default
implementation, but that the reverse is true when passing symbol keys.

In #296, I tried to do this universally for all Mashes, which slowed
down the fetching behavior for Mash significantly. I like this attempt
much better because it allows users to opt into the new behavior if they
want it, while still keeping the default implementation as-is.

Fixes #196 by giving the option of keeping the original structure of the
Mash when using it with Dash.

Fixes #246 by giving the option of opting into keeping the original
keys.

Closes #296 by giving a more flexible path forward that doesn't change
the semantics of the main Mash class.

michaelherold added a commit to michaelherold/hashie that referenced this pull request Feb 22, 2017

Add an extension to maintain original Mash keys
One of the behaviors of Mash that we see regularly surprise users is
that Mash stringifies any keys passed into it. This leads to unexpected
lack of synergy between Mash and its cousins (particularly Dash), since
the property DSLs do not handle indifferent key access.

This extension ensures that the original keys are kept inside the Mash's
data structure, at the expense of more costly logic for fetching
information indifferently. I have included a benchmark that compares the
two. The benchmark shows that when you are passing string keys into a
Mash, using this extension will actually be _faster_ than the default
implementation, but that the reverse is true when passing symbol keys.

In #296, I tried to do this universally for all Mashes, which slowed
down the fetching behavior for Mash significantly. I like this attempt
much better because it allows users to opt into the new behavior if they
want it, while still keeping the default implementation as-is.

Fixes #196 by giving the option of keeping the original structure of the
Mash when using it with Dash.

Fixes #246 by giving the option of opting into keeping the original
keys.

Closes #296 by giving a more flexible path forward that doesn't change
the semantics of the main Mash class.

michaelherold added a commit to michaelherold/hashie that referenced this pull request Feb 22, 2017

Add an extension to maintain original Mash keys
One of the behaviors of Mash that we see regularly surprise users is
that Mash stringifies any keys passed into it. This leads to unexpected
lack of synergy between Mash and its cousins (particularly Dash), since
the property DSLs do not handle indifferent key access.

This extension ensures that the original keys are kept inside the Mash's
data structure, at the expense of more costly logic for fetching
information indifferently. I have included a benchmark that compares the
two. The benchmark shows that when you are passing string keys into a
Mash, using this extension will actually be _faster_ than the default
implementation, but that the reverse is true when passing symbol keys.

In #296, I tried to do this universally for all Mashes, which slowed
down the fetching behavior for Mash significantly. I like this attempt
much better because it allows users to opt into the new behavior if they
want it, while still keeping the default implementation as-is.

Fixes #196 by giving the option of keeping the original structure of the
Mash when using it with Dash.

Fixes #246 by giving the option of opting into keeping the original
keys.

Closes #296 by giving a more flexible path forward that doesn't change
the semantics of the main Mash class.

michaelherold added a commit that referenced this pull request Feb 23, 2017

Add an extension to maintain original Mash keys (#326)
One of the behaviors of Mash that we see regularly surprise users is
that Mash stringifies any keys passed into it. This leads to unexpected
lack of synergy between Mash and its cousins (particularly Dash), since
the property DSLs do not handle indifferent key access.

This extension ensures that the original keys are kept inside the Mash's
data structure, at the expense of more costly logic for fetching
information indifferently. I have included a benchmark that compares the
two. The benchmark shows that when you are passing string keys into a
Mash, using this extension will actually be _faster_ than the default
implementation, but that the reverse is true when passing symbol keys.

In #296, I tried to do this universally for all Mashes, which slowed
down the fetching behavior for Mash significantly. I like this attempt
much better because it allows users to opt into the new behavior if they
want it, while still keeping the default implementation as-is.

Fixes #196 by giving the option of keeping the original structure of the
Mash when using it with Dash.

Fixes #246 by giving the option of opting into keeping the original
keys.

Closes #296 by giving a more flexible path forward that doesn't change
the semantics of the main Mash class.

@michaelherold michaelherold deleted the michaelherold:cannot-convert-mash-to-dash branch Feb 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment