Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Stop subclassing Hash #1231

Closed
myronmarston opened this Issue · 28 comments

6 participants

@myronmarston

On ruby 2.1, we started to get lots of deprecation warnings from our subclassing of Hash and then calling #reject on it:

https://travis-ci.org/rspec/rspec-core/jobs/15788366#L119

Here's the changelog entry describing this:

* Hash
  * incompatible changes:
    * Hash#reject will return plain Hash object in the future versions, that
      is the original object's subclass, instance variables, default value,
      and taintedness will be no longer copied, so now warnings are emitted
      when called with such Hash.

I fixed this in d94e9c8 by switching to select, but then I had to follow up with 83171b2 to keep things working on 1.8.7. I think subclassing Hash is a problematic practice and we should move away from it.

@JonRowe
Owner

The other place we subclass hash is in metadata, not subclassing there allows us to solve the issue with exposing mutation methods which don't uninstall hooks ;)

@myronmarston
Owner

I've been thinking a bit about the metadata case. Wanted to share my thoughts while they are fresh.

Given that we present metadata as being a hash, I'd like to make it just a hash -- not a subclass of a hash, not a wrapper/proxy object delegating to a hash, but literally, just a hash. Currently the extra metadata logic does two primary things:

  • Exposes a bunch of ruby methods for calculating particular bits of the metadata.
  • Overrides [] and fetch so that it can lazily compute requested values, which allows it to appear to always have those values, while avoiding the expense of computing them if they are not used.

I consider the first of these to be undesirable. We need this logic, but exposing it directly on the metadata is problematic, IMO, because we don't intend those methods to be public but the metadata is intended to be a public object users can do pretty much anything with. I'd prefer if the metadata hash didn't have these methods.

The second is the tricker part. I don't think we can get the exact same behavior from a raw hash, but we can get close. We can use Hash#default_proc to hook in and lazily compute values when they are fetched via []. That doesn't work for fetch, sadly, but that's just how ruby hashes work. I think I'm ok with fetch raising an error if the requested key hasn't been computed yet as long as we can provide a deprecation warning on 2.99. We can also document/recommend that users use #[] to get values out of the metadata hash.

So, what I'm imagining is something like this:

metadata_hash = Hash.new(&MetadataComputer.new(self))

The computation logic will be pulled out into this separate MetadataComputer (or similar) class.

If we're going to do this, I think it needs to be before 3.0 RC; there's no way to migrate to this new kind of setup w/o breaking changes, as far as I can see. Given that, I consider this a priority (but not an absolute release blocker, if I hit some kind of wall in implementing this) for 3.0. I'm planning to work on this next, in fact.

Thoughts from others?

/cc @JonRowe @xaviershay @samphippen @soulcutter @alindeman

@JonRowe
Owner

I like this idea, it solves problems; but we need to make sure this works with rspec-rails, which relies on metadata to load the relevant example types and also manipulates metadata based on the relevant folders

@JonRowe
Owner

#1089 is related to this

@myronmarston
Owner

and also manipulates metadata based on the relevant folders

I've always disliked this part of rspec-rails :(.

@JonRowe
Owner

Yes. I'd be ok with dropping that feature in 3.x, but it would cause a large amount of friction.

I generally dislike using rspec-rails, as the rails way doesn't fit with my testing philosophy and the rails core team doesn't play nice with others when it comes to the test helpers...

@myronmarston
Owner

We can always add a more first class API for rspec-rails to use if need be. Something like:

RSpec.configure do |rspec|
  rspec.define_inferred_metadata do |metadata|
    metadata[:type] = type_for(metadata[:path])
  end
end

Actually, that way seems better regardless of any changes to metadata implementation.

@alindeman
Collaborator

and also manipulates metadata based on the relevant folders

I've always disliked this part of rspec-rails :(.

Flying by here for now, though I'll follow more closely later. I might be going out on a limb here, but I personally don't think it's a good idea to change rspec-rails' folder-based behavior for 3.x ... I think rails developers are used to this sort of convention, and there's a huge amount of documentation around it. That said, if it's really holding rspec back, let's discuss more and see if there are some reasonable ways to ease the transition.

@JonRowe
Owner

Yah

@avdi

First gut reaction is "ick". I'm OK with Hash default procs that do really basic stuff that's consistent for all keys. But hooking up a bunch of special logic to handle "special" keys seems like a big POLS violation to me. I'm less surprised when methods do special stuff because that's what methods are for.

@avdi

To be a little more specific: let's say I'm fiddling around with RSpec metadata for the first time. I do:

metadata.keys

...and I see [:foo, :bar]. "Nuts", I say. "I was hoping for :baz".

Then I explore the documentation and/or source code a bit, and see references to :baz and I'm like "HEY! HOW COME YOU GET A :baz and I DON'T?!"

Then I see some docs that explicitly talk about :baz and I'm like "BULLSHIT THERE AIN'T NO SUCH THING". And then I'm like "why is my RSpec broken???"

Finally, after drinking a fifth of cheap schnapps I write a test that asserts:

expect(metadata[:baz]).to be

...just to express my angst. Giggling in a slightly deranged way, I hit "run".

And the test passes.

At first I can't believe it, and inspect the ingredients of the schnapps for hallucinogens. But when I run the test again, it still passes. With madness rising in my tormented brain, I inspect the contents of metadata[:baz] and discover exactly the information I was looking for all along.

With all the energy left in me, I screech out a curse upon the RSpec maintainers and all practitioners of dark "magic", and collapse, sobbing, on my keyboard.

The end.

@myronmarston
Owner

First gut reaction is "ick". I'm OK with Hash default procs that do really basic stuff that's consistent for all keys. But hooking up a bunch of special logic to handle "special" keys seems like a big POLS violation to me. I'm less surprised when methods do special stuff because that's what methods are for.

That's a a completely valid point, but we're already basically doing this by overriding #[] and #fetch and for special keys doing the computation:

https://github.com/rspec/rspec-core/blob/a7c1d356dfa23391aefbfc9d20cb72116226ee04/lib/rspec/core/metadata.rb#L55-L93

If we're going to get away from having logic in the metadata hash for certain "special" keys, I think we have to compromise in some other way:

  • Eagerly compute these values and put them in the hash up front. We'd take a perf hit since many of these values aren't needed for every example, though.
  • Stop using a hash at all, and provide the metadata using some other kind of object. This would be a huge API change and I think a hash has many advantages we'd lose by moving to a different kind of object, though.
@avdi

The advantages of a hash for user-supplied metadata are obvious. One possible compromise: isolate "magic" properties to a nested hash, e.g. metadata[:rspec].

@samphippen
Collaborator

The advantages of a has for user-supplied metadata are obvious. One possible compromise: isolate "magic" properties to a nested hash, e.g. metadata[:rspec].

:+1:

@xaviershay
Collaborator

While I'm sure it was done for good reason, a quick look at the lazily computed values doesn't show anything obviously slow. caller is the obvious one, but looks like it has already been provided and we're just fetching it. Is this code actually still slow?

(I didn't see anything in https://github.com/rspec/rspec-core/tree/master/benchmarks )

Another option would be if there is an appropriate trigger to just compute all of these (lazily) at the same time, that could be easier to reason about.

In general, I agree with the direction of not subclassing Hash.

@myronmarston
Owner

The advantages of a has for user-supplied metadata are obvious. One possible compromise: isolate "magic" properties to a nested hash, e.g. metadata[:rspec].

Interesting idea. That would be a bigger breaking API change than I was looking for, but I'll consider it. If we do go that way, we may want metadata[:rspec] to return an object with attributes rather than a hash, since there are certain known properties rspec manages.

@myronmarston
Owner

(I didn't see anything in https://github.com/rspec/rspec-core/tree/master/benchmarks )

The benchmarks dir is a relatively recent thing we started a couple years ago, after the metadata stuff with lazy logic was already basically settled.

@avdi

You could leave a default_proc shim in place which spits out a warning and forwards from the metadata Hash to the metadata[:rspec]. I know that's equally magical from the usage standpoint, but a) it's a deprecation path; b) if you're looking at docs or RSpec code you'll see metadata[:rspec] being used, not the magically forwarded properties; and c) it's a one-liner in the default proc, instead of being a bunch of special cases.

@avdi

E.g.

metadata = Hash.new {|h, k|  h[:rspec].public_send(k) }
@avdi

With warning:

metadata = Hash.new {|h, k|
  warn "Next time use metadata[:rspec].#{k}, doofus!"
  h[:rspec].public_send(k)
}
@myronmarston
Owner

You know, the more I think about this, the more I like the idea. The deprecations may not be necessary in 3.0 (but would be in 2.99), so in 3.0 the metadata hash could literally as simple as user_metadata[:rspec] = ExampleMetadata.new(self) (or something similar). Very little magic.

@avdi

I think it's also consistent with the Rack approach: the env Hash is a pure hash, but some of the standard entries in it point to things that are not pure hashes.

@myronmarston
Owner

I've started working on rewriting the metadata implementation. It's tricky to rewrite it given how much rspec depends on the metadata, but I'm making good progress.

What goes under the :rspec object and what stays in the hash? Clearly, user-defined k/v pairs go in the hash, and computed attributes (like full_description) should be exposed via :rspec...but there's a middle grey area. For example, there are a number of special metadata keys that have special meaning to RSpec (such as :order, :skip, :pending) but that the user explicitly sets (although, RSpec may set these under some situations as well). There are other metadata attributes that rspec sets (such as :example_group_block) that aren't really computed but that rspec stores in the metadata hash and the user never would. Which go where?

I'm leaning toward making :rspec contain anything that would never be set by a user. Keys like :skip and :pending would stay in the user hash since users can set that. Thoughts from others?

@xaviershay
Collaborator

That sounds about a good a method as we could come up with.

@avdi

That seems reasonable, if only because any other approach would mean more work for users who want to use those flags, and that would be bad.

@myronmarston

So after thinking this through some more, especially the filtering complexity I brought up in #1376, I'm leaning towards @xaviershay's suggestion above: precompute the computed entries rather than having the special :rspec entry. All of the options involved in making filtering on the :rspec properties work are suboptimal. Looking at the computed keys, I don't see any that should be particularly slow, and furthermore, most of the computed keys are going to be calculated eventually anyway.

@dchelimsky -- do you remember why you chose to make the metadata lazily computed? Did it make a noticable perf difference or was it more just that you could make it lazy so why not? In the git history I noticed two commits about this:

...but it doesn't give much to go off of.

@JonRowe
Owner

To be honest, I think I'd prefer precomputing entries.

@myronmarston

Closing in favor of #1391.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.