Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ClassTag context bounds to OpenHashMap type params #5121

Closed

Conversation

performantdata
Copy link
Contributor

I'm creating this PR now so that this API-breaking change makes it into 2.12.

@scala-jenkins scala-jenkins added this to the 2.12.0-M5 milestone Apr 23, 2016
The purpose is to support Array creation (in a subsequent commit), so
that the table entries require less memory.
@soc
Copy link
Member

soc commented Apr 24, 2016

Is there a ticket/discussion associated with this?

@performantdata
Copy link
Contributor Author

This is it.

@sjrd
Copy link
Member

sjrd commented Apr 24, 2016

The purpose is to support Array creation (in a subsequent commit), so that the table entries require less memory.

Internally? Just create Array[AnyRef]s and be done with it. There is no need for ClassTags, here.

@performantdata
Copy link
Contributor Author

performantdata commented Apr 24, 2016

Internally?

Yes.

Array[AnyRef]

Uses too much space. Array[Value] can be up to 32 times smaller if Value is Byte, and avoids one dereference.

@soc
Copy link
Member

soc commented Apr 24, 2016

But how is this class different from all the others? Shouldn't every collection class get this treatment if we follow this reasoning?

@sjrd
Copy link
Member

sjrd commented Apr 24, 2016

Sounds like you have a very specific use case, for which it might be better to simply have your own specialized copy of this class.

I would insist on @soc's request for a more lengthy discussion of the need. Note that this change is not only binary incompatible. It is also source incompatible, and will break perfectly legitimate existing code. So to be even considered, it should be significantly argued for, at the least.

@performantdata
Copy link
Contributor Author

But how is this class different from all the others?

Open addressing only allows one entry per slot in the hash table. So unlike chaining, there's no need to keep a linked list per slot, or even a reference (like it does now); just store the data in the hash table. In the JVM, that requires an array each for the key, value, and slot state, and the hash value of the key, if we choose to keep that around.

This approach is currently used in AnyRefMap and LongMap, though those don't use context bounds because they're limited to certain key & value types. This PR prepares OpenHashMap to generalize that approach to any types.

Note that context bounds didn't exist when this class was introduced to Scala 2.7. Thus the weird original implementation.

Shouldn't every collection class get this treatment if we follow this reasoning?

Ones that use open addressing, at least. A Set implementation, for instance, might make sense.

It is also source incompatible, and will break perfectly legitimate existing code.

I know. But I doubt that there are many people using this code, because of its poor performance compared to the alternatives: those above, mutable.HashMap, and Debox.

@Ichoran
Copy link
Contributor

Ichoran commented Apr 25, 2016

If there aren't many people using this code, why don't we just leave it alone or deprecate it entirely, and suggest that people use Debox?

The problem with this PR is that it doesn't go far enough to really do the job: you don't have to store all the boxes, but now you have to box/unbox everything on the way in and out, which potentially creates even more GC churn. So you really need specialized methods for everything. But then it's incompatible with the rest of the collections library (any interfacing will require boxing/unboxing).

Unfortunately, this isn't something you can fix incrementally, at least not if your end result is going to look at all pretty. You need to start with a top-down design, figure out how specialization or something like it is going to appear, and implement it consistently.

@performantdata
Copy link
Contributor Author

suggest that people use Debox

Debox doesn't support the Scala collections interfaces. I also think that it's great that Scala has an open addressing implementation. It would just be nice if it performed better.

The problem with this PR is that it doesn't go far enough to really do the job [...] now you have to box/unbox everything

This PR just modifies the constructor API; it doesn't affect boxing/unboxing of elements. That would be in a subsequent PR, with benchmarks et al. I think you're conflating the two.

you have to box/unbox everything on the way in and out

Until the JIT compiler optimizes away the boxing.

The current implementation is boxing everything already, so that computational overhead already exists. The GC churn doesn't exist, but that's because the objects remain allocated; I doubt that that's better for performance.

@Ichoran
Copy link
Contributor

Ichoran commented Apr 25, 2016

Benchmarks are more convincing than "doubts". Does this PR speed up operations? What is the read/write tradeoff like? I routinely find that a box/unbox cycle, despite formally being entirely elidable, is not preferable to simply unboxing. So my guess is that this change, if made successfully, might speed up creation but would slow down reading.

A low-memory creation-optimized map, if that's what it is, is pretty special-purpose.

Also, I don't see a path to making this change successfully. What do you think is going to happen when you do a map?

@performantdata
Copy link
Contributor Author

Benchmarks are more convincing than "doubts".

I know. I've run plenty of benchmarking on this class, and intend to run more. If you'd like to help that cause, there's #5061, whose build Adriaan recently green lighted. I could use that in 2.12.x.

The existence of faster, more conventional implementations is prima facie evidence that this class could be improved.

Does this PR speed up operations?

As I just wrote, this PR is just an API change. The performance effects would be in a subsequent PR.

What do you think is going to happen when you do a map?

Could you be more specific? Are you concerned about the existence of a CanBuildFrom? or is this about performance?

@Ichoran
Copy link
Contributor

Ichoran commented Apr 25, 2016

You haven't yet given a compelling argument for why OpenHashMap is the thing that should be fixed. So it's great that you're making this faster, and for any transparent change that can't possibly be a loss, but when you start making changes that break existing code we have to take a harder look at whether we're spending effort in the right place. (That is, even if the improvement is completely free, is the inconvenience to people who have to change somebody else's code written five years ago worth the gain in performance now?)

And I mean "does the change that this PR anticipates speed up operations". Of course this is just an interface change. But you're going somewhere with it, so we should be sure we want to end up there before we commit to it.

And yes, I'm worried about CBF. And also about other details of the implementation that necessitate this change. Do you plan to write nine different implementations (or eighty-one)? If not, how do you get fast array access? The JIT compiler is not particularly good at lifting dispatch out of tight loops. Adding a 9-way switch to every access isn't going to be terribly good for performance, at least not if this is used for more than one data type in the program (so the switch cannot be almost completely elided).

@performantdata
Copy link
Contributor Author

how do you get fast array access?

OK, good question. I'll need to do some implementation & benchmarking to check that. Busy on a PR for SI-9522 right now.

@lrytz lrytz added the on-hold label May 2, 2016
@lrytz
Copy link
Member

lrytz commented May 2, 2016

I read through the discussion again, I agree this cannot be merged with the current state of knowledge / benchmark data / design proposals. So closing for now.

@performantdata
Copy link
Contributor Author

performantdata commented May 20, 2016

Adding a 9-way switch to every access isn't going to be terribly good for performance, at least not if this is used for more than one data type in the program (so the switch cannot be almost completely elided).

Aaaannd...it's not. Specifically I see one case of the ScalaRunTime.array_apply() switch JIT-compiled (for Int) with corresponding instanceof and compare instructions. They're only a couple machine ops each, but still. Total performance is about two times slower (than the current implementation) for smaller maps. But performance is constant w.r.t. map size, so above ~2M elements it performs better on write calls.

The JIT compiler is not particularly good at lifting dispatch out of tight loops.

It should be noted that I'm JMH Blackholeing the map calls, so presumably not even giving the JIT a chance to do this in my benchmark.

@adriaanm adriaanm added 2.12 and removed 2.12 labels Oct 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants