New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SI-8061 Do not mutate List in Types.scala #3252
Conversation
In src/reflect/scala/reflect/internal/Types.scala, change the findMember method's algorithm to use the public List API. Prior to this change, the method built a list of matching members by appending matching symbols and mutating the tail. This change builds up the maching list by prepending instead, and reverses the result at the end.
I'd like to see a few steps ahead in the de-varification process before reviewing this as What's do you have in mind for |
This is a companion of #3233. Both are independent, but necessary as a prerequisite of making List Immutable. Tests for this change already seem to exist -- even very minor deviations of behavior result in failures of 'ant all.clean; ant build'. |
#3223 covers serialization, not |
Regarding performance at this location in Types.scala: The performance should be identical other than the reverse at the end. The search appears to be at least O(n^2). The only additional cost is the reverse at the end which is O(n). The constant factor on the O(n^2) part can be reduced by switching from a linked representation of members already found to an ArrayBuffer, which is significantly faster to traverse for all but the smallest of lists. This call site is a large, complicated method -- breaking it up into smaller private method chunks and looking at higher performance intermediate data structures is where I wold go to improve this. If there was a performance test or two to compare with, I'm positive I could equal or better what is already here and not require list mutation. If it is possible to change what newOverloaded( ... ) requires to avoid the list reverse, or accept an ArrayBuffer, then performance here will be easy to match or better. |
Regarding ListBuffer and its performance if it no longer mutates List: The plan: Question: ArrayBuffer is a Builder[A, ArrayBuffer[A]], is changing it to Builder[A, List[A]] even remotely acceptable? I assume not. What would be the favored way to create essentially, two variants of ArrayBuffer, one that is a builder for itself, and another that is a builder for List. I can think of several ways to do this, but don't know what way is most acceptable. ArrayBuffer has significantly higher locality of reference than ListBuffer, and is significantly faster to traverse and copy into a new List. I expect it to be faster in several cases, and slower only when ListBuffer gets to use the internal private tail mutation on List but avoid making any copy of the list. ListBuffer is a poor choice data structure for a Builder[List] unless mutation is used to allow structural sharing of the builder result with the intermediate mutable state in the builder. If mutation is not allowed, ListBuffer needs to change the internal data structure it uses. Two ArrayBuffers (one for appending, one for pepending, reversed) should be faster than a linked implementation for most use cases. ListBuffer's documentation says that prepending and appending are O(1) and almost everything else is O(n), two ArrayBuffers have lower per element overhead and beat the current implementation for nearly all cases that require copying of an element, since an arraycopy is extremely efficient. Once mutation is taken away, the optimal data structure for Builder[List], which only requires append, is an ArrayBuffer with an optimized toList implementation that traverses the array in reverse to build the list. It appears that ArrayBuffer (and IndexedSeqOptimized) do not have an optimal implementation of toList, delegating to the default Builder[List] which is currently ListBuffer, rather than essentially, IndexedSeqOptimized.foldrRIght(Nil)((elem, acc) => elem :: acc), except organized to avoid the megamorphic dispatch in foldRight. I need to build up the performance tests for all of this. I could use some advice or suggestions on this front. How good of a test is compiling the compiler with a compiler that has these changes in it? What is the best way to do that now, after the modularization changes (or is the "soon to be legacy" message at http://docs.scala-lang.org/scala/ still accurate)? I'm starting with http://axel22.github.io/scalameter/ but perhaps there are other tools already integrated with the build for performance tests? Or perhaps someone has suggestions or examples on another fork for performance tests on the library or compiler. Thanks! |
We've got some compiler benchmarking tools but they aren't documented well enough to release them. I can run your proposed changes through them. I would prefer to defer this work for 2.12 however, as we are a bit too late in the cycle to consider this for 2.11. We will probably branch for the 2.11 release in a month or so, which would mean we could then continue this work on master. |
Oh, and thanks for the detailed description of your vision for |
For 2.11 what I would want considered are the components of the above that affect compatibility. If (yes, this is a big and unlikely if) the remainder of the work did not affect binary compatibility and was merely a performance exercise, then some or all of it could be considered for 2.11.x if it was deemed safe. For that reason, I wanted to make small easy to review PR bits. #3233 breaks compatibility (assuming that serialization compatibility is at the same level as binary compatibility). Is it small and simple enough to get into 2.11? Changing one field from var to val is a win -- the JVM may be able to more aggressively cache the value in registers for a small performance win. This PR does not change any APIs, but is also a small local change that should be easy to digest. That leaves the following major tasks:
|
I think #3233 is worthwhile for 2.11. This PR, OTOH, exposes us to performance regressions, and we're not great at measuring/noticing them. I am working on a set of performance patches for 2.11.0-M8, and our benchmarking tools are slowly improving. |
BTW, an important case to optimize for is constructing a builder and never using it, as (currently) happens with |
For List (or more likely the appropriate 'Optimized' supertype), I'll want to avoid allocating any builder at all for most operations like map until it is unsafe to recurse deeper. For example, you can do non-tail-recursive recursion in map until you get 32 deep, then start using a Builder, and on the way back take the result of the builder and prepend what you spilled in the stack during recursion. This would make small lists faster than they are now. For this PR, it would not affect binary compatibility and we can validate the performance (or optimize it beyond what it is now) and apply it later. |
I'm going to close this for now. Let's take this up again post 2.11. |
In src/reflect/scala/reflect/internal/Types.scala, change the
findMember method's algorithm to use the public List API.
Prior to this change, the method built a list of matching members
by appending matching symbols and mutating the tail.
This change builds up the maching list by prepending instead,
and reverses the result at the end.