Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-8061 Do not mutate List in Types.scala #3252

Closed
wants to merge 1 commit into from

Conversation

scottcarey
Copy link

In src/reflect/scala/reflect/internal/Types.scala, change the
findMember method's algorithm to use the public List API.
Prior to this change, the method built a list of matching members
by appending matching symbols and mutating the tail.
This change builds up the maching list by prepending instead,
and reverses the result at the end.

  In src/reflect/scala/reflect/internal/Types.scala, change the
  findMember method's algorithm to use the public List API.
  Prior to this change, the method built a list of matching members
  by appending matching symbols and mutating the tail.
  This change builds up the maching list by prepending instead,
  and reverses the result at the end.
@retronym
Copy link
Member

I'd like to see a few steps ahead in the de-varification process before reviewing this as findMember is a really hot method.

What's do you have in mind for ListBuffer?

@scottcarey
Copy link
Author

This is a companion of #3233. Both are independent, but necessary as a prerequisite of making List Immutable.

Tests for this change already seem to exist -- even very minor deviations of behavior result in failures of 'ant all.clean; ant build'.

@retronym
Copy link
Member

#3223 covers serialization, not ListBuffer. Its not much good getting all the prerequisite steps done (perhaps at some performance cost) if we aren't sure that we can get over the final hurdle.

@scottcarey
Copy link
Author

Regarding performance at this location in Types.scala: The performance should be identical other than the reverse at the end. The search appears to be at least O(n^2). The only additional cost is the reverse at the end which is O(n). The constant factor on the O(n^2) part can be reduced by switching from a linked representation of members already found to an ArrayBuffer, which is significantly faster to traverse for all but the smallest of lists.

This call site is a large, complicated method -- breaking it up into smaller private method chunks and looking at higher performance intermediate data structures is where I wold go to improve this. If there was a performance test or two to compare with, I'm positive I could equal or better what is already here and not require list mutation. If it is possible to change what newOverloaded( ... ) requires to avoid the list reverse, or accept an ArrayBuffer, then performance here will be easy to match or better.

@scottcarey
Copy link
Author

Regarding ListBuffer and its performance if it no longer mutates List:

The plan:
Either change ArrayBuffer to be a Builder[List], or make a variation that is a Builder[List]. Change List.newBuilder to use it. Also change List, or its appropriate supertype's map, flatMap, foldRight, filter, etc to avoid using the Builder unless the list proves to be larger than some threshold during any recursion that is not tail-recursive, and then fall back to the Builder[List] for the tail of the list beyond the threshold. Effecitvely, use the stack for temp space for the first X elements and the heap after that. This should be fast. Note, foldRight currently is implemented as reverse.foldLeft -- spilling the values to an ArrayBuffer and scanning that backwards will be faster and use less temporary memory.

Question: ArrayBuffer is a Builder[A, ArrayBuffer[A]], is changing it to Builder[A, List[A]] even remotely acceptable? I assume not. What would be the favored way to create essentially, two variants of ArrayBuffer, one that is a builder for itself, and another that is a builder for List. I can think of several ways to do this, but don't know what way is most acceptable.

ArrayBuffer has significantly higher locality of reference than ListBuffer, and is significantly faster to traverse and copy into a new List. I expect it to be faster in several cases, and slower only when ListBuffer gets to use the internal private tail mutation on List but avoid making any copy of the list.

ListBuffer is a poor choice data structure for a Builder[List] unless mutation is used to allow structural sharing of the builder result with the intermediate mutable state in the builder. If mutation is not allowed, ListBuffer needs to change the internal data structure it uses. Two ArrayBuffers (one for appending, one for pepending, reversed) should be faster than a linked implementation for most use cases. ListBuffer's documentation says that prepending and appending are O(1) and almost everything else is O(n), two ArrayBuffers have lower per element overhead and beat the current implementation for nearly all cases that require copying of an element, since an arraycopy is extremely efficient.

Once mutation is taken away, the optimal data structure for Builder[List], which only requires append, is an ArrayBuffer with an optimized toList implementation that traverses the array in reverse to build the list. It appears that ArrayBuffer (and IndexedSeqOptimized) do not have an optimal implementation of toList, delegating to the default Builder[List] which is currently ListBuffer, rather than essentially, IndexedSeqOptimized.foldrRIght(Nil)((elem, acc) => elem :: acc), except organized to avoid the megamorphic dispatch in foldRight.

I need to build up the performance tests for all of this. I could use some advice or suggestions on this front. How good of a test is compiling the compiler with a compiler that has these changes in it? What is the best way to do that now, after the modularization changes (or is the "soon to be legacy" message at http://docs.scala-lang.org/scala/ still accurate)?

I'm starting with http://axel22.github.io/scalameter/ but perhaps there are other tools already integrated with the build for performance tests? Or perhaps someone has suggestions or examples on another fork for performance tests on the library or compiler.
Some microbenchmarks will be useful, but macro benchmarks like compiling with the altered compiler are necessary too. Any other suggestions there?

Thanks!

@retronym
Copy link
Member

We've got some compiler benchmarking tools but they aren't documented well enough to release them. I can run your proposed changes through them. I would prefer to defer this work for 2.12 however, as we are a bit too late in the cycle to consider this for 2.11. We will probably branch for the 2.11 release in a month or so, which would mean we could then continue this work on master.

@retronym
Copy link
Member

Oh, and thanks for the detailed description of your vision for ::!

@scottcarey
Copy link
Author

For 2.11 what I would want considered are the components of the above that affect compatibility. If (yes, this is a big and unlikely if) the remainder of the work did not affect binary compatibility and was merely a performance exercise, then some or all of it could be considered for 2.11.x if it was deemed safe.

For that reason, I wanted to make small easy to review PR bits.

#3233 breaks compatibility (assuming that serialization compatibility is at the same level as binary compatibility). Is it small and simple enough to get into 2.11? Changing one field from var to val is a win -- the JVM may be able to more aggressively cache the value in registers for a small performance win.

This PR does not change any APIs, but is also a small local change that should be easy to digest.

That leaves the following major tasks:

  • Modify IndexedSeqOptimized to have an optimized toList algorithm. This is a performance win that probably affects binary compatibility, but it is localized and small. Is this possible before 2.11?
  • Create a new Builder[A, List[A]] that does not use List mutation that performs well. ArrayBuffer is very close, the temporary memory used by an ArrayBuffer is about 1/4 the size of the final list itself and has nearly the best possible locality of reference. I can get this done rapidly, but the performance testing will lag.
  • Attempt to optimize List (or the appropriate supertype) to improve performance of operations that traverse the list by attempting to recurse on the stack for the first N elements then shifting to the heap via the new array based builder for the remainder. I do not know if this affects binary compatibility. If it does, this is the most difficult thing to get done and validate the performance improvements, and review as it is significantly more work than any of the other items.
  • Return the new Builder[A, List[A]] from List.newBuilder, activating the new one by default. I assume this sort of change is what one would do on a major release, although it should be binary compatible.
  • Modify ListBuffer to a different data structure that does not mutate List, and make it List immutable. Optimize it for its documented purpose: very fast simultaneous append + prepend.

@retronym
Copy link
Member

I think #3233 is worthwhile for 2.11.

This PR, OTOH, exposes us to performance regressions, and we're not great at measuring/noticing them. I am working on a set of performance patches for 2.11.0-M8, and our benchmarking tools are slowly improving.

@retronym
Copy link
Member

BTW, an important case to optimize for is constructing a builder and never using it, as (currently) happens with Nil.map(f). So it might be worth avoiding allocating the internal buffer in your new builder until the first append.

@scottcarey
Copy link
Author

For List (or more likely the appropriate 'Optimized' supertype), I'll want to avoid allocating any builder at all for most operations like map until it is unsafe to recurse deeper. For example, you can do non-tail-recursive recursion in map until you get 32 deep, then start using a Builder, and on the way back take the result of the builder and prepend what you spilled in the stack during recursion. This would make small lists faster than they are now.

For this PR, it would not affect binary compatibility and we can validate the performance (or optimize it beyond what it is now) and apply it later.
With the limited time I have in the next week or so, I'll attempt related PRs that have the most potential to either improve performance or affect compatibility. I'll leave this one alone for now.

@ghost ghost assigned retronym Dec 13, 2013
@retronym
Copy link
Member

retronym commented Jan 3, 2014

I'm going to close this for now. Let's take this up again post 2.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants