New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-4147 Add an implementation of `mutable.TreeMap` #4504

Merged
merged 1 commit into from Jun 25, 2015

Conversation

Projects
None yet
7 participants
@ruippeixotog
Member

ruippeixotog commented May 17, 2015

This commit contains an implementation of a mutable red-black tree with focus on performance. It also contains a new mutable.TreeMap Scala collection that is backed by the aforementioned tree. The common generic factories and traits related to mutable sorted maps didn't exist yet, so this commit also adds them.

Regarding performance, TreeMap overrides (from MapLike and SortedMapLike) all of the most common methods for maps and also those whose default implementations are asymptotically worse than direct red-black tree algorithms (e.g. last, clear).

The rangeImpl method of TreeMap returns an instance of TreeMapView, an inner class of TreeMap. This view is backed by the same RedBlackTree.Tree instance, and therefore changes to the original map are reflected in the view and vice-versa. The semantics of mutating a view by adding and removing keys outside the view's range are the same of the current mutable.TreeSet. A bit less focus was given on the performance of views - in particular, getting the size of a TreeMapView is O(n) on the number of elements inside the view bounds. That can be improved in the future.

In a future commit, mutable.TreeSet can be changed to be backed by this red-black tree implementation.

I didn't add unit tests for this collection because I think writing extensive JUnit tests for all the methods would be way too ineffective. Instead, I created a separate project (available here) containing a suite of ScalaCheck properties for both TreeMap and TreeMapView and a (rather unscientific, but somewhat conclusive) performance comparison between immutable.TreeMap, mutable.TreeMap and java.util.TreeMap. A sample result from this last test:

Total unique elements to insert/get/delete: 1000000

######
Insertion
######
--- scala.collection.immutable.TreeMap ---
Took 1653 ms
--- scala.collection.mutable.TreeMap ---
Took 1264 ms
--- java.util.TreeMap ---
Took 965 ms

######
Search
######
--- scala.collection.immutable.TreeMap ---
Took 961 ms
--- scala.collection.mutable.TreeMap ---
Took 932 ms
--- java.util.TreeMap ---
Took 899 ms

######
Deletion
######
--- scala.collection.immutable.TreeMap ---
Took 2622 ms
--- scala.collection.mutable.TreeMap ---
Took 1072 ms
--- java.util.TreeMap ---
Took 1044 ms

I understand if you want tests for this in the Scala repository nonetheless. If that's the case, please tell me what would be a sufficient test suite for this collection and I'll add it. Please also tell me if I forgot to do something, as it's my first "big" contribution to Scala :)

This should close SI-4147.

@scala-jenkins scala-jenkins added this to the 2.12.0-M2 milestone May 17, 2015

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 18, 2015

Member

Could you please update the commit title to: "SI-4147 Add an implementation of mutable.TreeMap"

Member

retronym commented May 18, 2015

Could you please update the commit title to: "SI-4147 Add an implementation of mutable.TreeMap"

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 18, 2015

Member

We are in the process of setting up property-based testing for the collections to https://github.com/scala/scala-collections-laws. Perhaps @Ichoran will take your tests cases over to the repository.

However, Scalacheck tests are supported under test/scalacheck. I'd suggest to add your tests initially to there, and we can move later them if needed.

Member

retronym commented May 18, 2015

We are in the process of setting up property-based testing for the collections to https://github.com/scala/scala-collections-laws. Perhaps @Ichoran will take your tests cases over to the repository.

However, Scalacheck tests are supported under test/scalacheck. I'd suggest to add your tests initially to there, and we can move later them if needed.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 18, 2015

Member

Further review, please, by @Ichoran, and, if he has a few moments spare, @axel22.

Member

retronym commented May 18, 2015

Further review, please, by @Ichoran, and, if he has a few moments spare, @axel22.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 18, 2015

Member
  • Serialization. Add a @serialversionuid's to the classes that will be serialized and add a test for stability of serialization in test/files/run/t8549.scala.
Member

retronym commented May 18, 2015

  • Serialization. Add a @serialversionuid's to the classes that will be serialized and add a test for stability of serialization in test/files/run/t8549.scala.
@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 18, 2015

Member

@ruippeixotog I forgot to say: welcome and thank you! This contribution looks to be of a really high standard already, looking forwards to having this data structure in our toolbox.

Member

retronym commented May 18, 2015

@ruippeixotog I forgot to say: welcome and thank you! This contribution looks to be of a really high standard already, looking forwards to having this data structure in our toolbox.

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 18, 2015

Member

Hi @retronym, thanks for the comments! I'll do the changes you mentioned above (please read my comments about making TreeMap final, though).

There is no folder test/scalacheck in the project, did you mean test/files/scalacheck? Can you tell me if there is a way to run only those tests using Ant or SBT?

Member

ruippeixotog commented May 18, 2015

Hi @retronym, thanks for the comments! I'll do the changes you mentioned above (please read my comments about making TreeMap final, though).

There is no folder test/scalacheck in the project, did you mean test/files/scalacheck? Can you tell me if there is a way to run only those tests using Ant or SBT?

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 18, 2015

Member

./test/partest --scalacheck or ./test/partest test/files/scalacheck/test.scala will run them. ant test.suite includes them, too.

Member

retronym commented May 18, 2015

./test/partest --scalacheck or ./test/partest test/files/scalacheck/test.scala will run them. ant test.suite includes them, too.

@retronym retronym closed this May 18, 2015

@retronym retronym reopened this May 18, 2015

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 18, 2015

Member

@retronym, I have just updated the pull request with the changes you mentioned :)

However, I didn't know exactly what to do in test/files/run/t8549.scala. I suppose I should manually add entries there for TreeMap, but I don't know how to run those scalac-hash commands, as I don't have those binaries in my PATH. Can you guide me on what to do?

Member

ruippeixotog commented May 18, 2015

@retronym, I have just updated the pull request with the changes you mentioned :)

However, I didn't know exactly what to do in test/files/run/t8549.scala. I suppose I should manually add entries there for TreeMap, but I don't know how to run those scalac-hash commands, as I don't have those binaries in my PATH. Can you guide me on what to do?

cmp = ord.compare(key, x.key)
x = if (cmp < 0) x.left else x.right
}
if (cmp <= 0) y else successor(y)

This comment has been minimized.

@axel22

axel22 May 18, 2015

Member

If node == Node(1, null, null) and key == 1, then we have:

x = Node(1, null, null)
y = x
cmp = ord.compare(1, 1) // == 0
x = null

And we return y == Node(1, null, null).
Is this correct? Shouldn't we return the first strictly greater node after key, i.e. null in this case.

@axel22

axel22 May 18, 2015

Member

If node == Node(1, null, null) and key == 1, then we have:

x = Node(1, null, null)
y = x
cmp = ord.compare(1, 1) // == 0
x = null

And we return y == Node(1, null, null).
Is this correct? Shouldn't we return the first strictly greater node after key, i.e. null in this case.

This comment has been minimized.

@axel22

axel22 May 18, 2015

Member

If the after is inclusive, a comment for this method would help.

@axel22

axel22 May 18, 2015

Member

If the after is inclusive, a comment for this method would help.

This comment has been minimized.

@ruippeixotog

ruippeixotog May 18, 2015

Member

Yes, minAfter and maxBefore follow the semantics of from and until in ranges. I agree that it may be misleading, I'll add a comment explaining that.

@ruippeixotog

ruippeixotog May 18, 2015

Member

Yes, minAfter and maxBefore follow the semantics of from and until in ranges. I agree that it may be misleading, I'll add a comment explaining that.

cmp = ord.compare(key, x.key)
x = if (cmp < 0) x.left else x.right
}
if (cmp > 0) y else predecessor(y)

This comment has been minimized.

@axel22

axel22 May 18, 2015

Member

If we have node == Node(1, null, null) and key == 1, then we return predecessor(y), which is null this time. It seems that before is not inclusive. Could we document that? It is not immediately obvious.

@axel22

axel22 May 18, 2015

Member

If we have node == Node(1, null, null) and key == 1, then we return predecessor(y), which is null this time. It seems that before is not inclusive. Could we document that? It is not immediately obvious.

This comment has been minimized.

@ruippeixotog

ruippeixotog May 18, 2015

Member

Yes, I implemented those methods with from/until semantics because they are useful for range projections. I'll add a proper description for both methods.

@ruippeixotog

ruippeixotog May 18, 2015

Member

Yes, I implemented those methods with from/until semantics because they are useful for range projections. I'll add a proper description for both methods.

@ruippeixotog ruippeixotog changed the title from Add an implementation of `mutable.TreeMap` to SI-4147 Add an implementation of `mutable.TreeMap` May 18, 2015

private[this] def fixAfterInsert[A, B](tree: Tree[A, B], node: Node[A, B]): Unit = {
var z = node
while (isRed(z.parent)) {
if (z.parent eq z.parent.parent.left) {

This comment has been minimized.

@axel22

axel22 May 18, 2015

Member

Is there a possibility of a NullPointerException here?
Looks like, if you call this from insert after adding a right child to the root node of the tree, i.e.:

root
  |
Node(1) // this is red, from the first insert
  |    \
null   Node(3)  <-- call fixAfterInsert here (this is z)

then z.parent.parent == null, no?

@axel22

axel22 May 18, 2015

Member

Is there a possibility of a NullPointerException here?
Looks like, if you call this from insert after adding a right child to the root node of the tree, i.e.:

root
  |
Node(1) // this is red, from the first insert
  |    \
null   Node(3)  <-- call fixAfterInsert here (this is z)

then z.parent.parent == null, no?

This comment has been minimized.

@ruippeixotog

ruippeixotog May 18, 2015

Member

Note that the root node is always black in a red-black tree. Therefore, if isRed(z.parent), z.parent can't be the root node and therefore z.parent.parent can't be null.

@ruippeixotog

ruippeixotog May 18, 2015

Member

Note that the root node is always black in a red-black tree. Therefore, if isRed(z.parent), z.parent can't be the root node and therefore z.parent.parent can't be null.

This comment has been minimized.

@axel22

axel22 May 18, 2015

Member

Ah, ok - I now saw line 204, where you ensure this.

@axel22

axel22 May 18, 2015

Member

Ah, ok - I now saw line 204, where you ensure this.

@retronym

This comment has been minimized.

Show comment
Hide comment
@retronym

retronym May 19, 2015

Member

scalac-hash is from https://github.com/retronym/libscala. It's just a shortcut to run old versions of Scala.

But in this case, just use ./build/quick/bin/scala[c]. The instructions in that test aren't really applicable to new datastructures.

The intention of that test is to serialize the datastructure, record the results (in the base64 encoded strings in the test case itself), and then check that we can continue to deserialize that in the future.

Member

retronym commented May 19, 2015

scalac-hash is from https://github.com/retronym/libscala. It's just a shortcut to run old versions of Scala.

But in this case, just use ./build/quick/bin/scala[c]. The instructions in that test aren't really applicable to new datastructures.

The intention of that test is to serialize the datastructure, record the results (in the base64 encoded strings in the test case itself), and then check that we can continue to deserialize that in the future.

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 19, 2015

Member

Roger! I just did the process and updated the pull request.

Member

ruippeixotog commented May 19, 2015

Roger! I just did the process and updated the pull request.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 20, 2015

Contributor

This looks quite promising, but I'm at a conference and recovering from food poisoning so I'm not going to get to it for at least a couple more days.

Contributor

Ichoran commented May 20, 2015

This looks quite promising, but I'm at a conference and recovering from food poisoning so I'm not going to get to it for at least a couple more days.

if (y.parent eq z) xParent = y
else {
xParent = y.parent
transplant(tree, y, y.right)

This comment has been minimized.

@axel22

axel22 May 28, 2015

Member

Do you require a fix here - is it possible to get violate the no-2-subsequent red nodes invariant?

@axel22

axel22 May 28, 2015

Member

Do you require a fix here - is it possible to get violate the no-2-subsequent red nodes invariant?

This comment has been minimized.

@ruippeixotog

ruippeixotog May 28, 2015

Member

At this point in the method, it is indeed possible that a red node becomes the child of another red node - it is intended behavior. In this transplant y.right takes the place of y and that situation can happen if both y.parent and y.right are red (note that breaking that invariant can only happen if y is originally black, information that is stored in yIsRed).

At the end of delete, fixAfterDelete is called precisely to fix that. It receives the node that took the place of y (which is set as y.right in line 237) and its new parent, and does the necessary rotations in order to reestablish the tree invariants, going up the tree possibly until the root node to do that.

@ruippeixotog

ruippeixotog May 28, 2015

Member

At this point in the method, it is indeed possible that a red node becomes the child of another red node - it is intended behavior. In this transplant y.right takes the place of y and that situation can happen if both y.parent and y.right are red (note that breaking that invariant can only happen if y is originally black, information that is stored in yIsRed).

At the end of delete, fixAfterDelete is called precisely to fix that. It receives the node that took the place of y (which is set as y.right in line 237) and its new parent, and does the necessary rotations in order to reestablish the tree invariants, going up the tree possibly until the root node to do that.

override protected[this] def newBuilder: Builder[(A, B), SortedMap[A, B]] = SortedMap.newBuilder[A, B]
override def empty: SortedMap[A, B] = SortedMap.empty

This comment has been minimized.

@axel22

axel22 May 28, 2015

Member

Add newline.

@axel22

axel22 May 28, 2015

Member

Add newline.

override def ++[B1 >: B](xs: GenTraversableOnce[(A, B1)]): SortedMap[A, B1] =
clone().asInstanceOf[SortedMap[A, B1]] ++= xs.seq
}

This comment has been minimized.

@axel22

axel22 May 28, 2015

Member

You might want to also instantiate an abstract class for the SortedMap trait - see example here:

https://github.com/scala/scala/blob/v2.11.5/src/library/scala/collection/mutable/Map.scala#L78

@axel22

axel22 May 28, 2015

Member

You might want to also instantiate an abstract class for the SortedMap trait - see example here:

https://github.com/scala/scala/blob/v2.11.5/src/library/scala/collection/mutable/Map.scala#L78

@axel22

This comment has been minimized.

Show comment
Hide comment
@axel22

axel22 May 28, 2015

Member

.LGTM after comments are addressed.

Member

axel22 commented May 28, 2015

.LGTM after comments are addressed.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 30, 2015

Contributor

Hmmm, got a NPE in collections tests in transformNodeNonNull. I'm trying to isolate it.

Contributor

Ichoran commented May 30, 2015

Hmmm, got a NPE in collections tests in transformNodeNonNull. I'm trying to isolate it.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 30, 2015

Contributor

transform gives an NPE when transforming a single-element map because transformNode assumes that if node exists, node.left does also (when it passes it on to transformNodeNonNull).

Contributor

Ichoran commented May 30, 2015

transform gives an NPE when transforming a single-element map because transformNode assumes that if node exists, node.left does also (when it passes it on to transformNodeNonNull).

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 30, 2015

Contributor

Once you fix the NPE (I commented on the line how to fix it), it passes all the collections-laws tests. But note that this isn't very good at exhaustively testing addition and subtraction from the library. That is done much better by test/junit/scala/collection/SetMapConsistencyTest.scala. If you could add TreeMap there also, that would increase our confidence. It does a fair bit of churning. I haven't tested the view yet because the coverage of views is not very good anywhere.

Contributor

Ichoran commented May 30, 2015

Once you fix the NPE (I commented on the line how to fix it), it passes all the collections-laws tests. But note that this isn't very good at exhaustively testing addition and subtraction from the library. That is done much better by test/junit/scala/collection/SetMapConsistencyTest.scala. If you could add TreeMap there also, that would increase our confidence. It does a fair bit of churning. I haven't tested the view yet because the coverage of views is not very good anywhere.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 30, 2015

Contributor

This should also close https://issues.scala-lang.org/browse/SI-6938 since that basically says "immutable RB trees use too much memory", and the most practical solution is to use a mutable variant. (If someone decides this uses too much memory also, that'd be a different bug anyway.)

Contributor

Ichoran commented May 30, 2015

This should also close https://issues.scala-lang.org/browse/SI-6938 since that basically says "immutable RB trees use too much memory", and the most practical solution is to use a mutable variant. (If someone decides this uses too much memory also, that'd be a different bug anyway.)

SI-4147 Add an implementation of `mutable.TreeMap`
This commit contains an implementation of a mutable red-black tree with focus on performance. It also contains a new `mutable.TreeMap` Scala collection that is backed by the aforementioned tree. The common generic factories and traits related to mutable sorted maps didn't exist yet, so this commit also adds them.

Regarding performance, `TreeMap` overrides (from `MapLike` and `SortedMapLike`) all of the most common methods for maps and also those whose default implementations are asymptotically worse than direct red-black tree algorithms (e.g. `last`, `clear`).

The `rangeImpl` method of `TreeMap` returns an instance of `TreeMapView`, an inner class of `TreeMap`. This view is backed by the same `RedBlackTree.Tree` instance, and therefore changes to the original map are reflected in the view and vice-versa. The semantics of mutating a view by adding and removing keys outside the view's range are the same of the current `mutable.TreeSet`. A bit less focus was given on the performance of views - in particular, getting the `size` of a `TreeMapView` is O(n) on the number of elements inside the view bounds. That can be improved in the future.

In a future commit, `mutable.TreeSet` can be changed to be backed by this red-black tree implementation.
@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 30, 2015

Member

I have just fixed the dumb error in transform and added the new map to SetMapConsistencyTest. The scalacheck specs also cover random insert/remove sequences, mostly on a more internal level and on mutating operations.

If you want, after this PR is accepted I can do a new one in which mutable.TreeSet starts using a RB.Tree[A, Null] as its underlying data structure - shouldn't SI-6938 be closed only once that happened?

Member

ruippeixotog commented May 30, 2015

I have just fixed the dumb error in transform and added the new map to SetMapConsistencyTest. The scalacheck specs also cover random insert/remove sequences, mostly on a more internal level and on mutating operations.

If you want, after this PR is accepted I can do a new one in which mutable.TreeSet starts using a RB.Tree[A, Null] as its underlying data structure - shouldn't SI-6938 be closed only once that happened?

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 30, 2015

Contributor

You're right regarding SI-6938; it's a little more than just having a RB tree available. One does have to use it! If you would like to look into whether it's easy to switch the wrapping in a source-compatible way, that would be great. If not, this is already a great contribution!

Contributor

Ichoran commented May 30, 2015

You're right regarding SI-6938; it's a little more than just having a RB tree available. One does have to use it! If you would like to look into whether it's easy to switch the wrapping in a source-compatible way, that would be great. If not, this is already a great contribution!

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 30, 2015

Member

At a first glance it should be easy to change the internals of mutable.TreeSet to use the mutable RB tree instead of the immutable one without breaking client code - the constructor that receives the tree, as well as all the internal methods and fields, are private. It most probably wouldn't be binary compatible though, and mutable.TreeSets serialized using old Scala versions wouldn't be deserializable after the change. If that's acceptable for 2.12.x, I can surely do that :)

I rebased this branch on origin/2.12.x as @retronym suggested, but it didn't seem to solve all build issues...

Member

ruippeixotog commented May 30, 2015

At a first glance it should be easy to change the internals of mutable.TreeSet to use the mutable RB tree instead of the immutable one without breaking client code - the constructor that receives the tree, as well as all the internal methods and fields, are private. It most probably wouldn't be binary compatible though, and mutable.TreeSets serialized using old Scala versions wouldn't be deserializable after the change. If that's acceptable for 2.12.x, I can surely do that :)

I rebased this branch on origin/2.12.x as @retronym suggested, but it didn't seem to solve all build issues...

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog Jun 3, 2015

Member

/rebuild

Member

ruippeixotog commented Jun 3, 2015

/rebuild

@adriaanm adriaanm referenced this pull request Jun 24, 2015

Merged

Merge 2.11.x to 2.12.x #4578

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 25, 2015

Member

/rebuild

Member

adriaanm commented Jun 25, 2015

/rebuild

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 25, 2015

Member

Ping @Ichoran. The build issues turned out to be a hotspot bug (see #4578 (comment)). Thanks for your patience! We'll do our best to get this merged for M2.

Member

adriaanm commented Jun 25, 2015

Ping @Ichoran. The build issues turned out to be a hotspot bug (see #4578 (comment)). Thanks for your patience! We'll do our best to get this merged for M2.

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 25, 2015

Member

Ping @axel22 -- were your comments addressed? (PS: please only start your comment with LGTM to signal the PR is ready for merge unconditionally)

Member

adriaanm commented Jun 25, 2015

Ping @axel22 -- were your comments addressed? (PS: please only start your comment with LGTM to signal the PR is ready for merge unconditionally)

@adriaanm adriaanm removed the reviewed label Jun 25, 2015

@axel22

This comment has been minimized.

Show comment
Hide comment
@axel22

axel22 Jun 25, 2015

Member

Yes, they were addressed.

Member

axel22 commented Jun 25, 2015

Yes, they were addressed.

@axel22

This comment has been minimized.

Show comment
Hide comment
@axel22

axel22 Jun 25, 2015

Member

LGTM

Member

axel22 commented Jun 25, 2015

LGTM

@adriaanm

This comment has been minimized.

Show comment
Hide comment
@adriaanm

adriaanm Jun 25, 2015

Member

Thanks!

Member

adriaanm commented Jun 25, 2015

Thanks!

adriaanm added a commit that referenced this pull request Jun 25, 2015

Merge pull request #4504 from ruippeixotog/mutable-treemap
SI-4147 Add an implementation of `mutable.TreeMap`

@adriaanm adriaanm merged commit e45dfe1 into scala:2.12.x Jun 25, 2015

5 checks passed

cla @ruippeixotog signed the Scala CLA. Thanks!
Details
integrate-ide [86] SUCCESS. Took 2 s.
Details
validate-main [121] SUCCESS. Took 214 min.
Details
validate-publish-core [122] SUCCESS. Took 46 s.
Details
validate-test [89] SUCCESS. Took 72 min.
Details

@ruippeixotog ruippeixotog deleted the ruippeixotog:mutable-treemap branch Nov 26, 2015

@lrytz lrytz added the release-notes label Jan 28, 2016

@adriaanm adriaanm added 2.12.0 and removed 2.12 labels Oct 29, 2016

@lrytz lrytz referenced this pull request Nov 1, 2016

Closed

notes on possible 2.12 release notes improvements #202

12 of 16 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment