Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-4147 Add an implementation of mutable.TreeMap #4504

Merged
merged 1 commit into from
Jun 25, 2015

Conversation

ruippeixotog
Copy link
Contributor

This commit contains an implementation of a mutable red-black tree with focus on performance. It also contains a new mutable.TreeMap Scala collection that is backed by the aforementioned tree. The common generic factories and traits related to mutable sorted maps didn't exist yet, so this commit also adds them.

Regarding performance, TreeMap overrides (from MapLike and SortedMapLike) all of the most common methods for maps and also those whose default implementations are asymptotically worse than direct red-black tree algorithms (e.g. last, clear).

The rangeImpl method of TreeMap returns an instance of TreeMapView, an inner class of TreeMap. This view is backed by the same RedBlackTree.Tree instance, and therefore changes to the original map are reflected in the view and vice-versa. The semantics of mutating a view by adding and removing keys outside the view's range are the same of the current mutable.TreeSet. A bit less focus was given on the performance of views - in particular, getting the size of a TreeMapView is O(n) on the number of elements inside the view bounds. That can be improved in the future.

In a future commit, mutable.TreeSet can be changed to be backed by this red-black tree implementation.

I didn't add unit tests for this collection because I think writing extensive JUnit tests for all the methods would be way too ineffective. Instead, I created a separate project (available here) containing a suite of ScalaCheck properties for both TreeMap and TreeMapView and a (rather unscientific, but somewhat conclusive) performance comparison between immutable.TreeMap, mutable.TreeMap and java.util.TreeMap. A sample result from this last test:

Total unique elements to insert/get/delete: 1000000

######
Insertion
######
--- scala.collection.immutable.TreeMap ---
Took 1653 ms
--- scala.collection.mutable.TreeMap ---
Took 1264 ms
--- java.util.TreeMap ---
Took 965 ms

######
Search
######
--- scala.collection.immutable.TreeMap ---
Took 961 ms
--- scala.collection.mutable.TreeMap ---
Took 932 ms
--- java.util.TreeMap ---
Took 899 ms

######
Deletion
######
--- scala.collection.immutable.TreeMap ---
Took 2622 ms
--- scala.collection.mutable.TreeMap ---
Took 1072 ms
--- java.util.TreeMap ---
Took 1044 ms

I understand if you want tests for this in the Scala repository nonetheless. If that's the case, please tell me what would be a sufficient test suite for this collection and I'll add it. Please also tell me if I forgot to do something, as it's my first "big" contribution to Scala :)

This should close SI-4147.

@scala-jenkins scala-jenkins added this to the 2.12.0-M2 milestone May 17, 2015
* An object containing the red-black tree implementation used by mutable `TreeMaps`. For performance reasons, this
* implementation uses `null` references to represent leaves instead of a sentinel node. Currently, the internal nodes
* do not store their subtree size - only the tree object keeps track of their size. Therefore, while obtaining the size
* of the whole tree is O(1), knowing the number of entries inside a range is O(n) on the size of the range.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move the implementation comment to // comment block, the doc comment should be tailored for the user of the API, not the maintainer. It would be good to keep some notes about performance in the API docs, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is worth keeping notes in the API docs about performance? Saying that it is a red-black tree tells users the complexity of most operations, and the note about size performance seems really only internal, as I do not provide a public method for getting the size of a range (TreeMapView uses the size of its iterator for calculating the collection size).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's worthwhile to leave the complexity comments in the user API docs, and move everything else into a normal comment. Perhaps you could alternatively move the O(n) comment about the range size to the respective method.

@retronym
Copy link
Member

Could you please update the commit title to: "SI-4147 Add an implementation of mutable.TreeMap"

@retronym
Copy link
Member

We are in the process of setting up property-based testing for the collections to https://github.com/scala/scala-collections-laws. Perhaps @Ichoran will take your tests cases over to the repository.

However, Scalacheck tests are supported under test/scalacheck. I'd suggest to add your tests initially to there, and we can move later them if needed.

@retronym
Copy link
Member

Further review, please, by @Ichoran, and, if he has a few moments spare, @axel22.

@retronym
Copy link
Member

  • Serialization. Add a @serialversionuid's to the classes that will be serialized and add a test for stability of serialization in test/files/run/t8549.scala.

@retronym
Copy link
Member

@ruippeixotog I forgot to say: welcome and thank you! This contribution looks to be of a really high standard already, looking forwards to having this data structure in our toolbox.

@ruippeixotog
Copy link
Contributor Author

Hi @retronym, thanks for the comments! I'll do the changes you mentioned above (please read my comments about making TreeMap final, though).

There is no folder test/scalacheck in the project, did you mean test/files/scalacheck? Can you tell me if there is a way to run only those tests using Ant or SBT?

@retronym
Copy link
Member

./test/partest --scalacheck or ./test/partest test/files/scalacheck/test.scala will run them. ant test.suite includes them, too.

@ruippeixotog
Copy link
Contributor Author

@retronym, I have just updated the pull request with the changes you mentioned :)

However, I didn't know exactly what to do in test/files/run/t8549.scala. I suppose I should manually add entries there for TreeMap, but I don't know how to run those scalac-hash commands, as I don't have those binaries in my PATH. Can you guide me on what to do?

cmp = ord.compare(key, x.key)
x = if (cmp < 0) x.left else x.right
}
if (cmp <= 0) y else successor(y)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If node == Node(1, null, null) and key == 1, then we have:

x = Node(1, null, null)
y = x
cmp = ord.compare(1, 1) // == 0
x = null

And we return y == Node(1, null, null).
Is this correct? Shouldn't we return the first strictly greater node after key, i.e. null in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the after is inclusive, a comment for this method would help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, minAfter and maxBefore follow the semantics of from and until in ranges. I agree that it may be misleading, I'll add a comment explaining that.

@ruippeixotog ruippeixotog changed the title Add an implementation of mutable.TreeMap SI-4147 Add an implementation of mutable.TreeMap May 18, 2015
private[this] def fixAfterInsert[A, B](tree: Tree[A, B], node: Node[A, B]): Unit = {
var z = node
while (isRed(z.parent)) {
if (z.parent eq z.parent.parent.left) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility of a NullPointerException here?
Looks like, if you call this from insert after adding a right child to the root node of the tree, i.e.:

root
  |
Node(1) // this is red, from the first insert
  |    \
null   Node(3)  <-- call fixAfterInsert here (this is z)

then z.parent.parent == null, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the root node is always black in a red-black tree. Therefore, if isRed(z.parent), z.parent can't be the root node and therefore z.parent.parent can't be null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok - I now saw line 204, where you ensure this.

@retronym
Copy link
Member

scalac-hash is from https://github.com/retronym/libscala. It's just a shortcut to run old versions of Scala.

But in this case, just use ./build/quick/bin/scala[c]. The instructions in that test aren't really applicable to new datastructures.

The intention of that test is to serialize the datastructure, record the results (in the base64 encoded strings in the test case itself), and then check that we can continue to deserialize that in the future.

@ruippeixotog
Copy link
Contributor Author

Roger! I just did the process and updated the pull request.

@Ichoran
Copy link
Contributor

Ichoran commented May 20, 2015

This looks quite promising, but I'm at a conference and recovering from food poisoning so I'm not going to get to it for at least a couple more days.

/**
* Transplant the node `from` to the place of node `to`. This is done by making `from` a child of `to`'s parent
* instead of `from` and making `from`'s parent the new parent of `to`. The children of `from` are left unchanged.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be:

"This is done by setting `from` as a child of `to`'s previous parent, and setting `from`'s parent to the `to`'s previous parent. The children of `from` are left unchanged."

In the code, you never set the to.parent.

@Ichoran
Copy link
Contributor

Ichoran commented May 30, 2015

transform gives an NPE when transforming a single-element map because transformNode assumes that if node exists, node.left does also (when it passes it on to transformNodeNonNull).

def transform[A, B](tree: Tree[A, B], f: (A, B) => B): Unit = transformNode(tree.root, f)

private[this] def transformNode[A, B, U](node: Node[A, B], f: (A, B) => B): Unit =
if (node ne null) transformNodeNonNull(node.left, f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be transformNodeNonNull(node, f)? The way it is gives a NPE and misses part of the tree.

@Ichoran
Copy link
Contributor

Ichoran commented May 30, 2015

Once you fix the NPE (I commented on the line how to fix it), it passes all the collections-laws tests. But note that this isn't very good at exhaustively testing addition and subtraction from the library. That is done much better by test/junit/scala/collection/SetMapConsistencyTest.scala. If you could add TreeMap there also, that would increase our confidence. It does a fair bit of churning. I haven't tested the view yet because the coverage of views is not very good anywhere.

@Ichoran
Copy link
Contributor

Ichoran commented May 30, 2015

This should also close https://issues.scala-lang.org/browse/SI-6938 since that basically says "immutable RB trees use too much memory", and the most practical solution is to use a mutable variant. (If someone decides this uses too much memory also, that'd be a different bug anyway.)

This commit contains an implementation of a mutable red-black tree with focus on performance. It also contains a new `mutable.TreeMap` Scala collection that is backed by the aforementioned tree. The common generic factories and traits related to mutable sorted maps didn't exist yet, so this commit also adds them.

Regarding performance, `TreeMap` overrides (from `MapLike` and `SortedMapLike`) all of the most common methods for maps and also those whose default implementations are asymptotically worse than direct red-black tree algorithms (e.g. `last`, `clear`).

The `rangeImpl` method of `TreeMap` returns an instance of `TreeMapView`, an inner class of `TreeMap`. This view is backed by the same `RedBlackTree.Tree` instance, and therefore changes to the original map are reflected in the view and vice-versa. The semantics of mutating a view by adding and removing keys outside the view's range are the same of the current `mutable.TreeSet`. A bit less focus was given on the performance of views - in particular, getting the `size` of a `TreeMapView` is O(n) on the number of elements inside the view bounds. That can be improved in the future.

In a future commit, `mutable.TreeSet` can be changed to be backed by this red-black tree implementation.
@ruippeixotog
Copy link
Contributor Author

I have just fixed the dumb error in transform and added the new map to SetMapConsistencyTest. The scalacheck specs also cover random insert/remove sequences, mostly on a more internal level and on mutating operations.

If you want, after this PR is accepted I can do a new one in which mutable.TreeSet starts using a RB.Tree[A, Null] as its underlying data structure - shouldn't SI-6938 be closed only once that happened?

@Ichoran
Copy link
Contributor

Ichoran commented May 30, 2015

You're right regarding SI-6938; it's a little more than just having a RB tree available. One does have to use it! If you would like to look into whether it's easy to switch the wrapping in a source-compatible way, that would be great. If not, this is already a great contribution!

@ruippeixotog
Copy link
Contributor Author

At a first glance it should be easy to change the internals of mutable.TreeSet to use the mutable RB tree instead of the immutable one without breaking client code - the constructor that receives the tree, as well as all the internal methods and fields, are private. It most probably wouldn't be binary compatible though, and mutable.TreeSets serialized using old Scala versions wouldn't be deserializable after the change. If that's acceptable for 2.12.x, I can surely do that :)

I rebased this branch on origin/2.12.x as @retronym suggested, but it didn't seem to solve all build issues...

@ruippeixotog
Copy link
Contributor Author

/rebuild

@adriaanm adriaanm mentioned this pull request Jun 24, 2015
@adriaanm
Copy link
Contributor

/rebuild

@adriaanm
Copy link
Contributor

Ping @Ichoran. The build issues turned out to be a hotspot bug (see #4578 (comment)). Thanks for your patience! We'll do our best to get this merged for M2.

@adriaanm
Copy link
Contributor

Ping @axel22 -- were your comments addressed? (PS: please only start your comment with LGTM to signal the PR is ready for merge unconditionally)

@adriaanm adriaanm removed the reviewed label Jun 25, 2015
@axel22
Copy link
Contributor

axel22 commented Jun 25, 2015

Yes, they were addressed.

@axel22
Copy link
Contributor

axel22 commented Jun 25, 2015

LGTM

@adriaanm
Copy link
Contributor

Thanks!

adriaanm added a commit that referenced this pull request Jun 25, 2015
SI-4147 Add an implementation of `mutable.TreeMap`
@adriaanm adriaanm merged commit e45dfe1 into scala:2.12.x Jun 25, 2015
@ruippeixotog ruippeixotog deleted the mutable-treemap branch November 26, 2015 22:32
@lrytz lrytz added the release-notes worth highlighting in next release notes label Jan 28, 2016
@adriaanm adriaanm added 2.12.0 and removed 2.12 labels Oct 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes worth highlighting in next release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants