-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SI-4147 Add an implementation of mutable.TreeMap
#4504
Conversation
* An object containing the red-black tree implementation used by mutable `TreeMaps`. For performance reasons, this | ||
* implementation uses `null` references to represent leaves instead of a sentinel node. Currently, the internal nodes | ||
* do not store their subtree size - only the tree object keeps track of their size. Therefore, while obtaining the size | ||
* of the whole tree is O(1), knowing the number of entries inside a range is O(n) on the size of the range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move the implementation comment to //
comment block, the doc comment should be tailored for the user of the API, not the maintainer. It would be good to keep some notes about performance in the API docs, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth keeping notes in the API docs about performance? Saying that it is a red-black tree tells users the complexity of most operations, and the note about size performance seems really only internal, as I do not provide a public method for getting the size of a range (TreeMapView
uses the size of its iterator for calculating the collection size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's worthwhile to leave the complexity comments in the user API docs, and move everything else into a normal comment. Perhaps you could alternatively move the O(n) comment about the range size to the respective method.
Could you please update the commit title to: "SI-4147 Add an implementation of |
We are in the process of setting up property-based testing for the collections to https://github.com/scala/scala-collections-laws. Perhaps @Ichoran will take your tests cases over to the repository. However, Scalacheck tests are supported under |
|
@ruippeixotog I forgot to say: welcome and thank you! This contribution looks to be of a really high standard already, looking forwards to having this data structure in our toolbox. |
Hi @retronym, thanks for the comments! I'll do the changes you mentioned above (please read my comments about making There is no folder |
|
4c1ce60
to
8d2d0db
Compare
@retronym, I have just updated the pull request with the changes you mentioned :) However, I didn't know exactly what to do in |
cmp = ord.compare(key, x.key) | ||
x = if (cmp < 0) x.left else x.right | ||
} | ||
if (cmp <= 0) y else successor(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If node == Node(1, null, null)
and key == 1
, then we have:
x = Node(1, null, null)
y = x
cmp = ord.compare(1, 1) // == 0
x = null
And we return y == Node(1, null, null)
.
Is this correct? Shouldn't we return the first strictly greater node after key
, i.e. null
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the after
is inclusive, a comment for this method would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, minAfter
and maxBefore
follow the semantics of from
and until
in ranges. I agree that it may be misleading, I'll add a comment explaining that.
mutable.TreeMap
mutable.TreeMap
private[this] def fixAfterInsert[A, B](tree: Tree[A, B], node: Node[A, B]): Unit = { | ||
var z = node | ||
while (isRed(z.parent)) { | ||
if (z.parent eq z.parent.parent.left) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a possibility of a NullPointerException
here?
Looks like, if you call this from insert after adding a right child to the root node of the tree, i.e.:
root
|
Node(1) // this is red, from the first insert
| \
null Node(3) <-- call fixAfterInsert here (this is z)
then z.parent.parent == null
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the root node is always black in a red-black tree. Therefore, if isRed(z.parent)
, z.parent
can't be the root node and therefore z.parent.parent
can't be null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok - I now saw line 204, where you ensure this.
8d2d0db
to
97ead69
Compare
But in this case, just use The intention of that test is to serialize the datastructure, record the results (in the base64 encoded strings in the test case itself), and then check that we can continue to deserialize that in the future. |
97ead69
to
356ce1d
Compare
Roger! I just did the process and updated the pull request. |
This looks quite promising, but I'm at a conference and recovering from food poisoning so I'm not going to get to it for at least a couple more days. |
/** | ||
* Transplant the node `from` to the place of node `to`. This is done by making `from` a child of `to`'s parent | ||
* instead of `from` and making `from`'s parent the new parent of `to`. The children of `from` are left unchanged. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be:
"This is done by setting `from` as a child of `to`'s previous parent, and setting `from`'s parent to the `to`'s previous parent. The children of `from` are left unchanged."
In the code, you never set the to.parent
.
|
def transform[A, B](tree: Tree[A, B], f: (A, B) => B): Unit = transformNode(tree.root, f) | ||
|
||
private[this] def transformNode[A, B, U](node: Node[A, B], f: (A, B) => B): Unit = | ||
if (node ne null) transformNodeNonNull(node.left, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be transformNodeNonNull(node, f)
? The way it is gives a NPE and misses part of the tree.
Once you fix the NPE (I commented on the line how to fix it), it passes all the collections-laws tests. But note that this isn't very good at exhaustively testing addition and subtraction from the library. That is done much better by test/junit/scala/collection/SetMapConsistencyTest.scala. If you could add TreeMap there also, that would increase our confidence. It does a fair bit of churning. I haven't tested the view yet because the coverage of views is not very good anywhere. |
This should also close https://issues.scala-lang.org/browse/SI-6938 since that basically says "immutable RB trees use too much memory", and the most practical solution is to use a mutable variant. (If someone decides this uses too much memory also, that'd be a different bug anyway.) |
This commit contains an implementation of a mutable red-black tree with focus on performance. It also contains a new `mutable.TreeMap` Scala collection that is backed by the aforementioned tree. The common generic factories and traits related to mutable sorted maps didn't exist yet, so this commit also adds them. Regarding performance, `TreeMap` overrides (from `MapLike` and `SortedMapLike`) all of the most common methods for maps and also those whose default implementations are asymptotically worse than direct red-black tree algorithms (e.g. `last`, `clear`). The `rangeImpl` method of `TreeMap` returns an instance of `TreeMapView`, an inner class of `TreeMap`. This view is backed by the same `RedBlackTree.Tree` instance, and therefore changes to the original map are reflected in the view and vice-versa. The semantics of mutating a view by adding and removing keys outside the view's range are the same of the current `mutable.TreeSet`. A bit less focus was given on the performance of views - in particular, getting the `size` of a `TreeMapView` is O(n) on the number of elements inside the view bounds. That can be improved in the future. In a future commit, `mutable.TreeSet` can be changed to be backed by this red-black tree implementation.
f1c2d2a
to
0e8b73b
Compare
I have just fixed the dumb error in If you want, after this PR is accepted I can do a new one in which |
You're right regarding SI-6938; it's a little more than just having a RB tree available. One does have to use it! If you would like to look into whether it's easy to switch the wrapping in a source-compatible way, that would be great. If not, this is already a great contribution! |
At a first glance it should be easy to change the internals of I rebased this branch on |
/rebuild |
/rebuild |
Ping @Ichoran. The build issues turned out to be a hotspot bug (see #4578 (comment)). Thanks for your patience! We'll do our best to get this merged for M2. |
Ping @axel22 -- were your comments addressed? (PS: please only start your comment with LGTM to signal the PR is ready for merge unconditionally) |
Yes, they were addressed. |
LGTM |
Thanks! |
SI-4147 Add an implementation of `mutable.TreeMap`
This commit contains an implementation of a mutable red-black tree with focus on performance. It also contains a new
mutable.TreeMap
Scala collection that is backed by the aforementioned tree. The common generic factories and traits related to mutable sorted maps didn't exist yet, so this commit also adds them.Regarding performance,
TreeMap
overrides (fromMapLike
andSortedMapLike
) all of the most common methods for maps and also those whose default implementations are asymptotically worse than direct red-black tree algorithms (e.g.last
,clear
).The
rangeImpl
method ofTreeMap
returns an instance ofTreeMapView
, an inner class ofTreeMap
. This view is backed by the sameRedBlackTree.Tree
instance, and therefore changes to the original map are reflected in the view and vice-versa. The semantics of mutating a view by adding and removing keys outside the view's range are the same of the currentmutable.TreeSet
. A bit less focus was given on the performance of views - in particular, getting thesize
of aTreeMapView
is O(n) on the number of elements inside the view bounds. That can be improved in the future.In a future commit,
mutable.TreeSet
can be changed to be backed by this red-black tree implementation.I didn't add unit tests for this collection because I think writing extensive JUnit tests for all the methods would be way too ineffective. Instead, I created a separate project (available here) containing a suite of ScalaCheck properties for both
TreeMap
andTreeMapView
and a (rather unscientific, but somewhat conclusive) performance comparison betweenimmutable.TreeMap
,mutable.TreeMap
andjava.util.TreeMap
. A sample result from this last test:I understand if you want tests for this in the Scala repository nonetheless. If that's the case, please tell me what would be a sufficient test suite for this collection and I'll add it. Please also tell me if I forgot to do something, as it's my first "big" contribution to Scala :)
This should close SI-4147.