New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance and behavior of ListMap and ListSet #5103

Merged
merged 3 commits into from May 17, 2016

Conversation

Projects
None yet
7 participants
@ruippeixotog
Member

ruippeixotog commented Apr 17, 2016

Makes the immutable ListMap and ListSet collections more alike one another, both in their semantics and in their performance.

In terms of semantics, makes the ListMap iterator return the elements in reverse order, as ListSet already does (improving its performance as a side-effect). While, as mentioned in SI-8985, ListMap and ListSet doesn't seem to make any guarantees in terms of iteration order, I believe users expect ListSet and ListMap to behave in the same way, particularly when they are implemented in the exact same way. The next method ListSet's iterator now throws an exception when called after the last element, a behavior consistent with all the other collections.

In terms of performance, ListMap is given a custom builder that avoids creation in O(N^2) time using a strategy similar to the one already applied in ListSet. ListSet's element removal method was not tail-recursive as the ListMap one, so a tail-recursive implementation was added.

@scala-jenkins scala-jenkins added this to the 2.12.0-M5 milestone Apr 17, 2016

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog Apr 17, 2016

Member

A Gist containing some naive benchmarks (together with the source code for them) is available here: https://gist.github.com/ruippeixotog/20f961055a94345ad24c2fd58079d26c

Member

ruippeixotog commented Apr 17, 2016

A Gist containing some naive benchmarks (together with the source code for them) is available here: https://gist.github.com/ruippeixotog/20f961055a94345ad24c2fd58079d26c

@@ -0,0 +1,16 @@
package scala.collection.immutable

This comment has been minimized.

@janekdb

janekdb Apr 19, 2016

Member

Add copyright header.

@janekdb

janekdb Apr 19, 2016

Member

Add copyright header.

This comment has been minimized.

@ruippeixotog

ruippeixotog Apr 19, 2016

Member

Most, if not all, JUnit tests lack a copyright header. I already wrote several tests for this repository before and the copyright never was a requirement... Are you sure it's mandatory? @Ichoran, can you tell what is the "official" policy in these cases?

@ruippeixotog

ruippeixotog Apr 19, 2016

Member

Most, if not all, JUnit tests lack a copyright header. I already wrote several tests for this repository before and the copyright never was a requirement... Are you sure it's mandatory? @Ichoran, can you tell what is the "official" policy in these cases?

This comment has been minimized.

@janekdb

janekdb Apr 20, 2016

Member

Confirming this was an assumption I had made. Please wait for the official policy.

@janekdb

janekdb Apr 20, 2016

Member

Confirming this was an assumption I had made. Please wait for the official policy.

@@ -0,0 +1,16 @@
package scala.collection.immutable

This comment has been minimized.

@janekdb

janekdb Apr 19, 2016

Member

Add copyright header.

@janekdb

janekdb Apr 19, 2016

Member

Add copyright header.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz Apr 29, 2016

Member

looks good otherwise. the build failure looks weird - let's just try again (amend your changes and git push -f).

Member

lrytz commented Apr 29, 2016

looks good otherwise. the build failure looks weird - let's just try again (amend your changes and git push -f).

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog Apr 29, 2016

Member

@lrytz thanks for the review, I just fixed both of the details you mentioned. The build failure didn't look like it was due to the modifications of this PR; I can try rebasing this into the current master if it doesn't work.

Member

ruippeixotog commented Apr 29, 2016

@lrytz thanks for the review, I just fixed both of the details you mentioned. The build failure didn't look like it was due to the modifications of this PR; I can try rebasing this into the current master if it doesn't work.

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog Apr 30, 2016

Member

I just rebased the branch on top of 2.12.x, but the error still occurs and Jenkins does not seem to be giving any helpful error messages. However, looking at the pull request list it seems that I'm the only one having this problem... Can anyone help me figuring out what's wrong with this PR?

Member

ruippeixotog commented Apr 30, 2016

I just rebased the branch on top of 2.12.x, but the error still occurs and Jenkins does not seem to be giving any helpful error messages. However, looking at the pull request list it seems that I'm the only one having this problem... Can anyone help me figuring out what's wrong with this PR?

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran Apr 30, 2016

Contributor

I have one major concern about performance. I wasn't able to spot why the test is failing, though.

Contributor

Ichoran commented Apr 30, 2016

I have one major concern about performance. I wasn't able to spot why the test is failing, though.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz May 2, 2016

Member

For the compilation failure, if I re-add .toList.reverseIterator to ListMap.iterator, compilation succeeds. This means that some part of the compiler (i guess the part checking forward references) depends on the current iteration order.

Honestly, when i started looking at this PR, I was very surprised to learn that "ListMap and ListSet doesn't seem to make any guarantees in terms of iteration (SI-8985)". I think this situation creates a lot of confusion, e.g., http://stackoverflow.com/questions/3481925/insertion-ordered-listset. Just like LinkedHashSet / LinkedHashMap for mutable collections, I think that ListSet / ListMap should guarantee that "all traversal methods of this class visit elements in the order they were inserted".

Member

lrytz commented May 2, 2016

For the compilation failure, if I re-add .toList.reverseIterator to ListMap.iterator, compilation succeeds. This means that some part of the compiler (i guess the part checking forward references) depends on the current iteration order.

Honestly, when i started looking at this PR, I was very surprised to learn that "ListMap and ListSet doesn't seem to make any guarantees in terms of iteration (SI-8985)". I think this situation creates a lot of confusion, e.g., http://stackoverflow.com/questions/3481925/insertion-ordered-listset. Just like LinkedHashSet / LinkedHashMap for mutable collections, I think that ListSet / ListMap should guarantee that "all traversal methods of this class visit elements in the order they were inserted".

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 7, 2016

Member

I was also surprised to see that ListMap and ListSet did not provide such guarantees - I always thought of them as an immutable alternative to LinkedHashMap and LinkedHashSet. If even the Scala compiler relies on their ordering, then I would rather make both ListMap and ListSet be insertion-ordered and document that properly in Scaladoc.

As you saw, iterating in the "correct" order requires the collection to be converted to a list and reversed, which has a performance hit. However, as @Ichoran mentioned, these collections are really only useful with a small number of elements, so I think it would be OK to do that.

Member

ruippeixotog commented May 7, 2016

I was also surprised to see that ListMap and ListSet did not provide such guarantees - I always thought of them as an immutable alternative to LinkedHashMap and LinkedHashSet. If even the Scala compiler relies on their ordering, then I would rather make both ListMap and ListSet be insertion-ordered and document that properly in Scaladoc.

As you saw, iterating in the "correct" order requires the collection to be converted to a list and reversed, which has a performance hit. However, as @Ichoran mentioned, these collections are really only useful with a small number of elements, so I think it would be OK to do that.

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 8, 2016

Contributor

The benchmarking you did isn't really reliable enough to do anything with. You should at least use Thyme; JMH would be better.

From this, it looks like you need to special-case 0-4. Having a 2x performance hit in the most commonly used cases doesn't justify the gain in the cases which the data structure isn't suited for anyway.

But I'd do some more robust benchmarking first.

Contributor

Ichoran commented May 8, 2016

The benchmarking you did isn't really reliable enough to do anything with. You should at least use Thyme; JMH would be better.

From this, it looks like you need to special-case 0-4. Having a 2x performance hit in the most commonly used cases doesn't justify the gain in the cases which the data structure isn't suited for anyway.

But I'd do some more robust benchmarking first.

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 8, 2016

Member

Thanks for the framework suggestions! I benchmarked again ListMap using JMH. I also benchmarked ListSet. The full results are available here: ListMap, ListSet.

If I did not anything wrong in the benchmarks, it seems the custom builder is significantly slower for every size equal to or below 6... Considering again that the collection should only be used for very small sizes due to its overall performance, I don't think it's worth it to keep the custom builder and write special cases for small sizes.

If you take a look the test results, it seems that the custom ListSet builder is also significantly slower than the default one (a mutable.SetBuilder). It is expected, as the two collections are pretty much equal. Should I remove ListSetBuilder then?

Member

ruippeixotog commented May 8, 2016

Thanks for the framework suggestions! I benchmarked again ListMap using JMH. I also benchmarked ListSet. The full results are available here: ListMap, ListSet.

If I did not anything wrong in the benchmarks, it seems the custom builder is significantly slower for every size equal to or below 6... Considering again that the collection should only be used for very small sizes due to its overall performance, I don't think it's worth it to keep the custom builder and write special cases for small sizes.

If you take a look the test results, it seems that the custom ListSet builder is also significantly slower than the default one (a mutable.SetBuilder). It is expected, as the two collections are pretty much equal. Should I remove ListSetBuilder then?

@Ichoran

This comment has been minimized.

Show comment
Hide comment
@Ichoran

Ichoran May 9, 2016

Contributor

My recommendation would be to remove ListSetBuilder if it's not an improvement.

Contributor

Ichoran commented May 9, 2016

My recommendation would be to remove ListSetBuilder if it's not an improvement.

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 9, 2016

Member

Yes, I agree it's better that way. I have just pushed the updated commit.

Member

ruippeixotog commented May 9, 2016

Yes, I agree it's better that way. I have just pushed the updated commit.

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz May 11, 2016

Member

OK, I agree the latest change looks good. A few small things to do:

  • can you specify the iteration order in the scaladoc, similar to LinkedHashMap/Set?
  • move the existing test case t3822.scala to the new junit test, so that they are all in the same place
  • also t7445 could be moved to junit. actually, ListSet probably has the same bug, it was only fixed for ListMap -- can you check? the fix for ListMap was in #3388.
  • method iterator first creates a List instance, then reverseIterator creates the reversed list instance. the following would only create one single list:
  def iterator: Iterator[(A,B)] = {
    def reverseList = {
      var self = this
      var res: List[(A, B)] = Nil
      while (!self.isEmpty) {
        res = (self.key, self.value) :: res
        self = self.next
      }
      res
    }
    reverseList.iterator
  }

in general, maybe do another pass over the two classes to bring the implementations closer together (rename ListMap.remove0 to removeInternal, replace ListSet.unchecked_outer by next, etc)

Member

lrytz commented May 11, 2016

OK, I agree the latest change looks good. A few small things to do:

  • can you specify the iteration order in the scaladoc, similar to LinkedHashMap/Set?
  • move the existing test case t3822.scala to the new junit test, so that they are all in the same place
  • also t7445 could be moved to junit. actually, ListSet probably has the same bug, it was only fixed for ListMap -- can you check? the fix for ListMap was in #3388.
  • method iterator first creates a List instance, then reverseIterator creates the reversed list instance. the following would only create one single list:
  def iterator: Iterator[(A,B)] = {
    def reverseList = {
      var self = this
      var res: List[(A, B)] = Nil
      while (!self.isEmpty) {
        res = (self.key, self.value) :: res
        self = self.next
      }
      res
    }
    reverseList.iterator
  }

in general, maybe do another pass over the two classes to bring the implementations closer together (rename ListMap.remove0 to removeInternal, replace ListSet.unchecked_outer by next, etc)

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 12, 2016

Member

Sure, I'll move t7445.scala code to a junit test and I'll improve the iterator as you suggested (adding two new tests for that). ListSet is indeed affected by SI-7445; I'll also fix that.

t3822 is a test that becomes ridiculously slow when the custom builder is taken off (though the Jenkins build seems to have made it past that test)... Do we still want to keep it? SI-3822 is a very old issue and it involves an use case that we concluded here that was inadequate for that collection. Regardless of that, all ListMap/ListSet methods are now tail recursive, so the bug is fixed anyway.

Regarding the Scaladoc and the general revamp I was also thinking of doing that, as it's way easier to fix bugs and add improvement to those collections if their code is as similar as it could be (after all, its internal organization and algorithms associated are exactly the same). However, I believe it would be better if I did that either in a separate pull request or in a separate commit; otherwise, the bugs and improvements present in this PR would become much more difficult to track in the commit history later, as they would be concealed by a lot of method renames and code moves. What do you think? Is there a problem if this PR has multiple commits?

Member

ruippeixotog commented May 12, 2016

Sure, I'll move t7445.scala code to a junit test and I'll improve the iterator as you suggested (adding two new tests for that). ListSet is indeed affected by SI-7445; I'll also fix that.

t3822 is a test that becomes ridiculously slow when the custom builder is taken off (though the Jenkins build seems to have made it past that test)... Do we still want to keep it? SI-3822 is a very old issue and it involves an use case that we concluded here that was inadequate for that collection. Regardless of that, all ListMap/ListSet methods are now tail recursive, so the bug is fixed anyway.

Regarding the Scaladoc and the general revamp I was also thinking of doing that, as it's way easier to fix bugs and add improvement to those collections if their code is as similar as it could be (after all, its internal organization and algorithms associated are exactly the same). However, I believe it would be better if I did that either in a separate pull request or in a separate commit; otherwise, the bugs and improvements present in this PR would become much more difficult to track in the commit history later, as they would be concealed by a lot of method renames and code moves. What do you think? Is there a problem if this PR has multiple commits?

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz May 13, 2016

Member

t3822 is a test that becomes ridiculously slow

Right, I agree we can get rid of it.

Is there a problem if this PR has multiple commits?

No, this is common practice in the scala repo. I agree that having separate commits for refactorings, scaladoc and bugfixes makes a lot of sense.

involves an use case that we concluded here that was inadequate for that collection

Can you make sure to include that in the Scaladoc cleanup? All operations are O(n) (right?), including the creation of a ListSet/ListMap using companion apply.

Member

lrytz commented May 13, 2016

t3822 is a test that becomes ridiculously slow

Right, I agree we can get rid of it.

Is there a problem if this PR has multiple commits?

No, this is common practice in the scala repo. I agree that having separate commits for refactorings, scaladoc and bugfixes makes a lot of sense.

involves an use case that we concluded here that was inadequate for that collection

Can you make sure to include that in the Scaladoc cleanup? All operations are O(n) (right?), including the creation of a ListSet/ListMap using companion apply.

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 14, 2016

Member

I have just added some new commits then. I organized the changes in:

  • The original commit, improving the performance of the collections and changing ListSet to provide insertion-order iteration;
  • Adding tests for both collections regarding some core operations. I found out that I had some bugs in the changes I did in the first commit and this makes sure no regressions happen now nor in the future;
  • Making the two collections similar by renaming and reordering methods, improving their Scaladoc and their code style. Regarding this I removed the Scaladoc of most methods, as the text inherited by the superclasses is in general well written. On the other hand, I improved the class and object-level Scaladoc by making explicit the traversal order guarantee and the time complexity of the operations;
  • Adding a SerialVersionUID annotation that was missing in ListSet and including ListSet in the serialization stability test in order to ensure that future changes to this collection do not break the serialization format (or at least alert that a break occurred).
Member

ruippeixotog commented May 14, 2016

I have just added some new commits then. I organized the changes in:

  • The original commit, improving the performance of the collections and changing ListSet to provide insertion-order iteration;
  • Adding tests for both collections regarding some core operations. I found out that I had some bugs in the changes I did in the first commit and this makes sure no regressions happen now nor in the future;
  • Making the two collections similar by renaming and reordering methods, improving their Scaladoc and their code style. Regarding this I removed the Scaladoc of most methods, as the text inherited by the superclasses is in general well written. On the other hand, I improved the class and object-level Scaladoc by making explicit the traversal order guarantee and the time complexity of the operations;
  • Adding a SerialVersionUID annotation that was missing in ListSet and including ListSet in the serialization stability test in order to ensure that future changes to this collection do not break the serialization format (or at least alert that a break occurred).
* $factoryInfo
*
* Note that each element insertion takes O(n) time, which means that creating a list set with
* n elements will take O(n^2^) time. This makes the builder suitable only for a small number of

This comment has been minimized.

@lrytz

lrytz May 17, 2016

Member

extra ^

@lrytz

lrytz May 17, 2016

Member

extra ^

This comment has been minimized.

@ruippeixotog

ruippeixotog May 17, 2016

Member

That is actually the superscript notation in Scaladoc: https://wiki.scala-lang.org/display/SW/Syntax#Syntax-Wiki-likeSyntaxinDocumentationSource. I checked the generated documentation and it appears correctly.

@ruippeixotog

ruippeixotog May 17, 2016

Member

That is actually the superscript notation in Scaladoc: https://wiki.scala-lang.org/display/SW/Syntax#Syntax-Wiki-likeSyntaxinDocumentationSource. I checked the generated documentation and it appears correctly.

This comment has been minimized.

@lrytz

lrytz May 17, 2016

Member
implicit def canBuildFrom[A, B]: CanBuildFrom[Coll, (A, B), ListMap[A, B]] =
new MapCanBuildFrom[A, B]
def empty[A, B]: ListMap[A, B] = EmptyListMap.asInstanceOf[ListMap[A, B]]

This comment has been minimized.

@lrytz

lrytz May 17, 2016

Member

could we remove the all the method overrides in the object EmptyListMap?

@lrytz

lrytz May 17, 2016

Member

could we remove the all the method overrides in the object EmptyListMap?

assertEquals(ListMap(1 -> 1, 2 -> 2, 3 -> 3), m - 4)
}
@Test

This comment has been minimized.

@lrytz

lrytz May 17, 2016

Member

you could squash this commit into the previous one

@lrytz

lrytz May 17, 2016

Member

you could squash this commit into the previous one

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz May 17, 2016

Member

other than these minor comments, looks good!

Member

lrytz commented May 17, 2016

other than these minor comments, looks good!

ruippeixotog added some commits Apr 17, 2016

Improve performance and behavior of ListMap and ListSet
Makes the immutable `ListMap` and `ListSet` collections more alike one another, both in their semantics and in their performance.

In terms of semantics, makes the `ListSet` iterator return the elements in their insertion order, as `ListMap` already does. While, as mentioned in SI-8985, `ListMap` and `ListSet` doesn't seem to make any guarantees in terms of iteration order, I believe users expect `ListSet` and `ListMap` to behave in the same way, particularly when they are implemented in the exact same way.

In terms of performance, `ListSet` has a custom builder that avoids creation in O(N^2) time. However, this significantly reduces its performance in the creation of small sets, as its requires the instantiation and usage of an auxilliary HashSet. As `ListMap` and `ListSet` are only suitable for small sizes do to their performance characteristics, the builder is removed, the default `SetBuilder` being used instead.
Make ListMap and ListSet implementations similar
ListSet and ListMap are two collections which share the exact same internal structure. This commit makes the two approaches as similar as possible by renaming and reordering internal methods, improving their Scaladoc and their code style. The Scaladoc of the classes and companion objects is also improved in order to alert users of the time complexity of the collections' operations.
@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 17, 2016

Member

I have just pushed the modifications you suggested :)

Member

ruippeixotog commented May 17, 2016

I have just pushed the modifications you suggested :)

@lrytz

This comment has been minimized.

Show comment
Hide comment
@lrytz

lrytz May 17, 2016

Member

LGTM! thanks for keeping up with the pedantic reviewer :)

Member

lrytz commented May 17, 2016

LGTM! thanks for keeping up with the pedantic reviewer :)

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 17, 2016

Member

/rebuild

Member

ruippeixotog commented May 17, 2016

/rebuild

@ruippeixotog

This comment has been minimized.

Show comment
Hide comment
@ruippeixotog

ruippeixotog May 17, 2016

Member

No problem at all, they were good suggestions :)

Member

ruippeixotog commented May 17, 2016

No problem at all, they were good suggestions :)

@lrytz lrytz merged commit bf35467 into scala:2.12.x May 17, 2016

6 checks passed

cla @ruippeixotog signed the Scala CLA. Thanks!
Details
combined All previous commits successful.
integrate-ide [1934] SUCCESS. Took 2 s.
Details
validate-main [2194] SUCCESS. Took 124 min.
Details
validate-publish-core [2168] SUCCESS. Took 1 min.
Details
validate-test [1905] SUCCESS. Took 122 min.
Details

@lrytz lrytz added the release-notes label May 17, 2016

@adriaanm adriaanm added 2.12 and removed 2.12 labels Oct 29, 2016

@lrytz lrytz referenced this pull request Nov 1, 2016

Closed

notes on possible 2.12 release notes improvements #202

12 of 16 tasks complete
protected def value: B = throw new NoSuchElementException("value of empty map")
protected def next: ListMap[A, B] = throw new NoSuchElementException("next of empty map")
override def stringPrefix = "ListMap"

This comment has been minimized.

@dwijnand

dwijnand Mar 8, 2017

Member

Btw this changed ListMap's toString between 2.11 and 2.12

@dwijnand

dwijnand Mar 8, 2017

Member

Btw this changed ListMap's toString between 2.11 and 2.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment