Performance tweaks for Seq#diff and Seq#intersect #9365

martijnhoekstra · 2020-12-05T20:06:41Z

~~Special case diff on List by structurally sharing when possible~~ reverted

Benchmark results

Baseline

[info] # Run complete. Total [info] Benchmark [info] ListBenchmark.diff_first_half [info] ListBenchmark.diff_first_half [info] ListBenchmark.diff_first_half [info] ListBenchmark.diff_first_half [info] ListBenchmark.diff_first_half [info] ListBenchmark.diff_identical [info] ListBenchmark.diff_identical [info] ListBenchmark.diff_identical [info] ListBenchmark.diff_identical [info] ListBenchmark.diff_identical [info] ListBenchmark.diff_last_half [info] ListBenchmark.diff_last_half [info] ListBenchmark.diff_last_half [info] ListBenchmark.diff_last_half [info] ListBenchmark.diff_last_half [info] ListBenchmark.diff_notIncluded [info] ListBenchmark.diff_notIncluded [info] ListBenchmark.diff_notIncluded [info] ListBenchmark.diff_notIncluded [info] ListBenchmark.diff_notIncluded [info] ListBenchmark.diff_single_last [info] ListBenchmark.diff_single_last [info] ListBenchmark.diff_single_last [info] ListBenchmark.diff_single_last [info] ListBenchmark.diff_single_last [info] ListBenchmark.diff_single_mid [info] ListBenchmark.diff_single_mid [info] ListBenchmark.diff_single_mid [info] ListBenchmark.diff_single_mid [info] ListBenchmark.diff_single_mid time: 00:20:29
(size) Mode Cnt Score Error Units
0 avgt 20 40.875 ± 0.393 ns/op 1
1 avgt 20 65.350 ± 1.487 ns/op 1
10 avgt 20 592.473 ± 59.870 ns/op 1
100 avgt 20 6392.043 ± 136.179 ns/op 1
1000 avgt 20 69990.051 ± 1484.883 ns/op 1
0 avgt 20 42.340 ± 2.373 ns/op 1
1 avgt 20 89.342 ± 1.600 ns/op 1
10 avgt 20 745.050 ± 28.070 ns/op 1
100 avgt 20 8239.964 ± 326.827 ns/op 1
1000 avgt 20 96104.255 ± 4346.676 ns/op 1
0 avgt 20 44.855 ± 5.776 ns/op 1
1 avgt 20 87.086 ± 2.926 ns/op 1
10 avgt 20 551.048 ± 9.701 ns/op 1
100 avgt 20 6052.273 ± 216.878 ns/op 1
1000 avgt 20 67614.880 ± 710.706 ns/op 1
0 avgt 20 71.985 ± 0.874 ns/op 1
1 avgt 20 85.513 ± 3.370 ns/op 1
10 avgt 20 429.533 ± 20.771 ns/op 1
100 avgt 20 1790.248 ± 40.322 ns/op 1
1000 avgt 20 18322.706 ± 555.524 ns/op 1
0 avgt 20 74.841 ± 4.999 ns/op 1
1 avgt 20 97.721 ± 5.290 ns/op 1
10 avgt 20 274.728 ± 13.519 ns/op 1
100 avgt 20 2324.095 ± 362.581 ns/op 1
1000 avgt 20 27409.479 ± 1529.480 ns/op 1
0 avgt 20 69.905 ± 3.296 ns/op 1
1 avgt 20 96.262 ± 1.628 ns/op 1
10 avgt 20 288.531 ± 23.111 ns/op 1
100 avgt 20 2725.829 ± 108.093 ns/op 1
1000 avgt 20 26140.306 ± 312.435 ns/op 1

List through new Seq

[info] # Run complete. Total time: 00:20:27
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.diff_first_half 0 avgt 20 4.105 ± 0.028 ns/op
[info] ListBenchmark.diff_first_half 1 avgt 20 4.093 ± 0.029 ns/op
[info] ListBenchmark.diff_first_half 10 avgt 20 264.260 ± 10.194 ns/op
[info] ListBenchmark.diff_first_half 100 avgt 20 3364.628 ± 70.029 ns/op
[info] ListBenchmark.diff_first_half 1000 avgt 20 39072.612 ± 2671.788 ns/op
[info] ListBenchmark.diff_identical 0 avgt 20 3.887 ± 0.042 ns/op
[info] ListBenchmark.diff_identical 1 avgt 20 63.033 ± 1.916 ns/op
[info] ListBenchmark.diff_identical 10 avgt 20 368.556 ± 10.840 ns/op
[info] ListBenchmark.diff_identical 100 avgt 20 5861.276 ± 146.805 ns/op
[info] ListBenchmark.diff_identical 1000 avgt 20 69268.715 ± 1607.349 ns/op
[info] ListBenchmark.diff_last_half 0 avgt 20 4.266 ± 0.072 ns/op
[info] ListBenchmark.diff_last_half 1 avgt 20 65.870 ± 3.425 ns/op
[info] ListBenchmark.diff_last_half 10 avgt 20 272.576 ± 8.440 ns/op
[info] ListBenchmark.diff_last_half 100 avgt 20 3802.349 ± 81.459 ns/op
[info] ListBenchmark.diff_last_half 1000 avgt 20 52942.003 ± 9855.334 ns/op
[info] ListBenchmark.diff_notIncluded 0 avgt 20 5.940 ± 0.207 ns/op
[info] ListBenchmark.diff_notIncluded 1 avgt 20 65.802 ± 2.616 ns/op
[info] ListBenchmark.diff_notIncluded 10 avgt 20 194.225 ± 2.471 ns/op
[info] ListBenchmark.diff_notIncluded 100 avgt 20 1635.484 ± 43.555 ns/op
[info] ListBenchmark.diff_notIncluded 1000 avgt 20 15961.686 ± 342.366 ns/op
[info] ListBenchmark.diff_single_last 0 avgt 20 4.305 ± 0.036 ns/op
[info] ListBenchmark.diff_single_last 1 avgt 20 66.875 ± 0.930 ns/op
[info] ListBenchmark.diff_single_last 10 avgt 20 224.710 ± 10.345 ns/op
[info] ListBenchmark.diff_single_last 100 avgt 20 1723.869 ± 25.487 ns/op
[info] ListBenchmark.diff_single_last 1000 avgt 20 17669.288 ± 820.160 ns/op
[info] ListBenchmark.diff_single_mid 0 avgt 20 4.184 ± 0.036 ns/op
[info] ListBenchmark.diff_single_mid 1 avgt 20 68.890 ± 2.777 ns/op
[info] ListBenchmark.diff_single_mid 10 avgt 20 228.758 ± 5.395 ns/op
[info] ListBenchmark.diff_single_mid 100 avgt 20 1731.472 ± 22.302 ns/op
[info] ListBenchmark.diff_single_mid 1000 avgt 20 16487.683 ± 389.199 ns/op

New List:

[info] # Run complete. Total time: 00:20:26
[info] Benchmark (size) Mode Cnt Score Error Units
[info] ListBenchmark.diff_first_half 0 avgt 20 2.706 ± 0.107 ns/op
[info] ListBenchmark.diff_first_half 1 avgt 20 2.671 ± 0.027 ns/op
[info] ListBenchmark.diff_first_half 10 avgt 20 176.566 ± 12.005 ns/op
[info] ListBenchmark.diff_first_half 100 avgt 20 2724.257 ± 69.220 ns/op
[info] ListBenchmark.diff_first_half 1000 avgt 20 30617.560 ± 2649.887 ns/op
[info] ListBenchmark.diff_identical 0 avgt 20 2.489 ± 0.060 ns/op
[info] ListBenchmark.diff_identical 1 avgt 20 2.989 ± 0.084 ns/op
[info] ListBenchmark.diff_identical 10 avgt 20 359.982 ± 17.850 ns/op
[info] ListBenchmark.diff_identical 100 avgt 20 5488.378 ± 121.688 ns/op
[info] ListBenchmark.diff_identical 1000 avgt 20 68650.204 ± 3627.994 ns/op
[info] ListBenchmark.diff_last_half 0 avgt 20 2.762 ± 0.027 ns/op
[info] ListBenchmark.diff_last_half 1 avgt 20 3.540 ± 0.078 ns/op
[info] ListBenchmark.diff_last_half 10 avgt 20 290.114 ± 4.281 ns/op
[info] ListBenchmark.diff_last_half 100 avgt 20 3755.760 ± 108.688 ns/op
[info] ListBenchmark.diff_last_half 1000 avgt 20 45800.311 ± 878.631 ns/op
[info] ListBenchmark.diff_notIncluded 0 avgt 20 4.710 ± 0.171 ns/op
[info] ListBenchmark.diff_notIncluded 1 avgt 20 9.982 ± 0.182 ns/op
[info] ListBenchmark.diff_notIncluded 10 avgt 20 241.703 ± 6.063 ns/op
[info] ListBenchmark.diff_notIncluded 100 avgt 20 1977.344 ± 23.295 ns/op
[info] ListBenchmark.diff_notIncluded 1000 avgt 20 20022.380 ± 375.221 ns/op
[info] ListBenchmark.diff_single_last 0 avgt 20 2.944 ± 0.078 ns/op
[info] ListBenchmark.diff_single_last 1 avgt 20 4.286 ± 0.060 ns/op
[info] ListBenchmark.diff_single_last 10 avgt 20 268.102 ± 4.289 ns/op
[info] ListBenchmark.diff_single_last 100 avgt 20 2273.511 ± 70.090 ns/op
[info] ListBenchmark.diff_single_last 1000 avgt 20 21951.718 ± 226.428 ns/op
[info] ListBenchmark.diff_single_mid 0 avgt 20 2.842 ± 0.046 ns/op
[info] ListBenchmark.diff_single_mid 1 avgt 20 4.417 ± 0.063 ns/op
[info] ListBenchmark.diff_single_mid 10 avgt 20 162.779 ± 1.520 ns/op
[info] ListBenchmark.diff_single_mid 100 avgt 20 1207.562 ± 25.183 ns/op
[info] ListBenchmark.diff_single_mid 1000 avgt 20 11083.343 ± 129.916 ns/op

Diff becomes much faster (~20 times) for the special cases of length 0 and 1. For cases where the argument "runs out", structural sharing from that point onwards not only means using less memory used, it also means a speedup proportional to the part not traversed -- traversing half the list until the other runs out means a speed up of about 50%.

Two benchmarks show an unexpected slow down, by about 10%, for lists of 100 or 1000 items where the argument contains a single element not in the list (ListBenchmark.diff_notIncluded). I don't know what's up with that. I expected those to roughly converge with the one where the argument is a list consisting of the last value only, but those are 15%-20% faster in this PR. I ran them on my laptop, so maybe some weird power setting thing?

Also special case diff and intersect on Seq for empty case on either side

src/library/scala/collection/StrictOptimizedSeqOps.scala

src/library/scala/collection/immutable/List.scala

SethTisue · 2020-12-08T17:18:34Z

@Ichoran interested in looking this one over?

NthPortal

it looks correct to me, but I would like another review as well because I don't fully trust myself

dwijnand · 2021-01-15T16:50:00Z

src/library/scala/collection/Seq.scala

+          None
+        }
+        case Some(1) => None
+        case Some(n) => Some(n - 1)


So are these implementations trading off faster size=0/size=1 operations at the cost of these Some allocations? Because that doesn't sound performant. So perhaps I'm misunderstand, but I thought I would ask.

This should at worst have the same amount of allocations (no shared elements) fewer (potentially far fewer) allocations and fewer lookups in most cases.

In this PR, updateWith to decrement (n >1) costs a hash lookup (to find the bucket), a Some allocation (to return the result) and another Some allocation (to hold de decremented result).

In the situation before, there was a hash lookup (to find the bucket), a Some allocation (to get the value to apply), then another hash lookup to find the bucket again, and then another default map allocation.

So the decrement costs an extra Some allocation, but saves a WithDefaultValue allocation and a hash lookup.

The other cases only look better: when the returned value is 1, now we don't have the extra Some allocation, and we clear the bucket. For List, maybe that empties the map an get to structurally share the rest, we short circuit and append and structurally share the tail. That last part is unavailable on Seq, but the rest of the elements are going to do a lookup on an empty HashMap, which, I just looked up, is actually no cheaper than a lookup in a non-empty HashMap, but it totally could be.

The last case is if the element is not in the underlying map. In the old case, that costs just the lookup, and a Some allocation iff there was something in the map before, and if not, just the lookup. In the new situation, it also costs just the lookup.

The 0 and 1 special casing is only at the start, where this avoids building the map entirely and could be done independently of the code in this part.

dwijnand · 2021-01-15T16:51:58Z

@mkeskells you might find this interesting and perhaps even fancy reviewing it?

joshlemer · 2021-01-15T17:41:14Z

src/library/scala/collection/Seq.scala

+    for (y <- sq) occ.updateWith(y) {
+      case None => Some(1)
+      case Some(n) => Some(n + 1)
+    }


Maybe we could avoid some allocations if we cache the Some's up to (for example..) 10?

private object CachedSome { private[this] val cached: Array[Some[Int]] = Array.tabulate(10)(Some(_)) def apply(i: Int): if (0 <= i && i < 10) cached(i) else Some(i) }

(Would have to measure it though)

I don't trust my laptop with the benchmarks, or, honestly, want to dedicate it to running them (I don't think that scala/scala-dev#753 was opened shortly after @NthPortal looked at this PR is a coincidence).

If this is worth it, it's plausible it's worth it globally, on Some.apply, so that any map that keeps a count of things takes advantage of it (though at the cost of a probably correctly branch-predicted instanceOf check), which would be orthogonal of this PR.

On the gripping hand, maybe that's me internally rationalizing my way out of just writing and running the damn benchmarks.

SethTisue · 2021-02-05T17:09:05Z

@scala/collections opinions here? (and/or volunteers to do some benchmarking?)

SethTisue · 2021-05-10T15:22:39Z

thanks Martijn (and reviewers)!

lrytz · 2021-05-10T19:30:04Z

src/library/scala/collection/immutable/List.scala

@@ -614,6 +614,36 @@ sealed abstract class List[+A]
    }
  }

+  // Override for performance: traverse only as much as needed
+  // and share tail when nothing needs to be filtered out anymore
+  override def diff[B >: A](that: collection.Seq[B]): List[A] = {


This override (I assume) causes a MiMa failure, see recent commit status in https://github.com/scala/scala/commits/2.13.x

Not sure why PR validation was green, maybe we're using a newer MiMa now. Also not sure who's right, the error could be a false positive.

MiMa 0.8.0, the version this PR had, gives the same error.

On Martin's branch, STARR was still 2.13.4, so I guess we should look for some 2.13.4 vs 2.13.5 difference?

Ah, I bet the problem was the lack of #9459 ! cc @dwijnand

SethTisue · 2021-05-11T13:40:37Z

#9624 reverts the List#diff part, because bincompat; MiMa missed the problem because our build was misconfigured (which was fixed by #9459)

som-snytt · 2021-05-25T07:22:03Z

It seems there was a minor regression. It wasn't caught because Ichoran did not care to look this one over. ^

I see there was a bit of smoke around benchmarking and mima. I don't know if the collections tests for correctness would have seen anything. This was one of those deeply buried lies, i.e., an asInstanceOf. paulp mentions in the commit how many array allocations it saves. Maybe the takeaway is that anything done in the name of performance should be kept in a special place, where the accumulated wisdom (or folly) can be consulted by future generations.

martijnhoekstra · 2021-05-25T08:10:49Z

@som-snytt I suspect the performance wins were lost in #9624 where much of this PR was lost to mima's relentlessness.

martijnhoekstra · 2021-05-25T08:25:56Z

Is your regression in terms of performance of correctness @som-snytt? Do you have some more details? Edit: I now see the other ticket

som-snytt · 2021-05-25T08:32:57Z

Yes in the linked ticket, Array.empty[Double] intersect xs throws because even an empty array knows its component. The OPT on isEmpty uses ArraySeq.empty. Edit: I don't know if that partial partial reversion suffices for now. We need a surgical term for that. But not to worry, there were actually a few breaking changes that made spark skip 2.13.6.

scala-jenkins added this to the 2.13.5 milestone Dec 5, 2020

martijnhoekstra requested a review from NthPortal December 5, 2020 20:08

martijnhoekstra added the library:collections PRs involving changes to the standard collection library label Dec 5, 2020

martijnhoekstra marked this pull request as draft December 5, 2020 21:23

NthPortal reviewed Dec 6, 2020

View reviewed changes

src/library/scala/collection/StrictOptimizedSeqOps.scala Outdated Show resolved Hide resolved

NthPortal reviewed Dec 6, 2020

View reviewed changes

src/library/scala/collection/StrictOptimizedSeqOps.scala Outdated Show resolved Hide resolved

NthPortal reviewed Dec 6, 2020

View reviewed changes

src/library/scala/collection/immutable/List.scala Outdated Show resolved Hide resolved

martijnhoekstra force-pushed the diffopt branch from bf55c7f to d8c9c53 Compare December 6, 2020 08:06

override diff on List for performance

0998dee

martijnhoekstra force-pushed the diffopt branch from d8c9c53 to 0998dee Compare December 6, 2020 08:07

This comment has been minimized.

Sign in to view

martijnhoekstra marked this pull request as ready for review December 6, 2020 09:44

lrytz requested a review from retronym December 7, 2020 10:27

NthPortal approved these changes Dec 9, 2020

View reviewed changes

SethTisue added the performance the need for speed. usually compiler performance, sometimes runtime performance. label Jan 15, 2021

dwijnand reviewed Jan 15, 2021

View reviewed changes

joshlemer reviewed Jan 15, 2021

View reviewed changes

SethTisue modified the milestones: 2.13.5, 2.13.6 Feb 9, 2021

SethTisue merged commit 76d189d into scala:2.13.x May 10, 2021

lrytz reviewed May 10, 2021

View reviewed changes

SethTisue mentioned this pull request May 10, 2021

Partially revert scala/scala#9365 to restore bincompat (and bump MiMa version too) #9624

Merged

SethTisue added a commit to SethTisue/scala that referenced this pull request May 11, 2021

partially revert scala#9365 to preserve bincompat

4761f32

SethTisue added a commit to SethTisue/scala that referenced this pull request May 11, 2021

partially revert scala#9365 to preserve bincompat

b796296

SethTisue changed the title ~~Performance tweaks for List#diff, Seq#diff and Seq#intersect~~ Performance tweaks for Seq#diff and Seq#intersect May 11, 2021

som-snytt mentioned this pull request May 25, 2021

Behavior changes of intersect operations related to Array.empty[T] between 2.13.5 and 2.13.6 scala/bug#12403

Closed

SethTisue mentioned this pull request Jun 1, 2021

Fix ArrayOps bugs (by avoiding ArraySeq#array, which does not guarantee element type) #9641

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tweaks for Seq#diff and Seq#intersect #9365

Performance tweaks for Seq#diff and Seq#intersect #9365

martijnhoekstra commented Dec 5, 2020 •

edited by SethTisue

Loading

This comment has been minimized.

SethTisue commented Dec 8, 2020

NthPortal left a comment

dwijnand Jan 15, 2021

martijnhoekstra Jan 15, 2021 •

edited

Loading

dwijnand commented Jan 15, 2021

joshlemer Jan 15, 2021 •

edited

Loading

martijnhoekstra Jan 16, 2021

SethTisue commented Feb 5, 2021

SethTisue commented May 10, 2021 •

edited

Loading

lrytz May 10, 2021

SethTisue May 11, 2021

SethTisue May 11, 2021

SethTisue commented May 11, 2021

som-snytt commented May 25, 2021

martijnhoekstra commented May 25, 2021

martijnhoekstra commented May 25, 2021 •

edited

Loading

som-snytt commented May 25, 2021 •

edited

Loading

Performance tweaks for Seq#diff and Seq#intersect #9365

Performance tweaks for Seq#diff and Seq#intersect #9365

Conversation

martijnhoekstra commented Dec 5, 2020 • edited by SethTisue Loading

This comment has been minimized.

SethTisue commented Dec 8, 2020

NthPortal left a comment

Choose a reason for hiding this comment

dwijnand Jan 15, 2021

Choose a reason for hiding this comment

martijnhoekstra Jan 15, 2021 • edited Loading

Choose a reason for hiding this comment

dwijnand commented Jan 15, 2021

joshlemer Jan 15, 2021 • edited Loading

Choose a reason for hiding this comment

martijnhoekstra Jan 16, 2021

Choose a reason for hiding this comment

SethTisue commented Feb 5, 2021

SethTisue commented May 10, 2021 • edited Loading

lrytz May 10, 2021

Choose a reason for hiding this comment

SethTisue May 11, 2021

Choose a reason for hiding this comment

SethTisue May 11, 2021

Choose a reason for hiding this comment

SethTisue commented May 11, 2021

som-snytt commented May 25, 2021

martijnhoekstra commented May 25, 2021

martijnhoekstra commented May 25, 2021 • edited Loading

som-snytt commented May 25, 2021 • edited Loading

martijnhoekstra commented Dec 5, 2020 •

edited by SethTisue

Loading

martijnhoekstra Jan 15, 2021 •

edited

Loading

joshlemer Jan 15, 2021 •

edited

Loading

SethTisue commented May 10, 2021 •

edited

Loading

martijnhoekstra commented May 25, 2021 •

edited

Loading

som-snytt commented May 25, 2021 •

edited

Loading