Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set should implement filter #6196

Closed
scabug opened this issue Aug 6, 2012 · 6 comments
Closed

Set should implement filter #6196

scabug opened this issue Aug 6, 2012 · 6 comments

Comments

@scabug
Copy link

@scabug scabug commented Aug 6, 2012

HashSet does not implement filter directly but defers to the generic builder-based implementation of TraversableLike. That leads to filter always creating a completely new set instead of reusing the old set where possible.

This does not only affect filter, but also methods like intersection (&) that use filter. People have a reasonable expectation that set-specific methods like intersection are as efficient as possible, but for the current implementation this is not the case.

A few examples where this matters:

val x=Set(1,2,3,4)
val y = x.filter(_ => true) 
/* this builds a completely new set, allocating a zillion temporary objects in the process. And the result does not share anything with the input even though it could share everything */
val z = x.filterNot(_ => false)
val w = x.intersect(x)

println(x eq y) // false, should be true
println(x eq z) // false, should be true
println(x eq w) // false, should be true

A solution for this problem would be to implement filter for HashSet and Set. I attached a patch that does that.

(Sorry for setting this as blocking, but the JIRA user interface on my browser (chrome on windows) did not allow setting it to anything else. Feel free to change this to something more appropriate.)

@scabug
Copy link
Author

@scabug scabug commented Aug 6, 2012

Imported From: https://issues.scala-lang.org/browse/SI-6196?orig=1
Reporter: @rklaehn
Affected Versions: 2.10.0-M6
Attachments:

@scabug
Copy link
Author

@scabug scabug commented Aug 9, 2012

@rklaehn said:
Pull request against 2.10.x:

scala/scala#1096

@scabug
Copy link
Author

@scabug scabug commented Aug 18, 2012

@rklaehn said:
I implemented a benchmark using google caliper for the filter method.

Run with

java -cp scala-library.jar:caliper.jar:. FilterBenchmark

The results on my development machine are as follows:

Builder-based implementation:

   size          benchmark           ns linear runtime
      1         FilterTrue        19.68 =
      1 FilterThreeQuarter         7.17 =
      1         FilterHalf         6.99 =
      1      FilterQuarter         7.03 =
      1        FilterFalse         6.99 =
    100         FilterTrue      9405.75 =
    100 FilterThreeQuarter      7145.98 =
    100         FilterHalf      4406.39 =
    100      FilterQuarter      1721.97 =
    100        FilterFalse       407.88 =
  10000         FilterTrue   1341440.25 =
  10000 FilterThreeQuarter   1025837.20 =
  10000         FilterHalf    725372.57 =
  10000      FilterQuarter    407338.86 =
  10000        FilterFalse     64437.12 =
1000000         FilterTrue 205794487.50 ==============================
1000000 FilterThreeQuarter 166166961.67 ========================
1000000         FilterHalf 108899815.71 ===============
1000000      FilterQuarter  61353241.38 ========
1000000        FilterFalse  15906104.10 ==

Specialized implementation of filter:

   size          benchmark          ns linear runtime
      1         FilterTrue       19.13 =
      1 FilterThreeQuarter        7.04 =
      1         FilterHalf        7.01 =
      1      FilterQuarter        6.87 =
      1        FilterFalse        6.95 =
    100         FilterTrue     1800.93 =
    100 FilterThreeQuarter     2929.80 =
    100         FilterHalf     3228.58 =
    100      FilterQuarter     1982.25 =
    100        FilterFalse     2179.38 =
  10000         FilterTrue   238035.87 =
  10000 FilterThreeQuarter   375221.19 =
  10000         FilterHalf   396188.11 =
  10000      FilterQuarter   367140.59 =
  10000        FilterFalse   196032.70 =
1000000         FilterTrue 37699210.38 =====================
1000000 FilterThreeQuarter 49403056.20 ============================
1000000         FilterHalf 51554796.54 ==============================
1000000      FilterQuarter 44450213.28 =========================
1000000        FilterFalse 34322142.71 ===================

As you can see, the specialized implementation is up to a factor of 5.5 faster, not to mention the lack of temporary object creation and more structural sharing.

The only case where the builder-based approach is faster is where the result is empty. But both implementations are quite fast in that case anyway.

@scabug
Copy link
Author

@scabug scabug commented Aug 20, 2012

@jsuereth said:
One thing this benchmark does not test is what happens when the call-site becomes megamorphic. You should run the test after warming up with other Set implementations to see what final performance is. My guess is we loose a bit when the JIT can't inline the call site.

@scabug
Copy link
Author

@scabug scabug commented Jul 10, 2013

@adriaanm said:
Unassigning and rescheduling to M6 as previous deadline was missed.

@scabug
Copy link
Author

@scabug scabug commented Jan 15, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants