optimize Vector concatenation (Vector ++ Vector) #4442

scabug · 2011-04-04T19:54:02Z

currently scala.collection.immutable.Vector.++ simply inherits its implementation from a supertrait (through the stubs added in r24227), so if you concatenate two vectors, it doesn't take advantage that that both are the same structure, but just treats the second vector like any traversable.

in theory this could be O(log n), not O(n), yes? Tiark seems to confirm it in this comment on #3724: "Unfortunately, for implementing a fast concat operation (we hope to do that for 2.8.1 or 2.9) heterogenous Arrays seem to be necessary (we'll be storing Int arrays next to the sub nodes). We might rethink this however, and try to stick to homogeneous Arrays."

I just thought there should be enhancement ticket on this as it would still be nice to have someday.

scabug · 2011-04-04T19:54:02Z

Imported From: https://issues.scala-lang.org/browse/SI-4442?orig=1
Reporter: @SethTisue
See #3724

scabug · 2011-04-04T20:43:14Z

@TiarkRompf said:
I've been having this on my todo list for quite a while now. The problem is that it requires pervasive changes to the current internal structure and those are hard to get right without slowing down other uses of Vector. Performance will most likely be amortized log n, not worst case. It's good to have a ticket for this, though.

scabug · 2011-09-20T21:37:13Z

@SethTisue said:
Is Phil working on this? (the latest Scala meeting notes seem to say so)

scabug · 2012-03-02T21:08:03Z

@TiarkRompf said (edited on Mar 2, 2012 9:10:30 PM UTC):
Here's a link to the paper:
http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf?version=1

Still need to resolve integration with the current Vector implementation. Maybe it's better to have a separate CatenableVector as a first step.

scabug · 2013-07-29T00:21:13Z

@pchiusano said:
Related to this, (1 to N).foldLeft(Vector[Int]())(_ :+ _) runs in linear time, but (1 to N).foldLeft(Vector[Int]())((acc,a) => acc ++ List(a)) is quadratic. This is extremely surprising (I spent several hours tracking down a performance bug in scalaz-stream that ended up being caused by this). Since Vector has a constant time snoc operation, it seems like the default implementation of ++ should be repeated calls to :+, which doesn't require any internal changes to Vector.

scabug · 2013-08-07T17:40:23Z

@SethTisue said:
Paul: I'm extremely glad you noticed this. I don't think anyone realized the situation was that bad! I opened a separate ticket on it at #7725.

scabug · 2014-02-10T00:37:35Z

@adriaanm said:
was this fixed along with SI-7725?

scabug · 2014-02-10T01:15:27Z

@Ichoran said:
This was not fixed--this is about preserving structural sharing across the entire tree, while the other was to at least start from the bigger of two collections to get at least some sharing.

scabug · 2014-02-13T21:28:48Z

@SethTisue said:
reassigning to "Backlog" on the grounds that the research on how it would even work hasn't been done yet

scabug · 2016-10-01T12:50:24Z

@Blaisorblade said:
Shouldn't RRB-Vector be a potential fix for this, according to the ICFP'15 paper describing it?

https://github.com/nicolasstucki/scala-rrb-vector
https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf
http://dl.acm.org/citation.cfm?id=2784739

joshlemer · 2018-09-27T17:05:53Z

These are very relevant!
paper describing a potentially new data structure for Vectors: https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf
implementation: https://github.com/nicolasstucki/scala-rrb-vector

V-Lamp · 2019-02-27T17:27:27Z

Is this addressed by any chance in the new collections in 2.13? 🙂

joshlemer · 2019-02-27T17:35:04Z

@V-Lamp the Vectors in 2.13 will not be the new data structure (RRB Vectors), but their concatenation may be improved enormously by scala/scala#7588 which ensures that during concatenation of vectors, the structure of the left Vector is used rather than copying it into a new Vector. This essentially means that Vector concatenation will be constant-time on the size of the left vector (but still O(n) where n = size of right vector).

szeiger · 2019-11-04T17:37:55Z

scala/scala#7588 was reverted due to #11636 but the new Vector implementation that I am working on for 2.14 also has this optimization (appending n elements to a Vector of size m is O(n + log(m+n))). I don't know if I'll find the time to implement it but the same is possible for prepending. The combination of the two would allow concatenation depending on the size of the shorter slice.

SethTisue · 2021-04-26T16:02:06Z

Note that Vector was completely redone for 2.13.2: scala/scala#8534

Has anyone studied what the current situation is w/r/t to this ticket?

SethTisue · 2021-04-26T16:06:02Z

I see that on scala/scala#8534 @szeiger wrote,

I did not get around to implementing an optimized concatenation where the right-hand side vector determines the alignment. Like most other algorithms for this data structure it is quite easy to visualize how it works and assure yourself that it is possible but actually implementing it correctly is much harder. This would allow concatenation of two vectors of sizes m and n in O(min(n, m) + log(n+m)) time.

So that sounds like we're still where we before: doing concatenation as efficiently as possible remains future work.

ansvonwa · 2022-08-19T09:05:42Z

For the record: I've implemented concatenation in O(min(n, m) + log(n+m)) time.
(If you're really lucky and the Vector structures are perfectly aligned, it's even only O(log(n+m)).)
It's not yet PR-ready, but I hope I'll be able to polish it in a month or so when I'm back from vacation.

closes scala/bug#4442

scabug added enhancement help wanted performance labels Apr 7, 2017

scabug added this to the Backlog milestone Apr 7, 2017

scabug assigned Ichoran Apr 7, 2017

This was referenced Apr 7, 2017

Vector concatenation is unreasonably slow (O(n^2), should be O(n)) #7725

Closed

Vector.patch and Vector.++ should not consume linear time for large Vectors #10001

Closed

S11001001 mentioned this issue Apr 28, 2017

Top-down divide-and-conquer-style traverse for Vector. scalaz/scalaz#1023

Merged

SethTisue added the library:collections label Feb 17, 2018

SethTisue unassigned Ichoran Feb 17, 2018

scala deleted a comment from scabug Sep 27, 2018

scala deleted a comment from scabug Apr 26, 2021

ansvonwa mentioned this issue Sep 25, 2022

Faster Vector concatenation scala/scala#10159

Merged

retronym closed this as completed in scala/scala#10159 Nov 30, 2022

retronym added a commit to scala/scala that referenced this issue Nov 30, 2022

Merge pull request #10159 from ansvonwa/vector-faster-prepend

6d7f884

closes scala/bug#4442

SethTisue modified the milestones: Backlog, 2.13.11 Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize Vector concatenation (Vector ++ Vector) #4442

optimize Vector concatenation (Vector ++ Vector) #4442

scabug commented Apr 4, 2011

scabug commented Apr 4, 2011

scabug commented Apr 4, 2011

scabug commented Sep 20, 2011

scabug commented Mar 2, 2012

scabug commented Jul 29, 2013 •

edited by xuwei-k

Loading

scabug commented Aug 7, 2013

scabug commented Feb 10, 2014

scabug commented Feb 10, 2014

scabug commented Feb 13, 2014

scabug commented Oct 1, 2016

joshlemer commented Sep 27, 2018

V-Lamp commented Feb 27, 2019

joshlemer commented Feb 27, 2019

szeiger commented Nov 4, 2019

SethTisue commented Apr 26, 2021

SethTisue commented Apr 26, 2021 •

edited

Loading

ansvonwa commented Aug 19, 2022

optimize Vector concatenation (Vector ++ Vector) #4442

optimize Vector concatenation (Vector ++ Vector) #4442

Comments

scabug commented Apr 4, 2011

scabug commented Apr 4, 2011

scabug commented Apr 4, 2011

scabug commented Sep 20, 2011

scabug commented Mar 2, 2012

scabug commented Jul 29, 2013 • edited by xuwei-k Loading

scabug commented Aug 7, 2013

scabug commented Feb 10, 2014

scabug commented Feb 10, 2014

scabug commented Feb 13, 2014

scabug commented Oct 1, 2016

joshlemer commented Sep 27, 2018

V-Lamp commented Feb 27, 2019

joshlemer commented Feb 27, 2019

szeiger commented Nov 4, 2019

SethTisue commented Apr 26, 2021

SethTisue commented Apr 26, 2021 • edited Loading

ansvonwa commented Aug 19, 2022

scabug commented Jul 29, 2013 •

edited by xuwei-k

Loading

SethTisue commented Apr 26, 2021 •

edited

Loading