New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize Vector concatenation (Vector ++ Vector) #4442

Open
scabug opened this Issue Apr 4, 2011 · 12 comments

Comments

Projects
None yet
4 participants
@scabug
Copy link

scabug commented Apr 4, 2011

currently scala.collection.immutable.Vector.++ simply inherits its implementation from a supertrait (through the stubs added in r24227), so if you concatenate two vectors, it doesn't take advantage that that both are the same structure, but just treats the second vector like any traversable.

in theory this could be O(log n), not O(n), yes? Tiark seems to confirm it in this comment on #3724: "Unfortunately, for implementing a fast concat operation (we hope to do that for 2.8.1 or 2.9) heterogenous Arrays seem to be necessary (we'll be storing Int arrays next to the sub nodes). We might rethink this however, and try to stick to homogeneous Arrays."

I just thought there should be enhancement ticket on this as it would still be nice to have someday.

@scabug

This comment has been minimized.

Copy link

scabug commented Apr 4, 2011

@scabug

This comment has been minimized.

Copy link

scabug commented Apr 4, 2011

@TiarkRompf said:
I've been having this on my todo list for quite a while now. The problem is that it requires pervasive changes to the current internal structure and those are hard to get right without slowing down other uses of Vector. Performance will most likely be amortized log n, not worst case. It's good to have a ticket for this, though.

@scabug

This comment has been minimized.

Copy link

scabug commented Sep 20, 2011

@SethTisue said:
Is Phil working on this? (the latest Scala meeting notes seem to say so)

@scabug

This comment has been minimized.

Copy link

scabug commented Mar 2, 2012

@TiarkRompf said (edited on Mar 2, 2012 9:10:30 PM UTC):
Here's a link to the paper:
http://infoscience.epfl.ch/record/169879/files/RMTrees.pdf?version=1

Still need to resolve integration with the current Vector implementation. Maybe it's better to have a separate CatenableVector as a first step.

@scabug

This comment has been minimized.

Copy link

scabug commented Jul 29, 2013

@pchiusano said:
Related to this, (1 to N).foldLeft(Vector[Int]())(_ :+ _) runs in linear time, but (1 to N).foldLeft(Vector[Int]())((acc,a) => acc ++ List(a)) is quadratic. This is extremely surprising (I spent several hours tracking down a performance bug in scalaz-stream that ended up being caused by this). Since Vector has a constant time snoc operation, it seems like the default implementation of ++ should be repeated calls to :+, which doesn't require any internal changes to Vector.

@scabug

This comment has been minimized.

Copy link

scabug commented Aug 7, 2013

@SethTisue said:
Paul: I'm extremely glad you noticed this. I don't think anyone realized the situation was that bad! I opened a separate ticket on it at #7725.

@scabug

This comment has been minimized.

Copy link

scabug commented Feb 10, 2014

@adriaanm said:
was this fixed along with SI-7725?

@scabug

This comment has been minimized.

Copy link

scabug commented Feb 10, 2014

@Ichoran said:
This was not fixed--this is about preserving structural sharing across the entire tree, while the other was to at least start from the bigger of two collections to get at least some sharing.

@scabug

This comment has been minimized.

Copy link

scabug commented Feb 13, 2014

@SethTisue said:
reassigning to "Backlog" on the grounds that the research on how it would even work hasn't been done yet

@scabug

This comment has been minimized.

Copy link

scabug commented Feb 13, 2014

@adriaanm said:
Thanks

@scabug

This comment has been minimized.

Copy link

scabug commented Oct 1, 2016

@joshlemer

This comment has been minimized.

Copy link
Member

joshlemer commented Sep 27, 2018

These are very relevant!
paper describing a potentially new data structure for Vectors: https://infoscience.epfl.ch/record/213452/files/rrbvector.pdf
implementation: https://github.com/nicolasstucki/scala-rrb-vector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment