New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArraySeq (f.k.a. Spandex) #52

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
@odd
Copy link
Contributor

odd commented Mar 27, 2017

An ArraySeq is an (ostensible) immutable array-like collection designed to support the following performance characteristics:

  • constant time head, last, tail, init, take, takeRight, takeWhile, drop, dropRight, dropWhile, slice* and reverse (* = depending on the auto-trimming mode in use)
  • amortised constant time prepended/prependedAll, appended/appendedAll and concat (depending on the complexity of java.lang.System.arraycopy)
  • efficient indexed access
  • linear time iteration (i.e. iterator and foreach) with low constant factors
  • reasonable memory usage (approximately double that of an array with the same number of elements)

Elements can be added to both the front and the rear of the underlying array (via prepended/prependedAll and appended/appendedAll/concat respectively), but never more than once for any given array slot. For any operation trying to use an already occupied slot (such as appended, prepended or updated) an overlay is provided that can store the index and element pair using a HashMap (thus avoiding having to copy the underlying array).
The overlay will only be allowed to grow according to the used auto-trimming mode, after which it will be merged with the underlying array on the next transformation operation.

To guarantee that only a single thread can write to an array slot an atomic long is used to guard the low and high index assignments.

When the underlying array is too small to fit a requested addition a new array is allocated (the size is dependent on the auto trimming mode used).

To ensure that no longer accessible array slots (i.e. on the returned instance of a slice-based operation) are freed, trimming can be used. Trimming can either be performed manually (via the trim method) or automatically (by specifying an auto-trimming mode at creation or via the withAutoTrimming method).

The automatic trimming can be customized via these three properties:

Property Default  Description
minUsagePercentage 25 The minimum size / capacity percentage allowed before auto trimming.
maxOverlaidPercentage 50 The maximum `overlaySize / capacity´ percentage allowed before auto trimming.
usePadding true Indicates whether to use padding or not.

There exists three predefined auto trimming modes:

  • AutoTrimming.Always will use auto trimming for all transformation operations and never use an overlay nor any padding. Safe but slow.
  • AutoTrimming.Default will use auto trimming for all transformation operations where the size of the returned instance is at most 25% of the capacity of the underlay, and will use an overlay with a size of at most 50% of the capacity of the underlay, and will use padding. Somewhat leaky but fast except for creating small slices.
  • AutoTrimming.Never will never use auto trimming, and will allow the overlay to grow to the same size as the capacity of the underlay, and will use padding. Leaky but fast.

Examples

// expression                                              (array, start, stop)
ArraySeq()                                             ==> ([], 0, 0)
ArraySeq() :+ a                                        ==> ([a, _, _, _, _, _, _, _], 0, 1)
a +: ArraySeq()                                        ==> ([_, _, _, _, _, _, _, a], -1, 0)
ArraySeq() :++ Seq(a, b, c, d)                         ==> ([a, b, c, d, _, _, _, _], 0, 4)
Seq(a, b, c, d) ++: ArraySeq()                         ==> ([_, _, _, _, a, b, c, d], -4, 0)
Seq(a, b, c, d) ++: ArraySeq() :++ Seq(e, f, g, h)     ==> ([e, f, g, h, a, b, c, d], -4, 4)
ArraySeq(a, b, c, d, e, f, g, h)                       ==> ([a, b, c, d, e, f, g, h], 0, 8)
ArraySeq() :++ Seq(a, b, c, d, e, f, g, h)             ==> ([a, b, c, d, e, f, g, h], 0, 8)
Seq(a, b, c, d, e, f, g, h) ++: ArraySeq()             ==> ([a, b, c, d, e, f, g, h], -8, 0)
(Seq(a, b, c, d) ++: ArraySeq()).slice(1, 3)           ==> ([_, _, _, _, a, b, c, d], -3, -1)
(ArraySeq() :++ Seq(a, b, c, d)).slice(1, 3)           ==> ([a, b, c, d, _, _, _, _], 1, 3)
(Seq(a, b) ++: ArraySeq() :++ Seq(c, d)).slice(1, 3)   ==> ([c, d, _, _, _, _, a, b], -1, 1)
@julienrf
Copy link
Contributor

julienrf left a comment

That looks very interesting. Could you run the benchmarks with immutable.Array so that we can compare them?

Spandex.tabulate(length + bs.length) {
case i if i < length => apply(i)
case i => bs(i - length)
}

This comment has been minimized.

@julienrf

julienrf Mar 28, 2017

Contributor

You could probably have even better performance by using System.arraycopy here.

This comment has been minimized.

@odd

odd Mar 28, 2017

Contributor

Switched to using two Array.copy instead.

override def ++[B >: Nothing](xs: IterableOnce[B]): Spandex[B] = {
val builder = Spandex.newBuilder[B]
builder ++= xs
Spandex.fromIterable(builder.result)

This comment has been minimized.

@julienrf

julienrf Mar 28, 2017

Contributor

This is not the first time we face this situation: maybe our fromIterable method should be a fromIterableOnce, so that we avoid using the builder. Is there an issue in doing that?

This comment has been minimized.

@szeiger

szeiger Apr 4, 2017

Collaborator

It's a bit odd currently that Iterable is required if you want multiple traversals but also gives you all the extra methods. The only reason I can think of why we don't want Builder <: Iterable is that it shouldn't get those methods. OTOH, I can't imagine a useful fromIterable that requires multiple traversals, either.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Mar 28, 2017

Do you mean running the benchmarks for immutable.Array at the same time or independently and comparing the results? (The immutable-arrays branch does not seem to be updated to the latest version of master so running both benchmarks at the same time seems difficult at the moment)

@DarkDimius

This comment has been minimized.

Copy link
Member

DarkDimius commented Mar 28, 2017

I'd argue that you should compare against current Vector instead.
I can't think of a non-sythetic situation when I'd prefer immutable. Array over current vector.

This is because I expect most of the operations that showcase immutability(drop, tail, filter) to be substantially better in Vector than in immutable array that was proposed.
If I don't care about those operations, than I don't care about immutability and I would use Array instead. Between the Vector and mutable array I don't see a convincing performance argument for immutable array.

@Ichoran

This comment has been minimized.

Copy link
Contributor

Ichoran commented Mar 28, 2017

Did you run the benchmarks? If yes, can you post the results here?

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Mar 28, 2017

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Mar 29, 2017

Here are the benchmark results (with Vector and scala.Array included).

The result part sorted by (benchmark, size, score, error):
time-benchmark-results.txt

The full log:
time-benchmark-full.txt

@jvican

This comment has been minimized.

Copy link
Member

jvican commented Mar 29, 2017

Are you making sure that contributors follow good practices while running benchmarks? Things like cpu governor, cpu frequency, Turbo boost, etc can greatly affect the outcome of benchmarks.

I suggest that if these are not documented, you create a section in CONTRIBUTING that describe them. These benchmarks will probably need to be double-checked manually by a core maintainer of this repo.

@DarkDimius

This comment has been minimized.

Copy link
Member

DarkDimius commented Mar 29, 2017

@jvican, TBH, I don't think we should require contributors to provide us clean benchmarking results.
It's too much of effort that may require custom setup of hardware.
I think better strategy would be for one of us to do it once, after we have all collections contributed.
In particular, I have quite some considerations about the current benchmarking suite as it creates an unrealistic "everything is in cache" situation in most runs.

@jvican

This comment has been minimized.

Copy link
Member

jvican commented Mar 29, 2017

@DarkDimius yes, that makes sense. Automating all this infrastructure and being able to get reliable results from CI could also be an option. AFAIK, we have a dedicated machine for performance that we could tweak and use to get predictable benchmarks, in the same spirit as scala/scala-dev#338.

@julienrf

This comment has been minimized.

Copy link
Contributor

julienrf commented Mar 29, 2017

I have quite some considerations about the current benchmarking suite as it creates an unrealistic "everything is in cache" situation in most runs.

I would be very happy to revise the benchmarking suite, can you create an issue and detail all the points that you think should be changed?

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Mar 29, 2017

I have generated some graphs manually to better see the relative performance of the benchmarks I posted earlier:
time-benchmarks-results-charts.pdf

Here is the pivot table the charts are based on:
time-benchmarks-results-table.pdf

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Mar 30, 2017

I've added some alternative implementations in 6f60304 that use atomic values to guard the low and high primary indices during prepend and append (instead of using synchronized as is done in this pull request):

  1. Spandex_AtomicIntegers uses two AtomicInteger values to represent low and high.
  2. Spandex_AtomicIntegerArray uses a single AtomicIntegerArray with two slots for low and high.
  3. Spandex_AtomicLong uses a single AtomicLong to represent both low and high.
  4. Spandex_Synchronize is the same implementation as in this pull request and uses this.synchronize to guard both low and high.

I will try to run the benchmarks (also including the improvements suggested by @DarkDimius above) during the night and post the results here tomorrow.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Mar 31, 2017

The atomic implementations where not significantly different in my benchmark run so I switched from the current synchronization based implementation to the alternative using two atomic integers since that is the simplest of the atomic variants (commited in e85b046).

@szeiger

This comment has been minimized.

Copy link
Collaborator

szeiger commented Apr 4, 2017

I haven't looked at the implementation details yet so I don't know if this is feasible, but can a Spandex be created in a degenerate state (that would be rebalanced when needed) from an existing array without copying? We need immutable arrays for varargs. If Spandex is universally better we could keep them hidden as an implementation detail but if a Spandex can also be built with similarly low overhead, we wouldn't need immutable arrays at all.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Apr 4, 2017

@szeiger I added a factory method that wraps a given array without copying (in commit 3a7ea7f).

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Apr 18, 2017

Updated benchmarking results (charts -f 1 -bm avgt -bs 1 -w 100ms -r 300ms -to 10s -t 1 -wi 7 -i 7)

Graphs:
time-benchmark-run-21-graphs.pdf

Table:
time-benchmark-run-21-table.pdf

Log:
time-benchmark-run-21.txt

@julienrf

This comment has been minimized.

Copy link
Contributor

julienrf commented May 15, 2017

Hi @odd, thanks a lot for continuing on that (even in absence of feedback from us). I really think this contribution is interesting. We first focus on getting the right design for the collections, and then we’ll see (maybe in June) which collection implementations we put in the core…

@odd

This comment has been minimized.

Copy link
Contributor

odd commented May 15, 2017

Hi @julienrf, no problem, it's only prudent to get the design sorted out before looking too much into different implementations and optimizations. Please let me know if there is anything I can do to help besides keeping this PR up to date with master.

@odd odd referenced this pull request May 23, 2017

Merged

Support for immutable arrays #87

@scottcarey

This comment has been minimized.

Copy link

scottcarey commented Jun 16, 2017

I haven't looked at details, so please excuse me if this question is not relevant:

Expansion occurs when the underlying array is full on the effected side

Is it possible for the array to 'wrap around' like a circular buffer, as done with the mutable.ArrayDeque ? #49

If so, expansion would only be needed after all slots are full, rather than either side, since the 'edge' would move from index 0 as necessary. Additionally, the 'biasing' of the origin from the center when re-sized would not be necessary, so the overall complexity might not increase.

This requires tracking more variables (as one needs the 'origin' index, as well as the start and end), which might break how the atomic integers are used to guard expansion. However, one possibility is to pack two atomic integers into a single atomic long in order to hold more information atomically.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Jun 22, 2017

@scottcarey Thanks for the feedback and sorry for not answering sooner.

Having the indexes wrap around the buffer (i.e. make it a circular buffer) would indeed allow more elements to be inserted before buffer expansion is required (so it would improve memory efficiency). The potential downside is that the overall complexity would increase (as you point out), but also that it would require two array copy operations instead of one at buffer expansion time (if low > high). It would also affect linear iteration negatively (instead of doing one low to high pass over the array, it could result in two non overlapping passes: low to end, 0 to high). As for packing the low and high marks into a single atomic long, this was tried earlier but the performance was not improved compared to using two atomic integers (using a single atomic long does however make the expansion at one end coupled to the expansion at the other end so that two simultaneous call to append and prepend would make one of them fail which is not the case now when using two separate atomic integers).

This is mainly just speculation on my part so preferably an alternative version using circular buffers could be implemented and benchmarked against the current one 🤓 🥇

@odd odd force-pushed the odd:spandex branch from c4615c6 to f039066 Jun 29, 2017

@NthPortal

This comment has been minimized.

Copy link
Contributor

NthPortal commented Jul 5, 2017

I noticed that the iterators returned by Spandex.iterator() do not do bounds checks.
I've written a basic fix here.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Jul 6, 2017

Time benchmark graphs for the latest version (2a840d8) comparing ImmutableArray, Vector and Spandex: time-benchmarks-run-28-charts.pdf

@julienrf

This comment has been minimized.

Copy link
Contributor

julienrf commented Jul 6, 2017

Thanks @odd :)

The numbers are very good for Spandex! BTW, where does the name come from?

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Jul 6, 2017

@@ -88,9 +113,38 @@ class ImmutableArrayBenchmark {
def map(bh: Blackhole): Unit = bh.consume(xs.map(x => x + 1))

@Benchmark
def reverse(bh: Blackhole): Any = bh.consume(xs.reverse)

This comment has been minimized.

@scottcarey

scottcarey Jul 6, 2017

The benchmarks are inconsistent. Some return Any, and some Unit. This could affect the JVM optimizations.

Ideally the same benchmark suite is used for each of these as appropriate, only changing the underlying collection type each run. @Benchmark def reverse, etc, should not be written more than once.

Also, I am not sure why the form used here:

@Benchmark
def reverse(bs: Blackhole): Any = bh.consume(xs.reverse)

is preferred to

@Benchmark
def reverse(): C = xs.reverse

(where C is the type of xs, though Any might do)

My understanding of JMH is that any value returned by the benchmark method is effectively blackhole'd anyway. Obviously, in cases like foldLeft below we need the Blackhole to consume items that are not returned.

Many of the things above are not specific to this PR, but IMO may be affecting the sanity of the benchmark results.
Thoughts?

This comment has been minimized.

@odd

odd Jul 6, 2017

Contributor

Yeah, one benchmark for each of the main abstractions (Seq, IndexedSeq, LinearSeq, Set, SortedSet etc) that could be extended once for each implementation would be super nice. I don't know if this negatively affects the quality of the results though (as the benchmarks will link towards the various abstract classes instead of the implementation class directly; on the other hand that is probably how they would be used in the wild).

@scottcarey

This comment has been minimized.

Copy link

scottcarey commented Jul 6, 2017

It is curious that foreach_filter does worse relative to ImmutableArray, when Spandex does better on the individual components (head, tail), maybe nonEmpty is the culprit?

In addition to benchmarks, It would be useful to have a chart of the heap used as size increases, per data structure. This probably won't be all that flattering for Vector either.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Jul 6, 2017

Here is a chart for the memory benchmarks comparing ImmutableArray, Vector, scala.Array and Spandex:

memory-footprint

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Sep 21, 2017

All commits have now been squashed into a single one: 5186c1d.

@julienrf

This comment has been minimized.

Copy link
Contributor

julienrf commented Sep 21, 2017

When you say that I should squash this PR, do you mean that I should rebase this PR onto current master folding all but the first commit into the first or do you mean something else?

I see that you just did it, but for reference what I meant was to clean up the history a bit. If everything fits well into a single commit, that’s better.

@alexnixon

This comment has been minimized.

Copy link

alexnixon commented Sep 21, 2017

This is interesting - always good to see thought going into collections (and benchmarks!), so thanks Odd!

As a disclaimer, my background is vanilla Scala so I'm coming to both this and ImmutableArray fresh.

Firstly, on ImmutableArray. It seems really dangerous to me that it lives in a package alongside all the other immutable collections yet has a completely different design - it's not immutable in the "persistent, functional data structures" sense like all the collections alongside it. They support cheap updates but this does not. This shows in several places, for example IndexedSeqOps which it implements:

  /** The rest of the collection without its `n` last elements. For
    * linear, immutable collections this should avoid making a copy. */
  override def dropRight(n: Int): C = fromSpecificIterable(view.dropRight(n))

Perhaps it would be better if it were named CopyOnWriteArray, and if it didn't share a hierarchy with collections with such different performance characteristics.

Then on this ArraySeq proposal, I find the lack of predictability of its performance worrying. For example if I write:

def wat(x: ArraySeq[Int]): ArraySeq[Int] = x :+ 42

I have no idea whether this is going to double my heap size [1] - it entirely depends on how the caller constructed the ArraySeq and whether it has already had its "cheap append" used up. Personally I would take no performance optimization over one which only kicks in if the wind's blowing east.

So in general I can see the utility of a copy-on-write array for when you really need the performance -
and perhaps it should have convenience methods like slices and O(1) reversal [2]. And I can see the benefit of Vector as a general-purpose collection which trades off performance[3] for cheap updates. However, I'm not yet convinced about the middle ground in which I think this collection lies.

I'm very open to being swayed by arguments though so would love to hear a response to this, particularly if I've there are any misunderstandings in the above.

[1] EDIT - I think this is subtly different enough to, say, the doubling you would see from a Java ArrayList, as that is amortized over all your add calls. And in fact, the ArraySeq example could end up using 3x the memory, because adding the extra element may require an entirely new array which is double the size of a. Another difference is that with an ArrayList you would preallocate space if you care about ad-hoc allocations, which isn't possible with ArraySeq.

[2] though I also see good arguments against these features - if a user cares so much about performance that they're using a copy-on-write array, then maybe it's best to get them as close to the metal as possible. They might care about the extra memory used by metadata tagged onto the data structure, and they might care how the array is laid out in memory, for example if they want to cheaply wrap a ByteBuffer around it to send to some other data-reading API

[3] I would love to see improvements to Vector, particularly in the memory usage of small Vectors

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Sep 21, 2017

Hi @alexnixon and thanks for the feedback!

The example you provide for the lack of predictability is not specific for ArraySeq but rather common for most of the Seq implementations that exists in the strawman.

def wat(x: Seq[Int]): Seq[Int] = x :+ 42

The function above will lead to all (or some in the case of immutable.Vector) elements of x being copied into a new structure if the next position in x is unavailable. The different Seq implementations will decide the availability of the next position something like this:

  • immutable.List: the next position is never available since it is occupied by Nil (so always make a full copy of x)
  • immutable.ImmutableArray: the next position is never available since the underlying array is always full (so always make a full copy of x)
  • immutable.ArraySeq: the next position is available if the next position already holds an equal element (to the element being appended) or if the underlying array is not full (otherwise make a full copy of x)
  • immutable.Vector: the next position is never available (so always make a copy of the segment of x holding the next position and so on up the levels [segment size is 32, up to 5 levels deep])
  • mutable.ArrayBuffer: the next position is available if the underlying array is not full at the rear (otherwise make a full copy of x)
  • mutable.ListBuffer: the next position is available if toList has not been called on x (otherwise make a full copy of x)
@alexnixon

This comment has been minimized.

Copy link

alexnixon commented Sep 21, 2017

Thanks for the quick reply!

I think I was unclear so let me rephrase: I cannot see another immutable collection for which, when adding an element in the most efficient way for that collection, you don't know whether it will use O(n) extra memory. Do you know of one? If so, which?

Even if there is one, I would still be happy provided the cost is amortized to constant. I have no problem with java's ArrayList for example.

But for ArraySeq this isn't amortized and this will lead to spooky action at a distance, for example:

var coll = (1 to 1000000).foldLeft(ArraySeq.empty[Int]) { case (acc, i) =>
  // Try uncommenting this
  // acc :+ 42
  acc :+ i
}

With one :+ operation, the :+ is constant time. With two :+ operations, the first is constant and the second linear, which makes this program quadratic altogether. Is there any other scala collection with performance characteristics like this?

It's a slightly contrived example but not hard to imagine it biting in the real world, particularly when the :+ calls aren't right next to each other.

A valid counterargument would be "we only make the guns, we can't stop people blowing off their toes". But there has to be a line somewhere and I'm still concerned we're at the wrong side of it here.

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Sep 22, 2017

Repeated appends (and prepends) on the same instance do indeed lead to full copying for the implementations based on single arrays such as ArraySeq and ImmutableArray (except for the first append/prepend which you get for free using ArraySeq). For List the performance is even worse for appends (but free for prepends). Vector is, in this case, the best overall performer since it has much less copying to do in both cases and for all repetitions (except for prepends when compared to List and the first append/prepend when compared to ArraySeq). Please see the attached benchmarks below.

Do I understand you correctly in that it is the difference in complexity between the first and the latter prepends/appends for ArraySeq that bothers you? Would you prefer ArraySeq instead having the worst complexity all the time like ImmutableArray to get predictability (at the cost of performance), or is it rather an argument against any immutable sequence implementation not having a consistently cheap way of doing repeated prepends or appends on the same instance?

Here are the benchmarks for the single and double append scenarios:
expand_singleappend
expand_doubleappend

@oxbowlakes

This comment has been minimized.

Copy link

oxbowlakes commented Sep 22, 2017

@julienrf julienrf referenced this pull request Sep 28, 2017

Closed

3.10.2017 Meeting #254

@julienrf

This comment has been minimized.

Copy link
Contributor

julienrf commented Oct 5, 2017

@odd How easy will your code be specializable for primitive types?

@szeiger Could you present your concerns about replacing ImmutableArray with ArraySeq?

@odd odd force-pushed the odd:spandex branch from 01b76b3 to 7067175 Nov 7, 2017

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Nov 7, 2017

A while back some issues were raised by @alexnixon and @oxbowlakes concerning the performance of repeated prepended/appended operations on ArraySeq (as manifested in the expand_foldAppendDouble benchmarks). The problem was that the performance and memory use could be difficult to predict and lead to hard to find performance issues.

The performance best case occurred when the targetted underlying array slot was unoccupied (i.e. it had never been written to) in which case the prepended/ appended operation would be a simple array store (~constant). If, however, the targetted underlying array slot was already occupied (i.e. it had already been used to store an element), the accessible part of the underlying array would first have to be copied and then updated with the new element (~linear in the size of the instance). The same copy-on-repeated-write problem also occured for the updated operation (but for updated there existed no constant best case since the targetted array slot would always be preoccupied).

To solve this performance degredation on repeated writes, I have redesigned ArraySeq to provide a map backed overlay in addition to the array backed underlay. The overlay maps indices to elements and is used whenever the targetted array slot in the underlay is occupied. This allows the prepended/appended/updated operations to have the same best case (single array store) while improving the worst case to a single hash map store (~effectively constant).

But all is not rosy, however. A danger with ArraySeq is that there exists a risk for memory leaks since elements in the underlay can become hidden. A hidden element can not be garbage collected even though it is not referenced from the ArraySeq instance directly, since it is still referenced from the associated underlay. The hiding of elements can happen in two different ways: by indexing or by shadowing.

Hiding by indexing occurs when a slice based operation (such as tail, init, take or drop) returns an instance where only a subset of the indices available in the associcated underlay are included. The elements in array slots outside the index subset will be hidden since they can not be accessed via the returned ArraySeq instance (but are still referenced by the associated underlay) . Hiding through indexing can also occur when an element is prepended or appended to an instance created with padding (i.e. having a larger capacity than size); the added element is hidden from the original instance since it is outside the index range of the original.

Hiding by shadowing occurs when an element is added to the overlay and the element in the slot with the same index in the underlay becomes hidden.

To mittigate the risk for memory leaks from hidden elements, trimming should be used, either manually via the trim operation, or automatically via the withAutoTrimming operation.

The auto trimming can be customized via these three properties:

  • minUsePercentage: the minimum percentage the size of an returned instance can have compared to the capacity of the associtated underlay.
  • maxOverlayPercentage: the maximum percentage the size of an overlay can grow to compared to the capacity of the associated underlay.
  • usePadding: if true the underlay will have a capacity that is the smallest power of two greater or equal to the size of the instance.

There are three predefined auto trimming modes:

  • AutoTrimming.Always: This mode will always trim the result of any transformation operations and never use an overlay nor any padding. It performs like an immutable array which copies on every change. Safe but slow.
  • AutoTrimming.Default: This mode will trim the result of any transformation operations which return an instance with a size that is at most 25% of the underlay capacity. It will allow the overlay to grow to 50% of the capacity of the underlay and will use padding. Somewhat leaky but fast except when creating small slices.
  • AutoTrimming.Never: This mode will never trim the result of any transformation operations and will allow the overlay to grow to the same size as the underlay. It will use padding. Leaky but fast.

One way of using ArraySeq to try to minimize the problem with element hiding without sacrificing too much perfomance is to use AutoTrimming.Always for all instances (to ensure safety) and use the withoutAutoTrimming[B >: A](f: ArraySeq[A] => ArraySeq[B]) method to wrap any performance critical transformations. That way the transformations will be performed without any trimming happening during the transformation steps; the resulting ArraySeq instance will instead be trimmed once at the end, when the transformation steps are completed.

The current default auto-trimming mode has been chosen to provide a middle ground between safety and performance but perhaps it is better to make AutoTrimming.Always the default (which would make an ArraySeq by default behave the same as an ImmutableArray) and to promote the above usage pattern as a potential optimization technique?

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Nov 7, 2017

Here are the graphs for the latest benchmark results for ArraySeq with the three predefined auto-trimming modes compared to List, Vector and ImmutableArray:

access_init
access_last
access_random
access_slice
access_tail
create_apply
create_build
create_buildrange
expand_append
expand_appendall
expand_appendinit
expand_foldappend
expand_foldappenddouble
expand_foldappenddoubleforeach
expand_padto
expand_prepend
expand_prependall
expand_prependallappendall
expand_prependappend
expand_prependtail
extract_palindrome
transform_collect
transform_distinct
transform_distinctby
transform_filter
transform_flatmap
transform_groupby
transform_lazyzip
transform_map
transform_patch
transform_reverse
transform_span
transform_unzip
transform_updateforeach
transform_updatelast
transform_updaterandom
transform_zip
transform_zipmaptupled
transform_zipwithindex
traverse_foldleft
traverse_foldright
traverse_foreach
traverse_headtail
traverse_initlast
traverse_iterator

@julienrf

This comment has been minimized.

Copy link
Contributor

julienrf commented Nov 8, 2017

Wow, thanks @odd for the hard work!

In summary, it seems that the AutoTrimming.Default mode provides, in general, better performance than Vector. However, the AutoTrimming.Always mode has, in general, worse performance than Vector. On the other hand, the AutoTrimming.Default mode can lead to memory leaks, whereas the AutoTrimming.Always mode can not.

I’m afraid the result is a collection type that is very powerful but a bit hard to master. What do other people think? (@Ichoran? @szeiger?)

@odd What do you think about implementing macro benchmarks (as suggested in #292)? You can draw inspiration from those which are in ArrayBaselineBenchmark.scala or implement your own.

@julienrf julienrf added the blocker label Jan 3, 2018

@julienrf julienrf added this to the 0.10.0 milestone Jan 31, 2018

@julienrf julienrf modified the milestones: 0.10.0, 0.11.0 Feb 19, 2018

@julienrf julienrf modified the milestones: 0.10.0, 0.11.0 Mar 1, 2018

@szeiger szeiger removed this from the 0.11.0 milestone Apr 19, 2018

@NthPortal

This comment has been minimized.

Copy link
Contributor

NthPortal commented Jul 24, 2018

What's the status of this?

@odd

This comment has been minimized.

Copy link
Contributor

odd commented Jul 24, 2018

Hi @NthPortal, since the last commit pushed here, I have experimented with the following:

  1. Make it behave like ImmutableArray (which has been renamed to ArraySeq in 2.13) for all types T <: AnyRef (i.e. never reuse but always copy the underlying array) to avoid potentially leaking references to objects when shadowing elements (by update or windowing).
  2. Keep the current behavior of reusing the underlying array as often as possible for primitive types (since these can not refer to other objects and thus the memory occupied by shadowed elements is bound by the size of the underlying array).
  3. Specialize all operations for primitive types to avoid as much boxing as possible.

The development work has been moved to the odd/collection-benchmark repository (which contains a copy of the benchmarking suite in this repository but updated to target Scala 2.13.0-M4) but I haven't pushed in quite a while. I had some interesting conversations with @szeiger at ScalaDays Berlin which however made me a bit unsure of the future of specialization in Dotty (at least in its current form), so I haven't pressed on; and then it was summer, and the warmest one in ages at that... ☀️

@SethTisue SethTisue added this to the backlog milestone Aug 15, 2018

@SethTisue

This comment has been minimized.

Copy link
Member

SethTisue commented Nov 16, 2018

closing all open pull requests in this repo. perhaps this could move forward in scala/scala, where the code now lives

@SethTisue SethTisue closed this Nov 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment