Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ListMap and ListSet iterator optimization #11752

Open
texasbruce opened this issue Sep 27, 2019 · 11 comments

Comments

@texasbruce
Copy link

commented Sep 27, 2019

Currently, for ListMap and ListSet, every call to iterator will build a whole List:

  def iterator: scala.collection.Iterator[A] = {
    var curr: ListSet[A] = this
    var res: List[A] = Nil
    while (!curr.isEmpty) {
      res = curr.elem :: res
      curr = curr.next
    }
    res.iterator
  }

which means a simple call to head:

listMap.head

will require building a whole list of all elements, and get the first item. This is disasterous for large ListSet and ListMap.

We should optimize iterator implementation to create custom Iterator object, rather than build a whole list and get the list iterator.

@texasbruce texasbruce changed the title ListMap and ListSet iterator method optimization ListMap and ListSet iterator optimization Sep 27, 2019
@Jasper-M

This comment has been minimized.

Copy link
Member

commented Sep 27, 2019

Looks like this also means that the doc which says "last and init are O(1)" is very wrong.

@texasbruce

This comment has been minimized.

Copy link
Author

commented Oct 1, 2019

@SethTisue I'll try to draft a PR.

@joshlemer

This comment has been minimized.

Copy link
Member

commented Oct 3, 2019

We should optimize iterator implementation to create custom Iterator object, rather than build a whole list and get the list iterator.

The problem is that ListSet and ListMap are stored in reverse-order, so to traverse the data in forward-order, you need to build some kind of buffer. An improvement might be to buffer the elements in an ArrayBuffer, but can't be avoided entirely.

@texasbruce

This comment has been minimized.

Copy link
Author

commented Oct 4, 2019

We should optimize iterator implementation to create custom Iterator object, rather than build a whole list and get the list iterator.

The problem is that ListSet and ListMap are stored in reverse-order, so to traverse the data in forward-order, you need to build some kind of buffer. An improvement might be to buffer the elements in an ArrayBuffer, but can't be avoided entirely.

We can change it to double linked list.

The current way is way too unoptimized and cannot even guarantee O(1) for head.

@joshlemer

This comment has been minimized.

Copy link
Member

commented Oct 4, 2019

If it's a double linked list though, that will increase the cost of insertions, because even new keys will require a complete rebuild. Though, TBH I am not really sure why someone would ever use a ListMap or ListSet, they are worse in virtually all circumstances than TreeMap or HashMap, in my opinion, the best option would be deprecation.

@plokhotnyuk

This comment has been minimized.

Copy link

commented Oct 4, 2019

@texasbruce

This comment has been minimized.

Copy link
Author

commented Oct 5, 2019

@joshlemer Why does it need a new rebuild for every insertion?

Each node only has reference to previous and next node, and the collection includes a head and tail reference. It should only need to update 1-2 nodes for every insertion/deletion.

It is useful in some situations. As its name indicates, it retains the order of insertion. We use a custom ListMap (due to this issue in std lib) as a coalescing queue for event processing.

@joshlemer

This comment has been minimized.

Copy link
Member

commented Oct 5, 2019

@joshlemer Why does it need a new rebuild for every insertion?

Each node only has reference to previous and next node, and the collection includes a head and tail reference. It should only need to update 1-2 nodes for every insertion/deletion.

If you have a DoubleListMap('a' -> 1, 'b' -> 2, 'c' -> 3) it will look like this:

('a', 1) <-> ('b', 2) <-> ('c', 3)

If you then want to insert 'd' -> 4, you would end up with

('a', 1) <-> ('b', 2) <-> ('c', 3) <-> ('d',4)

So not only does the ('d',4) node need to be created but also a new ('c',3) node needs to be created so that it can have a backpointer to the ('d',4) node, and because that node is changed, we must create a new ('b',2) node to point to that new 'c' node, and so on all the way to the front of the map.

it is useful in some situations. As its name indicates, it retains the order of insertion. We use a custom ListMap (due to this issue in std lib) as a coalescing queue for event processing.

Fair enough, but since 2.13, we have generic abstractions collection.SeqMap[K,V], immutable.SeqMap[K,V] and mutable.SeqMap[K, V] as well as 2 other better implementations of the immutable SeqMap (VectorMap and TreeSeqMap), and perhaps better implementations could be made which are even more performant implementations may be possible (such as the PersistentOrderedMap in kotlinx immutable collections (https://github.com/Kotlin/kotlinx.collections.immutable/blob/master/core/commonMain/src/implementations/persistentOrderedMap/PersistentOrderedMap.kt), which may be faster).

@texasbruce

This comment has been minimized.

Copy link
Author

commented Oct 5, 2019

@joshlemer That's correct for immutable ListMap. I just realized in 2.13 mutable ListMap became deprecated. We don't need to do that way for mutable ListMap, as we can just modify the reference. Do you know why is the deprecation?

We should probably un-deprecate mutable ListMap and implement using double linked list. We don't have any mutable SeqMap implementation that behaves the same as current ListMap. (The only implementation LinkedHashMap behaves slightly different)

@joshlemer

This comment has been minimized.

Copy link
Member

commented Oct 5, 2019

In what way to LinkedHashMap and ListMap behave differently? Is it that LinkedHashMap updates its keys in-place, whereas ListMap updates its keys by moving them to the back?

@joshlemer

This comment has been minimized.

Copy link
Member

commented Oct 5, 2019

if that's the case, then the same semantics could be implemented really simply, utilizing the LinkedHashMap, with much greater performance.

final class LinkedHashMapWithAppendingUpdates[K, V] private(lhm: LinkedHashMap[K, V]) extends SeqMap[K, V] {
  override def iterator: Iterator[(K, V)] = lhm.iterator
  override def subtractOne(elem: K): this.type = {
    lhm.subtractOne(elem)
    this
  }
  override def get(key: K): Option[V] = lhm.get(key)
  override def addOne(elem: (K, V)): this.type = {
    lhm.remove(elem._1)
    lhm.addOne(elem)
    this
  }
}

object LinkedHashMapWithAppendingUpdates extends MapFactory[LinkedHashMapWithAppendingUpdates] {
  override def empty[K, V]: mutable.LinkedHashMapWithAppendingUpdates[K, V] = 
    new mutable.LinkedHashMapWithAppendingUpdates[K, V](mutable.LinkedHashMap.empty)
  override def from[K, V](it: IterableOnce[(K, V)]): mutable.LinkedHashMapWithAppendingUpdates[K, V] = 
    Growable.from(empty[K, V], it)
  override def newBuilder[K, V]: mutable.Builder[(K, V), mutable.LinkedHashMapWithAppendingUpdates[K, V]] = 
    new GrowableBuilder(empty[K, V])
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.