Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Add lazyZip operation (formerly zipWith) #223

Merged
merged 1 commit into from
Oct 30, 2017

Conversation

marcelocenerine
Copy link
Contributor

@marcelocenerine marcelocenerine commented Sep 2, 2017

Changes made for #221:

  • Add zipWith method to collection.IterableOps and collection.Iterator
  • Introduce view ZipWith
  • Override zipWith method in ImmutableArray and LazyList to be consistent with the zip operation
  • Overload zipWith method in collection.ArrayOps and collection.SortedSet to be consistent with the zip operation
  • Introduce benchmarks for zipWith, zip and zipWithIndex (the latter two were missing)

It was not clear to me how the unit tests are organized. I wrote a bunch of them for the main collection types in both immutable and mutable packages to make sure my implementation works but did not include them in this PR.

Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What’s the impact on performance if we implement zip in terms of zipWith, to avoid some code duplication?

@marcelocenerine
Copy link
Contributor Author

marcelocenerine commented Sep 2, 2017

I was doing that at this very moment :D. It's gonna take some time until the benchmarks complete. I'll keep you posted

@EPronovost
Copy link
Contributor

Oh well, I implemented this as well and literally wrote the exact same code. Nice work!
EPronovost@24049b5

@marcelocenerine
Copy link
Contributor Author

marcelocenerine commented Sep 2, 2017

Note:

  • zip == existing implementation
  • zip2 == zip implemented in terms of zipWith

zip_vs_zip2

Benchmark               (size)  Mode  Cnt      Score      Error  Units
HashSetBenchmark.zip         0  avgt   12     19.750 ±    1.138  ns/op
HashSetBenchmark.zip         1  avgt   12     65.216 ±    3.590  ns/op
HashSetBenchmark.zip         2  avgt   12    150.916 ±    8.569  ns/op
HashSetBenchmark.zip         3  avgt   12    223.390 ±    3.237  ns/op
HashSetBenchmark.zip         4  avgt   12    321.306 ±    3.789  ns/op
HashSetBenchmark.zip         7  avgt   12    592.327 ±   17.823  ns/op
HashSetBenchmark.zip         8  avgt   12    682.168 ±   35.707  ns/op
HashSetBenchmark.zip        15  avgt   12   1280.184 ±   70.554  ns/op
HashSetBenchmark.zip        16  avgt   12   1393.507 ±   48.281  ns/op
HashSetBenchmark.zip        17  avgt   12   1496.845 ±   68.302  ns/op
HashSetBenchmark.zip        39  avgt   12   4762.469 ±  295.681  ns/op
HashSetBenchmark.zip       282  avgt   12  44191.198 ± 2168.640  ns/op

HashSetBenchmark.zip2        0  avgt   12     20.369 ±    0.590  ns/op
HashSetBenchmark.zip2        1  avgt   12     60.988 ±    1.465  ns/op
HashSetBenchmark.zip2        2  avgt   12    147.952 ±    5.945  ns/op
HashSetBenchmark.zip2        3  avgt   12    236.831 ±    4.997  ns/op
HashSetBenchmark.zip2        4  avgt   12    335.372 ±   14.477  ns/op
HashSetBenchmark.zip2        7  avgt   12    551.376 ±    7.048  ns/op
HashSetBenchmark.zip2        8  avgt   12    651.141 ±   20.254  ns/op
HashSetBenchmark.zip2       15  avgt   12   1251.230 ±   36.530  ns/op
HashSetBenchmark.zip2       16  avgt   12   1436.768 ±   38.667  ns/op
HashSetBenchmark.zip2       17  avgt   12   1532.003 ±   20.423  ns/op
HashSetBenchmark.zip2       39  avgt   12   4728.074 ±  174.185  ns/op
HashSetBenchmark.zip2      282  avgt   12  44729.714 ± 1118.149  ns/op
Benchmark               (size)  Mode  Cnt      Score      Error  Units
ListBenchmark.zip            0  avgt   12     92.196 ±    3.679  ns/op
ListBenchmark.zip            1  avgt   12    103.485 ±    1.883  ns/op
ListBenchmark.zip            2  avgt   12    162.533 ±   12.149  ns/op
ListBenchmark.zip            3  avgt   12    193.258 ±   14.573  ns/op
ListBenchmark.zip            4  avgt   12    208.044 ±    8.488  ns/op
ListBenchmark.zip            7  avgt   12    304.174 ±    8.090  ns/op
ListBenchmark.zip            8  avgt   12    276.759 ±    6.379  ns/op
ListBenchmark.zip           15  avgt   12    536.954 ±   14.958  ns/op
ListBenchmark.zip           16  avgt   12    421.065 ±   12.324  ns/op
ListBenchmark.zip           17  avgt   12    383.749 ±    7.729  ns/op
ListBenchmark.zip           39  avgt   12    809.668 ±   18.085  ns/op
ListBenchmark.zip          282  avgt   12   5699.023 ±  100.800  ns/op

ListBenchmark.zip2           0  avgt   12     90.923 ±    1.341  ns/op
ListBenchmark.zip2           1  avgt   12    103.967 ±    1.503  ns/op
ListBenchmark.zip2           2  avgt   12    130.143 ±    3.979  ns/op
ListBenchmark.zip2           3  avgt   12    153.424 ±    7.152  ns/op
ListBenchmark.zip2           4  avgt   12    178.389 ±    2.642  ns/op
ListBenchmark.zip2           7  avgt   12    246.142 ±    4.817  ns/op
ListBenchmark.zip2           8  avgt   12    276.040 ±    7.982  ns/op
ListBenchmark.zip2          15  avgt   12    429.587 ±   17.765  ns/op
ListBenchmark.zip2          16  avgt   12    373.032 ±    4.830  ns/op
ListBenchmark.zip2          17  avgt   12    388.918 ±    6.691  ns/op
ListBenchmark.zip2          39  avgt   12    983.104 ±   31.814  ns/op
ListBenchmark.zip2         282  avgt   12   5705.923 ±  104.434  ns/op
Benchmark               (size)  Mode  Cnt      Score      Error  Units
LazyListBenchmark.zip        0  avgt   12      3.744 ±    0.084  ns/op
LazyListBenchmark.zip        1  avgt   12     14.669 ±    0.444  ns/op
LazyListBenchmark.zip        2  avgt   12     15.487 ±    0.673  ns/op
LazyListBenchmark.zip        3  avgt   12     19.166 ±    3.590  ns/op
LazyListBenchmark.zip        4  avgt   12     17.731 ±    1.842  ns/op
LazyListBenchmark.zip        7  avgt   12     16.486 ±    2.188  ns/op
LazyListBenchmark.zip        8  avgt   12     16.813 ±    2.053  ns/op
LazyListBenchmark.zip       15  avgt   12     15.074 ±    0.599  ns/op
LazyListBenchmark.zip       16  avgt   12     15.743 ±    0.579  ns/op
LazyListBenchmark.zip       17  avgt   12     15.599 ±    0.696  ns/op
LazyListBenchmark.zip       39  avgt   12     17.200 ±    2.322  ns/op
LazyListBenchmark.zip      282  avgt   12     16.031 ±    1.493  ns/op

LazyListBenchmark.zip2       0  avgt   12      4.113 ±    0.097  ns/op
LazyListBenchmark.zip2       1  avgt   12     17.683 ±    1.195  ns/op
LazyListBenchmark.zip2       2  avgt   12     16.369 ±    0.315  ns/op
LazyListBenchmark.zip2       3  avgt   12     16.265 ±    0.257  ns/op
LazyListBenchmark.zip2       4  avgt   12     16.310 ±    0.445  ns/op
LazyListBenchmark.zip2       7  avgt   12     17.288 ±    0.496  ns/op
LazyListBenchmark.zip2       8  avgt   12     17.595 ±    0.359  ns/op
LazyListBenchmark.zip2      15  avgt   12     17.463 ±    0.899  ns/op
LazyListBenchmark.zip2      16  avgt   12     17.239 ±    0.703  ns/op
LazyListBenchmark.zip2      17  avgt   12     17.149 ±    0.268  ns/op
LazyListBenchmark.zip2      39  avgt   12     17.236 ±    0.349  ns/op
LazyListBenchmark.zip2     282  avgt   12     17.160 ±    0.734  ns/op

@julienrf, the results look pretty much the same. I have the changes on a separate branch: marcelocenerine/collection-strawman@master...marcelocenerine:zip_in_terms_of_zipWith . Shall I apply them to this PR?

@sjrd
Copy link
Member

sjrd commented Sep 2, 2017

IMO zipWith is a terrible name for this operation. Naming-wise, it's basically equivalent to zip. Two reasonable names, IMO, would be mapWith (because you are mapping this collection together with another one) or zipMap (because, obviously, it's a zip followed by a map).

@marcelocenerine
Copy link
Contributor Author

zipWith would be consistent with Haskell though

@sjrd
Copy link
Member

sjrd commented Sep 2, 2017

In Haskell it makes more sense because the mapping function comes first (due to currying conventions), which means that zipWith (+) xs ys reads as "zip with operation + the lists xs and ys. It also makes sense because you can define let zip = zipWith (,) (basically; I'm not sure of my Haskell syntax).

In Scala, however, xs.zipWith(ys)(_ + _) reads as "zip xs with ys, and some operation". But that would also be the reading of xs.zip(ys).

Precedents in Scala that would match Haskell's way of thinking use By: sortBy, groupBy, etc. So I guess zipBy would fit the consistency with other Scala criterium, but I'm not sure it makes much sense as a reading on its own.

@Ichoran
Copy link
Contributor

Ichoran commented Sep 2, 2017

I think we need to call it zipTo. The problem with zipWith is that it evokes zipWithIndex which suggests that zipWithIndex is a specialized case of zipWith, which it's not.

@smarter
Copy link
Member

smarter commented Sep 3, 2017

The problem with zipWith is that it evokes zipWithIndex which suggests that zipWithIndex is a specialized case of zipWith, which it's not.

What do you mean by that? In Haskell zipWithIndex can be implemented as a special case of zip (which is a special case of zipWith):

zipWithIndex xs = zip xs [0..((length xs) - 1)]

@sjrd
Copy link
Member

sjrd commented Sep 3, 2017

The With in zipWithIndex does not mean the same thing as the With in zipWith. The former introduces the rhs collection; the latter introduces the mapping function. At the end of the day, it's similar to my earlier argument: the reading of zipWith in Scala is such that you expect it to introduce the rhs collection, not the mapping function.

@ritschwumm
Copy link

if you call this zipBy, how would you call a function like this? or does that even exist already under a different name?

trait Coll[A] {
  def zipBy[B](f:A=>B):Coll[(A,B)] = this map { it => (it, f(it)) }
}

@julienrf
Copy link
Contributor

julienrf commented Sep 4, 2017

Thanks @marcelocenerine for running the benchmarks :) For me it makes sense to implement zip in terms of zipWith.

About the name… So far the proposals have been the following:

xs.zipTo(ys)(_ + _)
xs.zipBy(ys)(_ + _)
xs.zipMap(ys)(_ + _)
xs.zipWith(ys)(_ + _)
xs.mapWith(ys)(_ + _)

It’s unfortunate that the “with” doesn’t introduces the same thing as in Haskell. Therefore I don’t see a strong reason to follow the same naming convention. However the name zipWith is already established in other Scala libraries (e.g. cats, scalaz, akka-stream). Last but not least, the Scala standard library already has zipWith for Future (here).

@marcelocenerine
Copy link
Contributor Author

Sounds good, @julienrf. I cherry-picked the commit and updated this PR

@@ -172,12 +172,13 @@ sealed abstract class LazyList[+A]
else prefix.lazyAppendAll(nonEmptyPrefix.tail.flatMap(f))
}

override final def zip[B](xs: collection.Iterable[B]): LazyList[(A, B)] =
override final def zip[B](xs: collection.Iterable[B]): LazyList[(A, B)] = zipWith(xs)((_, _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to override it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! After rewriting zip in terms of zipWith, there's no need to override zip here.

@@ -76,14 +76,16 @@ class ImmutableArray[+A] private[collection] (private val elements: scala.Array[
ImmutableArray.fromIterable(View.Concat(xs, toIterable))
}

override def zip[B](xs: collection.Iterable[B]): ImmutableArray[(A, B)] =
override def zip[B](xs: collection.Iterable[B]): ImmutableArray[(A, B)] = zipWith(xs)((_, _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this override necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@julienrf
Copy link
Contributor

julienrf commented Sep 4, 2017

Thanks @marcelocenerine. Could you add some tests too? (in the test/junit project)

@marcelocenerine
Copy link
Contributor Author

@julienrf, my apologies for taking this long to reply to your review comments (I haven't had access to my laptop for a few days). I just pushed a new commit addressing them.

Regarding the method name: ReactiveX (see rxscala) is another library where zipWith is already established.

@marcelocenerine marcelocenerine changed the title #221 Add zipWith operation Add zipWith operation Sep 11, 2017
Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @marcelocenerine! Overall that looks great, I’ve left a few comments, mainly about documentation or typos.

Could you also fix the merge conflicts?

@@ -835,9 +835,24 @@ trait IterableOps[+A, +CC[X], +C] extends Any {
* corresponding elements of this $coll and `that`. The length
* of the returned collection is the minimum of the lengths of this $coll and `that`.
*/
def zip[B](xs: Iterable[B]): CC[(A @uncheckedVariance, B)] = fromIterable(View.Zip(toIterable, xs))
def zip[B](xs: Iterable[B]): CC[(A @uncheckedVariance, B)] = zipWith(xs)((_, _))

// sound bcs of VarianceNote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is really related to the @uncheckedVariance above, you should not insert a new line in between them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops....will do

* @param f The function to apply to each pair of elements
* @tparam B the type of the elements in the second half of the combined pairs
* @tparam R the type of the elements in the resulting collection
* @return a new collection of type `That` containing the results of applying the given function `f`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use That, that’s a left over of the standard collections based on CanBuildFrom :)

We should just say “a new $Coll containing …”

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also fix zip’s scaladoc at the same time?

Copy link
Contributor Author

@marcelocenerine marcelocenerine Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good to know. I was in doubt about it. I guess you meant $Coll with lowercase c, right?

sure, I can ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, $coll with lowercase c is better, you’re right.

* to each pair of corresponding elements of this $coll and `that`. The length
* of the returned collection is the minimum of the lengths of this $coll and `that`.
*/
def zipWith[B, R](xs: Iterable[B])(f: (A, B) => R): CC[R] = fromIterable(View.ZipWith(toIterable, xs, f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rename the xs parameter to that, to match the scaladoc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

case class Zip[A, B](underlying: Iterable[A], other: Iterable[B]) extends View[(A, B)] {
def iterator() = underlying.iterator().zip(other)
/** A view that generalizes the zip operation by applying a function to each pair of elements
* in the underlying collection and another collection or iterator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove “or iterator” since this is not anymore the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

val array: ArrayOps[Int] = Array(1, 2, 3)
val result: Array[(Int, String)] = array.zip(List("a", "b", "c"))

assertArrayEquals(Array((1, "a"), (2, "b"), (3, "c")).asInstanceOf[Array[AnyRef]], result.asInstanceOf[Array[AnyRef]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t it working if you omit the asInstanceOfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it does not compile:

overloaded method value assertArrayEquals with alternatives:
[error]   (x$1: Array[Long],x$2: Array[Long])Unit <and>
[error]   (x$1: Array[Int],x$2: Array[Int])Unit <and>
[error]   (x$1: Array[Short],x$2: Array[Short])Unit <and>
[error]   (x$1: Array[Char],x$2: Array[Char])Unit <and>
[error]   (x$1: Array[Byte],x$2: Array[Byte])Unit <and>
[error]   (x$1: Array[Object],x$2: Array[Object])Unit
[error]  cannot be applied to (Array[(Int, String)], Array[(Int, String)])
[error]     assertArrayEquals(Array((1, "a"), (2, "b"), (3, "c")), result)
[error]     ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you replace that with assertTrue(Array((1, "a"), ...).equals(result))?

Copy link
Contributor Author

@marcelocenerine marcelocenerine Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, I can do this but with sameElements so that these asserts become a bit shorter.

@@ -17,4 +17,60 @@ class TreeSetTest {
assertEquals(set, set drop Int.MinValue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we need these tests since TreeSet inherits the default implementation, which is already tested in IterableViewLikeTest.

Copy link
Contributor Author

@marcelocenerine marcelocenerine Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TreeSet has overloaded zip and zipWith methods inherited from SortedSet. These tests are testing the overloaded methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that’s right.

@odersky
Copy link
Contributor

odersky commented Sep 13, 2017

My proposal would be to find a way to have any additional methods sit somewhere where we can easily decide where to put them in the end. There is a case for each individual method, but there is also a clear case against adding dozens of new methods. The trick is to balance one against the other. This general comment applies to all additions of new methods.

In the case of zipWith I am particularly sceptical because

  • we already had it in early versions of Scala and removed it
  • it can be expressed as (xs, ys).zipped.map, which was the primary reason we removed it.

@julienrf
Copy link
Contributor

julienrf commented Sep 19, 2017

it can be expressed as (xs, ys).zipped.map, which was the primary reason we removed it.

Wow, today I learn about (xs, ys).zipped! Indeed, we can achieve exactly the same result as with zipWith, and we can even do more because Tuple2Zipped also has filter, flatMap and a few others.

Maybe it’s just me but I really think the discoverability of this (xs, ys).zipped thing is very low: if you have two collections the only chance to know about this method is to go to the Tuple2 scaladoc page: http://scala-lang.org/api/current/scala/Tuple2.html. But if you have three collections there is no way to know about Tuple3Zipped.Ops#zipped unless you know this class name because there is no Tuple3 entry in the stdlib scaladoc.

One way to make things more discoverable would be to make zipped an actual operation on the collections (instead of a decorator on tuples). We could maybe name it zipView or viewWith to emphasize the fact that it doesn’t consume the elements of the collections: xs viewWith ys.

@julienrf
Copy link
Contributor

@marcelocenerine @Ichoran @szeiger Do you have an opinion about the viewWith operation described above? I discussed with @odersky and he told me that he thinks we should keep one way of doing things, (xs, ys).zipped, and simply improve its discoverability by defining a link to this method in zip and zipWithIndex scaladoc. I do think that it would be better to move to viewWith instead and deprecate zipped, because I think it would be even better to have a viewWith entry in the scaladoc of the collections and because decorators (which is the case of zipped) don’t play well with type inference.

@odersky
Copy link
Contributor

odersky commented Sep 20, 2017

The problem with zipWith, viewWith etc is that

  • it is only binary
  • it can only do map, not flatMap, filter, or foreach.

zipped is a lot of more flexible. I don't see a reason why we should give that up!

@julienrf
Copy link
Contributor

julienrf commented Sep 20, 2017

No, sorry I wasn’t clear:

(xs viewWith ys).map(binaryFunction) == (xs, ys).zipped.map(binaryFunction)
(xs viewWith ys).forall(binaryPredicate) == (xs, ys).zipped.forall(binaryPredicate)
(xs viewWith ys viewWith zs).filter(ternaryPredicate) == (xs, ys, zs).zipped.filter(ternaryPredicate)

We would support flatMap, filter, etc. and we could chain another viewWith call to support arity 3. Thus, we would have the same flexibility, but with an increased discoverability and potentially better type error messages.

@odersky
Copy link
Contributor

odersky commented Sep 20, 2017

@julienrf Ah, I misunderstood indeed. But then there's still the problem that it's binary only. I have used zipped also for 3 and 4 tuples.

@Ichoran
Copy link
Contributor

Ichoran commented Sep 20, 2017

I prefer staying with zipped because (1) it's already there, and (2) we can define it for as many arities as we want. If it produces a proper view, though, that would be a nice improvement.

@julienrf
Copy link
Contributor

(1) it's already there

My point is that it’s hard to know that it exists and that a viewWith (or whatever its name) method would be easier to work with.

(2) we can define it for as many arities as we want

viewWith can do that too:

(xs1 wiewWith xs2 viewWith xs3 viewWith xs4).filter((x1, x2, x3, x4) => x1 == x2 && x3 != x4)

def transform_zipWithIndex(bh: Blackhole): Unit = bh.consume(xs.zipWithIndex)

@Benchmark
def transform_lazyZip(bh: Blackhole): Unit = bh.consume(xs.lazyZip(xs).map((_, _)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add more benchmarks to compare xs.lazyZip(ys).map(f) with xs.zip(ys).map(f.tupled)?

* {{{
* val xs = $Coll(1, 2, 3)
* val res = (xs lazyZip xs lazyZip xs lazyZip xs).map((a, b, c, d) => a + b + c + d)
* // res == $Col(4, 8, 12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: $Coll

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe you should just use List here.

* @return a decorator `Tuple2Zipped` that allows strict operations to be performed on the lazily evaluated pairs
* or chained calls to `lazyZip`. Implicit conversion to `Iterable[(A, B)]` is also supported.
*/
def lazyZip[B, C2[X] <: Iterable[X]](that: C2[B]): Tuple2Zipped[A, C1[A], B, C2[B]] = new Tuple2Zipped(`this`, that)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention we use the CC identifier for Collection type Constructors, while C is used for Collection types. BTW, I’m wondering, here, what’s the advantage of using a collection type constructor over a collection type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m wondering, here, what’s the advantage of using a collection type constructor over a collection type?

scala> import strawman.collection._
import strawman.collection._

scala> import strawman.collection.immutable._
import strawman.collection.immutable._

scala> val xs: List[Int] = List(1, 2, 3)
xs: strawman.collection.immutable.List[Int] = List(1, 2, 3)

scala> xs.lazyZip(xs)
<console>:19: error: value lazyZip is not a member of strawman.collection.immutable.List[Int]
       xs.lazyZip(xs)
          ^

Using collection type in TupleNZipped.lazyZip:

scala> xs.lazyZip(xs).lazyZip(xs)
<console>:19: error: inferred type arguments [Nothing,strawman.collection.immutable.List[Int]] do not conform to method lazyZip's type parameter bounds [B,C3 <: strawman.collection.Iterable[B]]
       xs.lazyZip(xs).lazyZip(xs)
                      ^
<console>:19: error: type mismatch;
 found   : strawman.collection.immutable.List[Int]
 required: C3
       xs.lazyZip(xs).lazyZip(xs)
                              ^

I got confused with this one as well. I started with the signature suggested in one of your last comments, but then I faced these errors. This was pretty much the reason I used implicit conversions in the first iteration. Am I missing anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok I see. Thanks for the explanation!

def next() = (elems1.next(), elems2.next())
}

def className = getClass.getName
Copy link
Contributor

@julienrf julienrf Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure the class name Tuple2Zipped is very useful. What do you think of implementing toString as follows?

def toString = s"$coll1.lazyZip($coll2)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

object Tuple2Zipped {
implicit def tuple2ZippedToIterable[El1, C1 <: Iterable[El1],
El2, C2 <: Iterable[El2]](zipped2: Tuple2Zipped[El1, C1, El2, C2]): Iterable[(El1, El2)] =
new View[(El1, El2)] { def iterator() = zipped2.iterator() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should be more explicit and also put View[(El1, El2)] as the static return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will do

def tuple4Zipped_empty(): Unit = {
assertTrue(zipped3.lazyZip(List.empty).isEmpty)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also check that TreeSet(1, 2, 3).lazyZip(List(4, 5, 6)).map((x, y) => x + y) returns a TreeSet?

And that TreeMap(1 -> "foo", 2 -> "bar").lazyZip(List("baz", "bah")).map { case ((k, v), s) => k -> (v ++ s) } also returns a TreeMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not work well with TreeMap :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here doesn't only concern TreeMap but Maps in general. I guess the reason is the same for having overloaded methods in Map (e.g. map, flatMap, concat).

Having a more specific implicit conversion in the Map companion object may address this:

implicit class LazyZipOps[K, V, CC1[K, V] <: Map[K, V]](`this`: CC1[K, V]) {
  def lazyZip[B, CC2[B] <: Iterable[B]](that: CC2[B]): Tuple2Zipped[(K, V), CC1[K, V], B, CC2[B]] = new Tuple2Zipped(`this`, that)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that’s a good idea.

Also, I think we don’t need to keep track of the precise type of the zipped collections: we just need C1 (to drive the type of the collection resulting from a transformation operation applied to the zipped collections), but we never need C2, C3 or C4.

So, I think we could simplify things as follows:

// TupleZipped.scala
final class Tuple2Zipped[El1, El2, C1 <: Iterable[El1]](coll1: C1, coll2: Iterable[El2]) { … }
final class Tuple3Zipped[El1, El2, El3, C1 <: Iterable[El1]](coll1: C1, coll2: Iterable[El2], coll3: Iterable[El3]) { … }
// same for Tuple4Zipped

class LazyZipOps[A, C1 <: Iterable[A]](`this`: C1) {
  def lazyZip[B](that: Iterable[B]): Tuple2Zipped[A, B, C1] = new Tuple2Zipped[A, B, C1](`this`, that)
}

// Iterable.scala
object Iterable {
  implicit def toLazyZipOps[CC[X] <: Iterable[X], A](it: CC[A]): LazyZipOps[A, CC[A]] = new LazyZipOps(it)
}

// Map.scala
object Map {
  implicit def toLazyZipOps[CC[X, Y] <: Iterable[(X, Y)], K, V](it: CC[K, V]): LazyZipOps[(K, V), CC[K, V]] = new LazyZipOps(it)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I was considering doing that too. The collection types were needed in previous iterations when filter was equivalent to (xs zip ys).filter(f.tupled).unzip, but now I can get rid of them

Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great, thanks! I’ve left some comments.

@julienrf
Copy link
Contributor

Hey @marcelocenerine what’s your status on this work? This really looks great but needs a few tweaks. If you are busy with other things I can try to finish it. Just let me know what works best for you :)

@marcelocenerine
Copy link
Contributor Author

Hi @julienrf , sorry for the delay. I was away last week and didn't find time to address all the comments yet.
I have almost everything sorted. The only things left are:

  • benchmarks to compare xs.lazyZip(ys).map(f) with xs.zip(ys).map(f.tupled)
  • make sure TreeMap(1 -> "foo", 2 -> "bar").lazyZip(List("baz", "bah")).map { case ((k, v), s) => k -> (v ++ s) } also returns a TreeMap

I left a comment regarding the latter. Please see what you think. I believe I can finish it and update the PR this evening. But if you need it earlier than that, please feel free to cherry pick my commit and work on top of that. I can push the changes I have made so far...

@julienrf
Copy link
Contributor

OK, thanks for your detailed response! So, I will let you finish :)

@marcelocenerine
Copy link
Contributor Author

@julienrf, all the review comments are now addressed. Many thanks for your suggestions.

I ended up renaming the TupleNZipped classes to LazyZipN. The naming in the Scala std library is likely influenced by the fact that TupleNZipped objects are created out of tuples, which is no longer the case in strawman. I think that would be misleading given that all the operations available on the new classes don't create any tuples at all (except the implicit conversion to Iterable[(A, B)]). Maybe the scaladocs need some tweaks to make this clear. I'm not totally convinced that LazyZipN is a good name, so please let me know if you have a better suggestion or if you don't agree in renaming them.

I'm running the benchmarks again and will post the results as soon as they complete.

Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome contribution @marcelocenerine, thanks a lot!

import scala.{Boolean, StringContext, Unit}
import scala.language.implicitConversions

final class LazyZipOps[A, C1 <: Iterable[A]] private[collection](`this`: C1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could even make it a value class (even though I bet the saving wouldn’t be significant at all)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very easy change, so I think it's worth doing it despite how much saving we eventually get.

@marcelocenerine
Copy link
Contributor Author

Please find attached the benchmark results comparing xs.lazyZip(ys).map(f) with xs.zip(ys).map(f.tupled)

@julienrf
Copy link
Contributor

Thanks Marcelo.

The benchmark results are interesting. For example, for List, in the current collections xs.zip(ys).map(f) is faster than (xs, ys).zipped.map(f)… In the strawman that’s not the case, though (also, List operations are, in general, significantly slower in the strawman…). It is also interesting that, for ImmutableArray, if you have less than 1000 elements, xs.zip(ys).map(f) is faster than xs.lazyZip(ys).map(f).

@julienrf
Copy link
Contributor

@marcelocenerine That looks great to me, could you please squash your commits into a single one so that I can merge the PR?

@marcelocenerine
Copy link
Contributor Author

for ImmutableArray, if you have less than 1000 elements, xs.zip(ys).map(f) is faster than xs.lazyZip(ys).map(f)

I figured there was a problem there. The views returned by LazyZipN.map weren't overriding knownSize. Now xs.lazyZip(ys).map(f) is faster than xs.zip(ys).map(f) for ImmutableArray. Thanks for spotting that.

image
image

My commits are now squashed.


object LazyZip2 {
implicit def lazyZip2ToIterable[El1, El2](zipped2: LazyZip2[El1, El2, _]): View[(El1, El2)] =
new View[(El1, El2)] { def iterator() = zipped2.iterator() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also override knownSize here, then:

override def knownSize = zipped2.coll1.knownSize min zipped2.coll2.knownSize

And same for other arities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops. done

@julienrf
Copy link
Contributor

Woohoo, thanks a lot Marcelo!

@julienrf julienrf merged commit 01c982a into scala:master Oct 30, 2017
@SethTisue SethTisue changed the title Add zipWith operation Add lazyZip operation Aug 18, 2018
@SethTisue SethTisue changed the title Add lazyZip operation Add lazyZip operation (formerly zipWith) Aug 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants