Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow passing arrays for repeated arguments #11024

Closed
allanrenucci opened this issue Jul 16, 2018 · 23 comments
Closed

Disallow passing arrays for repeated arguments #11024

allanrenucci opened this issue Jul 16, 2018 · 23 comments

Comments

@allanrenucci
Copy link

Seq will be an alias for immutable.Seq in 2.13. Therefore, you should not be able to pass an Array where a repeated argument is expected. Currently:

Welcome to Scala 2.13.0-M4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172).
Type in expressions for evaluation. Or try :help.

scala> import collection.{mutable, immutable}
import collection.{mutable, immutable}

scala> def foo(x: Int*) = 1
foo: (x: Int*)Int

scala> foo(immutable.Seq(1, 2): _*)
res0: Int = 1

scala> foo(mutable.Seq(1, 2): _*)
                      ^
       error: type mismatch;
        found   : Seq[Int] (in scala.collection.mutable)
        required: Seq[Int] (in scala.collection.immutable)

scala> foo(Array(1, 2): _*)
res2: Int = 1

This raises the question: What do we do for Java defined methods? Should we still allow passing an Array? E.g.

def foo(xs: Array[String]) = java.nio.file.Paths.get("Hello", xs: _*)

It would be nice to not have a special case for java defined methods, but this might incur a performance cost as you would need to wrap/copy the Array into a immutable.ArraySeq when the compiler will have to convert back to an Array anyway.

Note: sequence literals would not concerned by this change

@adriaanm
Copy link
Contributor

adriaanm commented Jul 16, 2018

@adriaanm adriaanm added this to the 2.13.0-M5 milestone Jul 16, 2018
@Ichoran
Copy link

Ichoran commented Jul 16, 2018

There is no particular reason that A* has to mean immutable.Seq[A]. It probably shouldn't, since that would prevent efficient passing of mutable collections to varargs, and efficiency is one of the main points of mutable collections.

@sjrd
Copy link
Member

sjrd commented Jul 16, 2018

Are you suggesting that varargs should be specified as scala.collection.Seq instead of Seq (which now means scala.collection.immutable.Seq)?

@Ichoran
Copy link

Ichoran commented Jul 16, 2018

@sjrd - Yes, because doing otherwise will clobber performance of varargs for mutable collections; the "workaround" involves users writing a whole set of new collection.mutable.Seq or other methods that don't require rebuilding before being passed in. For Array itself it's fine, as immutable.ArraySeq presumably has about the same overhead as mutable.WrappedArray. But for anything else it's an issue.

@xuwei-k
Copy link

xuwei-k commented Jul 17, 2018

Welcome to Scala 2.13.0-M4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_172).
Type in expressions for evaluation. Or try :help.

scala> def foo(xs: Int*): Seq[Int] = xs
foo: (xs: Int*)Seq[Int]

scala> val array = Array(1, 2, 3)
array: Array[Int] = Array(1, 2, 3)

scala> val seq = foo(array : _*)
seq: Seq[Int] = ArraySeq(1, 2, 3)

scala> array(0) = 7

scala> seq // not immutable !!!
res1: Seq[Int] = ArraySeq(7, 2, 3)

😱

@dwijnand
Copy link
Member

ummutable

@Jasper-M
Copy link
Member

Jasper-M commented Jul 17, 2018

Why do we special case arrays and not just depend on the existing implicit conversions of Array[x] => Seq[x]? An Array is not a Seq...

Take String for another example:

scala> def foo(xs: Char*): Seq[Char] = xs
foo: (xs: Char*)Seq[Char]

scala> foo("asf" :_*) //print
   foo(((scala.Predef.wrapString("asf")): _*)) // : Seq[Char]

No special compiler magic required...

@SethTisue
Copy link
Member

There is no particular reason that A* has to mean immutable.Seq[A]

I think there is a goal to have the language spec refer only to scala.* types. (And in 2.13, of course, scala.Seq is an alias to scala.collection.immutable.Seq.)

@sjrd
Copy link
Member

sjrd commented Jul 17, 2018

I think there is a goal to have the language spec refer only to scala.* types.

Is there? What about scala.reflect.ClassTag, for example? Anyway, we could side-step this problem by introducing type VarArgs[+A] = scala.collection.Seq[A] in scala.

@lihaoyi
Copy link

lihaoyi commented Jul 18, 2018

Is there? What about scala.reflect.ClassTag, for example? Anyway, we could side-step this problem by introducing type VarArgs[+A] = scala.collection.Seq[A] in scala.

Removing the ability to pass in mutable sequences through varargs as read-only sequences is a non-trivial change with significant consequences performance of mutable sequences: forcing people to copy their collections at a time when they most want to avoid additional overhead.

Not to say we definitely shouldn't do it, but we definitely shouldn't do it "just because", especially if we can satisfy our requirements just by adding a trivial type alias.

@szeiger
Copy link
Member

szeiger commented Jul 18, 2018

Treating Java varargs as effectively immutable was a conscious decision, allowing to pass arrays from Scala code was not. Uncurry does the necessary adaptation but I must have missed how it gets through typer in the first place. I expected that it would use an implicit conversion, in this case the deprecated scala.Predef.copyArrayToImmutableIndexedSeq.

@allanrenucci
Copy link
Author

I'll quote @odersky's comment in scala/scala3#4787

On further discussion, I think it's preferable to have automatic conversions for varargs. We already have a conversion from Seq to Array when passing a Seq argument to a Java vararg. We should have a dual conversion from Array to Seq when passing an Array argument to a Scala vararg. I believe the Array-> Seq conversion should make a defensive copy of the array in order to prevent mutations leaking into the vararg.

@szeiger
Copy link
Member

szeiger commented Jul 19, 2018

@allanrenucci I agree. That was the intention with copyArrayToImmutableIndexedSeq. I'll try to figure out why it's not used.

@xuwei-k
Copy link

xuwei-k commented Jul 22, 2018

class A {
  def foo1(xs: Int*): Seq[Int] = xs
  val seq1 = foo1(Array(1, 2, 3) : _*)

  def foo2(xs: String*): Seq[String] = xs
  val seq2 = foo2(Array("a", "b", "c") : _*)
}

compile with -Xprint:jvm (latest 2.13.x) ↓

A.this.seq1 = A.this.foo1(scala.runtime.ScalaRunTime.wrapIntArray(Array[Int]{1, 2, 3}));
A.this.seq2 = A.this.foo2(scala.runtime.ScalaRunTime.wrapRefArray(Array[String]{"a", "b", "c"}.$asInstanceOf[Array[Object]]().$asInstance

Scala 2.13.x use scala.runtime.ScalaRunTime#wrap***Array methods. These methods are unsafe.

@xuwei-k
Copy link

xuwei-k commented Jul 23, 2018

BTW, unapplySeq should work scala.collection.Seq or only scala.collection.immutable.Seq ?
scala/scala#6962 (comment)

@szeiger
Copy link
Member

szeiger commented Jul 24, 2018

It looks like varargs do not really have an expected collection type. The relevant part of typer that handles _* arguments is:

      def typedTyped(tree: Typed) = {
        if (treeInfo isWildcardStarType tree.tpt)
          typedStarInPattern(tree, mode.onlySticky, pt)

And typedStarInPattern allows both Array and Seq. Changing it at this point would require changes to how overload resolution interacts with varargs because it is possible to have mixed overloads with Java and Scala varargs of the same base type in the same position.

I'll change the varargs adaptation in Uncurry intead.

@Ichoran
Copy link

Ichoran commented Jul 24, 2018

Do we have any evidence that this is an extant problem that is causing significant harm?

Performance regressions do cause harm. At the very least we should benchmark before/after to get an idea of how bad it is.

If this is "fixed" and as a consequence people end up writing a bunch of

  def foo(xs: collection.Seq[A]): Foo = /* actual code */
  def foo(xs: Array[A]): Foo = foo(ArraySeq.unsafeWrapArray(xs))
  def foo(xs: A*): Foo = foo(xs)

I don't think we've made things better.

@szeiger
Copy link
Member

szeiger commented Jul 24, 2018

Performance regressions do cause harm. At the very least we should benchmark before/after to get an idea of how bad it is.

We have the same situation in Predef.copyArrayToImmutableIndexedSeq which is deprecated so it prints a warning when you accidentally copy an array into an immutable.Seq. I'm adding a similar deprecation message for the automatic adaptation.

I don't understand the purpose of your code example. If the first method takes a collection.Seq there is no need for the second method. We have non-deprecated non-copying implicit conversions for this. I wouldn't define the third method in this way (i.e. with the same name but taking varargs instead of an Array or Seq) but it's certainly possible and it can call the first method directly without any copying or use of the second method.

Calling foo((xs: Array[A]): _*) will create a defensive copy and result in a deprecation warning but there is no need to do that because you can call the non-varargs version instead (foo((xs: Array[A]))). Passing individual values (i.e. without the use of : _*) is unaffected and will not copy the array.

@Ichoran
Copy link

Ichoran commented Jul 24, 2018

The second method was anticipating a more complicated scenario than makes sense to describe here; just ignore it!

So the workaround for arrays is to replace (xs: _*) for arrays with (immutable.ArraySeq.unsafeWrapArray(xs)), correct? And there is no workaround for collection.mutable.Seq (aside from hoping that the API
will be changed to have the pair of methods I wrote above)?

@szeiger
Copy link
Member

szeiger commented Jul 24, 2018

Right, if you want to support mutable Seqs and varargs you need two separate methods since M4. Defensive copying of explicit array values won't make this any harder. If you only have a varargs method you'll need to use unsafeWrapArray to avoid copying, same as for non-varargs uses of immutable.Seq.

@Ichoran
Copy link

Ichoran commented Jul 24, 2018

Guess I missed it on M4, then. Was that ever benchmarked?

@szeiger
Copy link
Member

szeiger commented Jul 24, 2018

I can't think of a good use case that could be benchmarked. The recommendation for migrating to 2.13 is to use collection.Seq instead of scala.Seq or varargs if you want to support mutable collections (and arrays). There is no performance penalty in this case.

If you keep using scala.Seq or varargs and thus end up with copies of arrays (and the associated deprecation warnings) the performance impact could be anywhere from negligible to severe depending on your code.

@Ichoran
Copy link

Ichoran commented Jul 24, 2018

The problem is that historically there was no reason to create both methods. But now if I write

val myThing = ThatOtherLibrary.create()
myThing.varArgMethod(xs: _*)

then I am the one who gets the deprecation warning, but ThatOtherLibrary is where the changes have to occur. There's no, "Warning, users of your library may find your API to have reduced performance as they will now have to copy mutable collections to pass them in to this method." when they go to port their library to 2.13.

So likely what will happen is they do a major version upgrade and then the users find that the API is broken for them (probably not right away, as performance testing usually only comes after people notice a problem). But ThatOtherLibrary can't maintain binary compatibility if they alter their API so they need another major version upgrade, after users notice, to fix the issue.

The general-purpose workaround is, I guess, (xs: _*) to (immutable.ArraySeq.unsafeWrapArray(xs.toArray): _*) in the case where xs is a mutable Seq, and either the same in the collection.Seq case or pattern-match out the immutables to pass directly.

It would be good to test this with something like

def sumLengths(xs: String*) = {
  var sum = 0
  val it = xs.iterator()
  while (it.hasNext) sum += it.next().length;
  sum
}

before and after. My quick and dirty experiments with going through toArray/WrappedArray on ArrayBuffer in 2.12 suggests that the slowdown is about 2-2.5x.

I'm sorry I didn't notice before, but this seems large enough to me to warrant further attention.

The effect of the change is to make what was previously mostly a syntactic choice to one that has substantial performance consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants