Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arithmetic overflow possible in IterableOnceOps.copyToArray #12795

Open
noresttherein opened this issue May 29, 2023 · 8 comments
Open

Arithmetic overflow possible in IterableOnceOps.copyToArray #12795

noresttherein opened this issue May 29, 2023 · 8 comments

Comments

@noresttherein
Copy link

Reproduction steps

Scala version: 2.13.10

In IterableOnceOps:

  def copyToArray[B >: A](xs: Array[B], start: Int, len: Int): Int = {
    val it = iterator
    var i = start
    val end = start + math.min(len, xs.length - start)
    while (i < end && it.hasNext) {
      xs(i) = it.next()
      i += 1
    }
    i - start
  }

Problem

There are actually two overflows here, but xs.length - start is arguably not an issue, as a negative start causes a negative end and nothing will be copied (instead of throwing an exception if an overflow had not happened). There is a second one in start + math.min(len, ...), however, and on a valid path:

copyToArray(new Array[Hamster](Int.MaxValue - 100), Int.MaxValue - 50, Int.MaxValue

The 'new' collection framework is much better in terms of avoiding overflows than the old one, where almost everything was overflowing, but if there was some competition I'm pretty sure I could find a good dozen more. Iterator.drop in complex iterator implementations in particular is very vulnerable, and that's a bit of a bummer, because iterators actually can be easily used for collections with more than Int.MaxValue elements.

I'd like to use this opportunity to climb my soap box and say again I don't like the exact semantics of indices out of range in copyToArray. Essentially, they are closely tied to this particular implementation, and troublesome to reproduce to a tee for all possible data sets. A negative start is allowed only if the total number of elements which would be copied, as influenced by len, this.size, and xs.length, would be zero or less. I think it would be cleaner if it was either always rejected with an exception, or always accepted.

While I can see the value of permissive indices in slicing operations, where we are dealing with bounds on collection sizes, using them in a context of specific indices in a sequence (like indexOfSlice and some other issues I reported),

  1. doesn't seem to me to follow the policy of least surprise,
  2. is inconsistent with mutable collections, in which all mutating operations (as far as I can tell), reject all cases of an index out of range (see remove, insertAll, etc.),
  3. is not particularly useful. For example, if copyToArray always accepted negative start and simply ignored -start first elements in the collection, it would be actually quite useful.

The worst offender, by far, though, is, in my eyes, patch. The policy of clipping the index of the start of the patch, can easily lead to quiet bugs, without any beneficial use cases I can see. If, as I suggested above, instead the indices were not modified, but the part of the elems (patch) collection falling outside of the patched (this) collection was ignored, it would be both more logical and useful.
Ehm, sorry.

@lrytz
Copy link
Member

lrytz commented May 30, 2023

Would it be worth building an inventory of where things are in terms of out-of-bounds handling (relevant operations, what is accepted, which exceptions for rejections)? And agree on a doc / spec that we can follow? @julienrf do you know if there is anything existing?

climb my soap box

nice, til 🖼️

@julienrf
Copy link

I don’t think we have such a spec, except the scala-collection-laws maybe…

@som-snytt
Copy link

climb my soap box

not to be confused with soap opera.

I don't think a bug in arithmetic warrants an existential crisis in the API. It's perfectly natural to minimize bounds checking and only fail "on demand".

Permissive inputs are well-established for xs.drop(100) (as a paradigm).

I'm more sympathetic to the argument about patch, because it seems to me that it should be "trivial" to calculate "what will be replaced." slice doesn't work because patch takes a count, not an end index.

scala> xs
val res17: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> val p = List(42, 27)
val p: List[Int] = List(42, 27)

scala> xs.patch(-11, p, 5)
val res18: List[Int] = List(42, 27, 6, 7, 8, 9, 10)

scala> xs.slice(-11, -6)
val res19: List[Int] = List()

scala> xs.slice(-11, 1000).take(5)
val res20: List[Int] = List(1, 2, 3, 4, 5)

Then the definition of patch is really res20 (n elements of the maximal slice at i) and not res19.

The case is stronger for patchInPlace, which ought to have similar behavior (except in-place). If buf.update throws, then buf.patchInPlace should throw for the same bad index.

It's mutating operations that throw instead of silently truncating, such as buf.remove(1000).

Perhaps I'm wrong that patch and patchInPlace ought to have the same policy on the from parameter, just because the method names are similar. Or perhaps I'm wrong that patchInPlace show throw like update, if "the elements to replace" is defined in terms of slice.

If patch means "patch slice", then the hypothetical patchRange(from, to) is the form that would throw (as well as patchRangeInPlace).

It would be nice if these behaviors were summarized in the collections overview, of course.

@SethTisue SethTisue added this to the Backlog milestone Aug 8, 2023
@SethTisue
Copy link
Member

@scala/collections

@Ichoran
Copy link

Ichoran commented Aug 8, 2023

I think Julien is correct that in general we don't have specifications for what happens when various indices are out of bounds. Treating everything as undefined behavior is unwise, because people will test for the behavior, then assume that the experimentally-determined behavior is the intended behavior, and then be surprised when some other collection of the same type behaves differently.

I tried to put some of what I thought were sensible invariants into collections-laws. I don't think I was comprehensive, and anyway, that just helps people working on the standard library avoid unexpected behavior; it still doesn't help users understand what to expect.

The most important principle is, I think, to make all the collections have the same behavior, with an occasional exception if absolutely critical for array performance. I don't think copyToArray is one of these exceptions.

Logically, I think copyToArray(xs, start, len) should be identical to

val it = iterator
var l = len
var s = start
while (l > 0 && it.hasNext) {
  xs(s) = it.next()
  s += 1
  l -= 1
}
len - l

That is, I agree that the asymmetry between start and end when you're specifying the max length is a bad thing. However, the docs say that the copy ends when the array does, and it's behavior that people may count on, and the return Int helps you figure out whether you didn't transfer the data you thought you would.

So I'd rewrite it like this:

  def copyToArray[B >: A](xs: Array[B], start: Int, len: Int): Int = 
    if (len <= 0) 0
    else {
      var i = start
      val it = iterator
      val end = if (xs.length - len <= start) xs.length else start + len
      while (i < end && it.hasNext) {
        xs(i) = it.next()
        i += 1
      }
      i - start
    }

This works because xs.length and len are both positive, so the subtraction will be in bounds; if you add len to both sides, the inequality is xs.length <= start + len, and then the if-statement is just a min.

If start is negative, it doesn't matter to the logic; it will work correctly and index into a negative value and throw an exception, as it does now.

As a bonus, in the case where len is zero or negative, and getting the iterator is not a no-op, the code is a touch faster.

(Note: code untested.)

@Ichoran
Copy link

Ichoran commented Aug 8, 2023

Specifically regarding patch, I think the behavior was chosen to be identical to, but more efficient than, that which you would get from the most obvious collections operations.

In particular, xs.patch(n, ys, m) is equivalent to xs.take(n) ++ ys ++ xs.drop(n).drop(m). The double-drop is weird until you notice that the natural way to split xs is the splitAt method: val (xsL, xsR) = xs.splitAt(n); xsL ++ ys ++ xsR.drop(m).

So in terms of coherence with the "natural" way to do things with higher-order methods, I think patch does the right thing. patchInPlace should match its behavior. If it doesn't, I would consider that a bug.

@noresttherein
Copy link
Author

noresttherein commented Aug 13, 2023 via email

@som-snytt
Copy link

som-snytt commented Aug 13, 2023

Parameters have been renamed and annotated with deprecatedName.

The most important principle is, I think, to make all the collections have the same behavior

If I write x(i) = 42, I would expect similar semantics whether x is a mutable collection or an array.

I have the biggest issue with Buffer.remove

At the moment, after a couple of months, I don't have a sense of what is the priority issue, in general and in your view.

Recently, I was looking at "value discard" warnings and noticed that remove(i) returns a value but remove(i, 1) does not return a slice. (It's not an exact replacement for reasons of efficiency. There's extra work to check the range.) So I agree that "uniformity of expectation" could be improved.

On parameters, index is sometimes idx. But especially Shrinkable.subtractOne(elem) becomes Buffer.subtractOne(x). I wonder if anyone has written the Scalafix rule to enforce parameter names in overrides.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants