Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scala.collection.immutable.HashMap.merge does not handle collisions correctly #5879

Closed
scabug opened this issue Jun 4, 2012 · 4 comments

Comments

@scabug
Copy link

commented Jun 4, 2012

The merge function of scala.collections.immutable.HashMap can result in an inconsistent HashMap when there is a hash code collision. The easiest way to get this behavior is to merge two maps and defining a collision function.

But this also occasionally happens when no collision function is provided!

    import collection.immutable.HashMap
    val a = HashMap(1 -> "1")
    val b = HashMap(1 -> "2")
    def collision(a: (Int, String), b: (Int, String)) = {
      // shouldn't this be called with reversed order?
      println(a)
      a
    }
    val r = a.merge(b, collision)
    println(r)
    println(r(1))

Output:

    (1,2)
    Map(1 -> 2)
    1

Explanation:

HashMap1 caches both the value and a key/value pair. As a result of the merge the value field is inconsistent to the kv field. When printing the map, the kv field is used, so the result is correct. When calling apply, the value field is used, and the result is wrong.

Expected behavior:

HashMap1.value and HashMap1.kv should never be able to become inconsistent. To fix this in HashMap1.updated0:

      if (hash == this.hash && key == this.key ) {
        val newKv = if (merger eq null) kv else merger(this.kv, kv)
        new HashMap1(key, hash, newKv ._2, newKv)
      } else {
        ...

But there might be other places where this issue (inconsistency between kv and value) occurs.

@scabug

This comment has been minimized.

Copy link
Author

commented Jun 4, 2012

Imported From: https://issues.scala-lang.org/browse/SI-5879?orig=1
Reporter: @rklaehn
Affected Versions: 2.9.2

@scabug

This comment has been minimized.

Copy link
Author

commented Aug 9, 2012

@rklaehn said:
There is a bug in the fix. Or at least the behavior of the merged method if the merger is null is pretty nondeterministic and thus useless.

Given two maps a and b, a.merged(b)(null) should only take elements from b if there is no collision, right? So it should be equivalent with b ++ a, where all elements in a replace elements in b. But it is not. See test case.

The reason is the following: if a merger is given, the number of times invert has been called on it keeps track of the number of times the order of the arguments has been reversed. It essentially acts as a flag that is negated every time the arguments are swapped, like e.g. in

    protected override def merge0[B1 >: B](that: HashMap[A, B1], level: Int, merger: Merger[A, B1]): HashMap[A, B1] = {
      that.updated0(key, hash, level, value, kv, if (merger ne null) merger.invert else null)
    }

But when the merger is null there is no way to keep track of the number of flips, so whether you get something from the first or the second map depends on the structure of the map (depth, number of collisions, etc).

Here is some code to reproduce the bug:

  val n = 10
  val keys = (0 until n)
  val sa = keys.map(key => key -> "a")
  val sb = keys.map(key => key -> "b")
  for (i <- 0 until n) {
    val a = HashMap(sa: _*).take(i)
    val b = HashMap(sb: _*).take(n-i)

    val r1 = a ++ b
    val r2 = b.merged(a)(null)

    if (r1 != r2) {
      println(a)
      println(b)
      println(r1)
      println(r2)
    }
    println(r1 == r2)
  }

I would suggest dropping the merger stuff and just having an additional boolean argument "flipped" that just keeps track of whether the order of the arguments was flipped. That way it works even if the merger is null.

I think that would be faster as well since it would not require lifting the function to some helper object.

@scabug

This comment has been minimized.

Copy link
Author

commented Aug 9, 2012

@rklaehn said:
See comment

@scabug

This comment has been minimized.

Copy link
Author

commented Aug 9, 2012

@rklaehn said:
Added a pull request. Not sure if this is the best solution, but it should be reasonably efficient. When the merge function is null, it gets lifted to a default merger that just takes the first argument.

scala/scala#1108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.