Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Champ HashMap/Set filter/filterNot methods should be optimized to not perform any hashes or equality of keys #11143

Open
joshlemer opened this issue Sep 12, 2018 · 3 comments

Comments

@joshlemer
Copy link
Member

joshlemer commented Sep 12, 2018

there is currently a test for this for the OldHashMap implementation, found here. This requirement should probably be brought forward to the new hash collections if possible (Edit: it's definitely possible!)

@joshlemer joshlemer added this to the 2.13.0-RC1 milestone Sep 12, 2018
@joshlemer
Copy link
Member Author

Also, that test should be a JUnit or scalacheck test, not a partest I reckon.

@joshlemer joshlemer changed the title Champ HashMap filter/filterNot methods should be optimized to not perform any hashes Champ HashMap filter/filterNot methods should be optimized to not perform any hashes or equality of keys Sep 12, 2018
@joshlemer
Copy link
Member Author

Actually I'll post the test here for future records, and remove the old tests anyways

import scala.collection.immutable.{OldHashMap => HashMap}

object Test extends App {

  case class Collision(value: Int) extends Ordered[Collision] {
    def compare(that: Collision) = value compare that.value

    override def hashCode = value / 5
  }

  def testCorrectness[T: Ordering](n: Int, mkKey: Int => T): Unit = {
    val o = implicitly[Ordering[T]]
    val s = HashMap.empty[T, Unit] ++ (0 until n).map(x => mkKey(x) -> (()))
    for (i <- 0 until n) {
      val ki = mkKey(i)
      val a = s.filter(kv => o.lt(kv._1, ki))
      val b = s.filterNot(kv => o.lt(kv._1, ki))
      require(a.size == i && (0 until i).forall(i => a.contains(mkKey(i))))
      require(b.size == n - i && (i until n).forall(i => b.contains(mkKey(i))))
    }
  }

  // this tests the structural sharing of the new filter
  // I could not come up with a simple test that tests structural sharing when only parts are reused, but
  // at least this fails with the old and passes with the new implementation
  def testSharing(): Unit = {
    val s = HashMap.empty[Int, Unit] ++ (0 until 100).map(_ -> (()))
    require(s.filter(_ => true) eq s)
    require(s.filterNot(_ => false) eq s)
  }

  // this tests that neither hashCode nor equals are called during filter
  def testNoHashing(): Unit = {
    var hashCount = 0
    var equalsCount = 0
    case class HashCounter(value: Int) extends Ordered[HashCounter] {
      def compare(that: HashCounter) = value compare that.value

      override def hashCode = {
        hashCount += 1
        value
      }

      override def equals(that: Any) = {
        equalsCount += 1
        that match {
          case HashCounter(value) => this.value == value
          case _ => false
        }
      }
    }

    val s = HashMap.empty[HashCounter, Unit] ++ (0 until 100).map(k => HashCounter(k) -> (()))
    val hashCount0 = hashCount
    val equalsCount0 = equalsCount
    val t = s.filter(_._1 < HashCounter(50))
    require(hashCount == hashCount0)
    require(equalsCount == equalsCount0)
  }

  // this tests correctness of filter and filterNot for integer keys
  testCorrectness[Int](100, identity _)
  // this tests correctness of filter and filterNot for keys with lots of collisions
  // this is necessary because usually collisions are rare so the collision-related code is not thoroughly tested
  testCorrectness[Collision](100, Collision.apply _)
  testSharing()
  testNoHashing()
}

@joshlemer
Copy link
Member Author

Ditto for HashSet

import scala.collection.immutable.{OldHashSet => HashSet}

object Test extends App {

  case class Collision(value: Int) extends Ordered[Collision] {
    def compare(that:Collision) = value compare that.value

    override def hashCode = value / 5
  }

  def testCorrectness[T : Ordering](n: Int, mkKey: Int => T): Unit = {
    val o = implicitly[Ordering[T]]
    val s = HashSet.empty[T] ++ (0 until n).map(mkKey)
    for (i <- 0 until n) {
      val ki = mkKey(i)
      val a = s.filter(o.lt(_,ki))
      val b = s.filterNot(o.lt(_,ki))
      require(a.size == i && (0 until i).forall(i => a.contains(mkKey(i))))
      require(b.size == n - i && (i until n).forall(i => b.contains(mkKey(i))))
    }
  }

  // this tests the structural sharing of the new filter
  // I could not come up with a simple test that tests structural sharing when only parts are reused, but
  // at least this fails with the old and passes with the new implementation
  def testSharing(): Unit = {
    val s = HashSet.empty[Int] ++ (0 until 100)
    require(s.filter(_ => true) eq s)
    require(s.filterNot(_ => false) eq s)
  }

  // this tests that neither hashCode nor equals are called during filter
  def testNoHashing(): Unit = {
    var hashCount = 0
    var equalsCount = 0
    case class HashCounter(value:Int) extends Ordered[HashCounter] {
      def compare(that:HashCounter) = value compare that.value

      override def hashCode = {
        hashCount += 1
        value
      }

      override def equals(that:Any) = {
        equalsCount += 1
        that match {
          case HashCounter(value) => this.value == value
          case _ => false
        }
      }
    }

    val s = HashSet.empty[HashCounter] ++ (0 until 100).map(HashCounter)
    val hashCount0 = hashCount
    val equalsCount0 = equalsCount
    val t = s.filter(_<HashCounter(50))
    require(hashCount == hashCount0)
    require(equalsCount == equalsCount0)
  }

  // this tests correctness of filter and filterNot for integer keys
  testCorrectness[Int](100, identity _)
  // this tests correctness of filter and filterNot for keys with lots of collisions
  // this is necessary because usually collisions are rare so the collision-related code is not thoroughly tested
  testCorrectness[Collision](100, Collision.apply _)
  testSharing()
  testNoHashing()
}

@joshlemer joshlemer changed the title Champ HashMap filter/filterNot methods should be optimized to not perform any hashes or equality of keys Champ HashMap/Set filter/filterNot methods should be optimized to not perform any hashes or equality of keys Sep 13, 2018
@szeiger szeiger modified the milestones: 2.13.0-RC1, 2.13.1 Jan 10, 2019
@szeiger szeiger modified the milestones: 2.13.1, 2.13.2 Aug 22, 2019
@SethTisue SethTisue modified the milestones: 2.13.2, Backlog Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants