Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SI-9732 remove the use of Option from OpenHashMap #5074

Conversation

performantdata
Copy link
Contributor

@performantdata performantdata commented Apr 2, 2016

See the JIRA ticket for the performance graphs.

@scala-jenkins scala-jenkins added this to the 2.11.9 milestone Apr 2, 2016
/** The state of each of the slots in the hash table.
* One of `Empty`, `Deleted` or `Occupied`.
*/
private[this] var slotStates = new Array[Byte](actualInitialSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to change OpenEntry to have a Value + Boolean, rather than introducing this? On my architecture, I think you'd get that for free:

$ sbt
sbt> libraryDependencies += "org.openjdk.jol" % "jol-core" % "0.4"

sbt> consoleQuick
...
scala> println(VMSupport.vmDetails())
WARNING: Unable to attach Serviceability Agent: Error: Could not find or load main class org.openjdk.jol.util.HS_SA_Support

WARNING: VM details, e.g. object alignment, reference size, compressed references info will be guessed.

Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

scala> ClassLayout.parseClass(Class.forName("scala.collection.mutable.OpenHashMap$OpenEntry")).toPrintable()

"scala.collection.mutable.OpenHashMap.OpenEntry object internals:
 OFFSET  SIZE   TYPE DESCRIPTION                    VALUE
      0    12        (object header)                N/A
     12     4    int OpenEntry.hash                 N/A
     16     4 Object OpenEntry.key                  N/A
     20     4 Option OpenEntry.value                N/A
     24     4 Object OpenEntry.next                 N/A
     28     4        (loss due to the next object alignment)
Instance size: 32 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't try to add a Byte to OpenEntry. The thought crossed my mind, since that would preserve memory locality. But the direction that I'm headed is to eliminate the overhead of objects like OpenEntry, to support higher capacities than a chained hash map in a JVM. Because, if this class isn't good for that, then what is it good for?

I didn't realize that OpenEntry's inheritance from HashEntry was adding an extra word. I assumed that it wasn't adding overhead, since it wasn't adding feature. The response to that, though, is to remove that dependence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed HashEntry trait here, rather than open another PR. I need to retest the performance over the next couple evenings.

Copy link
Contributor Author

@performantdata performantdata May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the performance graphs in the JIRA ticket. It's gotten a bit better, and a bit worse.

What bothers me is why the Some isn't being elided in the getOrElse(0) call. It's sticks around also if I use the AbstractMap.getOrElse(i, 0). Building tests with -Yopt:l:classpath.

@SethTisue
Copy link
Member

if you merge the latest 2.11.x changes into this PR, it should pass validation

@performantdata performantdata force-pushed the topic/OpenHashMap-performance branch 4 times, most recently from 12df95a to 7258f5e Compare April 30, 2016 18:16
@lrytz
Copy link
Member

lrytz commented May 2, 2016

oh, I didn't see this PR before; my comment here #5124 (comment) is related.

Replacing the Option wrappers on values, with a Byte array of state
indicators, and removing the HashEntry trait,
- cuts memory usage by 10%, by not storing the Some instances, and
  another 10% by not storing the HashEntry field
- speeds put() & remove() for maps w/ >100k entries, by as much as
  50-100%, apparently by delaying memory effects

But it also
- slows put() by up to 10% for maps w/ 100 to 20k entries
- slows get() by up to 20% for maps constructed w/ put() & remove()

put() is slowed by adding reference to the separate Byte array.  get()
is slowed by adding a Some wrapper in the call; this could be avoided
in an optimized getOrElse().  Both of these are probably mitigated at
larger sizes by the reduced memory usage.

Some foreach() calls also were rewritten to while-loops for better code
generation.

This commit also modifies a test for a change in compiler mangling of
private member variable names.  Not sure why this wasn't caught in a
previous commit.
@performantdata performantdata force-pushed the topic/OpenHashMap-performance branch from f859c11 to e8c2222 Compare May 6, 2016 21:08
@performantdata
Copy link
Contributor Author

I'm going to rewrite this substantially, once #5183 is in.

@lrytz lrytz added the on-hold label Jun 1, 2016
@SethTisue
Copy link
Member

should we keep this open?

@lrytz
Copy link
Member

lrytz commented Aug 22, 2016

let's close it since there was no activity in a very long time. @performantdata feel free to re-open once you think it should be reviewed.

@lrytz lrytz closed this Aug 22, 2016
@SethTisue SethTisue removed the on-hold label Aug 22, 2016
@SethTisue SethTisue removed this from the 2.11.9 milestone Aug 22, 2016
@performantdata
Copy link
Contributor Author

There are more substantial changes that I want to do that require API changes, which requires a 2.13.x branch, since they missed 2.12.0-M5. More satisfying to cut to the chase. Was planning to treat issues like this as a backport, if anyone else cared.

@SethTisue
Copy link
Member

note that OpenHashMap is now deprecated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants