-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SI-9732 remove the use of Option from OpenHashMap #5074
SI-9732 remove the use of Option from OpenHashMap #5074
Conversation
/** The state of each of the slots in the hash table. | ||
* One of `Empty`, `Deleted` or `Occupied`. | ||
*/ | ||
private[this] var slotStates = new Array[Byte](actualInitialSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to change OpenEntry
to have a Value
+ Boolean
, rather than introducing this? On my architecture, I think you'd get that for free:
$ sbt
sbt> libraryDependencies += "org.openjdk.jol" % "jol-core" % "0.4"
sbt> consoleQuick
...
scala> println(VMSupport.vmDetails())
WARNING: Unable to attach Serviceability Agent: Error: Could not find or load main class org.openjdk.jol.util.HS_SA_Support
WARNING: VM details, e.g. object alignment, reference size, compressed references info will be guessed.
Running 64-bit HotSpot VM.
Using compressed oop with 3-bit shift.
Using compressed klass with 3-bit shift.
Objects are 8 bytes aligned.
Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
scala> ClassLayout.parseClass(Class.forName("scala.collection.mutable.OpenHashMap$OpenEntry")).toPrintable()
"scala.collection.mutable.OpenHashMap.OpenEntry object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 12 (object header) N/A
12 4 int OpenEntry.hash N/A
16 4 Object OpenEntry.key N/A
20 4 Option OpenEntry.value N/A
24 4 Object OpenEntry.next N/A
28 4 (loss due to the next object alignment)
Instance size: 32 bytes (estimated, the sample instance is not available)
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't try to add a Byte
to OpenEntry
. The thought crossed my mind, since that would preserve memory locality. But the direction that I'm headed is to eliminate the overhead of objects like OpenEntry
, to support higher capacities than a chained hash map in a JVM. Because, if this class isn't good for that, then what is it good for?
I didn't realize that OpenEntry
's inheritance from HashEntry
was adding an extra word. I assumed that it wasn't adding overhead, since it wasn't adding feature. The response to that, though, is to remove that dependence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed HashEntry
trait here, rather than open another PR. I need to retest the performance over the next couple evenings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the performance graphs in the JIRA ticket. It's gotten a bit better, and a bit worse.
What bothers me is why the Some
isn't being elided in the getOrElse(0)
call. It's sticks around also if I use the AbstractMap.getOrElse(i, 0)
. Building tests with -Yopt:l:classpath
.
if you merge the latest 2.11.x changes into this PR, it should pass validation |
12df95a
to
7258f5e
Compare
oh, I didn't see this PR before; my comment here #5124 (comment) is related. |
5fa81e6
to
f859c11
Compare
Replacing the Option wrappers on values, with a Byte array of state indicators, and removing the HashEntry trait, - cuts memory usage by 10%, by not storing the Some instances, and another 10% by not storing the HashEntry field - speeds put() & remove() for maps w/ >100k entries, by as much as 50-100%, apparently by delaying memory effects But it also - slows put() by up to 10% for maps w/ 100 to 20k entries - slows get() by up to 20% for maps constructed w/ put() & remove() put() is slowed by adding reference to the separate Byte array. get() is slowed by adding a Some wrapper in the call; this could be avoided in an optimized getOrElse(). Both of these are probably mitigated at larger sizes by the reduced memory usage. Some foreach() calls also were rewritten to while-loops for better code generation. This commit also modifies a test for a change in compiler mangling of private member variable names. Not sure why this wasn't caught in a previous commit.
f859c11
to
e8c2222
Compare
I'm going to rewrite this substantially, once #5183 is in. |
should we keep this open? |
let's close it since there was no activity in a very long time. @performantdata feel free to re-open once you think it should be reviewed. |
There are more substantial changes that I want to do that require API changes, which requires a 2.13.x branch, since they missed 2.12.0-M5. More satisfying to cut to the chase. Was planning to treat issues like this as a backport, if anyone else cared. |
note that |
See the JIRA ticket for the performance graphs.