New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8186958: Need method to create pre-sized HashMap #7928
Conversation
|
I think making these functions at Collections is slightly better than place them to their own classes. The first step is to make such functions. The second step is to change some usage to these functions. |
@XenoAmess The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
I ran the jmh locally, and find it far better performance to use int calculations, than double. Benchmark Mode Cnt Score Error Units |
Webrevs
|
I do have a local test to make sure the 3 functions I provided can produce equal capacity HashMap, but I think it does not need to be added into jdk.
|
test/jdk/java/util/Collections/CalculateHashMapCapacityTestJMH.java
Outdated
Show resolved
Hide resolved
/csr |
@ChrisHegarty has indicated that a compatibility and specification (CSR) request is needed for this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice addition. In Elasticsearch we have such API points, which are tedious to get right and test.
Hi. actually I don't know how to create a CSR request. I have no account on your internal jira. |
I'll sponsor this PR, and I can create a CSR as well. |
So I myself think, the functions looks good now. |
OK, finally got some time to look at this. Here's a rewrite of the spec words, at least for HashMap::newHashMap. If this settles down, I'll write the CSR for this and LHM and WHM.
The original wording was taken from CHM, which generally is a reasonable thing to do. Unfortunately, it's wrong. :-) "Table size" isn't accurate; HashMap uses "capacity" as the term for the number of buckets (== length of the internal table array). "Size" means "number of mappings" so its use of "table size" confuses these two concepts. The CHM wording also uses "elements" which applies to linear collections; the things inside a map are usually referred to as "mappings" or "entries". (I guess we should fix up CHM at some point too.) While "expectedSize" isn't inaccurate, it's not tied to the main text, so I've renamed it to numMappings. There are a couple other javadoc style rules operating here. The first sentence is generally a sentence fragment that is short and descriptive, as it will be pulled into a summary table. (It's often written as if it were a sentence that begins "This method..." but those words are elided.) Details are in the rest of the first paragraph. The text for -- On performance and benchmarking: this is a distraction from the main point of this effort, which is to add an API that allows callers a correct and convenient way to create a properly-sized HashMap. Any work spent on optimizing performance is almost certainly wasted. First, the benchmark: it's entirely unclear what this is measuring. It's performing the operation 2^31 times, but it sends the result to a black hole so that the JIT doesn't eliminate the computation. One of the actual results is 0.170 ops/sec. This includes both the operation and the black hole, so we don't actually have any idea what that result represents. Maybe it's possible to infer some idea of relative performance of the different operations by comparing the results. It's certainly plausible that the integer code is faster than the float or double code. But the benchmark doesn't tell us how long the actual computation takes. Second, how significant is the cost of the computation? I'll assert that it's insignificant. The table length is computed once at HashMap creation time, and it's amortized over the addition of all the initial entries and subsequent queries and updates to the HashMap. Any of the computations (whether integer or floating point) are a handful of nanoseconds. This will be swamped by the first hashCode computation that causes a cache miss. Third: I'll stipulate that the integer computation is probably a few ns faster than the floating-point computation. But the computation is entirely non-obvious. To make up for it being non-obvious, there's a big comment that explains it all. It's not worth doing something that increases performance by an insignificant amount that also requires a big explanation. Finally, note that most of the callers are already doing a floating-point computation to compute the desired capacity, and it doesn't seem to be a problem. Sorry, you probably spent a bunch of time on this already, but trying to optimize the performance here just isn't worthwhile. Let's please just stick with our good old -- There should be regression tests added for the three new methods. I haven't thought much about this. It might be possible to reuse some of the infrastructure in the WhiteBoxResizeTest we worked on previously. -- I think it would be good to include updates to some of the use sites in this PR. It's often useful to put new APIs into practice. One usually learns something from the effort. Even though this is a really simple API, looking at use sites can illuminating, to see how the code reads. This might affect the method name, for example. You don't need to update all of the use sites in the JDK, but it would be good to choose a reasonable sample. Maybe the ones from a single package, or a handful (like java.lang or java.util). Maybe include Class::enumConstantDirectory. If this use site is updated, then maybe it will allow us to get rid of that ConstantDirectoryOptimalCapacity test that we problem-listed in the previous PR. |
@stuart-marks codes and javadocs done. |
src/java.xml/share/classes/com/sun/org/apache/xalan/internal/xsltc/dom/DocumentCache.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I learned something new about HashMap today...
I looked at java.security.cert and sun.security.* and that part LGTM.
That said, you need to check with @seanjmullan for the java.xml.crypto code. We try to keep the code in sync with the Apache code. As this is a new API, we probably can't push this kind of change to Apache as they need to support older releases.
Right, we generally try to avoid making too many changes to the implementation code in the java.xml.crypto module in order to stay consistent with Apache Santuario. They also would not accept this change because it is a new API and they need to run on older releases. I haven't had time yet to understand this Enhancement, but are the changes necessary for this part? |
src/java.xml.crypto/share/classes/org/jcp/xml/dsig/internal/dom/DOMXPathFilter2Transform.java
Outdated
Show resolved
Hide resolved
@seanjmullan no, they are just performance refinement. If you really that wanna 100% sync , I can use the old 1.8 api to migrate that part, and make a mirror pr to that part of https://github.com/apache/santuario-xml-security-java Is this solution acceptable then? |
Yes, that would be preferred. Thanks! |
Thanks @bradfordwetmore and @seanjmullan for looking at this, and @XenoAmess for following up quickly. To summarize, it sounds like the only issues are with the changes to two files in the In both cases it looks like the HashMap is likely being under-allocated, so the fix would be to inline to capacity computation, something like |
So is there any other things we should do before calling bot to integrate? |
I'd like to see a confirmation from @seanjmullan to make sure the issues with Santuario are resolved satisfactorily. Other than that I think it's pretty well covered. I've already run these changes through our internal test system and they look fine. I'll do a final recheck and let you know. |
I am fine with this being integrated. @XenoAmess already submitted a PR to the Santuario Project using the existing |
OK, go ahead and integrate! |
/integrate |
@XenoAmess |
/sponsor |
I've also written a release note for this change. Please review. |
Going to push as commit 87faa85.
Your commit was automatically rebased without conflicts. |
@stuart-marks @XenoAmess Pushed as commit 87faa85. |
Progress
Issues
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7928/head:pull/7928
$ git checkout pull/7928
Update a local copy of the PR:
$ git checkout pull/7928
$ git pull https://git.openjdk.java.net/jdk pull/7928/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7928
View PR using the GUI difftool:
$ git pr show -t 7928
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7928.diff