Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8186958: Need method to create pre-sized HashMap #7928

Closed
wants to merge 27 commits into from

Conversation

XenoAmess
Copy link
Contributor

@XenoAmess XenoAmess commented Mar 23, 2022


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed
  • Change requires a CSR request to be approved

Issues

  • JDK-8186958: Need method to create pre-sized HashMap
  • JDK-8284377: Need method to create pre-sized HashMap (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7928/head:pull/7928
$ git checkout pull/7928

Update a local copy of the PR:
$ git checkout pull/7928
$ git pull https://git.openjdk.java.net/jdk pull/7928/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7928

View PR using the GUI difftool:
$ git pr show -t 7928

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7928.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 23, 2022

👋 Welcome back XenoAmess! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@XenoAmess
Copy link
Contributor Author

@stuart-marks

I think making these functions at Collections is slightly better than place them to their own classes.

The first step is to make such functions.

The second step is to change some usage to these functions.

@openjdk
Copy link

openjdk bot commented Mar 23, 2022

@XenoAmess The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Mar 23, 2022
@XenoAmess
Copy link
Contributor Author

XenoAmess commented Mar 23, 2022

I ran the jmh locally, and find it far better performance to use int calculations, than double.

Benchmark Mode Cnt Score Error Units
CalculateHashMapCapacityTestJMH.testCalculateHashMapCapacity1 thrpt 50 0.170 �� 0.001 ops/s
CalculateHashMapCapacityTestJMH.testCalculateHashMapCapacity2 thrpt 50 0.296 �� 0.001 ops/s
CalculateHashMapCapacityTestJMH.testCalculateHashMapCapacity3 thrpt 50 0.330 �� 0.001 ops/s

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 23, 2022
@XenoAmess
Copy link
Contributor Author

XenoAmess commented Mar 23, 2022

I do have a local test to make sure the 3 functions I provided can produce equal capacity HashMap, but I think it does not need to be added into jdk.

public class CalculateHashMapCapacityFunctionTest {

    /**
     * Calculate initial capacity for HashMap based classes, from expected size.
     *
     * @param expectedSize expected size
     * @return initial capacity for HashMap based classes.
     * @since 19
     */
    private static int calculateHashMapCapacity1(int expectedSize) {
        return (int) Math.ceil(expectedSize / 0.75);
    }

    /**
     * Calculate initial capacity for HashMap based classes, from expected size.
     *
     * @param expectedSize expected size
     * @return initial capacity for HashMap based classes.
     * @since 19
     */
    private static int calculateHashMapCapacity2(int expectedSize) {
        if (expectedSize >= 1610612736) {
            return Integer.MAX_VALUE;
        }
        return (expectedSize + (expectedSize + 2) / 3);
    }

    /**
     * Calculate initial capacity for HashMap based classes, from expected size.
     *
     * @param expectedSize expected size
     * @return initial capacity for HashMap based classes.
     * @since 19
     */
    private static int calculateHashMapCapacity3(int expectedSize) {
        if (expectedSize >= 805306368) {
            return (1 << 30);
        }
        return (expectedSize + (expectedSize + 2) / 3);
    }

    public static void main(String[] args) throws Exception {
        for (int i = Integer.MAX_VALUE; i >= 0; --i) {
            int length1 = getArrayLength(calculateHashMapCapacity1(i));
            int length2 = getArrayLength(calculateHashMapCapacity2(i));
            int length3 = getArrayLength(calculateHashMapCapacity3(i));
            if (length1 != length2 || length1 != length3) {
                throw new RuntimeException("wrong: " + i);
            }
        }
    }

    static final int MAXIMUM_CAPACITY = 1 << 30;

    public static int getArrayLength(int initialCapacity) {
        if (initialCapacity < 0) {
            throw new IllegalArgumentException("Illegal initial capacity: " +
                    initialCapacity);
        }
        if (initialCapacity > MAXIMUM_CAPACITY) {
            initialCapacity = MAXIMUM_CAPACITY;
        }
        return tableSizeFor(initialCapacity);
    }

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1);
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

}



@ChrisHegarty
Copy link
Member

/csr

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Mar 26, 2022
@openjdk
Copy link

openjdk bot commented Mar 26, 2022

@ChrisHegarty has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@XenoAmess please create a CSR request for issue JDK-8186958. This pull request cannot be integrated until the CSR request is approved.

Copy link
Member

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very nice addition. In Elasticsearch we have such API points, which are tedious to get right and test.

src/java.base/share/classes/java/util/HashMap.java Outdated Show resolved Hide resolved
@XenoAmess
Copy link
Contributor Author

XenoAmess commented Mar 27, 2022

/csr

@ChrisHegarty @liach

Hi. actually I don't know how to create a CSR request. I have no account on your internal jira.
So I write an email to csr-discuss@openjdk.java.net , but no response.
If there be anything more I should do now, please just tell me.
Thanks a lot.

@stuart-marks
Copy link
Member

I'll sponsor this PR, and I can create a CSR as well.

@XenoAmess
Copy link
Contributor Author

XenoAmess commented Mar 27, 2022

So I myself think, the functions looks good now.
The first step, to make such functions, seems completed.
Should we go into the next step now, means to change some usage to these functions in java.base?

@stuart-marks
Copy link
Member

OK, finally got some time to look at this. Here's a rewrite of the spec words, at least for HashMap::newHashMap. If this settles down, I'll write the CSR for this and LHM and WHM.

/**
 * Creates a new, empty HashMap suitable for the expected number of mappings.
 * The returned map uses the default load factor of 0.75, and its initial capacity is
 * generally large enough so that the expected number of mappings can be added
 * without resizing the map.
 *
 * @param numMappings the expected number of mappings
 * @param <K>         the type of keys maintained by this map
 * @param <V>         the type of mapped values
 * @return the newly created map
 * @throws IllegalArgumentException if numMappings is negative
 * @since 19
 */

The original wording was taken from CHM, which generally is a reasonable thing to do. Unfortunately, it's wrong. :-) "Table size" isn't accurate; HashMap uses "capacity" as the term for the number of buckets (== length of the internal table array). "Size" means "number of mappings" so its use of "table size" confuses these two concepts. The CHM wording also uses "elements" which applies to linear collections; the things inside a map are usually referred to as "mappings" or "entries". (I guess we should fix up CHM at some point too.)

While "expectedSize" isn't inaccurate, it's not tied to the main text, so I've renamed it to numMappings.

There are a couple other javadoc style rules operating here. The first sentence is generally a sentence fragment that is short and descriptive, as it will be pulled into a summary table. (It's often written as if it were a sentence that begins "This method..." but those words are elided.) Details are in the rest of the first paragraph. The text for @param, @return, and @throws are generally also sentence fragments, and they have no initial capital letter nor trailing period.

--

On performance and benchmarking: this is a distraction from the main point of this effort, which is to add an API that allows callers a correct and convenient way to create a properly-sized HashMap. Any work spent on optimizing performance is almost certainly wasted.

First, the benchmark: it's entirely unclear what this is measuring. It's performing the operation 2^31 times, but it sends the result to a black hole so that the JIT doesn't eliminate the computation. One of the actual results is 0.170 ops/sec. This includes both the operation and the black hole, so we don't actually have any idea what that result represents. Maybe it's possible to infer some idea of relative performance of the different operations by comparing the results. It's certainly plausible that the integer code is faster than the float or double code. But the benchmark doesn't tell us how long the actual computation takes.

Second, how significant is the cost of the computation? I'll assert that it's insignificant. The table length is computed once at HashMap creation time, and it's amortized over the addition of all the initial entries and subsequent queries and updates to the HashMap. Any of the computations (whether integer or floating point) are a handful of nanoseconds. This will be swamped by the first hashCode computation that causes a cache miss.

Third: I'll stipulate that the integer computation is probably a few ns faster than the floating-point computation. But the computation is entirely non-obvious. To make up for it being non-obvious, there's a big comment that explains it all. It's not worth doing something that increases performance by an insignificant amount that also requires a big explanation.

Finally, note that most of the callers are already doing a floating-point computation to compute the desired capacity, and it doesn't seem to be a problem.

Sorry, you probably spent a bunch of time on this already, but trying to optimize the performance here just isn't worthwhile. Let's please just stick with our good old (int) Math.ceil(numMappings / 0.75).

--

There should be regression tests added for the three new methods. I haven't thought much about this. It might be possible to reuse some of the infrastructure in the WhiteBoxResizeTest we worked on previously.

--

I think it would be good to include updates to some of the use sites in this PR. It's often useful to put new APIs into practice. One usually learns something from the effort. Even though this is a really simple API, looking at use sites can illuminating, to see how the code reads. This might affect the method name, for example.

You don't need to update all of the use sites in the JDK, but it would be good to choose a reasonable sample. Maybe the ones from a single package, or a handful (like java.lang or java.util). Maybe include Class::enumConstantDirectory. If this use site is updated, then maybe it will allow us to get rid of that ConstantDirectoryOptimalCapacity test that we problem-listed in the previous PR.

@XenoAmess
Copy link
Contributor Author

@stuart-marks codes and javadocs done.
usages and tests are on the way.

Copy link
Contributor

@bradfordwetmore bradfordwetmore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned something new about HashMap today...

I looked at java.security.cert and sun.security.* and that part LGTM.

That said, you need to check with @seanjmullan for the java.xml.crypto code. We try to keep the code in sync with the Apache code. As this is a new API, we probably can't push this kind of change to Apache as they need to support older releases.

@seanjmullan
Copy link
Member

Right, we generally try to avoid making too many changes to the implementation code in the java.xml.crypto module in order to stay consistent with Apache Santuario. They also would not accept this change because it is a new API and they need to run on older releases. I haven't had time yet to understand this Enhancement, but are the changes necessary for this part?

@XenoAmess
Copy link
Contributor Author

XenoAmess commented Apr 14, 2022

Are the changes necessary for this part?

@seanjmullan no, they are just performance refinement.

If you really that wanna 100% sync ,

I can use the old 1.8 api to migrate that part, and make a mirror pr to that part of https://github.com/apache/santuario-xml-security-java

Is this solution acceptable then?

@seanjmullan
Copy link
Member

Are the changes necessary for this part?

@seanjmullan no, they are just performance refinement.

If you really that wanna 100% sync ,

I can use the old 1.8 api to migrate that part, and make a mirror pr to that part of https://github.com/apache/santuario-xml-security-java

Is this solution acceptable then?

Yes, that would be preferred. Thanks!

@stuart-marks
Copy link
Member

Thanks @bradfordwetmore and @seanjmullan for looking at this, and @XenoAmess for following up quickly.

To summarize, it sounds like the only issues are with the changes to two files in the java.xml.crypto area, as those need to be maintained in sync with Apache Santuario. Right?

In both cases it looks like the HashMap is likely being under-allocated, so the fix would be to inline to capacity computation, something like new HashMap<>((int) Math.ceil(length / 0.75)) I guess.

@XenoAmess
Copy link
Contributor Author

So is there any other things we should do before calling bot to integrate?

@stuart-marks
Copy link
Member

I'd like to see a confirmation from @seanjmullan to make sure the issues with Santuario are resolved satisfactorily. Other than that I think it's pretty well covered. I've already run these changes through our internal test system and they look fine. I'll do a final recheck and let you know.

@seanjmullan
Copy link
Member

I'd like to see a confirmation from @seanjmullan to make sure the issues with Santuario are resolved satisfactorily. Other than that I think it's pretty well covered. I've already run these changes through our internal test system and they look fine. I'll do a final recheck and let you know.

I am fine with this being integrated. @XenoAmess already submitted a PR to the Santuario Project using the existing HashMap constructor which I have approved. So the code will be in sync the next time we upgrade the JDK to a newer version of Santuario.

@stuart-marks
Copy link
Member

OK, go ahead and integrate!

@XenoAmess
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Apr 18, 2022
@openjdk
Copy link

openjdk bot commented Apr 18, 2022

@XenoAmess
Your change (at version 95e22f2) is now ready to be sponsored by a Committer.

@stuart-marks
Copy link
Member

/sponsor

@stuart-marks
Copy link
Member

I've also written a release note for this change. Please review.

https://bugs.openjdk.java.net/browse/JDK-8284975

@openjdk
Copy link

openjdk bot commented Apr 19, 2022

Going to push as commit 87faa85.
Since your change was applied there have been 81 commits pushed to the master branch:

  • 41fc078: 8284112: Minor cleanup could be done in javax.crypto
  • 897d6c0: 8282008: Incorrect handling of quoted arguments in ProcessBuilder
  • ffdeb32: 8284928: Add links from SourceVersion to specific JLS versions
  • d3d71ea: 8284922: Fix some doc-comment issues on methods with package access in JDK API
  • 6e36c45: 8284923: Update description of SourceVersion.RELEASE_18
  • c63fabe: 8284935: Improve debug in java.security.jgss
  • ef25e18: 8283870: jdeprscan --help causes an exception when the locale is ja, zh_CN or de
  • 21ea740: 8284699: Include all image types to the J2DBench.ColorConvertOpTests
  • e5041ae: 8144030: [macosx] test java/awt/Frame/ShapeNotSetSometimes/ShapeNotSetSometimes.java fails (again)
  • f5beafa: 8159599: [TEST_BUG] java/awt/Modal/ModalInternalFrameTest/ModalInternalFrameTest.java
  • ... and 71 more: https://git.openjdk.java.net/jdk/compare/34914f12bee75045e686b5bbe16ec24d116533d5...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 19, 2022
@openjdk openjdk bot closed this Apr 19, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Apr 19, 2022
@openjdk
Copy link

openjdk bot commented Apr 19, 2022

@stuart-marks @XenoAmess Pushed as commit 87faa85.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org i18n i18n-dev@openjdk.org integrated Pull request has been integrated net net-dev@openjdk.org nio nio-dev@openjdk.org security security-dev@openjdk.org