Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JDK-8255603: Memory/Performance regression after JDK-8210985 #937

Closed
wants to merge 2 commits into from

Conversation

@RealCLanger
Copy link
Contributor

@RealCLanger RealCLanger commented Oct 29, 2020

It seems that there exists a memory/performance regression that was introduced with JDK-8210985: Update the default SSL session cache size to 20480.

The idea to limit the maixmum SSL session cache size by itself is good. Unfortunately, as per the current implementation of sun.security.util.MemoryCache, it also modifies the initial size of the LinkedHashMap that is backing the cache to initialize with more than the maximum size.

I suggest to restore the original behavior that initializes with an initialCapacity of 1 in most cases. That was true when before JDK-8210985, the property javax.net.ssl.sessionCacheSize was not set at all.
In case it was set, the initial size would have been like now, (javax.net.ssl.sessionCacheSize / 0.75f) + 1, which still seems strange.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Testing

Linux x64 Linux x86 Windows x64 macOS x64
Build ✔️ (5/5 passed) ✔️ (2/2 passed) ✔️ (2/2 passed) ✔️ (2/2 passed)
Test (tier1) ✔️ (9/9 passed) ✔️ (6/6 passed) ✔️ (2/2 passed) ✔️ (4/4 passed)

Issue

  • JDK-8255603: Memory/Performance regression after JDK-8210985

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/937/head:pull/937
$ git checkout pull/937

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Oct 29, 2020

👋 Welcome back clanger! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Oct 29, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Oct 29, 2020

@RealCLanger The following label will be automatically applied to this pull request:

  • security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the security label Oct 29, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Oct 29, 2020

Webrevs

@XueleiFan
Copy link
Member

@XueleiFan XueleiFan commented Oct 29, 2020

Did you have a benchmark with various cache sizes (for example, from 1 to 10K) and various connections (for example from 1 to 10K) for those components (including TLS implementation) that use Cache?

@RealCLanger
Copy link
Contributor Author

@RealCLanger RealCLanger commented Oct 29, 2020

Did you have a benchmark with various cache sizes (for example, from 1 to 10K) and various connections (for example from 1 to 10K) for those components (including TLS implementation) that use Cache?

Nope, we've just seen the memory regression in a certain customer use case (lot's of meory retained by a finalizer) and confirmed it resolved after setting javax.net.ssl.sessionCacheSize to 0.

But I guess this change merits certain further benchmarking to get it right.

@simonis
Copy link
Member

@simonis simonis commented Oct 29, 2020

Congratulations! You beat me about half an hour with this issue :)

I was debugging this regression already the whole day and when I searched for JDK-8210985 in my mailbox I saw your post.

@XueleiFan
Copy link
Member

@XueleiFan XueleiFan commented Oct 29, 2020

Did you have a benchmark with various cache sizes (for example, from 1 to 10K) and various connections (for example from 1 to 10K) for those components (including TLS implementation) that use Cache?

Nope, we've just seen the memory regression in a certain customer use case (lot's of meory retained by a finalizer) and confirmed it resolved after setting javax.net.ssl.sessionCacheSize to 0.

Did you have this patch checked with the customer? I think the performance may be similar or improved comparing to set the cache size to 0.

But I guess this change merits certain further benchmarking to get it right.

It looks good to me, but we may be more confident with it if there is a benchmarking.

@RealCLanger
Copy link
Contributor Author

@RealCLanger RealCLanger commented Oct 29, 2020

We're currently rolling out a patch to our SAP JVM shipment (based on Oracle's JDK 8 licensee repository) with exactly this content. We will then check with the customer but I'd suspect his results will about the same as with -Djavax.net.ssl.sessionCacheSize=0.

If you require some benchmarking I guess it'll take me some more time.

In the end I doubt that we'll find a better default value than 1 for the cache size as it's hard to predict how full a cache will be in the average. Maybe one could spend a property for the initial size - but I also doubt that's worth the effort.

I also think that for the use case in StatusResponseManager's responseCache the influence of a different initial value is neglectable.

@simonis
Copy link
Member

@simonis simonis commented Oct 29, 2020

I can confirm that JDK-8210985 can cause massive regressions in memory consumption. Also, JDK-8253116
Performance regression observed post upgrade to 8u261
is clearly a duplicate for this.

I'm fine with setting the initial cache size to 1 as this restores the original behavior for javax.net.ssl.sessionCacheSize=0.

The other possibility would be to use the default size for LinkedHashMap but that's not easy if we want to set the accessOrder because the only constructor which takes the accessOrder also requires the specification of initialCapacitiy and loadFactor. Otherwise, LinkedHashMaps capacity defaults to HashMaps default capacity which is 16.

Setting the initialCapacity to 1 will initialize the hash map with one bucket (see initialization code in HashMap).

    /**
     * Returns a power of two size for the given target capacity.
     */
    static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

Benchmarking is probably hard because we don't know the average occupancy of the map.

So in the end I think 1 is a good solution for now.

@openjdk
Copy link

@openjdk openjdk bot commented Oct 29, 2020

@RealCLanger This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8255603: Memory/Performance regression after JDK-8210985

Reviewed-by: simonis, xuelei, aph

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 11 new commits pushed to the master branch:

  • e48016b: 8255565: [Vector API] Add missing format strings for extract instructs in x86.ad
  • 2c7fc85: 8254972: Fix pretouch chunk calculations
  • d128191: 8253473: Javadoc clean up in HttpHandler, HttpPrincipal, HttpContext, and HttpsConfigurator
  • 379ba80: 8255595: delay_to_keep_mmu passes wrong arguments to Monitor wait
  • 1a89d68: 8255285: Move JVMFlag origins into a new enum JVMFlagOrigin
  • 56eb5f5: 8255466: C2 crashes at ciObject::get_oop() const+0x0
  • 5782a2a: 8254975: lambda proxy fails to access a protected member inherited from a split package
  • d5138d1: 8255604: java/nio/channels/DatagramChannel/Connect.java fails with java.net.BindException: Cannot assign requested address: connect
  • 2a2fa13: 8255449: Improve the exception message of MethodHandles::permuteArguments
  • 2a50c3f: 8241495: Make more compiler related flags available on a per method level
  • ... and 1 more: https://git.openjdk.java.net/jdk/compare/9e5bbff51d5f2c305a3bac04d4daf3a9ed319d32...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Oct 29, 2020
@XueleiFan
Copy link
Member

@XueleiFan XueleiFan commented Oct 29, 2020

Benchmarking is probably hard because we don't know the average occupancy of the map.

I agreed. No matter what the default value is, it will not fit perfectly in all situations. The value 1 may be fit for small workload applications, but not good for big workload applications. Applications could use the size setting APIs for the tuning. For this update, I think the impact for various workload may be limited/acceptable, but I'm not very sure of it. Benchmarking data with various workload would help us for a better sense.

Copy link
Member

@XueleiFan XueleiFan left a comment

No regression test.

@simonis
Copy link
Member

@simonis simonis commented Oct 29, 2020

Benchmarking is probably hard because we don't know the average occupancy of the map.

I agreed. No matter what the default value is, it will not fit perfectly in all situations. The value 1 may be fit for small workload applications, but not good for big workload applications. Applications could use the size setting APIs for the tuning. For this update, I think the impact for various workload may be limited/acceptable, but I'm not very sure of it. Benchmarking data with various workload would help us for a better sense.

But we did run with 1 for quite a long time without somebody complaining :)

@XueleiFan
Copy link
Member

@XueleiFan XueleiFan commented Oct 29, 2020

Benchmarking is probably hard because we don't know the average occupancy of the map.

I agreed. No matter what the default value is, it will not fit perfectly in all situations. The value 1 may be fit for small workload applications, but not good for big workload applications. Applications could use the size setting APIs for the tuning. For this update, I think the impact for various workload may be limited/acceptable, but I'm not very sure of it. Benchmarking data with various workload would help us for a better sense.

But we did run with 1 for quite a long time without somebody complaining :)

Yes, I think it is a safe update and looks good to me. I believe the impact should be minimal. But normally, I would like to check with a test for sure. If no regression test, an explain with noreg tag may be needed. External testing, like a confirmation of no performance regression any longer in an existing application, is fine.

I don't want to block this integration, please go ahead if you are confident with it.

@RealCLanger
Copy link
Contributor Author

@RealCLanger RealCLanger commented Oct 29, 2020

I added the noreg-hard label to the JBS bug. I'll wait for any further input til monday, before integration.

@XueleiFan would be nice if you could approve then, too :)

Copy link
Member

@XueleiFan XueleiFan left a comment

I had a benchmarking with one client and one server for full handshaking, with JMH. I did not see throughput regression with this benchmarking, and the improvement is not visible although. The benchmarking itself may be limited as it is trying to use a large volume of connections.

@@ -265,8 +265,7 @@ public MemoryCache(boolean soft, int maxSize, int lifetime) {
else
this.queue = null;

int buckets = (int)(maxSize / LOAD_FACTOR) + 1;
cacheMap = new LinkedHashMap<>(buckets, LOAD_FACTOR, true);
cacheMap = new LinkedHashMap<>(1, LOAD_FACTOR, true);
Copy link
Contributor

@theRealAph theRealAph Oct 30, 2020

This looks right. The idea of scaling the initial cache size to the maximum size seems obviously to be wrong.

@RealCLanger
Copy link
Contributor Author

@RealCLanger RealCLanger commented Nov 1, 2020

/integrate

@openjdk openjdk bot closed this Nov 1, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Nov 1, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Nov 1, 2020

@RealCLanger Since your change was applied there have been 30 commits pushed to the master branch:

  • 518ff51: 8233569: [TESTBUG] JTextComponent test bug6361367.java fails on macos
  • 4c4b8f4: 8196302: javax/swing/JFileChooser/8041694/bug8041694.java
  • f61ce32: 8255576: (fs) Files.isHidden() throws ArrayIndexOutOfBoundsException (unix)
  • fe7672b: 8196099: javax/swing/text/CSSBorder/6796710/bug6796710.java fails
  • cacce84: 8169954: JFileChooser/8021253: java.lang.RuntimeException: Default button is not pressed
  • 7597cba: 8143021: [TEST_BUG] Test javax/swing/JColorChooser/Test6541987.java fails
  • 80380d5: 8255494: PKCS7 should use digest algorithm to verify the signature
  • 9d5c9cc: 8254309: appcds GCDuringDump.java failed - class must exist
  • 36c150b: 8255489: Unify the parsing of @lambda-proxy and @lambda-form-invokers tags in a classlist
  • 0f48603: 8214561: Use {@systemProperty} for definition of "java.util.prefs.PreferencesFactory" system property
  • ... and 20 more: https://git.openjdk.java.net/jdk/compare/9e5bbff51d5f2c305a3bac04d4daf3a9ed319d32...master

Your commit was automatically rebased without conflicts.

Pushed as commit 64feeab.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk-notifier bot referenced this issue Nov 1, 2020
@RealCLanger RealCLanger deleted the jdk-8255603 branch Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
4 participants