-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8259886 : Improve SSL session cache performance and scalability #2255
8259886 : Improve SSL session cache performance and scalability #2255
Conversation
Hi @djelinski, welcome to this OpenJDK project and thanks for contributing! We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user djelinski" as summary for the issue. If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing |
@djelinski The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
/covered |
Thank you! Please allow for a few business days to verify that your employer has signed the OCA. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated! |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build change looks good, but I would like to hear from @cl4es too.
@djelinski This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 545 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@erikj79, @XueleiFan) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Adding an |
The impact could beyond the JSSE implementation, andI will have a look as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get the patch right, the benchmark performance improvement is a trade-off between CPU and memory, by keeping expired entries while putting a new entry in the cache. I'm not very sure of the performance impact on memory and GC collections. Would you mind add two more types of benchmark: get the entries and remove the entries, for cases that there are 1/10, 1/4, 1/3 and 1/2 expired entries in the cache? And increase the size to some big scales, like 2M and 20M.
It looks like a spec update as it may change the behavior of a few JDK components (TLS session cache, OCSP stapling response cache, cert store cache, certificate factory, etc), because of "expired entries are no longer guaranteed to be removed before live ones". I'm not very sure of the impact. I may suggest to file a CSR and have more eyes to check the compatibility impact before moving forward.
Not exactly. The memory use is capped by cache size. The patch is a trade off between the cache's hit/miss ratio and CPU; we will get faster cache access at the cost of more frequent cache misses. All calls to |K=1, exp=11|K=2, exp=12|K=3, exp=13|K=4, exp=14| If we now add another item to the queue, it will overflow. Here's where the implementations behave differently, but the outcome is identical: old one scans the entire list for expired entries; new one improves performance by ending the search for expired entries after encountering the first non-expired entry (which is the first entry in the above example). The end result is the same in both cases - oldest (least recently accessed) item is dropped: |K=2, exp=12|K=3, exp=13|K=4, exp=14|K=5, exp=15| Example 2: insertions at a moderate pace, with interleaved reads. Here the new implementation improves performance, but at a possible cost of wasting cache capacity on expired entries. Consider a cache with size=4, timeout=7, and the following sequence of events: |K=3, exp=10|K=5, exp=12|K=7, exp=14|K=1, exp=8|
If we wait for item with K=1 to expire and then add another item to the queue, it will overflow. Here's where the implementations behave differently, and the outcome is different: old one scans the entire list for expired entries, finds entry with K=1 and drops it; new one gives up after first non-expired entry (which is the first entry), and drops the first entry. So, when we perform: Old implementation will get: New implementation will get: Note that:
Example 3: insertions at a slow pace, where most items expire before queue overflows. Here the new implementation improves memory consumption. Consider a cache with size=4, timeout=1, and the following sequence of events: |K=1, exp=2(expired)|K=3, exp=4(expired)|K=5, exp=6(expired)|K=7, exp=8| New implementation removes expired entries on every put, so after the last put only one entry is in the cache: |K=7, exp=8| After another put the old implementation will encounter a cache overflow and remove all expired items. Let me know if that helped.
Both these operations are constant-time, both before and after my changes. Do you expect to see some oddities here, or do we just want a benchmark that could be used to compare other implementations?
Can do. Do you think it makes sense to also benchmark the scenario where GC kicks in and collects soft references?
Of all uses of Cache, only How do I file a CSR? Also, what do you think about the changes done in |
Added benchmarks for get & remove. Added tests for 5M cache size. Switched time units to nanoseconds. Results:
|
Thank you for the comment. The big picture is more clear to me now.
K=3 is not expired yet, but get removed, while K=1 is kept. This behavior change may cause more overall performance hurt than improving the cache put/get performance. For example, it need to grab the value remotely. A full handshake or OCSP status grabbing could counteract all the performance gain with the cache update.
I think the idea that put() remove expired items from the front of the queue is good. I was wondering if it is an option to have the get() method that removed expired items until the 1st un-expired item, without scan the full queue and change the order of the queue. But there is still an issue that the SoftReference may have clear an item, which may be still valid. In general, I think the get() performance is more important than put() method, as get() is called more frequently. So we should try to keep the cache small if possible.
In the update, the SoftReference.clear() get removed. I'm not sure of the impact of the enqueued objects any longer. In theory, it could improve the memory use, which could counteract the performance gain in some situation.
See above, it is a concern to me that the soft reference cannot be cleared with this update.
Could you edit the bug: https://bugs.openjdk.java.net/browse/JDK-8259886? In the more drop down menu, there is a "Create CSR" option. You can do it if we have an agreement about the solution and impact. |
Thanks for your review! Some comments below.
Yes, but that's unlikely. Note that K=3 is before K=1 in the queue only because 3 wasn't used since 1 was last used. This means that either K=3 is used less frequently than K=1, or that all cached items are in active use. In the former case we don't lose much by dropping K=3 (granted, there's nothing to offset that). In the latter we are dealing with full cache at all times, which means that most
If we do that, frequently used entries will be evicted at the same age as never used ones. This means we will have to recompute (full handshake/fresh OCSP) both the frequently used and the infrequently used entries. It's better to recompute only the infrequently used ones, and reuse the frequently used as long as possible - we will do less work that way.
I don't see the link; could you explain?
That's the best part: no objects ever get enqueued! We only called
I'd need an account on the bug tracker first. |
So, how do we want to proceed here? Is the proposed solution acceptable? If not, what needs to change? if yes, what do I need to do next? |
I may think it differently. It may be hard to know the future frequency of an cached item based on the past behaviors. For the above case, I'm not sure that K=3 is used less frequently than K=1. Maybe, next few seconds, K=1 could be more frequently. I would like a solution to following the timeout specification: keep the newer items if possible.
See above. It may be true for some case to determine the frequency, but Cache is a general class and we may want to be more careful about if we are really be able to determine the frequency within the Cache implementation.
link? Did you mean the link to get() method? It is a method in the Cache class.
I need more time for this section.
Okay. No worries, I will help you if we could get an agreement about the update. |
For me, it is a pretty good solution, but I have some concerns. I appreciate if you would like to read my comment and see if we could have an agreement. |
I agree that such prediction might not be 100% accurate. But, quick google search reveals that there are many articles that claim that LRU caches offer better hit rates than FIFO, especially for in-memory caches.
That's a trivial change; all we need to do is change I could keep LRU and add another linked list that would store items in the order of their expiration dates; then we could quickly scan that list for expired items. Note: the order of expiration dates is not necessarily the order of insertion, because 1) |
Well, if removing all expired items before evicting live ones is a non-negotiable, implementing all operations in constant time is much easier with FIFO, where we only need to keep one item order.
tier1 and jdk_security tests passed; benchmark results show only minimal changes. I verified that none of the classes using |
Actually there's a much easier solution to reduce the number of slow |
Definitely, I think it is a good improvement. Actually, it is a surprise to me that the current code is not working this way. Sorry, I was/am on vacation, and the review could be delayed for a few days. |
I reverted all earlier Cache changes, and added a new commit that caches the earliest expire time of all cached items. The observable behavior of the new code is identical to original - items are removed from cache at exactly the same time as before; we only skip scanning the cache when we know that there are no expired items inside. The performance is substantially improved. There can be at most My reduced set of benchmarks produced the following values:
which is comparable to what was observed with the previous commits. |
ping @XueleiFan, I'd appreciate another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also update the copyright year to 2021. Otherwise, looks good to me. Thank you!
/integrate |
@djelinski |
/sponsor |
@XueleiFan @djelinski Since your change was applied there have been 548 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 18fc350. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Under certain load, MemoryCache operations take a substantial fraction of the time needed to complete SSL handshakes. This series of patches improves performance characteristics of MemoryCache, at the cost of a functional change: expired entries are no longer guaranteed to be removed before live ones. Unused entries are still removed before used ones, and cache performance no longer depends on its capacity.
First patch in the series contains a benchmark that can be run with
make test TEST="micro:CacheBench"
.Baseline results before any MemoryCache changes:
there's a nonlinear performance drop between 20480 and 204800 entries, probably attributable to CPU cache thrashing. Beyond 204800 entries the cache scales more linearly.
Benchmark results after the 2nd and 3rd patches are pretty similar, so I'll only copy one:
The third patch improves worst-case times on a mostly idle cache by scattering removal of expired entries over multiple
put
calls. It does not affect performance of an overloaded cache.The 4th patch removes all code that clears cached values before handing them over to the GC. This comment stated that clearing values was supposed to be a GC performance optimization. It wasn't. Benchmark results after that commit:
I wasn't expecting that much of an improvement, and don't know how to explain it.
The 40ns difference between cache with and without a timeout can be attributed to 2
System.currentTimeMillis()
calls; they were pretty slow on my VM.Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/2255/head:pull/2255
$ git checkout pull/2255