Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8313678 - SymbolTable can leak Symbols during cleanup #15137

Closed
wants to merge 10 commits into from

Conversation

olivergillespie
Copy link
Contributor

@olivergillespie olivergillespie commented Aug 3, 2023

Fix leak in SymbolTable during cleanup.

  1. symbol1 inserted in bucket, refcount 1
  2. Decrement refcount for this symbol, refcount is now 0
  3. symbol2 inserted in same bucket
  4. during insertion, concurrentHashTable notices there is a dead entry in this bucket
    4a. This triggers delete_in_bucket, which triggers SymbolTableLookup.equals for symbol2
    4b. SymbolTableLookup.equals for symbol2 spuriously increments symbol2's refcount
  5. symbol2 is newly inserted with a refcount of 2, thus can never be cleaned up -> leak

The cleanup routine of concurrentHashTable uses the lookup function for its secondary effect of checking whether any particular value is dead or not. It does not actually care about the main purpose of the lookup function, i.e. checking if a particular value is the desired value.

The lookup function for SymbolTable, however, has the side effect of incrementing the Symbol refcount when the lookup succeeds. This is presumably with the assumption that a successful lookup will result in a newly held reference. concurrentHashTable's delete_in_bucket can break this assumption by completing a successful lookup without retaining the result (nor ever decrementing the refcount).

Thus, with a particular sequence of events (as shown in the new test case), a newly inserted Symbol can have a refcount of 2 instead of 1. Even if all legitimate references to this Symbol are removed, it will remain live, thus leaking.

The fix allows the caller of .equals to specify if they intend to use the value after the lookup, or not, allowing SymbolTable to only increment the refcount when appropriate.

I'd be happy to hear if anyone has any alternative suggestions for the fix, it seems a bit awkward/dangerous in general that SymbolTableLookup's equals increments the refcount. Also, any suggestions for implementing the test case without relying on the String.hashCode implementation would be great.

Thanks @shipilev for help debugging and fixing this :)


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8313678: SymbolTable can leak Symbols during cleanup (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/15137/head:pull/15137
$ git checkout pull/15137

Update a local copy of the PR:
$ git checkout pull/15137
$ git pull https://git.openjdk.org/jdk.git pull/15137/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 15137

View PR using the GUI difftool:
$ git pr show -t 15137

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/15137.diff

Webrev

Link to Webrev Comment

The cleanup routine of concurrentHashTable uses the lookup
function for its secondary effect of checking whether any
particular value is dead or not. It does not actually care about
the main purpose of the lookup function, i.e. checking if a particular
value is the desired value.

The lookup function for SymbolTable, however, has the side effect of
incrementing the Symbol refcount when the lookup succeeds. This is
presumably with the assumption that a successful lookup will result
in a newly held reference. concurrentHashTable's delete_in_bucket
can break this assumption by completing a successful lookup without
retaining the result (nor ever decrementing the refcount).

Thus, with a particular sequence of events (as shown in the new
test case), a newly inserted Symbol can have a refcount of 2 instead
of 1. Even if all legitimate references to this Symbol are removed,
it will remain live, thus leaking.

The fix allows the caller of .equals to specify if they intend to
use the value after the lookup, or not, allowing SymbolTable to
only increment the refcount when appropriate.
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 3, 2023

👋 Welcome back ogillespie! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 3, 2023
@openjdk
Copy link

openjdk bot commented Aug 3, 2023

@olivergillespie The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Aug 3, 2023
@mlbridge
Copy link

mlbridge bot commented Aug 3, 2023

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at this right now, and seems reasonable, but trying to think of another way other than adding this is_used_after parameter.

@olivergillespie
Copy link
Contributor Author

Thanks :) I'll be interested to hear what you think - I'm open to any suggestions.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does equals bump the ref count? That seems a very strange thing to do as there is no "opposing" operation that would decrement the ref count again. ???

@olivergillespie
Copy link
Contributor Author

Thanks for looking, David. I'm not sure. I can only guess that it was intended to cover all the possible lookup cases, so that any successful (equals only increments refcount if it matches) lookup bumps the refcount, making it impossible to forget. And it's true that the majority of the successful equals calls correspond to lookups so the refcount needs to be incremented by someone.
The alternatives I see are:

  • increment after lookup in SymbolTable
  • let the symbol table caller increment (they already manage decrementing)

But really I don't know so I'm hoping someone else has insight.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good analysis and find. I'm wondering if it would be cleaner for the LookupFunction class to have an is_dead() function to return whether the entry is dead, rather than calling equals. The call to equals is surprising in delete_in_bucket.

The reason that the Lookup class increments refcount is that it has the is_dead return, so needs to keep the entry alive at that point. Also, Symbol refcounts going to zero in a surprising manner is a source of frequent bugs. Moving that to the Get class (SymbolTableGet) might be too late - ie something may have discovered that the refcount went to zero before incrementing it.

SymbolTable::try_increment_refcount() will fail if the refcount is already zero and we don't want lookup to find that entry.

Can you see if adding a is_dead() function looks nicer here? The remove_entry() case doesn't matter because we're removing the entry anyway, I believe.

@openjdk openjdk bot removed the rfr Pull request is ready for review label Aug 9, 2023
Whitespace
@olivergillespie
Copy link
Contributor Author

Thanks Coleen, I think the dedicated is_dead function looks good. Let me know what you think.

Whitespace
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 9, 2023
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do think the is_dead() dedicated function looks like an improvement. The peek() call is correct for the WeakHandle entries. Thank you.

@openjdk
Copy link

openjdk bot commented Aug 9, 2023

@olivergillespie This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8313678: SymbolTable can leak Symbols during cleanup

Reviewed-by: coleenp, dholmes, shade

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 70 new commits pushed to the master branch:

  • f41c267: 8314045: ArithmeticException in GaloisCounterMode
  • 911d1db: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp
  • 6574dd7: 8314025: Remove JUnit-based test in java/lang/invoke from problem list
  • 207bd00: 8313756: [BACKOUT] 8308682: Enhance AES performance
  • 823f5b9: 8308850: Change JVM options with small ranges that get -Wconversion warnings to 32 bits
  • 5bfb82e: 8314119: G1: Fix -Wconversion warnings in G1CardSetInlinePtr::card_pos_for
  • 06aa3c5: 8314118: Update JMH devkit to 1.37
  • 4164693: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers.
  • 049b55f: 8314019: Add gc logging to jdk/jfr/event/gc/detailed/TestZAllocationStallEvent.java
  • a39ed10: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636
  • ... and 60 more: https://git.openjdk.org/jdk/compare/a1115a7a39438438ec247743718cdc1ec59823d6...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@coleenp, @dholmes-ora, @shipilev) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 9, 2023
Better to use the new is_dead function.
@olivergillespie
Copy link
Contributor Author

Thanks for the review Coleen. Actually @shipilev suggested that with the new is_dead function we might as well clean up the is_dead return value from .equals altogether, and I think it makes sense. Please take a look at the latest commit if you can. If you don't like it or it's not worth adding to this change, I'll revert it.

It's implemented wrong, and not sure it's worth doing anyway.
Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if it would be safe (ie, the dead test needed to be in equals) but appears safe because it's at the same time. It's a good suggestion and a nice cleanup as well.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this version. Yes, I think peeling off is_dead completely from equals is safe. The only place where it is used along with equals is ConcurrentHashTable::get_node, where we collect the "is_dead" info to trigger cleanup. The symbols that are dead stay dead, so if equals replied is_dead = false earlier for a Symbol, then it would report is_dead(sym) == false again later.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on @coleenp 's explanation of why the increment to the ref count is done, I expected to see it removed now that equals(Symbol* value) no longer cares about is_dead. I still find it very surprising that equals would have this side-effect of bumping the ref-count. At the very least there should be an explanatory comment as to why it is there and some guidance on when the caller should decrement again.

@olivergillespie
Copy link
Contributor Author

Thanks all.

@dholmes-ora :

Along with the is_dead return, Coleen also mentioned:

Moving [increment refcount] to the Get class (SymbolTableGet) might be too late - ie something may have discovered that the refcount went to zero before incrementing it.

Which I understand to mean it's safest to keep it in equals, if possible. That being said, as far as I can tell, LOOKUP_FUNC (SymbolTableLookup) and FOUND_FUNC (SymbolTableGet) do seem to always be called within the same critical section, with no is_dead cleanup in-between so maybe it would be okay with the current code to move it to FOUND_FUNC. But I suppose ensuring nobody relies on the is_dead value between those two functions in the future is tricky.

Equally I'm happy to add a comment, though I'm not sure I understand it perfectly so happy to take suggestions. How about something like:

When this returns true (i.e. the value matches this symbol, and this symbol is not dead), 
the symbol's reference count is incremented.
This is to ensure that as soon as a symbol is found in the table, it is kept alive for the 
caller - otherwise another thread could consider it dead and clean it up, before the caller 
has a chance to strongly reference it.
The caller is responsible for calling `Symbol::decrement_refcount` when finished with the symbol.

@coleenp
Copy link
Contributor

coleenp commented Aug 10, 2023

The comment seems fine. Bit verbose but that's ok. Don't move the refcount out of equals into the lookup function. When we implemented this, this was where we thought this needed to be for safety, and this code has not changed to change that.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@olivergillespie
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Aug 11, 2023
@openjdk
Copy link

openjdk bot commented Aug 11, 2023

@olivergillespie
Your change (at version dd5ee5f) is now ready to be sponsored by a Committer.

@shipilev
Copy link
Member

@coleenp, @dholmes-ora -- want to run it via your testing systems before we integrate, or?

This passes tests on our side.

@coleenp
Copy link
Contributor

coleenp commented Aug 11, 2023

I could run tests over the weekend if you want. We have some closed tests that can be displeased when Symbol refcounts unexpectedly go to zero.

@shipilev
Copy link
Member

I could run tests over the weekend if you want. We have some closed tests that can be displeased when Symbol refcounts unexpectedly go to zero.

Yes, I think that would be helpful.

@coleenp
Copy link
Contributor

coleenp commented Aug 11, 2023

It was helpful. This test failed:

make test TEST=runtime/cds/appcds/dynamicArchive/DynamicSharedSymbols.java
java.lang.RuntimeException: '17 3: jdk/test/lib/apps ' missing from stdout/stderr

It may be that we decremented a refcount in this symbol during cleanup.

Edit: yes, I think that's the case, the output has: 17 2: jdk/test/lib/apps
If I comment out delete_in_bucket, then the failure is the same (ie the refcount wasn't incremented when cleaning that bucket).

This change should fix the test too.

The test assertion relied on the buggy behaviour.
Fix the desired ref count.
@openjdk openjdk bot removed the sponsor Pull request is ready to be sponsored label Aug 14, 2023
@olivergillespie
Copy link
Contributor Author

Thanks for spotting, updated the test and it now passes

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing the test, and good idea to run them on this change. All the other testing was clean for this change.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we are good to go.

@olivergillespie
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Aug 14, 2023
@openjdk
Copy link

openjdk bot commented Aug 14, 2023

@olivergillespie
Your change (at version 35f0539) is now ready to be sponsored by a Committer.

@shipilev
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Aug 14, 2023

Going to push as commit 4b2703a.
Since your change was applied there have been 70 commits pushed to the master branch:

  • f41c267: 8314045: ArithmeticException in GaloisCounterMode
  • 911d1db: 8314078: HotSpotConstantPool.lookupField() asserts due to field changes in ConstantPool.cpp
  • 6574dd7: 8314025: Remove JUnit-based test in java/lang/invoke from problem list
  • 207bd00: 8313756: [BACKOUT] 8308682: Enhance AES performance
  • 823f5b9: 8308850: Change JVM options with small ranges that get -Wconversion warnings to 32 bits
  • 5bfb82e: 8314119: G1: Fix -Wconversion warnings in G1CardSetInlinePtr::card_pos_for
  • 06aa3c5: 8314118: Update JMH devkit to 1.37
  • 4164693: 8313372: [JVMCI] Export vmIntrinsics::is_intrinsic_available results to JVMCI compilers.
  • 049b55f: 8314019: Add gc logging to jdk/jfr/event/gc/detailed/TestZAllocationStallEvent.java
  • a39ed10: 8314116: C2: assert(false) failed: malformed control flow after JDK-8305636
  • ... and 60 more: https://git.openjdk.org/jdk/compare/a1115a7a39438438ec247743718cdc1ec59823d6...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 14, 2023
@openjdk openjdk bot closed this Aug 14, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Aug 14, 2023
@openjdk
Copy link

openjdk bot commented Aug 14, 2023

@shipilev @olivergillespie Pushed as commit 4b2703a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@olivergillespie olivergillespie deleted the 8313678 branch August 17, 2023 10:40
@olivergillespie
Copy link
Contributor Author

/backport jdk17u

@openjdk
Copy link

openjdk bot commented Aug 18, 2023

@olivergillespie Could not automatically backport 4b2703ad to openjdk/jdk17u due to conflicts in the following files:

  • src/hotspot/share/classfile/dictionary.cpp
  • src/hotspot/share/classfile/stringTable.cpp
  • src/hotspot/share/classfile/symbolTable.cpp
  • src/hotspot/share/gc/g1/g1CardSet.cpp
  • src/hotspot/share/prims/resolvedMethodTable.cpp
  • src/hotspot/share/services/finalizerService.cpp
  • src/hotspot/share/utilities/concurrentHashTable.inline.hpp
  • test/hotspot/jtreg/runtime/cds/appcds/dynamicArchive/DynamicSharedSymbols.java

Please fetch the appropriate branch/commit and manually resolve these conflicts by using the following commands in your personal fork of openjdk/jdk17u. Note: these commands are just some suggestions and you can use other equivalent commands you know.

# Fetch the up-to-date version of the target branch
$ git fetch --no-tags https://git.openjdk.org/jdk17u.git master:master

# Check out the target branch and create your own branch to backport
$ git checkout master
$ git checkout -b olivergillespie-backport-4b2703ad

# Fetch the commit you want to backport
$ git fetch --no-tags https://git.openjdk.org/jdk.git 4b2703ad39f8160264eb30c797824cc93a6b56e2

# Backport the commit
$ git cherry-pick --no-commit 4b2703ad39f8160264eb30c797824cc93a6b56e2
# Resolve conflicts now

# Commit the files you have modified
$ git add files/with/resolved/conflicts
$ git commit -m 'Backport 4b2703ad39f8160264eb30c797824cc93a6b56e2'

Once you have resolved the conflicts as explained above continue with creating a pull request towards the openjdk/jdk17u with the title Backport 4b2703ad39f8160264eb30c797824cc93a6b56e2.

@olivergillespie
Copy link
Contributor Author

/backport jdk21u

@openjdk
Copy link

openjdk bot commented Aug 21, 2023

@olivergillespie the backport was successfully created on the branch olivergillespie-backport-4b2703ad in my personal fork of openjdk/jdk21u. To create a pull request with this backport targeting openjdk/jdk21u:master, just click the following link:

➡️ Create pull request

The title of the pull request is automatically filled in correctly and below you find a suggestion for the pull request body:

Hi all,

This pull request contains a backport of commit 4b2703ad from the openjdk/jdk repository.

The commit being backported was authored by Oli Gillespie on 14 Aug 2023 and was reviewed by Coleen Phillimore, David Holmes and Aleksey Shipilev.

Thanks!

If you need to update the source branch of the pull then run the following commands in a local clone of your personal fork of openjdk/jdk21u:

$ git fetch https://github.com/openjdk-bots/jdk21u.git olivergillespie-backport-4b2703ad:olivergillespie-backport-4b2703ad
$ git checkout olivergillespie-backport-4b2703ad
# make changes
$ git add paths/to/changed/files
$ git commit --message 'Describe additional changes made'
$ git push https://github.com/openjdk-bots/jdk21u.git olivergillespie-backport-4b2703ad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
4 participants