Skip to content

Conversation

@eastig
Copy link
Member

@eastig eastig commented Aug 20, 2025

There is a race between JvmtiClassFileReconstituter::copy_bytecodes and InstanceKlass::link_class_impl. InstanceKlass::link_class_impl can be rewriting bytecodes. JvmtiClassFileReconstituter::copy_bytecodes will not restore them to the original ones because the flag rewritten is false. This will result in invalid bytecode.

This PR adds linking a class before the copy_bytecodes method is called.
The PR also adds a regression test.

Tested fastdebug and release builds: Linux x86_64 and arm64

  • The reproducer from JDK-8277444 passed.
  • The regression test passed.
  • Tier1 - tier3 passed.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8277444: Data race between JvmtiClassFileReconstituter::copy_bytecodes and class linking (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26863/head:pull/26863
$ git checkout pull/26863

Update a local copy of the PR:
$ git checkout pull/26863
$ git pull https://git.openjdk.org/jdk.git pull/26863/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 26863

View PR using the GUI difftool:
$ git pr show -t 26863

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26863.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 20, 2025

👋 Welcome back eastigeevich! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@eastig This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8277444: Data race between JvmtiClassFileReconstituter::copy_bytecodes and class linking

Reviewed-by: dholmes, amenkov, coleenp

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 136 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Aug 20, 2025

@eastig The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review labels Aug 20, 2025
@eastig
Copy link
Member Author

eastig commented Aug 20, 2025

Hi @coleenp,
Could you please take a look?

@mlbridge
Copy link

mlbridge bot commented Aug 20, 2025

Webrevs

@dholmes-ora
Copy link
Member

@eastig I am not sure about this one. Can you clarify please how you can be transforming a class that has not yet been linked? If this is possible then it seems to me we are missing a call to ensure linkage.

@eastig
Copy link
Member Author

eastig commented Aug 21, 2025

@eastig I am not sure about this one. Can you clarify please how you can be transforming a class that has not yet been linked? If this is possible then it seems to me we are missing a call to ensure linkage.

Hi @dholmes-ora,

I checked what was happening.

The reproducer from JDK-8277444 simplified:

  • Thread 1 does:
bigClass = Class.forName();
consumer.accept(bigClass); // Puts bigClass in QUEUE
final Object o = bigClass.getConstructor().newInstance();
System.out.println(o.hashCode());
  • Thread 2 does:
final Class<?> aClass = QUEUE.take();
Instrumentation.retransformClasses(aClass); 

Class.forName does not link bigClass. So an unlinked class is put in a queue. The linking process starts when we start using bigClass.
Thread 2 gets unlinked bigClass from the queue. It can request to retransform before we start using it and the linking process starts.

So we can have the linking process and the retransforming process running in parallel. There is no synchronization between them. We get a race condition. bigClass is big enough to make the linking process running long.

I think Class.forName does not do linking intentionally, for performance reasons.

I hope I've got everything correctly from logs and sources.

@dholmes-ora
Copy link
Member

@eastig Thank you very much for that detailed analysis. An interesting scenario. I still find it somewhat suspect that we can transform an unlinked class and wonder if we should instead ensure it is linked first, rather than trying to coordinate the two pieces of code via the use of the init_lock. ?

@eastig
Copy link
Member Author

eastig commented Aug 22, 2025

@dholmes-ora

According to

they allow flexibility in an implementation of the linking process. If I am correct, we have a "lazy" linkage strategy in Hotspot. I don't think we want to change anything here.

copy_bytecodes is only used by JVMTI:

  • JvmtiEnv::RetransformClasses
  • JvmtiEnv::GetBytecodes

I still find it somewhat suspect that we can transform an unlinked class and wonder if we should instead ensure it is linked first,

JvmtiEnv::RetransformClasses does not need a class to be linked. It needs the initial class file bytes which are the bytes passed to ClassLoader.defineClass or RedefineClasses.
JvmtiEnv::GetBytecodes returns the bytecodes that implement the method. It's not clear from the JVMTI specification whether the returned bytecodes must be the initial class bytes. The current implementation returns the initial bytecodes the same used in JvmtiEnv::RetransformClasses.

As we don't keep the initial bytecodes (do we?), copy_bytecodes cannot be called during linking, linking changes bytecodes. It can only be called for a class in the unlinked and linked states. When copy_bytecodes is called for a linked class, it restores bytecodes to the initial ones. Linking cannot be done whilst copy_bytecodes is working.

If we use init_lock, copy_bytecodes will never see a class in the linking state. Linking will never see copy_bytecodes. As linking is blocked while we are copying bytecodes, the linking time will increase. This might have negative performance impact. Retrasforming obsoletes an original version of a method, if a new version of the method is installed. So we might not notice the performance impact.

What other options do we have which don't require many changes? We have two mutually exclusive processes.

@dean-long
Copy link
Member

This seems like the correct fix. Forcing these APIs to link the class first would be a change in behavior and I assume it could cause linking-related exceptions to be thrown that the client might not expect.

@dholmes-ora
Copy link
Member

This seems like the correct fix. Forcing these APIs to link the class first would be a change in behavior and I assume it could cause linking-related exceptions to be thrown that the client might not expect.

@dean-long retransformClasses is specified to to be able to throw LinkageErrors.

@dholmes-ora
Copy link
Member

Class.forName does not link bigClass.

Just to be clear Class.forName(name, false, loader) would not initialize the class; the default form would initialize and thus link the class. The Class.forName spec does not state whether or not it actually performs full linking in the absence of initialization, though the hotspot implementation does not. Note that the preparation phase of linking must have occurred to get the Class object.

The retransformation API's are somewhat vague about the exact state of a class to be retransformed - the class must be "modifiable" but that seems to be treated as a static rather than dynamic property ie. primitives, arrays, hidden classes are not modifiable - any other type of class instance is.

Bottom line is that I don't think any of the specifications help us here, so we need to look at implementation practicalities.

The linking code, uses the init_lock and assumes it cannot be interfered with in any way. I don't know either pieces of code well enough to say whether we could devise a lock-free protocol that would handle the current case; or whether we could use a VM mutex around the critical part of the linking code?

I don't like JVM TI having to know anything about the init_lock and I don't like seeing another place where we acquire the init_lock via an ObjectLocker, as that does not play nicely with the work to avoid pinning for virtual threads.

I'd like to hear from others more knowledgeable than myself in this area, unfortunately @coleenp is on vacation and won't be back till next week.

Comment on lines 1000 to 1004
// Method bytecodes can be rewritten during linking.
// Whilst the linking process rewriting bytescodes,
// is_rewritten() returns false. So we won't restore the original bytecodes.
// We hold a lock to guarantee we are not getting bytecodes
// at the same time the linking process are rewriting them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Method bytecodes can be rewritten during linking.
// Whilst the linking process rewriting bytescodes,
// is_rewritten() returns false. So we won't restore the original bytecodes.
// We hold a lock to guarantee we are not getting bytecodes
// at the same time the linking process are rewriting them.
// We acquire the init_lock monitor to serialize with class linking so we are not getting
// bytecodes at the same time the linking process is rewriting them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard from Coleen and she seems okay with this approach; and Patricio doesn't think it will have an impact on the virtual thread work - so that eases my concerns. I have some minor requested changes whilst we wait for a serviceability review.

Thanks

Comment on lines 1005 to 1006
Handle h_init_lock(Thread::current(), mh->method_holder()->init_lock());
ObjectLocker ol(h_init_lock, JavaThread::current());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Handle h_init_lock(Thread::current(), mh->method_holder()->init_lock());
ObjectLocker ol(h_init_lock, JavaThread::current());
JavaThread* current = JavaThread::current();
Handle h_init_lock(current, mh->method_holder()->init_lock());
ObjectLocker ol(h_init_lock, current);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dean-long
Copy link
Member

@dean-long retransformClasses is specified to to be able to throw LinkageErrors.

I see that in java.lang.instrument, but the JVMTI spec says "In response to this call, no events other than the ClassFileLoadHook event will be sent." Normally during linking we send the ClassPrepare event.

@dholmes-ora
Copy link
Member

@dean-long retransformClasses is specified to to be able to throw LinkageErrors.

I see that in java.lang.instrument, but the JVMTI spec says "In response to this call, no events other than the ClassFileLoadHook event will be sent." Normally during linking we send the ClassPrepare event.

To have a Class object it must have already undergone the "preparation" part of linking (it is done at the end of loading when we create the class mirror).

@eastig
Copy link
Member Author

eastig commented Aug 26, 2025

@dholmes-ora Thank you for review and the suggestions.

@dean-long
Copy link
Member

(I realize this is a tangent, but maybe there is a separate bug here...)

To have a Class object it must have already undergone the "preparation" part of linking (it is done at the end of loading when we create the class mirror).

If that's what "preparation" means, and Class.forName(name, false, loader) does not link the class, then posting the event here in InstanceKlass::link_class_impl() instead of earlier seems misplaced:

JvmtiExport::post_class_prepare(THREAD, this);

Class.forName(name, false, loader) would result in a prepared class but without the event being sent.

Copy link

@alexmenkov alexmenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks good from serviceability perspective

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 26, 2025
@dholmes-ora
Copy link
Member

If that's what "preparation" means, and Class.forName(name, false, loader) does not link the class, then posting the event here in InstanceKlass::link_class_impl() instead of earlier seems misplaced

@dean-long it seems the JVM TI "prepare" event is somewhat misnamed. It states "At this point, class fields, methods, and implemented interfaces are available, and no code from the class has been executed." but that neither describes preparation, nor the rest of linking. You can only access fields and methods after a class has been initialized! But lets take this elsewhere if needed.

@eastig
Copy link
Member Author

eastig commented Aug 27, 2025

@eastig
Copy link
Member Author

eastig commented Aug 27, 2025

Hi @dholmes-ora,
Are there any other action items/changes required?

@jaikiran
Copy link
Member

MacOS test failures are known

The virtual thread test that is failing in this PR's GitHub actions job is java/lang/Thread/virtual/stress/GetStackTraceALotWhenBlocking#id0 It has this:

javatestOS=Mac OS X 14.7.6 (aarch64)
javatestVersion=6.0-ea+b24-2025-08-18-${BUILT_FROM_COMMIT}
jtregVersion=jtreg 7.5.2 dev 0
...
#section:main
----------messages:(10/424)----------
command: main GetStackTraceALotWhenBlocking 100000
reason: User specified action: run main/othervm/timeout=300 GetStackTraceALotWhenBlocking 100000 
started: Tue Aug 26 14:24:40 UTC 2025
Mode: othervm [/othervm specified]
Additional options from @modules: --add-modules jdk.management
Process id: 21219
Timeout information:
--- Timeout information end.
finished: Tue Aug 26 14:47:22 UTC 2025
elapsed time (seconds): 1361.83
----------configuration:(3/42)----------
Boot Layer
  add modules: jdk.management

----------System.out:(1190/56017)----------
2025-08-26T14:24:41.601528Z => 48 of 100000
...
2025-08-26T14:44:37.436452Z => 99703 of 100000
2025-08-26T14:44:38.451885Z => 99752 of 100000
2025-08-26T14:44:39.474953Z => 99787 of 100000
Timeout signalled after 1200 seconds
2025-08-26T14:44:40.477403Z => 99833 of 100000
2025-08-26T14:44:41.478976Z => 99886 of 100000
2025-08-26T14:44:42.487173Z => 99931 of 100000
2025-08-26T14:44:43.495950Z => 99975 of 100000
2025-08-26T14:44:44.055060Z => 100000 of 100000
2025-08-26T14:44:44.055248Z VirtualThread[#27]/runnable@ForkJoinPool-1-worker-3 => 3904532796 loops
2025-08-26T14:44:44.055381Z VirtualThread[#23]/runnable@ForkJoinPool-1-worker-1 => 3293339644 loops
----------System.err:(1/15)----------
STATUS:Passed.
...
test result: Error. Program `/Users/runner/work/jdk/jdk/bundles/jdk/jdk-26.jdk/Contents/Home/bin/java' timed out (timeout set to 1200000ms, elapsed time including timeout handling was 1361819ms).

It looks like it's taking long to complete and that's causing the test timeout. I recollect that this test failure was addressed some time back, so this appears to be a new occurrence. In any case, this failure doesn't look related to the changes in this PR because I see some other PRs having failed with this same issue. I'll check and file an issue later today/tomorrow (unless anyone else gets to it first).

@coleenp
Copy link
Contributor

coleenp commented Aug 29, 2025

Does it fail with the patch? Sorry for the delay. @dholmes-ora and I have been discussing how all this works offline, but with time-zone differences, we won't have any agreement until next week. I wrote a more limited version of the patch I sent and am testing it now.

@coleenp
Copy link
Contributor

coleenp commented Aug 29, 2025

Okay I ran the test case in the issue and I see why it wouldn't be reliable. I verified it with the new more minimalistic patch here: #26971

@eastig
Copy link
Member Author

eastig commented Aug 29, 2025

Okay I ran the test case in the issue and I see why it wouldn't be reliable. I verified it with the new more minimalistic patch here: #26971

Thank you for the minimalistic patch.

I now have a version of the jtreg test which fails more reliably.
The test always passes the the previous version of #26971.

BTW I have updated the title of JDK-8277444.
The issue is the data race between copy_bytecodes and class linking. So we must guarantee no data race when copy_bytecodes is used.

@eastig
Copy link
Member Author

eastig commented Aug 29, 2025

I'll add the test to the PR soon.

@eastig eastig changed the title 8277444: Race condition on Instrumentation.retransformClasses() and class linking 8277444: Data race between JvmtiClassFileReconstituter::copy_bytecodes and class linking Aug 29, 2025
@eastig
Copy link
Member Author

eastig commented Aug 29, 2025

@coleenp,

I have pulled your changes.
I added a guarantee check to copy_bytecodes. IMO it's better to be overcautious to prevent incorrect uses of copy_bytes. Because of it I had to add link_class to GetBytecodes.

I don't use the macros because they rely on THREAD. It is a variable in your patch but it is usually used as a macro.

@jaikiran
Copy link
Member

jaikiran commented Sep 2, 2025

The virtual thread test that is failing in this PR's GitHub actions job is java/lang/Thread/virtual/stress/GetStackTraceALotWhenBlocking#id0
It looks like it's taking long to complete and that's causing the test timeout. I recollect that this test failure was addressed some time back, so this appears to be a new occurrence. In any case, this failure doesn't look related to the changes in this PR because I see some other PRs having failed with this same issue. I'll check and file an issue later today/tomorrow (unless anyone else gets to it first).

I've filed https://bugs.openjdk.org/browse/JDK-8366669 to track this failure.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of minor comments but otherwise looks good. Is the test now reliable? Thank you for adding a test.

#include "prims/jvmtiClassFileReconstituter.hpp"
#include "runtime/handles.inline.hpp"
#include "runtime/signature.hpp"
#include "runtime/synchronizer.hpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this include anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

if (current_thread->has_pending_exception()) {
current_thread->clear_pending_exception();
return JVMTI_ERROR_INVALID_CLASS;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the pattern:

    JavaThread* THREAD = current_thread;
    ... link_class(THREAD);
    if (HAS_PENDING_EXCEPTION)
      etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to macros.

Copy link
Contributor

@coleenp coleenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. Thank you for your patience while we found the right solution.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 3, 2025
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good. There are a few minor nits/typos.

I'm not sure about the test, in particular the attempt to calculate MIN_LINK_TIME_MS. It is very hard to see/know that you will actually induce the desired race. But as the test doesn't actually have any explicit failure conditions, it at least won't generate false reports.

Thanks

InstanceKlass* ik = InstanceKlass::cast(klass);
if (ik->get_cached_class_file_bytes() == nullptr) {
// Link the class to avoid races with the rewriter. This will call the verifier also
// on the class. Linking is done already below in VM_RedefineClasses below, but we need
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// on the class. Linking is done already below in VM_RedefineClasses below, but we need
// on the class. Linking is also done in VM_RedefineClasses below, but we need

There are two "below"s

* This test puts the linking process in one thread and the retransforming process
* in another thread. The test uses Class.forName("BigClass", false, classLoader)
* which does not link the class. When the class is used, the linking process starts.
* In another thread retransforming of the class is happening,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* In another thread retransforming of the class is happening,
* In another thread retransforming of the class is happening.

* in another thread. The test uses Class.forName("BigClass", false, classLoader)
* which does not link the class. When the class is used, the linking process starts.
* In another thread retransforming of the class is happening,
* We generate a class with big methods. A number of methods and thier size are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* We generate a class with big methods. A number of methods and thier size are
* We generate a class with big methods. A number of methods and their size are

* which does not link the class. When the class is used, the linking process starts.
* In another thread retransforming of the class is happening,
* We generate a class with big methods. A number of methods and thier size are
* chosen to make the linking and retransforming processes running concurrently.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* chosen to make the linking and retransforming processes running concurrently.
* chosen to make the linking and retransforming processes run concurrently.


private static final Object LOCK = new Object();
private static final int COUNTER_INC_COUNT = 2000; // A number of 'c+=1;' statements in methods of a class.
private static final int MIN_LINK_TIME_MS = 60; // This time is chosen to be big enough the linking and retransforming processes are running in parallel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final int MIN_LINK_TIME_MS = 60; // This time is chosen to be big enough the linking and retransforming processes are running in parallel.
private static final int MIN_LINK_TIME_MS = 60; // Large enough so linking and retransforming run in parallel.

return InMemoryJavaCompiler.compile(className, classSrc);
}

// We calculate a number of methods the linking time to exceed MIN_LINK_TIME_MS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite parse this sentence.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 5, 2025
@eastig
Copy link
Member Author

eastig commented Sep 5, 2025

@dholmes-ora, Thank you for review.

I'm not sure about the test, in particular the attempt to calculate MIN_LINK_TIME_MS. It is very hard to see/know that you will actually induce the desired race.

Making a reliable regression test was a problem.
MIN_LINK_TIME_MS is based on the reproducer from JDK-8277444 and my experiments on x86 and arm64 hosts.
Maybe the time is not the best. As you wrote there is no way to know the race has happened.

@coleenp
Copy link
Contributor

coleenp commented Sep 5, 2025

If the test becomes problematic because of timing, we could have another change to make it /manual.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing further from me. If there are issues with the test we will address them as needed.

Thanks

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 9, 2025
@eastig
Copy link
Member Author

eastig commented Sep 10, 2025

Thank you, everyone, for reviewing.

@eastig
Copy link
Member Author

eastig commented Sep 10, 2025

/integrate

@openjdk
Copy link

openjdk bot commented Sep 10, 2025

Going to push as commit 46ae1ee.
Since your change was applied there have been 167 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 10, 2025
@openjdk openjdk bot closed this Sep 10, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 10, 2025
@openjdk
Copy link

openjdk bot commented Sep 10, 2025

@eastig Pushed as commit 46ae1ee.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

6 participants