Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8243455: Many SA tests can fail due to trying to get the stack trace of an active method #2700

Closed
wants to merge 4 commits into from

Conversation

plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 23, 2021

Many tests run LingeredApp, get its stack with jstack, and then look for LingeredApp.main in the output, which should appear in the stack trace of the main thread. Usually this works fine since the main thread is in a loop, and usually blocked in Thread.sleep(). However, during the short period of time it wakes up from sleep, the thread's stack might not be walk-able, in which case the LingeredApp.main frame will not appear, and the tests looking for it will fail.

The fix is to introduce a new thread called SteadyStateThread that has a method called steadyState() that blocks indefinitely trying to synchronize on a locked object. All code that used to look for LingeredApp.main in the output now instead looks for LingeredApp.steadyState, which should always be there. I'm open to suggestions for a better name than "SteadyState" if you have one.

There are a few special cases with the tests that were modified that are described below:

Regarding the removal of LingeredAppWithTrivialMain.java, it was originally added to fix JDK-8229118, which was an issue with the ClhsdbFindPC test failing on aarch64 when doing the -Xcomp run. It expected LingeredApp.main() to be compiled in that case, and for the PC of the frame to be in an NMethod. However, on aarch64 an uncommon-trap was causing it to be de-optimized, so the PC was in the interpreter instead, causing the test to fail. The fix was to instead rely on finding a trivial LingeredAppWithTrivialMain.main() frame in the stack trace since it would always be compiled. With the changes to instead have (and rely on) the SteadyStateThread and the presence of LingeredApp.steadyState(), the need for LingeredAppWithTrivialMain goes away. LingeredApp.steadyState() will always be compiled, even on aarch64.

Regarding DebugdConnectTest.java, it is listed in the CR as an affected test but no specified fix has been provided.
None is needed. The issue was that it looked for LingeredApp in the jstack output, and it used to be the only place it would find it was in the main thread's stack trace, which sometimes could not be retrieved. Now it can also be found in the SteadyStateThread's stack trace, which can always be retrieved.

Regarding HeapDumpTest.java, it used to check for LingeredAppWithExtendedChars.main and now it checks for LingeredApp.steadyState. It still is run with LingeredAppWithExtendedChars. The only thing special about that class is that it contains an extended unicode character used to reproduce JDK-8028623, so it will still do that, even though it now checks for LingeredApp.steadyState.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8243455: Many SA tests can fail due to trying to get the stack trace of an active method

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2700/head:pull/2700
$ git checkout pull/2700

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 23, 2021

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 23, 2021

@plummercj The following labels will be automatically applied to this pull request:

  • core-libs
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability core-libs labels Feb 23, 2021
@plummercj
Copy link
Contributor Author

plummercj commented Feb 24, 2021

/label remove core-libs

@openjdk openjdk bot removed the core-libs label Feb 24, 2021
@openjdk
Copy link

openjdk bot commented Feb 24, 2021

@plummercj
The core-libs label was successfully removed.

@plummercj plummercj marked this pull request as ready for review Mar 3, 2021
@openjdk openjdk bot added the rfr label Mar 3, 2021
@mlbridge
Copy link

mlbridge bot commented Mar 3, 2021

Webrevs

lmesnik
lmesnik approved these changes Mar 3, 2021
@mlbridge
Copy link

mlbridge bot commented Mar 3, 2021

Mailing list message from David Holmes on serviceability-dev:

Hi Chris,

On 4/03/2021 6:09 am, Chris Plummer wrote:

Many tests run `LingeredApp`, get its stack with `jstack`, and then look for `LingeredApp.main` in the output, which should appear in the stack trace of the main thread. Usually this works fine since the main thread is in a loop, and usually blocked in `Thread.sleep()`. However, during the short period of time it wakes up from sleep, the thread's stack might not be walk-able, in which case the `LingeredApp.main` frame will not appear, and the tests looking for it will fail.

Doesn't jstack force all threads to be walkable via a safepoint or
handshake?

David
-----

@plummercj
Copy link
Contributor Author

plummercj commented Mar 3, 2021

Doesn't jstack force all threads to be walkable via a safepoint or
handshake?

Nope. If it did it couldn't debug core files or a hung process. That's why SA has so much code to safe guard against walking off the deep end when doing things like stack walking and heap scanning.

BTW, you might be thinking of the Attach API version of jstack, which does safepoint. These tests all involve the SA version of jstack.

@plummercj
Copy link
Contributor Author

plummercj commented Mar 9, 2021

Ping! I could use one more reviewer. Thanks!

// Do another short sleep so we can get into the synchronized block,
// although this probably is not necessary to guarantee that the
// stack trace is readable.
Thread.sleep(100);
Copy link
Member

@YaSuenag YaSuenag Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wait to change the state of steadyStateThread to BLOCKED? It is more robustness than Thread.sleep(100).

Copy link
Contributor Author

@plummercj plummercj Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you do that?

Copy link
Member

@YaSuenag YaSuenag Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, can we write following code?

while (steadyStateThread.getState() != Thread.State.BLOCKED) {
    Thread.onSpinWait();
}

Copy link
Contributor Author

@plummercj plummercj Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's easy enough. What about the loop before it. Do you prefer 100ms sleeps or onSpinWait() for it also.

Copy link
Member

@YaSuenag YaSuenag Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer onSpinWait() if anything because it expects short suspend, and we can expect CPU friendly code.

Copy link
Member

@YaSuenag YaSuenag Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an option, we can remove steadyStateReached and rewrite the code as following:

        steadyStateThread.setName("SteadyStateThread");
        steadyStateThread.start();

        while (steadyStateThread.getState() != Thread.State.BLOCKED) {
            Thread.onSpinWait();
        }

Copy link
Contributor Author

@plummercj plummercj Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of that, but was worried that during thread startup maybe there might be a spurious short lived period where the thread was BLOCKED. Probably not possible, but I didn't want to have to go about proving as much.

@openjdk
Copy link

openjdk bot commented Mar 9, 2021

@plummercj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8243455: Many SA tests can fail due to trying to get the stack trace of an active method

Reviewed-by: lmesnik, ysuenaga, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 213 new commits pushed to the master branch:

  • d0c1aec: 8263140: Japanese chars garble in console window in HSDB
  • 70342e8: 8262520: Add SA Command Line Debugger support to connect to debug server
  • e5ce97b: 8263206: assert(*error_msg != '\0') failed: Must have error_message while parsing -XX:CompileCommand=unknown
  • 3212f80: 8261937: LambdaForClassInBaseArchive: SimpleApp$$Lambda$1 missing
  • 2218e72: 8262486: Merge trivial JDWP agent changes from the loom repo to the jdk repo
  • 86fac95: 8263142: Delete unused entry points in libawt/libawt_xawt/libawt_headless
  • b7f0b3f: 8252173: Use handles instead of jobjects in modules.cpp
  • a6e34b3: 8262829: Native crash in Win32PrintServiceLookup.getAllPrinterNames()
  • fbe40e8: 8252399: Update mapMulti documentation to use type test pattern instead of instanceof once JEP 375 exits preview
  • 0f2402d: 8263190: Update java.io, java.math, and java.text to use instanceof pattern variable
  • ... and 203 more: https://git.openjdk.java.net/jdk/compare/539c80bfda7eb8b8ffdb0c1bff58ac226d4bb236...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Mar 9, 2021
lmesnik
lmesnik approved these changes Mar 9, 2021
@sspitsyn
Copy link
Contributor

sspitsyn commented Mar 10, 2021

Chris,

test/lib/jdk/test/lib/apps/LingeredApp.java

It is better to rename startSteadStateThread to startSteadyStateThread, otherwise it seems to be a typo.

  •    try {
    
  •        // Sleep until the thread has started running.
    
  •        while (!steadyStateReached) {
    
  •            Thread.sleep(100);
    
  •        }
    
  •        // Do another short sleep so we can get into the synchronized block,
    
  •        // although this probably is not necessary to guarantee that the
    
  •        // stack trace is readable.
    
  •        Thread.sleep(100);
    
  •    } catch (InterruptedException e) {
    
  •    }
    

It is better to remove the second sleep as it is not really needed.

Another concern is that the stack trace of this thread is going to be trivial, not sure if it is useful.
But it can be okay though if it serves trivial purposes. :)
It feels like something better is needed in general.

-Serguei

@plummercj
Copy link
Contributor Author

plummercj commented Mar 10, 2021

@sspitsyn

It is better to rename startSteadStateThread to startSteadyStateThread, otherwise it seems to be a typo.

I'll fix that.

It is better to remove the second sleep as it is not really needed.

You were looking at the first version. It was updated last night.

Another concern is that the stack trace of this thread is going to be trivial, not sure if it is useful.
But it can be okay though if it serves trivial purposes. :)
It feels like something better is needed in general.

It's no less trivial than the main thread that tests used to look for, except that previously once in a great while the thread woke up from sleep (and that usually resulted in a test failure because the stack trace could not be produced). However, I also have a PR 2720 to add a stack trace stress test.

@plummercj
Copy link
Contributor Author

plummercj commented Mar 12, 2021

/integrate

@openjdk openjdk bot closed this Mar 12, 2021
@openjdk openjdk bot added integrated and removed ready labels Mar 12, 2021
@openjdk openjdk bot removed the rfr label Mar 12, 2021
@openjdk
Copy link

openjdk bot commented Mar 12, 2021

@plummercj Since your change was applied there have been 260 commits pushed to the master branch:

  • e834f99: 8263412: ClassFileInstaller can't be used by classes outside of default package
  • bf9b5fa: 8263501: compiler/oracle/TestInvalidCompileCommand.java fails with release VMs
  • 0c8350e: 8263460: DynamicArchiveRelocationTest.java fails in product VM
  • b2f7c58: 8263055: hsdb Command Line Debugger does not properly direct output for some commands
  • ecfa712: 8263326: Remove ReceiverTypeData check from serviceability/sa/TestPrintMdo.java
  • b932a62: 8263470: Consolidate copies of getClassBytes in various tests
  • 0ea48d9: 8194129: Regression automated Test '/open/test/jdk/java/awt/Window/ShapedAndTranslucentWindows/TranslucentChoice.java' fails
  • 4b5c664: 8178348: left_n_bits(0) invokes undefined behavior
  • 0b10c6b: 8263017: Read barriers are missing in nmethod printing code
  • a6e056f: 8263125: During deoptimization vectors should reassign scalarized payload after all objects are reallocated.
  • ... and 250 more: https://git.openjdk.java.net/jdk/compare/539c80bfda7eb8b8ffdb0c1bff58ac226d4bb236...master

Your commit was automatically rebased without conflicts.

Pushed as commit 43524cc.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@plummercj plummercj deleted the 8243455-steadystate branch Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated serviceability
4 participants