Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8292201: serviceability/sa/ClhsdbThreadContext.java fails with "'Thread "Common-Cleaner"' missing from stdout/stderr" #10090

Closed
wants to merge 5 commits into from

Conversation

plummercj
Copy link
Contributor

@plummercj plummercj commented Aug 30, 2022

While dumping all registers (and doing a findpc on each), the following exception occurred for r8:

 r8: 0x000000750e4fdffc
Error: java.lang.ArrayIndexOutOfBoundsException: Index 4099 out of bounds for length 4096
java.lang.ArrayIndexOutOfBoundsException: Index 4099 out of bounds for length 4096
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.Page.getLong(Page.java:182)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.PageCache.getLong(PageCache.java:100)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readCInteger(DebuggerBase.java:364)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgDebuggerLocal.readAddress(WindbgDebuggerLocal.java:312)
at jdk.hotspot.agent/sun.jvm.hotspot.debugger.windbg.WindbgAddress.getAddressAt(WindbgAddress.java:71)
at jdk.hotspot.agent/sun.jvm.hotspot.utilities.PointerFinder.find(PointerFinder.java:58)
at jdk.hotspot.agent/sun.jvm.hotspot.runtime.JavaThread.printThreadContextOn(JavaThread.java:493)

This exception is the result of using PointerFinder ("findpc") on invalid address that starts on the last 32-bit word of a page. "page" in this case is referring to a page in SA's PageCache, which are always 4k in size. Note findpc is frequently done using an invalid address. In this test case it is being called on each x64 register, which could contain anything. findpc relies on getting some sort of AddressException when this happens, and will then say the address points to something that is unknown. However, in the case of the address pointing to the last 32-bot word of a page, it results in an ArrayIndexOutOfBoundsException when the page cache tries to read past the end of the page. This is happening in Page.getLong(), which you can see in the stack trace.

The main culprit here is some weakening of the alignment checks being done. The alignment check should have resulted in an UnalignedAddressException long before we ever got to Page.getLong(). However, we have the following override, which is allowing the unaligned address to pass the alignment check.

           public void checkAlignment(long address, long alignment) {
             // Need to override default checkAlignment because we need to
             // relax alignment constraints on Bsd/x86
             if ( (address % alignment != 0)
                &&(alignment != 8 || address % 4 != 0)) {
                throw new UnalignedAddressException(
                        "Trying to read at address: "
                      + addressValueToString(address)
                      + " with alignment: " + alignment,
                        address);
             }
           }
        };

This allows a pointer to a 64-bit value to only be 32-bit aligned. But there's more to it than that. I modified ClhsdbFindPC.java to fetch a tid (a thread address), and did a findpc on the address OR'd with 0xffc. findpc uses PointerFinder. This forced a read of a 64-bit value that starts in the last 32-bits of a page. It passed on linux-x64 but failed on windowx-x64 in the same manner described in the description of this CR. The difference between the two implementations is that windows relies on the default implementation of DebuggerBase.readCInteger() whereas linux has an override. DebuggerBase.readCInteger() does the following:

  public long readCInteger(long address, long numBytes, boolean isUnsigned) {
    utils.checkAlignment(address, numBytes);
    if (useFastAccessors) {
      if (isUnsigned) {
        switch((int) numBytes) {
        case 1: return cache.getByte(address) & 0xFF;
        case 2: return cache.getShort(address, bigEndian) & 0xFFFF;
        case 4: return cache.getInt(address, bigEndian) & 0xFFFFFFFFL;
        case 8: return cache.getLong(address, bigEndian);
...

There is an alignment check here, but it is the "relaxed" override shown above, which allows 64-bit addresses on 32-bit boundaries. The override in LinuxDebuggerLocal is:

    /** Need to override this to relax alignment checks on x86. */
    public long readCInteger(long address, long numBytes, boolean isUnsigned)
        throws UnmappedAddressException, UnalignedAddressException {
        // Only slightly relaxed semantics -- this is a hack, but is
        // necessary on x86 where it seems the compiler is
        // putting some global 64-bit data on 32-bit boundaries
        if (numBytes == 8) {
            utils.checkAlignment(address, 4);
        } else {
            utils.checkAlignment(address, numBytes);
        }
        byte[] data = readBytes(address, numBytes);
        return utils.dataToCInteger(data, isUnsigned);

Although there is a relaxed alignment check here also, the code that reads from the address does not assume all the bytes are on the same page. It uses readBytes() intead of cache.getLong(), so it won't run into the PageCache issue with reading a 64-bit value that starts in the last 32-bit word of a page.

I think the introduction of these relaxed alignment checks has a muddled history, probably made more muddled by ports cloning code that maybe wasn't necessary. Probably also initial fixes (the relaxed alignment check) seemed to work at first, but later the PageCache issue was discovered, leading to readBytes() workaround in the LinuxDebuggerLocal.readCInteger() override, but was not also done on other ports (so we got this CR for windows).

For 64-bit support it's clear this easing of the 64-bit alignment is not needed, and getting rid of it would result in the proper UnalignedAddressException being thrown. The question is whether it is still needed for 32-bit x86, and if so is it needed on all ports.

I can't test linux-x86, so I can't tell if it still allows 64-bit values on 32-bit aligned addresses, so for now I'll assume it does. So the approach being taken is whenever an address of a 64-bit type points to the last 32-bit word of a page, use readBytes() to get the 64-bit value one byte at a time. It still uses the page cache in the end, but it doesn't try to get all 8 bytes from the same page. Note for 64-bit systems, the conditoin will never arise because of the removal of the relaxed alignment check. Instead there will be an UnalignedAddressException at an early point when the alignment check is made.

One windfall of this change is now we always read 64-bit values from the page cache in a way that is much more efficient than reading a byte at a time. This has resulted in about a 25% performance improvement in a test I have that does a heap dump.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8292201: serviceability/sa/ClhsdbThreadContext.java fails with "'Thread "Common-Cleaner"' missing from stdout/stderr"

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/10090/head:pull/10090
$ git checkout pull/10090

Update a local copy of the PR:
$ git checkout pull/10090
$ git pull https://git.openjdk.org/jdk pull/10090/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 10090

View PR using the GUI difftool:
$ git pr show -t 10090

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/10090.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 30, 2022

👋 Welcome back cjplummer! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8292201 8292201: serviceability/sa/ClhsdbThreadContext.java fails with "'Thread "Common-Cleaner"' missing from stdout/stderr" Aug 30, 2022
@openjdk
Copy link

openjdk bot commented Aug 31, 2022

@plummercj this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout 8292201_sa_alignment
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Aug 31, 2022
@openjdk
Copy link

openjdk bot commented Aug 31, 2022

@plummercj The following labels will be automatically applied to this pull request:

  • core-libs
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Aug 31, 2022
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 1, 2022
@mlbridge
Copy link

mlbridge bot commented Sep 1, 2022

Webrevs

@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Sep 1, 2022
@@ -43,27 +43,24 @@
public class RemoteDebuggerClient extends DebuggerBase implements JVMDebugger {
private RemoteDebugger remoteDebugger;
private RemoteThreadFactory threadFactory;
private boolean unalignedAccessesOkay = false;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting rid of the readCInteger and readJLong overrides below, unalignedAccessesOkay is no longer needed. The DebuggerBase superclass understands the alignment requirements.

@AlanBateman
Copy link
Contributor

/label remove core-libs

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Sep 1, 2022
@openjdk
Copy link

openjdk bot commented Sep 1, 2022

@AlanBateman
The core-libs label was successfully removed.

Comment on lines +241 to +242
long pageMask = ~(pageSize - 1);
if ((address & pageMask) != ((address + 4) & pageMask)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit over-complicated.
Maybe something like
long pageMask = pageSize - 1; if ((address & pageMask) > (pageSize - 8)) { return false; }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tinkered with this code quite a bit before I had something that both worked and I liked. I don't know that your approach is actually any simpler, and at the moment I still prefer mine, but that might just be because I'm more familiar with it. However, it does seem that there should be a more elegant way of doing this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code works only for the exact case - address of 64bit value points to the last 32bit of the page.

My idea is to use more generic approach - check that the whole value is in the page, i.e. this is special case of more common

boolean canUsePageCacheForRead(long address, int bits) {
   long pageMask = pageSize - 1;
   return ((address & pageMask) <= (pageSize - bits/8));
}

But I have no strong objections, feel free to keep your version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code works only for the exact case - address of 64bit value points to the last 32bit of the page.
That's the only case we need to worry about. Any others would have already failed the alignment check.

@openjdk
Copy link

openjdk bot commented Sep 1, 2022

@plummercj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8292201: serviceability/sa/ClhsdbThreadContext.java fails with "'Thread "Common-Cleaner"' missing from stdout/stderr"

Reviewed-by: amenkov, sspitsyn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 29 new commits pushed to the master branch:

  • ac05bc8: 8293293: Move archive heap loading code out of heapShared.cpp
  • e1e6732: 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if
  • 5757e21: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out
  • 3993a1f: 8292067: Convert test/sun/management/jmxremote/bootstrap shell tests to java version
  • 83a3408: 8293315: Add back logging for Placeholders
  • b6477fd: 8293288: bootcycle build failure after JDK-8173605
  • 0c6094e: 8293188: x86_64: Introduce stubGenerator_x86_64.hpp
  • 2baeebb: 8293006: sun/tools/jhsdb/JStackStressTest.java fails with "UnalignedAddressException: 8baadbabe"
  • da99e3e: 8289400: Improve com/sun/jdi/TestScaffold error reporting
  • 77e21c5: 8290529: C2: assert(BoolTest(btest).is_canonical()) failure
  • ... and 19 more: https://git.openjdk.org/jdk/compare/07616de00c3e1c305852fcc44df8dadafd0dbf3f...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 1, 2022
if (address % 4 == 0) {
return;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indent above has to be 2.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me.
Thanks,
Serguei

@plummercj
Copy link
Contributor Author

Thanks Alex and Serguei!

/integrate

@openjdk
Copy link

openjdk bot commented Sep 3, 2022

Going to push as commit 767262e.
Since your change was applied there have been 30 commits pushed to the master branch:

  • a366e82: 7113208: Incorrect javadoc on java.net.DatagramPacket.setLength()
  • ac05bc8: 8293293: Move archive heap loading code out of heapShared.cpp
  • e1e6732: 8293319: [C2 cleanup] Remove unused other_path arg in Parse::adjust_map_after_if
  • 5757e21: 8292385: assert(ctrl == kit.control()) failed: Control flow was added although the intrinsic bailed out
  • 3993a1f: 8292067: Convert test/sun/management/jmxremote/bootstrap shell tests to java version
  • 83a3408: 8293315: Add back logging for Placeholders
  • b6477fd: 8293288: bootcycle build failure after JDK-8173605
  • 0c6094e: 8293188: x86_64: Introduce stubGenerator_x86_64.hpp
  • 2baeebb: 8293006: sun/tools/jhsdb/JStackStressTest.java fails with "UnalignedAddressException: 8baadbabe"
  • da99e3e: 8289400: Improve com/sun/jdi/TestScaffold error reporting
  • ... and 20 more: https://git.openjdk.org/jdk/compare/07616de00c3e1c305852fcc44df8dadafd0dbf3f...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 3, 2022
@openjdk openjdk bot closed this Sep 3, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 3, 2022
@openjdk
Copy link

openjdk bot commented Sep 3, 2022

@plummercj Pushed as commit 767262e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@plummercj plummercj deleted the 8292201_sa_alignment branch October 11, 2022 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
4 participants