Skip to content

8311993: Test serviceability/sa/UniqueVtableTest.java failed: duplicate vtables detected #20684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

alexmenkov
Copy link

@alexmenkov alexmenkov commented Aug 22, 2024

On Windows SA agent gets a class vtable from symbols, exported from jvm.dll (it exports symbols like "??_7" + type + "@@6b@").
But symbol lookup function first requests WinDbg about the symbol.
Sometimes WinDbg routine IDebugSymbols::GetOffsetByName() returns offset for both class and class pointer types. Returned offsets correspond to symbols like "jvm!class_name::`vftable'".
The behavior is intermittent, I was not able to find what is the reason.
The fix adds workaround for the case - if GetOffsetByName succeeded, we check if corresponding symbol contains requested one.
So it returns expected offset for non-vtable symbols like "MaxJNILocalCapacity" (GetOffsetByName returns offset for "jvm!MaxJNILocalCapacity"), but returns 0 for vtlb lookup.

Additionally added check for results of IDebugSymbols::SetImagePath/SetSymbolPath

Testing: tier1,tier2,hs-tier5-svc


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8311993: Test serviceability/sa/UniqueVtableTest.java failed: duplicate vtables detected (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20684/head:pull/20684
$ git checkout pull/20684

Update a local copy of the PR:
$ git checkout pull/20684
$ git pull https://git.openjdk.org/jdk.git pull/20684/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20684

View PR using the GUI difftool:
$ git pr show -t 20684

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20684.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 22, 2024

👋 Welcome back amenkov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@alexmenkov This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8311993: Test serviceability/sa/UniqueVtableTest.java failed: duplicate vtables detected

Reviewed-by: cjplummer, kevinw, dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 131 new commits pushed to the master branch:

  • 03ba37e: 8339169: Remove NaiveHuffman coder
  • a136a85: 8338768: Introduce runtime lookup to check for static builds
  • 9d7d85a: 8339298: Remove unused function declaration poll_for_safepoint
  • 92aafb4: 8338934: vmTestbase/nsk/jvmti/FieldWatch/TestDescription.java tests timeout intermittently
  • 392bdd5: 8339248: RISC-V: Remove li64 macro assembler routine and related code
  • 4f071ce: 8311163: Parallel: Improve large object handling during evacuation
  • b840b13: 8338882: Clarify matching order of MessageFormat subformat factory styles
  • 25e03b5: 8339115: Rename TypeKind enum constants to follow code style
  • fef1ef7: 6426678: (spec) File.createTempFile(prefix, suffix, dir) needs clarification for illegal symbols in suffix
  • a528c4b: 8339156: Use more fine-granular clang unused warnings
  • ... and 121 more: https://git.openjdk.org/jdk/compare/55851a312baaea5af14c04fb1b436313fe0deac7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Aug 22, 2024

@alexmenkov The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Aug 22, 2024
@alexmenkov alexmenkov changed the title windbg workaround JDK-8311993: Test serviceability/sa/UniqueVtableTest.java failed: duplicate vtables detected Aug 22, 2024
@alexmenkov alexmenkov changed the title JDK-8311993: Test serviceability/sa/UniqueVtableTest.java failed: duplicate vtables detected 8311993: Test serviceability/sa/UniqueVtableTest.java failed: duplicate vtables detected Aug 22, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 22, 2024
@mlbridge
Copy link

mlbridge bot commented Aug 22, 2024

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert on this code by any means, but this seems a reasonable way to tackle the problem.

Thanks.

// when requested for decorated "class" or "class*" (i.e. "??_7class@@6B@"/"??_7class*@@6B@").
// As a workaround check if returned symbol contains requested symbol.
ULONG64 disp = 0L;
char buf[512];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/512/SYMBOL_BUFSIZE/ ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, fixed

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 26, 2024
@plummercj
Copy link
Contributor

Why do we rely on GetOffsetByName() when we can already lookup directly from the dll, and this is in fact the fallback already in place when GetOffsetByName() fails?

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Aug 26, 2024
@alexmenkov
Copy link
Author

Why do we rely on GetOffsetByName() when we can already lookup directly from the dll, and this is in fact the fallback already in place when GetOffsetByName() fails?

I'm not expert in the area, but as far as I see this is caused by SA agent design - vtable for a class is considered as usual symbol and standard symbol lookup routine is used (on Windows it's GetOffsetByName and search through DLL exported symbols as a fallback).

@plummercj
Copy link
Contributor

I found this:

    // should we parse DLL symbol table in Java code or use
    // Windbg's native lookup facility? By default, we use
    // native lookup so that we can take advantage of '.pdb'
    // files, if available.
    useNativeLookup = true;
    String str = System.getProperty("sun.jvm.hotspot.debugger.windbg.disableNativeLookup");
    if (str != null) {
      useNativeLookup = false;
    }

I'm not sure what is meant by "take advantage of '.pbp' files". Is it perhaps a more reliable or complete database of symbols, or perhaps it is faster? In any case, I'd be interested in seeing if all our tests still pass when useNativeLookup is false.

@alexmenkov
Copy link
Author

I found this:

    // should we parse DLL symbol table in Java code or use
    // Windbg's native lookup facility? By default, we use
    // native lookup so that we can take advantage of '.pdb'
    // files, if available.
    useNativeLookup = true;
    String str = System.getProperty("sun.jvm.hotspot.debugger.windbg.disableNativeLookup");
    if (str != null) {
      useNativeLookup = false;
    }

I'm not sure what is meant by "take advantage of '.pbp' files". Is it perhaps a more reliable or complete database of symbols, or perhaps it is faster? In any case, I'd be interested in seeing if all our tests still pass when useNativeLookup is false.

As far as I understand .pdb files contain symbol information.
I run tier1,tier2,hs-tier5-svc with "-Dsun.jvm.hotspot.debugger.windbg.disableNativeLookup=yes" on windows-x64 and windows-x64-debug
Got 110 failures; all failures on debug build, most of them with NPE:

Error: java.lang.NullPointerException: Cannot invoke "sun.jvm.hotspot.debugger.Address.getJBooleanAt(long)" because "address" is null
java.lang.NullPointerException: Cannot invoke "sun.jvm.hotspot.debugger.Address.getJBooleanAt(long)" because "address" is null
	at jdk.hotspot.agent/sun.jvm.hotspot.runtime.VM.isSharingEnabled(VM.java:940)
	at jdk.hotspot.agent/sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:240)
        ...

@plummercj
Copy link
Contributor

  public boolean isSharingEnabled() {
    if (sharingEnabled == null) {
        Address address = VM.getVM().getDebugger().lookup(null, "UseSharedSpaces");
        if (address == null && getOS().equals("win32")) {
            // On Win32 symbols are prefixed with the dll name. So look for
            // UseSharedSpaces as a symbol in jvm.dll.
            address = VM.getVM().getDebugger().lookup(null, "jvm!UseSharedSpaces");
        }
        sharingEnabled = address.getJBooleanAt(0);
    }
    return sharingEnabled.booleanValue();
  }

I'm not sure why this is failing. Based on the existing code and comments, it seems at some point SA worked without relying on the windbg native symbol support. Code like isSharingEnabled() probably got added afterwards and was never tested with it disabled.

@sspitsyn
Copy link
Contributor

I'm not sure why this is failing. Based on the existing code and comments, it seems at some point SA worked without relying on the windbg native symbol support. Code like isSharingEnabled() probably got added afterwards and was never tested with it disabled.

It seems the isSharingEnabled() has a bug and does not always work with the optionDsun.jvm.hotspot.debugger.windbg.disableNativeLookup=yes".
So, there are two choices:

  • use this work around from Alex (with a potential overhead because of extra calls to GetOffsetByName())
  • fix the isSharingEnabled() related bug and use GetOffsetByName() instead of native lookup

I wonder if the call toGetOffsetByName() can be done with some additional check, so it is not always called.

ULONG64 disp = 0L;
char buf[SYMBOL_BUFSIZE];
memset(buf, 0, sizeof(buf));
if (ptrIDebugSymbols->GetNameByOffset(offset, buf, sizeof(buf), 0, &disp) == S_OK) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Extra space before S_OK.

@plummercj
Copy link
Contributor

Alex's changes neither introduce nor fix the isSharingEnabled() bug. It was just discovered out of an investigation to better understand how SA is doing symbol lookups on Windows. Basically we are trying to understand why it tries both windbg and dll lookups. Currently it won't work with just dll lookups, as the isSharingEnabled() failures show, but if that is the case how can dll lookups be a reliable fallback when windbg lookups fail.

@alexmenkov
Copy link
Author

alexmenkov commented Aug 28, 2024

  public boolean isSharingEnabled() {
    if (sharingEnabled == null) {
        Address address = VM.getVM().getDebugger().lookup(null, "UseSharedSpaces");
        if (address == null && getOS().equals("win32")) {
            // On Win32 symbols are prefixed with the dll name. So look for
            // UseSharedSpaces as a symbol in jvm.dll.
            address = VM.getVM().getDebugger().lookup(null, "jvm!UseSharedSpaces");
        }
        sharingEnabled = address.getJBooleanAt(0);
    }
    return sharingEnabled.booleanValue();
  }

I'm not sure why this is failing. Based on the existing code and comments, it seems at some point SA worked without relying on the windbg native symbol support. Code like isSharingEnabled() probably got added afterwards and was never tested with it disabled.

"UseSharedSpaces" is exported from jvm.dll, but the issue here on Windows SymbolLookup searches for decorated symbols (i.e. "??_7UseSharedSpaces@@6b@") and the symbol is exported undecorated.
From globalDefinition.hpp (note extern "C"):

extern "C" {
// Make sure UseSharedSpaces is accessible to the serviceability agent.
extern JNIEXPORT jboolean UseSharedSpaces;
}

Update: usually constants are available for SA by using VMStruct, I'm not sure why "UseSharedSpaces" is exported this way

@plummercj
Copy link
Contributor

What would happen if it was not extern "C". Would the windbg lookup succeed in that case?

@plummercj
Copy link
Contributor

Actually I'm a bit confused on this. The extern "C" is making it so the dll lookup fails, but is it helping the windbg lookup to succeed? If yes, why did it ever get added if it was always first succeeding with the dll lookup.

@plummercj
Copy link
Contributor

Update: usually constants are available for SA by using VMStruct, I'm not sure why "UseSharedSpaces" is exported this way

Probably the author just didn't know about the VMStruct approach.

@alexmenkov
Copy link
Author

What would happen if it was not extern "C". Would the windbg lookup succeed in that case?

without extern "C" the variable is exported as "?UseSharedSpaces@@3ea". So it wouldn't work with DLL lookup neither windbg.

@alexmenkov
Copy link
Author

Anyway failures with disableNativeLookup and UseSharedSpaces is different issue.
I can file JBS issue, but to me disableNativeLookup looks like development property.

@plummercj
Copy link
Contributor

Where do you see runtime flags in vmstructs?

@alexmenkov
Copy link
Author

Where do you see runtime flags in vmstructs?

I do not see :) I thought about constants, but they are different

@plummercj
Copy link
Contributor

I can't find any other place where we directly lookup a global symbol like this. I did find this other reference to the lookup() API:

  public String findSymbol(String symbol) {
    Address addr = lookup(null, symbol);
    if (addr == null && getOS().equals("win32")) {
      // On win32 symbols are prefixed with the dll name. Do the user
      // a favor and see if this is a symbol in jvm.dll or java.dll.
      addr = lookup(null, "jvm!" + symbol);
      if (addr == null) {
        addr = lookup(null, "java!" + symbol);
      }
    }
    ...

So it seems isSharingEnabled() could be using this instead, although that is not solving any of the problems we are seeing with non-native lookups. clhsdb findsym uses this findSymbol() API.

        new Command("findsym", "findsym name", false) {
            public void doit(Tokens t) {
                if (t.countTokens() != 1) {
                    usage();
                } else {
                    String result = VM.getVM().getDebugger().findSymbol(t.nextToken());
                    out.println(result == null ? "Symbol not found" : result);
                }
            }
        },

We have one test that does a findsym of MaxJNILocalCapacity (ClhsdbFindPC.java). I'm guessing this would fail if native lookups were disabled. Note findsym was added less than 3 years ago.

Another thing I noticed is that isSharingEnabled() used to be the following:

public boolean isSharingEnabled() {
if (sharingEnabled == null) {
Flag flag = getCommandLineFlag("UseSharedSpaces");
sharingEnabled = (flag == null)? Boolean.FALSE :
(flag.getBool()? Boolean.TRUE: Boolean.FALSE);
}
return sharingEnabled.booleanValue();
}

So it used to use getCommandLineFlag(). It was modified to use lookup() as part of https://bugs.openjdk.org/browse/JDK-8277481, which changed UseSharedSpaces from a command line flag to just a regular global. This change was done 3 years ago. getCommandLineFlag() seems to get the flag from VMStructs, which references the following:

JVMFlag* JVMFlag::flags = flagTable;
size_t JVMFlag::numFlags = (sizeof(flagTable) / sizeof(JVMFlag));

So there is no dll or windbg symbol lookup here, just access to the VMStructs database.

It seems then that theoretically the dll lookup should never be needed, but due to the bug this PR is fixing, it is needed as a fallback when windbg lookup fails to do the right thing. I wonder if there are also other types of symbols we lookup that would only be found with the dll lookup.

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 29, 2024
@alexmenkov
Copy link
Author

We have one test that does a findsym of MaxJNILocalCapacity (ClhsdbFindPC.java). I'm guessing this would fail if native lookups were disabled. Note findsym was added less than 3 years ago.

The test fails on isSharingEnabled, but I suppose it will fail to find MaxJNILocalCapacity too

It seems then that theoretically the dll lookup should never be needed, but due to the bug this PR is fixing, it is needed as a fallback when windbg lookup fails to do the right thing. I wonder if there are also other types of symbols we lookup that would only be found with the dll lookup.

As far as I understand dll lookup is needed to get vtables for classes (and I suppose it's the only case).
I don't think it's "right thing" to return offset of "jvm!class_name::vftable'" for decorated symbols (and especially for pointers). Maybe vtable lookup can be implemented by searching for "class_name::vftable'", then dll lookup (and all this exported symbols) can be dropped

Copy link
Contributor

@kevinjwalls kevinjwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it odd that we lookup symbols including the *, e.g.
Duplicate vtable: 0x00007ffd233633d8:

  • CompiledMethod (extends CodeBlob)
  • CompiledMethod* (extends null)

And iterating over agent.getTypeDataBase().getTypes(); we see e.g. jbyte, jbyte*, jbyte** which looks odd and unnecessary.

We know what a pointer is, so not needing to lookup up "jbyte*" might be good also.
But that might be more change than we want to do right now, so this double check looks good.

@alexmenkov
Copy link
Author

/integrate

@openjdk
Copy link

openjdk bot commented Sep 3, 2024

Going to push as commit a7120e2.
Since your change was applied there have been 152 commits pushed to the master branch:

  • 5ebdf2d: 8338708: Don't create/destroy debug agent cmdQueueLock for each connection
  • 130ac13: 8337265: Test static-libs build in GitHub Actions
  • cfec3ac: 8337317: serviceability/jvmti tests failed with FATAL ERROR in native method: Failed during the GetClassSignature call
  • 0d593cd: 8339419: [s390x] Problemlist compiler/c2/irTests/TestIfMinMax.java
  • c3adcb8: 8338916: Build warnings about overriding recipe for jvm-ldflags.txt
  • 66945e5: 8339336: Fix build system whitespace to adhere to coding conventions
  • ad40a12: 8339214: Remove misleading CodeBuilder.loadConstant(Opcode, ConstantDesc)
  • 4ca2c20: 8338740: java/net/httpclient/HttpsTunnelAuthTest.java fails with java.io.IOException: HTTP/1.1 header parser received no bytes
  • e0c46d5: 8325397: sun/java2d/Disposer/TestDisposerRace.java fails in linux-aarch64
  • b94c3de: 8339401: Optimize ClassFile load and store instructions
  • ... and 142 more: https://git.openjdk.org/jdk/compare/55851a312baaea5af14c04fb1b436313fe0deac7...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 3, 2024
@openjdk openjdk bot closed this Sep 3, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 3, 2024
@openjdk
Copy link

openjdk bot commented Sep 3, 2024

@alexmenkov Pushed as commit a7120e2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@alexmenkov alexmenkov deleted the sa_vtable branch September 3, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

5 participants