-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8317510: Change Windows debug symbol files naming to avoid losing info when an executable and a library share the same name #16039
Conversation
…fo when an executable and a library share the same name
👋 Welcome back fthevenet! A progress list of the required criteria for merging this PR into |
The following build scenarios where tested on Windows, and the symbols name manually verified for correctness:
For sanity check, I also ran the following on Linux (where we're not expecting to see any differences):
|
@fthevenet The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
I tried to build this internally and hit this error:
Looks like the gtest build is failing. |
@@ -81,13 +81,11 @@ endif | |||
ifneq ($(CMDS_DIR), ) | |||
DEPS += $(call FindFiles, $(CMDS_DIR)) | |||
ifeq ($(call isTargetOs, windows)+$(SHIP_DEBUG_SYMBOLS), true+public) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ifeq ($(call isTargetOs, windows)+$(SHIP_DEBUG_SYMBOLS), true+public) | |
ifeq ($(call isCompiler, microsoft)+$(SHIP_DEBUG_SYMBOLS), true+public) |
stripped pdbs (and pdbs in general) are only ever used with the VS toolchain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the point of this change, but it is not directly related to the issue addressed here (i.e. this condition wasn't introduced in this PR.).
Should it be included in the PR anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, hold on for a while
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @fthevenet, such a change should be separate from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll retract my suggestion then, and include this change after this is integrated on my own
@@ -81,13 +81,11 @@ endif | |||
ifneq ($(CMDS_DIR), ) | |||
DEPS += $(call FindFiles, $(CMDS_DIR)) | |||
ifeq ($(call isTargetOs, windows)+$(SHIP_DEBUG_SYMBOLS), true+public) | |||
# For public debug symbols on Windows, we have to use stripped pdbs, rename them | |||
# and filter out a few launcher pdbs where there's a lib that goes by the same name | |||
# For public debug symbols on Windows, we have to use stripped pdbs and rename them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# For public debug symbols on Windows, we have to use stripped pdbs and rename them | |
# For public debug symbols on VS, we have to use stripped pdbs and rename them |
I have added a basic test that verifies that symbols can be resolved by the internal jdk tooling that makes use of them, even after the name change for the .pdb files. |
Adding a test seems nice, but I will have to defer to Hotspot people for reviewing the validity, placement and tier/group inclusion of the new test. My interpretation is that with the current placement and TEST.groups it will be part of tier1. |
Yes, of course. |
OK, so unsurprisingly I was indeed wrong... The test ran as part or tier1 in the GHA checks - and failed. |
They will run in tier1. Test looks good. Cheers, Thomas |
Looks like symbol resolution of the crash address failed:
looking at the output of the crashing process, it just prints out a PC, but not the expected frame:
where we expected to see a function name. |
According to the GHA workflow, tier1 test are run with a jdk build using the following
The documentation isn't much help in advising what the default is in this case, but if it is equivalent to |
Verified that the test passes in our internal build and test environment (which runs hotspot tests in tier1 on debug builds). |
I would expect the default to be to include pdbs on Windows ( |
The default for all platforms is none on a static build, and external otherwise jdk/make/autoconf/jdk-options.m4 Line 262 in 053f456
|
P.S. I suggest changing the comment in the new test that states "where the function names are available even with no symbols". This is not very accurate, the reason it works is because the symbols are inside libjvm.so itself, unlike with Microsoft Visual C which places all debug information (including symbol names) into the pdb files entirely. (For instance, if you compile the Windows Java VM with gcc, the symbol names would also be inside the VM itself, and not in the debug files) |
From what I can tell, the test fails because of an unrelated issue with the "test-prebuilt" target used to run the test by GHA. According to the logs for the failed test, path to the symbol folder "/d/a/jdk/jdk/bundles/symbols/jdk-22/fastdebug" is passed explicitly via the env variable
But looking into the hs_err report, we see the path for the symbols folder is not considered by the symbol engine:
That would still imply that the pdb files were stripped from the jdk bundle that is uploaded by the build task, or they would be picked by the symbol engine from there. Right now, I don't know if this is indeed the case, and if so caused by the renaming of pdb files in this PR. |
I must confess that I'm not very well versed in the github actions configuration and execution model, nor our implementation. In our internal build-and-test system, we provide |
If it isn't too much of a bother, could you please verify that the path you pass via SYMBOLS_IMAGE_DIR is indeed listed in the symbol engine search path in the resulting hs_err produced as part of the test? Command line: hs_err: Also please note that if pdb files are present in the folder that you pass via JDK_IMAGE_DIR, the symbol engine will likely pick them up from there even if the path set in SYMBOLS_IMAGE_DIR is empty or incorrect (and therefore the test will pass). |
All that to say that it looks like there might be a bug in Apologies if I'm just shooting in the dark here, but at a cursory glance, I notice that in RunTestsPrebuilt.gmk, SYMBOLS_IMAGE_DIR isn't checked for validity like JDK_IMAGE_DIR or TEST_IMAGE_DIR are; do you know why that is? Although from what I understand, it wouldn't transform the path, it would have caught the problem, wouldn't it? |
We aren't checking SYMBOLS_IMAGE_DIR because we consider it optional. Tests are expected to be able to run without it, you just get less information on crashes. _NT_SYMBOL_PATH is setup with Windows style paths based on SYMBOLS_IMAGE_DIR here: Line 63 in 80232b7
In our run-test-prebuilt scenario, our JDK_IMAGE_DIR does not have any pdb files. The _NT_SYMBOLS_PATH setup was introduced to provide pdb files for symbol lookup in hs_err files, so I'm fairly certain that part works. |
I see, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code itself looks good. We just need to figure out why it is breaking on GHA.
@fthevenet This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 10 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@magicus, @erikj79) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
Once integrated, this is going to spam everyone on GitHub with Windows test failures until the issue is fixed separately :/ |
I don't think @fthevenet plans on checking in a broken test. He will figure this out or disable the test. |
Indeed, I see no reason to rush the integration for this before we've resolved the failing test. I can confirm that outside of the context of GHA, the Meanwhile, I have instrumented the GHA test workflow as @magicus suggested and waiting for the results. |
@fthevenet Did the instrumentation give anything? |
@magicus As a matter of fact, it did yield some interesting resutls, but then I got quite caught up with the release earlier this week and this and didn't find the time to update this thread. The listing of the folder that is passed as a parameter to the test run shows it contains the pdb files with the expected names (e.g. jvm.dll.pdb). You can see an example of that in the "Check symbols" step in the run below: I also added a print out of the values for SYMBOL_PATH and _NT_SYMBOL_PATH when they are first assigned (see fthevenet@5655ca7#diff-041bf69ea79b333b9ce99c1f879e398d698538530a35c361500b72631f059233R70), but to my surprise I could not see those in the test run logs from GHA, while are indeed printed when I run the test locally. One notable difference I noticed, is that I run all my local tests using cygwin, while GHA uses MSYS2; could this explain anything? |
Yes, msys2/cygwin are sufficiently different that it could possibly explain this. |
The root cause for this test failing turns out to be indeed related to MSYS2 ; namely checks made to determine whether or not we're running on Windows return "false", so all Windows specific code is ignored. I opened https://bugs.openjdk.org/browse/JDK-8318669 to make the logic that auto detects the target OS when running pre-built test in "RunTestsPrebuilt.gmk" aware of MSYS2, but I am now wondering if it might not be best to just set OPENJDK_TARGET_OS explicitly to "windows" in the command line that launch the tests? |
I finally opted to address the underlying issue by patching RunTestsPrebuilt.gmk, rather than GHA; #16343. As for this PR, I see two possible ways forward; one is to remove the test and integrate the change without it as part of the current PR, and add the test back in a follow up once the RunTestsPrebuilt patch is integrated. I like the first one better; a few more steps but overall less fussy. I'm also open to another solution. |
+1 Vote for first option. |
You don't have to rebase, in fact, you should not rebase an open PR. Just merge from master, once the msys fix is in. Your fix for JDK-8318669 was simple; just fix the "else ifeq" as Erik suggests, and you're good to go. So just delay pushing this a bit more -- you don't have to move it to Draft either. I think that is preferable, since it will keep the test with this PR, where it belongs. |
Test now passes in GHA. |
/integrate |
@fthevenet |
/sponsor |
Going to push as commit d96f38b.
Your commit was automatically rebased without conflicts. |
@erikj79 @fthevenet Pushed as commit d96f38b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Aw, I wanted to sponsor this one in particular :( Oh well, it's great that it's integrated now! |
When building OpenJDK on Windows using "--with-native-debug-info=external", the resulting debug symbols are saved in files located in the same folder as the corresponding executable or library and named by swapping the extension ".exe" or ".dll" for a ".pdb" one (or "diz" if option "--with-native-debug-info=zipped" is used), which means that in the event of an exe and a dll file sharing the same target folder and file name (e.g.
bin\java.exe
andbin\java.dll
), we have to choose whether symbols inbin\java.pdb
will refer to the exe or the dll; we can't have both.This PR addresses this issue by adopting a different naming strategy for the resulting symbol files where we keep the full name of every file - including its
dll
orexe
extension) and then add the appropriate.pdb
,.map
or.diz
extension .For instance,
jvm.dll
symbols are no longer calledjvm.pdb
but insteadjvm.dll.pdb
. Similarly, it is nowjvm.dll.diz
when using zipped symbols, and "jvm.dll.stripped.pdb" for stripped symbols (i.e. when "--with-external-symbols-in-bundles=public" is used).The PR also removes the existing filtering for java.pdb, jimage.pdb and jpackage.pdb used to guaranty the dll symbols were bundled over the ones from the exe, since we no longer need that.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16039/head:pull/16039
$ git checkout pull/16039
Update a local copy of the PR:
$ git checkout pull/16039
$ git pull https://git.openjdk.org/jdk.git pull/16039/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 16039
View PR using the GUI difftool:
$ git pr show -t 16039
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16039.diff
Webrev
Link to Webrev Comment