Skip to content

Conversation

@fandreuz
Copy link
Contributor

@fandreuz fandreuz commented Sep 14, 2025

The problem seems to be in read_lib_segments (ps_core.c), this check is too harsh:

if ((existing_map->memsz != page_size) &&
(existing_map->fd != lib_fd) &&
(ROUNDUP(existing_map->memsz, page_size) != ROUNDUP(lib_php->p_memsz, page_size))) {

In my run, existing_map->memsz = 0xe24000, while the rhs in L425 is 0xe23000. According to the NT_FILE entry, this segment of libjvm.so has file offset 0x67f000. It seems that the linker aligned it down according to the page size (0x1000). The offset of the same segment according to readelf -l libjvm.so is 0x67fc80. This additional offset should be added to p_memsz to obtain the 0xe24000, which we see in the core dump.

I added some files to the ticket for context.

Passes tier1 and tier2.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8367609: serviceability/sa/ClhsdbPmap.java fails when built with Clang (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27274/head:pull/27274
$ git checkout pull/27274

Update a local copy of the PR:
$ git checkout pull/27274
$ git pull https://git.openjdk.org/jdk.git pull/27274/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27274

View PR using the GUI difftool:
$ git pr show -t 27274

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27274.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 14, 2025

👋 Welcome back fandreuzzi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 14, 2025

@fandreuz This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8367609: serviceability/sa/ClhsdbPmap.java fails when built with Clang

Reviewed-by: kevinw, cjplummer

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 49 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kevinjwalls, @plummercj) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Sep 14, 2025

@fandreuz The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org rfr Pull request is ready for review labels Sep 14, 2025
@mlbridge
Copy link

mlbridge bot commented Sep 14, 2025

@kevinjwalls
Copy link
Contributor

Hi - Do you still have the same core? Can you attach a "readelf -a " output from it in the jbs issue with the other (helpful) files? Would like to see how things get mapped.

It's interesting to me at the moment that in a random build of mine with gcc, libjvm has e.g.

                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000022a5a50 0x00000000022a5a50  R E    0x1000

..so loadable text is just part of the mapping at the base address, no offset.

A core of that contains:

  LOAD           0x000000003431b000 0x00007f1305a15000 0x0000000000000000
                 0x00000000022a6000 0x00000000022a6000  R E    0x1000

But your clang build has the text with some offset and vaddr:
https://bugs.openjdk.org/secure/attachment/116147/libjvm_sections.txt

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x000000000067fc80 0x0000000000680c80 0x0000000000680c80
                 0x0000000000e225c0 0x0000000000e225c0  R E    0x1000

..so does that appear as a distinct PH in the core?

@fandreuz
Copy link
Contributor Author

fandreuz commented Sep 17, 2025

Hi @kevinjwalls, I don't have the same core. I ran the test another time, you find the new files attached to the ticket (those starting with 2).

This is the corresponding entry in the new core:

  LOAD           0x0000000006b23000 0x00007fa9ff881000 0x0000000000000000
                 0x0000000000e23000 0x0000000000e23000  R E    0x1000

@kevinjwalls
Copy link
Contributor

Thanks - does that still have the same problem?
(Do you have the jtreg log from this one, to confirm I was looking at the right address here...)

libjvm second seg:
core:

Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000006b23000 0x00007fa9ff881000 0x0000000000000000
                 0x0000000000e23000 0x0000000000e23000  R E    0x1000

libjvm binary:

LOAD           0x0000000000680180 0x0000000000681180 0x0000000000681180
               0x0000000000e226c0 0x0000000000e226c0  R E    0x1000

These figures look like they work, for that check which has been working for gcc builds:

ROUNDUP(existing_map->memsz, page_size) != ROUNDUP(lib_php->p_memsz, page_size)

left: core file memsize: 0x0000000000e23000
right: lib mem size 0x0000000000e226c0 rounds to 0xe23000

Was core 1 different? Looks like a slightly smaller libjvm in that run (mem size was 0e225c0 and the core contained a 0xe24000 size mapping, which was the problem).

@fandreuz
Copy link
Contributor Author

Hi @kevinjwalls, I think that might be my fault. I rerun the test and I noticed the failure this time is not in libjvm.so, but in libjimage.so:

hsdb> ERROR: address conflict @ 0x7fa9ff0e4440 (existing map size = 102400, size = 97328, flags = 5)

% info proc mappings
[...]
0x00007fa9ff0e4000 0x00007fa9ff0fd000 0x19000            0x8000             /home/fandreuz/code/jdk/build/clang/images/jdk/lib/libjimage.so 

I think this happens by chance. I added the files from the new round to the ticket.

@kevinjwalls
Copy link
Contributor

failure this time is not in libjvm.so, but in libjimage.so:

Thanks looking again - do you still have the files, as libjimage_sections.txt ... has the Section Headers, not the Program Headers (readelf -a is fine 8-) )

@fandreuz
Copy link
Contributor Author

fandreuz commented Sep 19, 2025

Hi @kevinjwalls, I just uploaded 3libjimage_all.txt, thanks again for looking into this PR

@kevinjwalls
Copy link
Contributor

kevinjwalls commented Sep 23, 2025

Thanks for the additional info.

If I ignore what's said so far and start again, I see the following...  (anyone should feel free to correct, we aren't in this
area every day!...)

hsdb> ERROR: address conflict @ 0x7fa9ff0e4440 (existing map size = 102400, size = 97328, flags = 5)

          print_error("address conflict @ 0x%lx (existing map size = %ld, size = %ld, flags = %d)\n",
                        target_vaddr, existing_map->memsz, lib_php->p_memsz, lib_php->p_flags);

core has no mapping at exactly 0x7fa9ff0e4440 but has:

  Start                 End               Page Offset
  0x00007fa9ff0db000  0x00007fa9ff0e4000  0x0000000000000000 /home/fandreuz/code/jdk/build/clang/images/jdk/lib/libjimage.so  (0x9000 size) 0x841c rounded up
  0x00007fa9ff0e4000  0x00007fa9ff0fd000  0x0000000000000008 /home/fandreuz/code/jdk/build/clang/images/jdk/lib/libjimage.so (0x19000 size) the existing map size from the error
  

The error is having a size 0x17c30 mapping that should go at 0x00007fa9ff0e4000
That is the second LOAD phdr from libjimage.

The check which has been working for gcc builds:
ROUNDUP(existing_map->memsz, page_size) != ROUNDUP(lib_php->p_memsz, page_size)

102400 	= 0x19000 is mem size, page size aligned. The core has a mapping of this size.
97328	= 0x17c30 libjimage has this, which rounds up to only 0x18000

libjimage is providing too little data? 
But target vaddr 0x7fa9ff0e4440 is offset into the actual segment 0x00007fa9ff0e4000 by  0x440  (1088 bytes)

0x17c30 + 0x440 = 0x18070 which rounds up to the wanted 0x19000


src/jdk.hotspot.agent/linux/native/libsaproc/ps_core.c

399       uintptr_t target_vaddr = lib_php->p_vaddr + lib_base;
400       map_info *existing_map = core_lookup(ph, target_vaddr);

(Existing code expects target_vaddr and existing_map->vaddr to be exactly equal, not to see target_vaddr being
anything other than 0x1000 aligned.)

Maybe we:
check if target_vaddr > existing_map->vaddr and add any difference to the library mem size we compare?


429             (ROUNDUP(existing_map->memsz, page_size) != ROUNDUP(lib_php->p_memsz, page_size))) {

+        uint64_t lib_memsz = lib_php->p_memsz; // check type
+        if (target_vaddr > existing_map->vaddr) {
+            lib_memsz += (target_vaddr - existing_map->vaddr);
+        }
+        lib_memsz = ROUNDUP(lib_memsz, page_size);
+
         if ((existing_map->memsz != page_size) &&
             (existing_map->fd != lib_fd) &&
-            (ROUNDUP(existing_map->memsz, page_size) != ROUNDUP(lib_php->p_memsz, page_size))) {
+            (ROUNDUP(existing_map->memsz, page_size) != lib_memsz)) {


This is kind of similar to what you had, but maybe the target_vaddr varies and becomes offset even more
than pagesize at some point?

Curious if that works for clang builds.  In my builds it doesn't change anything, we don't hit
target_vaddr > existing_map->vaddr

@fandreuz
Copy link
Contributor Author

Hi @kevinjwalls, what you say makes sense to me. I tried the updated check on my environment and it fixed the problem as well as what I initially proposed in this PR.

I updated the PR too, the new check looks cleaner than what I had.

Copy link
Contributor

@kevinjwalls kevinjwalls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK glad it works. 8-)

Maybe a comment above line 423 so say something like:
// Account for the PH being at some vaddr offset from mapping in core file.

We could have left the ROUNDUP on line 427 to the comparison on line 431 to make it more like the original. I think either way is good.

Good for me, let's check Chris who looked at it also agrees.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 29, 2025
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 29, 2025
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 29, 2025
Comment on lines +423 to +428
// Account for the PH being at some vaddr offset from mapping in core file.
uint64_t lib_memsz = lib_php->p_memsz;
if (target_vaddr > existing_map->vaddr) {
lib_memsz += target_vaddr - existing_map->vaddr;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change looks good, but just want to make sure I'm understanding it properly. Kevin commented the following:

hsdb> ERROR: address conflict @ 0x7fa9ff0e4440 (existing map size = 102400, size = 97328, flags = 5)

          print_error("address conflict @ 0x%lx (existing map size = %ld, size = %ld, flags = %d)\n",
                        target_vaddr, existing_map->memsz, lib_php->p_memsz, lib_php->p_flags);

core has no mapping at exactly 0x7fa9ff0e4440 but has:

  Start                 End               Page Offset
  0x00007fa9ff0db000  0x00007fa9ff0e4000  0x0000000000000000 /home/fandreuz/code/jdk/build/clang/images/jdk/lib/libjimage.so  (0x9000 size) 0x841c rounded up
  0x00007fa9ff0e4000  0x00007fa9ff0fd000  0x0000000000000008 /home/fandreuz/code/jdk/build/clang/images/jdk/lib/libjimage.so (0x19000 size) the existing map size from the error

So the problem here is that we were expecting 0x00007fa9ff0e4000 but got 0x7fa9ff0e4440. Basically target_vaddr is at an unexpected offset from existing_map->vaddr. The fix is to ad this offset to lib_memsz so the error is no longer triggered. Do we understand why this offset is happening?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we understand why this offset is happening?

It looks like the binary is just built differently, clang or the linker or some combination arranges things differently to what we have seen from gcc.

Maybe it's enough to say there doesn't have to be a 1:1 mapping from program headers to segments in the process.
We can have a core file segment that contains more than one program header from the library (that 0x19000 size segment at the libjimage base address).

So, new question:

we must have been through this loop already, in the libjimage example with the first LOAD PH of size 0x000000000000841c at the library base address.

Did that pass the same test?
No it can't, the mapping is 0x19000 and size 0x841c does not round up to that.
I think we must ignore the first LOAD PH from the clang build as we check:

408       } else if (lib_php->p_flags != existing_map->flags) {

..and the first LOAD PH from https://bugs.openjdk.org/secure/attachment/116209/3libjimage_all.txt
..is a Read only, not execute.

This should be OK in this case:

https://bugs.openjdk.org/secure/attachment/116199/3core_all.txt
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000141000 0x00007fa9ff0e4000 0x0000000000000000
0x0000000000019000 0x0000000000019000 R E 0x1000

Filesize and memsiz both 0x019000 must mean the core contains everything, we don't need either of these from the library.
But there is no harm in continuing past the problematic check and populating existing_map from the library.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That's mostly making sense to me. I'm still a bit fuzzy on it, but that's ok. Has testing been done to make sure this doesn't break anything with gcc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has testing been done to make sure this doesn't break anything with gcc?

Yes, I didn't see any problem while testing the change with gcc.

Copy link
Contributor

@plummercj plummercj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@fandreuz
Copy link
Contributor Author

fandreuz commented Oct 1, 2025

Thanks for the review @kevinjwalls and @plummercj !

@fandreuz
Copy link
Contributor Author

fandreuz commented Oct 1, 2025

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Oct 1, 2025
@openjdk
Copy link

openjdk bot commented Oct 1, 2025

@fandreuz
Your change (at version fe9adaf) is now ready to be sponsored by a Committer.

@kevinjwalls
Copy link
Contributor

/sponsor

@openjdk
Copy link

openjdk bot commented Oct 2, 2025

Going to push as commit 8be1616.
Since your change was applied there have been 53 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 2, 2025
@openjdk openjdk bot closed this Oct 2, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Oct 2, 2025
@openjdk
Copy link

openjdk bot commented Oct 2, 2025

@kevinjwalls @fandreuz Pushed as commit 8be1616.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants