Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8261710: SA DSO objects have sizes that are too large #2563

Closed
wants to merge 7 commits into from

Conversation

YaSuenag
Copy link
Member

@YaSuenag YaSuenag commented Feb 14, 2021

This PR relates to JDK-8261702 ( #2562 )
When SA creates a DSO object, which is used to represent a shared object file (.so), it initializes the "size" to be the size of the shared object file. This usually results in the size being too big. This can cause SA to get confused about whether or not an address is in the shared object. SA should instead set the DSO's size to the amount of the file that is actually mapped for executable code.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8261710: SA DSO objects have sizes that are too large

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2563/head:pull/2563
$ git checkout pull/2563

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Feb 14, 2021

👋 Welcome back ysuenaga! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Feb 14, 2021

@YaSuenag The following label will be automatically applied to this pull request:

  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@YaSuenag YaSuenag marked this pull request as ready for review Feb 14, 2021
@openjdk openjdk bot added the rfr label Feb 14, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Feb 14, 2021

Webrevs

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 15, 2021

These changes look good, but it would be nice to fix OSX too. Fortunately it should be easier. As part of the fix for JDK-8247515, I added a lib_info->memsz field. I think it is correct and is what you need. I needed it in order to properly scan .dylibs for symbols without scanning too far. It seems to be working. Unfortunately we don't really have a good tests for this, but you could look at the before and after for ClhsdbPmap test to get an idea of if it is working. If you don't have an OSX machine to try, I can try out your changes for you.

BTW, I have no idea if Windows is getting the size right, but I guess we'll just have to assume it is:

    env->CallVoidMethod(obj, addLoadObject_ID, strName, (jlong) params[u].Size,
                        (jlong) params[u].Base);

@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 16, 2021

@plummercj Thanks for the review!

These changes look good, but it would be nice to fix OSX too. Fortunately it should be easier. As part of the fix for JDK-8247515, I added a lib_info->memsz field. I think it is correct and is what you need.

AFAICS lib_info->memsz is not set to loaded size, it seems to relates to the offset of the symbol in the binary.

newlib->symtab = build_symtab(newlib->fd, &newlib->memsz);

uintptr_t offset = lentry.n_value; // offset of the symbol code/data in the file

Can I believe lib_info->memsz for this purpose?
I'm not familiar of Mach-O, and I don't have Mac, so I can't evaluate it.

I needed it in order to properly scan .dylibs for symbols without scanning too far. It seems to be working. Unfortunately we don't really have a good tests for this, but you could look at the before and after for ClhsdbPmap test to get an idea of if it is working. If you don't have an OSX machine to try, I can try out your changes for you.

In process attach on Linux, we can test the change with /proc//maps, but in other case, we might not test it.
In coredump on Linux, it is difficult because we cannot get memory map from other source (excepts the core).

BTW, I have no idea if Windows is getting the size right, but I guess we'll just have to assume it is:

    env->CallVoidMethod(obj, addLoadObject_ID, strName, (jlong) params[u].Size,
                        (jlong) params[u].Base);

According to Microsoft Docs, it sounds good.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 16, 2021

https://bugs.openjdk.java.net/browse/JDK-8250750 has some interesting details that I forgot about. Note this is referencing the state of the code before fixing JDK-8250750, which changed how lib->memsz is calculated.

lib->memsz comes from the size of the LC_SEGMENT_64 that the library was discovered in. However, the library can actually span multiple segments. In this case of the vtable address, the address was in the segment that follows the initial LC_SEGMENT_64. Because of this lib->memsz is too small, resulting in symbol lookups being restricted to addresses that are in the initial segment.

The simplest approach to fixing this seems to be locating the largest offset found in the symbol table, round that up to a page boundary, and use it as lib->memsz. I've implemented this and it seems to be working.

So it's not perfect, but I think it's good enough for our needs and better than using the file size and ending up with a size that is too big. I think the main way that this could fail is if you use findpc on an address that is beyond the last symbol (if it is even possible for there to be memory mapped at an address after the last symbol). In this case findpc will just say it doesn't know where the address is rather than saying it is in the dso + some very large offset. For symbols in the dso, they will always be found since lib->memsz now covers all symbols.

Also, since I never did figure out how to determine the size of a mach-o symbol, it's possible that the last symbol extends beyond the page boundary. If that is the case and you use findpc on and address that is part of the symbol but beyond the page boundary, findpc won't find it and say it doesn't know what the address is. So again, not perfect, but this issue can only happen with the last symbol in the dso, and only if the symbol crosses a page boundary, and only if the searched for address is after the page boundary. If you search for the start of the symbol, you'll still find it.

Having said all that, I think maybe we can get the correct size by taking the address of the highest symbol and then looking up the map_info that it is in. Then it's just a matter of adding map->vaddr + map->memsz. But this assumes that there are no more maps for the dso after the one that has the last symbol, so maybe still not perfect. I'm not sure any of this extra work is worth it. When I dove into the mach-o support last year to fix some issues, it ended up being a huge rat hole that took way more of my time then I had expected. It was very badly broken, far worse then I first thought, and many things about mach-o core files were a mystery, and still remain so. For example, see https://bugs.openjdk.java.net/browse/JDK-8249779. We never did figure out how to create a link map.

@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 16, 2021

Thanks for explanation!
I understood Marc-O is complex, and to use lib->memsz is reasonable as you said.

I updated patch for macOS. I cannot test it, but all of tier 1 servicerbility tests were passed on GitHub Actions. Could you review again?

@dcubed-ojdk
Copy link
Member

@dcubed-ojdk dcubed-ojdk commented Feb 16, 2021

While I was characterizing:

JDK-8261844 serviceability/sa/ClhsdbFindPC.java#id1 failed with "'In code in NMethod for LingeredAppWithTrivialMain.main' missing from stdout/stderr"

I determined that I could easily reproduce serviceability/sa/ClhsdbFindPC.java failures
on my Ubuntu 16.04 test machine. I've applied this version of this PR at @plummercj's
request:

$ git log HEAD^!
commit c9a3472 (HEAD -> pull/2563)
Author: Yasumasa Suenaga suenaga@oss.nttdata.com
Date: Tue Feb 16 18:05:43 2021 +0900

Remove unnecessary code

and retested serviceability/sa/ClhsdbFindPC.java:

$ do_java_test serviceability/sa/ClhsdbFindPC.java 2>&1 | tee -a do_java_test.8261710.log
INFO: GNUMAKE=make
INFO: GNUMAKE version is: GNU Make 4.1

INFO: JTREG options:
INFO: JOBS=16
INFO: TEST_MODE=othervm
INFO: EXTRA_PROBLEM_LISTS=ProblemList-extra.txt
INFO: VM_OPTIONS=
INFO: test_val=serviceability/sa/ClhsdbFindPC.java
Test Config: linux-x86_64-normal-server-release
INFO: TIMEOUT_FACTOR=4
Done testing
Test Run linux-x86_64-normal-server-release time: 0.55 minutes.

TEST                                              TOTAL  PASS  FAIL ERROR
jtreg:open/test/hotspot/jtreg/serviceability/sa/ClhsdbFindPC.java
>>                                                       4     3     1     0 <<

1 failure(s) found in log=do_java_test.linux-x86_64-normal-server-release.log

TEST: serviceability/sa/ClhsdbFindPC.java#id3
ERROR: Failed to instantiate timeout handler: jdk.test.failurehandler.jtreg.GatherProcessInfoTimeoutHandler: file does not exist

Test Config: linux-x86_64-normal-server-fastdebug
INFO: TIMEOUT_FACTOR=6
Done testing
Test Run linux-x86_64-normal-server-fastdebug time: 0.87 minutes.

TEST                                              TOTAL  PASS  FAIL ERROR
jtreg:open/test/hotspot/jtreg/serviceability/sa/ClhsdbFindPC.java
                                                      4     4     0     0

Test Config: linux-x86_64-normal-server-slowdebug
INFO: TIMEOUT_FACTOR=12
Done testing
Test Run linux-x86_64-normal-server-slowdebug time: 2.97 minutes.

TEST                                              TOTAL  PASS  FAIL ERROR
jtreg:open/test/hotspot/jtreg/serviceability/sa/ClhsdbFindPC.java
>>                                                       4     3     1     0 <<

1 failure(s) found in log=do_java_test.linux-x86_64-normal-server-slowdebug.log

TEST: serviceability/sa/ClhsdbFindPC.java#id3
LOG: build/linux-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_serviceability_sa_ClhsdbFindPC_java/serviceability/sa/ClhsdbFindPC_id3.jtr
Saving build/linux-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_serviceability_sa_ClhsdbFindPC_java/serviceability/sa/ClhsdbFindPC_id3.jtr as /work/shared/bug_hunt/8261844_for_jdk17.git/test_failures.2021-02-16-171036/ClhsdbFindPC_id3.jtr.slowdebug
Saving /work/shared/bug_hunt/8261844_for_jdk17.git/build/linux-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_serviceability_sa_ClhsdbFindPC_java/serviceability/sa/ClhsdbFindPC_id3/hs_err_pid16470.log as /work/shared/bug_hunt/8261844_for_jdk17.git/test_failures.2021-02-16-171036/hs_err_pid16470.log
Moving /work/shared/bug_hunt/8261844_for_jdk17.git/build/linux-x86_64-normal-server-slowdebug/test-support/jtreg_open_test_hotspot_jtreg_serviceability_sa_ClhsdbFindPC_java/serviceability/sa/ClhsdbFindPC_id3/core to /work/shared/bug_hunt/8261844_for_jdk17.git/test_failures.2021-02-16-171036/core.16470

Total test time: 4.40 minutes.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 16, 2021

I updated patch for macOS. I cannot test it, but all of tier 1 servicerbility tests were passed on GitHub Actions. Could you review again?

The OSX changes look good and passed my testing. However as Dan indicated above, it seems with your changes in place the original problem is still being reproduced. Note the original problem never reproduced with the testing I've been doing, so my testing doesn't prove that you fixed the issue. However, Dan is running on a local host that does reproduce the issue pretty reliably, and seems to still reproduce it after your fixes.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 16, 2021

Dan included the logs of the failures in the CR. This is what they show:

  + findpc 0x00002b77f084d116
Address 0x00002b77f084d116: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b116
...
java.lang.RuntimeException: Test ERROR java.lang.RuntimeException: 'In interpreter codelet' missing from stdout/stderr 

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 16, 2021

I should add that the failures Dan is seeing are with #id3, which is no -Xcomp, but with a core file instead of a live process. With -Xcomp this part of the test is not run, so possibly this is just an issue with the dso size calculation for core files, and works correctly with a live process.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 17, 2021

If you run ClhsdbPmap.java, you can see pmap output for both core and live processes. The sizes of the maps are very large for both of them, and actually a bit bigger with the live process. Here's the output for a live process:

0x000014755360c000	4048K	/usr/lib64/libnss_sss.so.2
0x0000147553815000	4012K	/usr/lib64/libnss_files-2.17.so
0x0000147560208000	4064K	/usr/lib64/libm-2.17.so
0x000014756050a000	3032K	/usr/lib64/librt-2.17.so
0x0000147560712000	32892K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
0x0000147562731000	4924K	/usr/lib64/libc-2.17.so
0x0000147562aff000	3076K	/usr/lib64/libdl-2.17.so
0x0000147562d03000	3060K	/usr/lib64/libpthread-2.17.so
0x0000147562f1f000	2948K	/usr/lib64/libz.so.1.2.7
0x0000147563135000	2860K	/usr/lib64/ld-2.17.so
0x0000147563164000	92K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
0x000014756317b000	80K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
0x00001475631e0000	156K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
0x0000147563207000	128K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
0x000014756332c000	68K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
0x0000563c950bf000	16K	/scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java

/usr/lib64/libnss_files-2.17.so is the one that turned up in the test failure. It's only a 68k file but has a 4064k map. It's second in the list. I'm not sure if this is the order we would always see on linux systems. My assumption was it was the library at the highest address that was causing the problem, and that the inteprerter was located right after it, but that might not be the case.

The address in the interpreter that we are doing findpc on turned up at libnss_files.so.2 + 0x21b116, or at an offset of 2200k. I added a "pmap" command to ClhsdbFindPC, and from my test runs the interpreter seemed to alway be just before the first library. However, maybe on some systems it is intermixed with the libraries.

@openjdk
Copy link

@openjdk openjdk bot commented Feb 17, 2021

@YaSuenag this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8261710
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict label Feb 17, 2021
@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 17, 2021

I pushed new change to use ELF_PHDR.p_filesz instead of p_memsz. It almost works fine, but it is not perfect solution.
For example, let's consider for libnss_sss (provided by Fedora 33) - /proc/<PID>/maps shows libnss as following. There are 5 segments.

7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133                     /usr/lib64/libnss_sss.so.2
7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133                     /usr/lib64/libnss_sss.so.2
7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133                     /usr/lib64/libnss_sss.so.2
7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133                     /usr/lib64/libnss_sss.so.2
7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133                     /usr/lib64/libnss_sss.so.2

However I could see only 4 segments in libnss_sss.so when I ran readelf -l /usr/lib64/libnss_sss.so.2:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000001468 0x0000000000001468  R      0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x0000000000006931 0x0000000000006931  R E    0x1000
  LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
                 0x0000000000001110 0x0000000000001110  R      0x1000
  LOAD           0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
                 0x000000000000044c 0x0000000000000658  RW     0x1000

Linux kernel seems to separate final segment (0xbc78) into RO and RW segments when it attempts to load shared library. (but I'm not sure)

I think we need to refactor handling shared libraries in other ways.

For live process, we can use /proc/<PID>/maps.
For coredump, we can use NT_FILE in note section in corefile, It has valid segments as below.

$ readelf -n core
  :
    0x00007f0ba6ec5000  0x00007f0ba6ec7000  0x0000000000000000
        /usr/lib64/libnss_sss.so.2
    0x00007f0ba6ec7000  0x00007f0ba6ece000  0x0000000000000002
        /usr/lib64/libnss_sss.so.2
    0x00007f0ba6ece000  0x00007f0ba6ed0000  0x0000000000000009
        /usr/lib64/libnss_sss.so.2
    0x00007f0ba6ed0000  0x00007f0ba6ed1000  0x000000000000000a
        /usr/lib64/libnss_sss.so.2
    0x00007f0ba6ed1000  0x00007f0ba6ed2000  0x000000000000000b
        /usr/lib64/libnss_sss.so.2

But they makes big change to SA.
As an option, we can integrate this change at first, then we will refactor them.
What do you think?
(I want to resolve this problem with smaller fix if I can of course, so another solutions are welcome)

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 17, 2021

Linux kernel seems to separate final segment (0xbc78) into RO and RW segments when it attempts to load shared library. (but I'm not sure)

I know I've seen this brought up before, if not with SA then with hotspot itself. Maybe it was CDS or NMT. Or maybe it has something to do with how hotspot generates core dumps (I seem to recall there are some bits you can set somewhere that determine what does or does not get dumped to the core).

In any case, I think the main issue it causes for you is that your rounding up the size of the last (4th) segment may not enough. I think in most cases you would need to round it up to a page boundary, and then add another page to it. Consider a page as 4k and let's say the segment size is 2k, but it is split into two 1k segments. Each of those segment would take one page, so a total of 8k. But if you round 2k up to a page boundary you only get to 4k. You need to add another page to that. I think the only time you don't want to add an extra page to the size is if one of the segment's size was already page aligned. However, you have know way of knowing if that will be the case, unless you can determine the RO and RW sizes of the segment.

But they makes big change to SA.
As an option, we can integrate this change at first, then we will refactor them.
What do you think?

Whatever works for you. I'm actually not too concerned about getting this right, because with my PointerFinder workaround I don't think this issue with the map sizes has much impact on SA. Probably the only place it will show up is with SA pmap output.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 17, 2021

@YaSuenag https://bugs.openjdk.java.net/browse/JDK-8250826 is the bug I was thinking of that sounds like the RO/RW issue you were talking about.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 17, 2021

@YaSuenag I asked Dan to run a modified ClhsdbFindPC that also issues a clhsdb pmap command so we can see what the maps look like, and compare them to the address being looked up. This is before your latest fix, so the the sizes are still too big, but that's ok for this analysis. First, this is the findpc command that was suppose to show the address in the interpreter:

hsdb> + findpc 0x00002ab36ca942b6
Address 0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6

And here's the pmap output . I had to manually sort by address, and I also added the location of the interpreter address being looked up.

0x00005652c8fd0000	16K	<jdkdir>/jdk/bin/java
0x00002ab3692ae000	3400K	/lib64/ld-linux-x86-64.so.2
0x00002ab3692e0000	12K	<jdkdir>/test/hotspot/jtreg/native/libLingeredApp.so
0x00002ab3692ed000	84K	<jdkdir>/jdk/bin/../lib/libjli.so
0x00002ab369406000	144K	<jdkdir>/jdk/lib/libjimage.so
0x00002ab36942a000	200K	<jdkdir>/jdk/lib/libjava.so
0x00002ab3694bc000	88K	<jdkdir>/jdk/lib/libnio.so
0x00002ab3694d6000	3240K	/lib/x86_64-linux-gnu/libz.so.1
0x00002ab3696f0000	3136K	/lib/x86_64-linux-gnu/libpthread.so.0
0x00002ab36990d000	3020K	/lib/x86_64-linux-gnu/libdl.so.2
0x00002ab369b11000	5052K	/lib/x86_64-linux-gnu/libc.so.6
0x00002ab369edb000	31100K	<jdkdir>/jdk/lib/server/libjvm.so
0x00002ab36bd3a000	2840K	/lib/x86_64-linux-gnu/librt.so.1
0x00002ab36bf42000	4856K	/lib/x86_64-linux-gnu/libm.so.6
0x00002ab36c24b000	3796K	/lib/x86_64-linux-gnu/libnss_compat.so.2
0x00002ab36c454000	3760K	/lib/x86_64-linux-gnu/libnsl.so.1
0x00002ab36c66d000	3660K	/lib/x86_64-linux-gnu/libnss_nis.so.2
0x00002ab36c879000	3612K	/lib/x86_64-linux-gnu/libnss_files.so.2
0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6
0x00002ab38fc08000	112K	<jdkdir>/jdk/lib/libnet.so
0x00002ab38fc55000	3756K	/usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00002ab3bc000000	4096K	/lib/x86_64-linux-gnu/libgcc_s.so.1

There appears to be a very large gap between libnss_files.so.2 and libnet.so (about 590mb) so I assume a lot of hotspot memory allocations are located in this area, including the interpreter.

Copy link
Contributor

@sspitsyn sspitsyn left a comment

Hi Yasumasa,

Just some comments.

src/jdk.hotspot.agent/linux/native/libsaproc/libproc_impl.c

+static inline uintptr_t align_down(uintptr_t ptr, size_t size) {
+  return (ptr & ~(size - 1));
+}
+
+static inline uintptr_t align_up(uintptr_t ptr, size_t size) {
+  return ((ptr + size - 1) & ~(size - 1));
+}

The name of 'size' parameter is confusing. Should it be renamed to something like page_size of psize?

+  lib->end = (uintptr_t)-1L;
. . .
+      if ((lib->end == (uintptr_t)-1L) || (lib->end < aligned_end)) {
+        lib->end = aligned_end;
       }

The condition (lib->end == (uintptr_t)-1L) is a subset of (lib->end < aligned_end), and so, can be removed. The same is true for the condition:

+        if ((lib->exec_end == (uintptr_t)-1L) || (lib->exec_end < aligned_end)) {
+          lib->exec_end = aligned_end;
+        }

+ print_debug("%s [%d] 0x%lx-0x%lx: base = 0x%lx, vaddr = 0x%lx, memsz = 0x%lx, filesz = 0x%lx\n", lib->name, cnt, aligned_start, aligned_end, lib->base, ph->p_vaddr, ph->p_memsz, ph->p_filesz);

It is better to split this long line.

Thanks,
Serguei

@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 18, 2021

@YaSuenag https://bugs.openjdk.java.net/browse/JDK-8250826 is the bug I was thinking of that sounds like the RO/RW issue you were talking about.

It is similar, but it's different from this issue because JDK-8250826 is caused by mprotect() call against memory segment in ELF binary.

In any case, I think the main issue it causes for you is that your rounding up the size of the last (4th) segment may not enough. I think in most cases you would need to round it up to a page boundary, and then add another page to it.

Hmm... it might be page-boundary problem as you said, but I don't have any ideas where we can collect the information about that excepting note section in the core.
My latest patch shows following debug message on the console. It shows all PT_LOAD segments have been handled correctly.

libsaproc DEBUG: /lib64/libnss_sss.so.2 [0] 0x7f0ba6ec5000-0x7f0ba6ec7000: base = 0x7f0ba6ec5000, vaddr = 0x0, memsz = 0x1468, filesz = 0x1468
libsaproc DEBUG: /lib64/libnss_sss.so.2 [1] 0x7f0ba6ec7000-0x7f0ba6ece000: base = 0x7f0ba6ec5000, vaddr = 0x2000, memsz = 0x6931, filesz = 0x6931
libsaproc DEBUG: /lib64/libnss_sss.so.2 [2] 0x7f0ba6ece000-0x7f0ba6ed0000: base = 0x7f0ba6ec5000, vaddr = 0x9000, memsz = 0x1110, filesz = 0x1110
libsaproc DEBUG: /lib64/libnss_sss.so.2 [3] 0x7f0ba6ed0000-0x7f0ba6ed1000: base = 0x7f0ba6ec5000, vaddr = 0xbc78, memsz = 0x658, filesz = 0x44c

I'm actually not too concerned about getting this right, because with my PointerFinder workaround I don't think this issue with the map sizes has much impact on SA. Probably the only place it will show up is with SA pmap output.

Ok, I will move forward this fix.

@openjdk openjdk bot removed the merge-conflict label Feb 18, 2021
@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 18, 2021

Thanks @sspitsyn for the comment!
I fixed almost of your comment in new commit.

The condition (lib->end == (uintptr_t)-1L) is a subset of (lib->end < aligned_end), and so, can be removed. The same is true for the condition:

lib->end is declared as unsigned (uintptr_t), so we can't use (lib->end < aligned_end) when lib->end is set to -1.

@plummercj
Copy link
Contributor

@plummercj plummercj commented Feb 19, 2021

BTW, one way to test these changes might be to do a clhsdb pmap and then do a findpc on the last address of each library's map (should find a symbol) and the first address after the map (should not find a symbol unless that happens to be the start of the next mapped library). I'm not sure if this will always work. It's possible that no symbol covers the very end of the map, especially due to page alignment. Maybe try the first address of the last page then. I'm not necessarily suggesting you write a test that does this, but maybe just do a bit of experimenting as a sanity check.

@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 20, 2021

BTW, one way to test these changes might be to do a clhsdb pmap and then do a findpc on the last address of each library's map (should find a symbol) and the first address after the map (should not find a symbol unless that happens to be the start of the next mapped library). I'm not sure if this will always work. It's possible that no symbol covers the very end of the map, especially due to page alignment. Maybe try the first address of the last page then.

I thought similar tests, but I don't want to do so because it might fail as you said.
As an option, we can add the test to to compare address range of shared libraries between clhsdb pmap and /proc/<PID>/maps. But I can add it for Linux only, so I'm not sure it is worth to add.

@sspitsyn
Copy link
Contributor

@sspitsyn sspitsyn commented Feb 20, 2021

Thank you for the update.

lib->end is declared as unsigned (uintptr_t), so we can't use (lib->end < aligned_end) when lib->end is set to -1.
Could we use the 0L instead of -1L? What's wrong with 0L?

In general, I'm okay with your fix though.

@openjdk
Copy link

@openjdk openjdk bot commented Feb 20, 2021

@YaSuenag This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8261710: SA DSO objects have sizes that are too large

Reviewed-by: sspitsyn, cjplummer

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 127 new commits pushed to the master branch:

  • 03d888f: 8261804: Remove field _processing_is_mt, calculate it instead
  • 6800ba4: 8257500: Drawing MultiResolutionImage with ImageObserver "leaks" memory
  • 65a245e: 8262329: Fix JFR parser exception messages
  • a4c2496: 8259535: ECDSA SignatureValue do not always have the specified length
  • 2515c42: 8262332: serviceability/sa/ClhsdbJhisto.java fails with Test ERROR java.lang.RuntimeException: 'ParselTongue' missing from stdout/stderr
  • 07061fc: 8256417: Exclude TestJFRWithJMX test from running with PodMan
  • c9e9189: 8262074: Consolidate the default value of MetaspaceSize
  • 05c11bc: 8262426: Change TRAPS to Thread* for find_constrained_instance_or_array_klass()
  • d06d6f5: 8262402: Make CATCH macro assert not fatal
  • 47a0842: 8241403: JavaThread::get_thread_name() should be ThreadSMR-aware
  • ... and 117 more: https://git.openjdk.java.net/jdk/compare/ea5bf45c6f1bdcc7d671d9c7258ed98280637331...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Feb 20, 2021
@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Feb 20, 2021

@sspitsyn Thanks for the review!

lib->end is declared as unsigned (uintptr_t), so we can't use (lib->end < aligned_end) when lib->end is set to -1.

Could we use the 0L instead of -1L? What's wrong with 0L?

I prefer to use -1 instead of 0 for invalid value because we can't distinguish NULL or invalid value if we use 0.

@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Mar 1, 2021

PING: Can I get second reviewer?

@plummercj
Copy link
Contributor

@plummercj plummercj commented Mar 1, 2021

PING: Can I get second reviewer?

Sorry, thought I had already done that but I guess not.

@YaSuenag
Copy link
Member Author

@YaSuenag YaSuenag commented Mar 2, 2021

/integrate

@openjdk openjdk bot closed this Mar 2, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Mar 2, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Mar 2, 2021

@YaSuenag Since your change was applied there have been 147 commits pushed to the master branch:

  • fdd1093: 8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base
  • f5ab7f6: 8262472: Buffer overflow in UNICODE::as_utf8 for zero length output buffer
  • 6635d7a: 8261670: Add javadoc for the XML processing limits
  • 85b774a: 8255859: Incorrect comments in log.hpp
  • c3eb80e: 8262500: HostName entry in VM.info should be a new line
  • 9f0f0c9: 8260933: runtime/cds/serviceability/ReplaceCriticalClassesForSubgraphs.java fails without CompactStrings
  • d339832: 8257414: Drag n Drop target area is wrong on high DPI systems
  • 353416f: 8262509: JSSE Server should check the legacy version in TLSv1.3 ClientHello
  • 642f45f: 8261839: Error creating runtime package on macos without mac-package-identifier
  • 682e120: 8262497: Delete unused utility methods in ICC_Profile class
  • ... and 137 more: https://git.openjdk.java.net/jdk/compare/ea5bf45c6f1bdcc7d671d9c7258ed98280637331...master

Your commit was automatically rebased without conflicts.

Pushed as commit 3b350ad.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@YaSuenag YaSuenag deleted the JDK-8261710 branch Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrated serviceability
4 participants