Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8262188: Add test to verify trace page sizes logging on Linux #2696

Closed
wants to merge 4 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Feb 23, 2021

Please review this new test that checks the output from -Xlog:pagesize and compares it to what's really used according to /proc/self/smaps.

We already have a few tests that check the output from -Xlog:pagesize, the difference is that those tests expect some page size given the parameters used. This new test instead check if the values printed out are correct with regards to what is recorded in /proc/self/maps.

I wrote the test to see if all our logging was correct, and found those two issues that are now fixed:
JDK-8261029: Code heap page sizes not traced correctly using os::trace_page_sizes
JDK-8261230: GC tracing of page sizes are wrong in a few places

The test now passes and it will be a good way to prevent incorrect page size logging to be added going forward.

Testing
The new test has been run through mach5 without any problems and also manually on configurations with explicit large pages enabled.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8262188: Add test to verify trace page sizes logging on Linux

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2696/head:pull/2696
$ git checkout pull/2696

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 23, 2021

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 23, 2021

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime label Feb 23, 2021
@kstefanj kstefanj marked this pull request as ready for review Feb 23, 2021
@openjdk openjdk bot added the rfr label Feb 23, 2021
@mlbridge
Copy link

mlbridge bot commented Feb 23, 2021

Webrevs

Copy link
Member

@tstuefe tstuefe left a comment

Hi Stefan,

looks good. Minor points below.

Cheers, Thomas

* @requires os.family == "linux"
* @requires os.arch=="amd64" | os.arch=="x86_64"
* @requires vm.gc != "Z"
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseLargePages -XX:LargePageSizeInBytes=1g TestTracePageSizes
Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also test with -XX:LargePageSizeInBytes=2M?

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, 2m might not always be the default so I guess it makes sense to explicitly test both.

* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:-SegmentedCodeCache TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:-SegmentedCodeCache -XX:+UseLargePages TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:-SegmentedCodeCache -XX:+UseTransparentHugePages TestTracePageSizes
*/
Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you use multiple @run, when multiple @test?

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be using this wrong, but the way I use it is that for each @test I can have a set of @requires and then I do a set of runs under these requirements.

Sometimes specifying explicit tests for the different GCs are considered to expensive but in this case I think it is fine.

* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseSerialGC TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseSerialGC -XX:+UseLargePages TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseSerialGC -XX:+UseTransparentHugePages TestTracePageSizes
*/
Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you don't explicitly test UseSHM. Because the number of tests would be too large?

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't explicitly test HugeTLBFS either, but yes that is what's being tested when just setting +UseLargePages =) Would you prefer if I had one @run for -XX:+UseHugeTLBFS and one for -XX:+UseSHM.

Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine for now; hopefully we'll get rid of UseSHM.

// will generate a match.
private static void parseSmaps() throws Exception {
Pattern smapsPattern = Pattern.compile(RangeWithPageSize.smapsPatternString, Pattern.DOTALL);
Scanner smapsScanner = new Scanner(new File("/proc/self/smaps"));
Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that work if mappings are concurrently modified? (I guess it has to otherwise one could not read reliably from /proc at all)

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say for sure, not sure exactly how the file is consumed by the Scanner. The segments and the fields in those segments that we really care about should not change and therefore I think we should be fine.

Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at https://www.kernel.org/doc/Documentation/filesystems/proc.txt

Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent
output can be achieved only in the single read call).
This typically manifests when doing partial reads of these files while the
memory map is being modified.  Despite the races, we do provide the following
guarantees:

1) The mapped addresses never go backwards, which implies no two
   regions will ever overlap.
2) If there is something at a given vaddr during the entirety of the
   life of the smaps/maps walk, there will be some output for it.

So, if Scanner reads the file in one go its okay.

I'd say lets see how this goes then. If this test fails intermittently, we know where to look.

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a very basic test in jshell and created a Scanner on a file that I then updated between calls to nextLine(). The output from the scanner did not include my additions, so at least for a small file is seems like the contents is read in one go. But that doesn't say that is always the case.

Bu I agree, we know where to look if we see intermittent failures.

// transparent huge pages. The pattern is not greedy and will
// match as little as possible so each "segment" in the file
// will generate a match.
private static void parseSmaps() throws Exception {
Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code may be easier to read if the pattern were defined up here.

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I moved it around and ended up leaving it below, but now with the comment it certainly make sense to have it here.

Pattern traceLinePattern = Pattern.compile(traceLinePatternString);

// The test needs to be run with page size logging printed to ps-$pid.log.
Scanner fileScanner = new Scanner(new File("./ps-" + ProcessHandle.current().pid() + ".log"));
Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we scan the log file while it is being written by ourselves? This seems slightly unsafe. Are we sure it has been completely flushed at this point? I'm mainly concerned about things like running this test in a current dir on NFS or somesuch.

An alternative would be to let the testee write the log file and print out a copy of /proc/self/smaps to stdout, then to scan and compare in a separate process.

But if you think this is safe enough, I'm fine with this too.

Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really while it is being written, since all the output from -Xlog:pagesize happens before the java test is executed and from what I can tell the log-file should be properly flushed.

As you say doing the scan and compare in a separate process afterwards would make any concurrent modification issue and flush issue go away. But it would also slightly complicate the test and make it depend on the test library. For debugging it is very nice to be able to run the test standalone and that is a bit harder when depending on the test lib.

Until proven unsafe I would prefer to leave it as is.

Copy link
Member

@tstuefe tstuefe Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say so too. If that pattern works, I may use it in other places too.

Copy link
Contributor Author

@kstefanj kstefanj left a comment

Thanks Thomas for looking at this and highlighting some potential problems.

I think we should be fine, but I will see if I can trigger a failure caused by any of your concerns :)

* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:-SegmentedCodeCache TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:-SegmentedCodeCache -XX:+UseLargePages TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:-SegmentedCodeCache -XX:+UseTransparentHugePages TestTracePageSizes
*/
Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be using this wrong, but the way I use it is that for each @test I can have a set of @requires and then I do a set of runs under these requirements.

Sometimes specifying explicit tests for the different GCs are considered to expensive but in this case I think it is fine.

* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseSerialGC TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseSerialGC -XX:+UseLargePages TestTracePageSizes
* @run main/othervm -XX:+AlwaysPreTouch -Xlog:pagesize:ps-%p.log -XX:+UseSerialGC -XX:+UseTransparentHugePages TestTracePageSizes
*/
Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't explicitly test HugeTLBFS either, but yes that is what's being tested when just setting +UseLargePages =) Would you prefer if I had one @run for -XX:+UseHugeTLBFS and one for -XX:+UseSHM.

// transparent huge pages. The pattern is not greedy and will
// match as little as possible so each "segment" in the file
// will generate a match.
private static void parseSmaps() throws Exception {
Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I moved it around and ended up leaving it below, but now with the comment it certainly make sense to have it here.

// will generate a match.
private static void parseSmaps() throws Exception {
Pattern smapsPattern = Pattern.compile(RangeWithPageSize.smapsPatternString, Pattern.DOTALL);
Scanner smapsScanner = new Scanner(new File("/proc/self/smaps"));
Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say for sure, not sure exactly how the file is consumed by the Scanner. The segments and the fields in those segments that we really care about should not change and therefore I think we should be fine.

Pattern traceLinePattern = Pattern.compile(traceLinePatternString);

// The test needs to be run with page size logging printed to ps-$pid.log.
Scanner fileScanner = new Scanner(new File("./ps-" + ProcessHandle.current().pid() + ".log"));
Copy link
Contributor Author

@kstefanj kstefanj Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really while it is being written, since all the output from -Xlog:pagesize happens before the java test is executed and from what I can tell the log-file should be properly flushed.

As you say doing the scan and compare in a separate process afterwards would make any concurrent modification issue and flush issue go away. But it would also slightly complicate the test and make it depend on the test library. For debugging it is very nice to be able to run the test standalone and that is a bit harder when depending on the test lib.

Until proven unsafe I would prefer to leave it as is.

Copy link
Member

@tstuefe tstuefe left a comment

I'm fine with this test. It will come in useful.

I leave it up to you to check the reading of proc/self/smaps more robust. It would be okay for me to use this test as it is and if it starts failing intermittently to modify it.

@openjdk
Copy link

openjdk bot commented Feb 24, 2021

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8262188: Add test to verify trace page sizes logging on Linux

Reviewed-by: stuefe, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 64 new commits pushed to the master branch:

  • 702ca62: 8262185: G1: Prune collection set candidates early
  • 8bc8542: 8262195: Harden tests that use the HostsFileNameService (jdk.net.hosts.file property)
  • 20c93b3: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload
  • ddd550a: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed
  • 03d888f: 8261804: Remove field _processing_is_mt, calculate it instead
  • 6800ba4: 8257500: Drawing MultiResolutionImage with ImageObserver "leaks" memory
  • 65a245e: 8262329: Fix JFR parser exception messages
  • a4c2496: 8259535: ECDSA SignatureValue do not always have the specified length
  • 2515c42: 8262332: serviceability/sa/ClhsdbJhisto.java fails with Test ERROR java.lang.RuntimeException: 'ParselTongue' missing from stdout/stderr
  • 07061fc: 8256417: Exclude TestJFRWithJMX test from running with PodMan
  • ... and 54 more: https://git.openjdk.java.net/jdk/compare/29c72631ee3f4f1f88a36f25b933642ea9659681...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Feb 24, 2021
Updated with an additional run with setting page size = 2m and moved the regexp-string to where it is used.
Copy link
Contributor

@tschatzl tschatzl left a comment

I think this is good. There are two typos in there, please fix before pushing.

Scanner fileScanner = new Scanner(new File("./ps-" + ProcessHandle.current().pid() + ".log"));
while (fileScanner.hasNextLine()) {
String line = fileScanner.nextLine();
if (line.matches(traceLinePatternString)){
Copy link
Contributor

@tschatzl tschatzl Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space before the brace


public RangeWithPageSize(String start, String end, String pageSize, String vmFlags) {
this.start = Long.parseUnsignedLong(start, 16);
this.end = Long.parseUnsignedLong(end, 16);;
Copy link
Contributor

@tschatzl tschatzl Mar 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two ";" at the end

@kstefanj
Copy link
Contributor Author

kstefanj commented Mar 1, 2021

Thanks for the reviews Thomas and Thomas =)

@kstefanj
Copy link
Contributor Author

kstefanj commented Mar 1, 2021

/integrate

@openjdk openjdk bot closed this Mar 1, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Mar 1, 2021
@openjdk
Copy link

openjdk bot commented Mar 1, 2021

@kstefanj Since your change was applied there have been 65 commits pushed to the master branch:

  • 0a7fff4: 8261636: The test mapping in hugetlbfs_sanity_check should consider LargePageSizeInBytes
  • 702ca62: 8262185: G1: Prune collection set candidates early
  • 8bc8542: 8262195: Harden tests that use the HostsFileNameService (jdk.net.hosts.file property)
  • 20c93b3: 8261914: IfNode::fold_compares_helper faces non-canonicalized bool when running JRuby JSON workload
  • ddd550a: 8261308: C2: assert(inner->is_valid_counted_loop(T_INT) && inner->is_strip_mined()) failed: OuterStripMinedLoop should have been removed
  • 03d888f: 8261804: Remove field _processing_is_mt, calculate it instead
  • 6800ba4: 8257500: Drawing MultiResolutionImage with ImageObserver "leaks" memory
  • 65a245e: 8262329: Fix JFR parser exception messages
  • a4c2496: 8259535: ECDSA SignatureValue do not always have the specified length
  • 2515c42: 8262332: serviceability/sa/ClhsdbJhisto.java fails with Test ERROR java.lang.RuntimeException: 'ParselTongue' missing from stdout/stderr
  • ... and 55 more: https://git.openjdk.java.net/jdk/compare/29c72631ee3f4f1f88a36f25b933642ea9659681...master

Your commit was automatically rebased without conflicts.

Pushed as commit fe8e370.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime integrated
3 participants