Skip to content

8292269: Replace FileMapInfo::fail_continue() with Unified Logging #12419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

matias9927
Copy link
Contributor

@matias9927 matias9927 commented Feb 3, 2023

The current logging method FileMapInfo::fail_continue() method reports the reason when the CDS archive cannot be mapped and has variable behavior depending on the -Xshare mode. Logging calls are divided primarily between the "warning" and "info" channels, where the "info" channel was used to reduce the number of printed logs for expected and normal failures.

Logging will now shift toward Unified Logging, so fail_continue() will be replaced with log_info(cds) and log_warning(cds). Some genuine failures will be logged to the warning channel for better visibility. Upon actual failures, the VM will exit at MetaspaceShared::initialize_runtime_shared_and_meta_spaces() unless it is otherwise necessary. Relevant tests are updated to accommodate this change. Verified with tier1-4 tests


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8292269: Replace FileMapInfo::fail_continue() with Unified Logging

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12419/head:pull/12419
$ git checkout pull/12419

Update a local copy of the PR:
$ git checkout pull/12419
$ git pull https://git.openjdk.org/jdk pull/12419/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12419

View PR using the GUI difftool:
$ git pr show -t 12419

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12419.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 3, 2023

👋 Welcome back matsaave! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Feb 3, 2023

@matias9927 The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Feb 3, 2023
@matias9927 matias9927 marked this pull request as ready for review February 4, 2023 02:59
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 4, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 4, 2023

va_end(ap);
}

void FileMapInfo::fail_continue(LogLevelType level, const char *msg, ...) {
void FileMapInfo::fail_continue_nowarn(const char *msg, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to name a method by what it does do rather than what it doesn't.

But I am finding the naming here problematic anyway as we do not always continue - the logic is basically possibly_fail_else_log_and_continue. Maybe the simplest thing here is to only have the variant that takes the LogLevel and so every callsite is explicit about whether it logs at warning or info level rather than try to accommodate warn/info into the name somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the fail_continue is not the ideal name for what this is trying to accomplish: log a message and possibly fail.

That being said, I felt it was appropriate to abstract away the need to manage LogLevel as only Info and Warning are used with fail_continue. If someone does want to use any of the other channels, like debug, error, etc, they can still use the fail_continue variant that excepts a LogLevel, but they wouldn't have to worry about it otherwise. I think the binary option of warning vs no warning can help to keep it simple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case I return to "It is better to name a method by what it does do rather than what it doesn't." So fail_continue and fail_continue_warn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the methods can be renamed to

FileMapInfo::notify_failure(LogLevelType level, const char *msg, ...);
FileMapInfo::notify_failure(const char *msg, ...); // use warning channel

In most cases we use the warning channel, so that should be the default.

The naming of notify_failure() follows existing functions such as VM_GC_Operation::notify_gc_begin() to indicate that something has happened and some action will happen as a result.

In our case, it's reasonable to expect that some logging may happen as a result of the failure, and the VM may even exit.

By the way, the VM exits only if -Xshare:on is specified in the command-line. As specified in https://docs.oracle.com/en/java/javase/17/vm/class-data-sharing.html, -Xshare:on is indented for testing only and should not be used in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be in line with @dholmes-ora's suggestion, would we use the names notify_failure and notify_failure_warn ? That seems a bit redundant and ambiguous to me, so I propose notify_failure for the warning channel and log_failure for the info channel and other possible levels.

Copy link
Member

@iklam iklam Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, after thinking about this more, I think we should rewrite all the calls to FileMapInfo::fail_continue() with log_xxx() calls. This will bring the CDS logging code up-to-date and we can get rid of the convoluted fail_continue() function.

fail_continue() was ancient code from the days when Unified Logging wasn't available. It currently does three things:

  • (a) If PrintSharedArchiveAndExit or RequireSharedSpaces are enabled, always log the message so it's visible to the user
  • (b) If RequireSharedSpaces is enabled, exit the VM immediately
  • Otherwise, log the message to the cds channel.

As mentioned in JDK-8292269, we want to log some of the cds failure messages to the info channel, to avoid warning messages when the failure is expected (i.e., when -XX:-UseCompressedClassPointers is specified). My proposal is change all the fail_continue() calls to log_info(cds) or log_warning(cds)

The reason for (a) is: PrintSharedArchiveAndExit is for diagnosing problems with the CDS archive; RequireSharedSpaces, which is driven by the -Xshare:on command-line flag, which should be used only in the regression test suite (see here). In both these cases, we can simply force the cds logging level to info inside arguments.cpp (using LogConfiguration::configure_stdout()).

For (b), we don't have to exit immediately when the error is detected. We can exit the VM in MetaspaceShared::initialize_runtime_shared_and_meta_spaces() when result != MAP_ARCHIVE_SUCCESS.

Copy link
Member

@iklam iklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, apart from the comment.

Are any of the tests affected? Some tests may be sensitive to whether the log is from warning or info.

@@ -2591,7 +2553,7 @@ GrowableArray<const char*>* FileMapInfo::_non_existent_class_paths = nullptr;
// Open the shared archive file, read and validate the header
// information (version, boot classpath, etc.). If initialization
// fails, shared spaces are disabled and the file is closed. [See
// fail_continue.]
// log_warning(cds).]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [See ...] part of the comment should be removed.

@openjdk
Copy link

openjdk bot commented Feb 15, 2023

@matias9927 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8292269: Replace FileMapInfo::fail_continue() with Unified Logging

Reviewed-by: iklam, dholmes, ccheung

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 116 new commits pushed to the master branch:

  • ae29054: 8303175: (fs) Deprecate com.sun.nio.file.SensitivityWatchEventModifier for removal
  • a04b104: 8303413: (fs) Ignore polling interval sensitivity modifiers in PollingWatchService
  • 9944314: 8303587: Remove VMOutOfMemoryError001 test from the problem list after 8303198
  • a50dc67: 8303527: update for deprecated sprintf for jdk.hotspot.agent
  • 40c5edf: 8303181: Memory leak in ClassLoaderExt::setup_app_search_path
  • 29ee7c3: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java
  • e3016c1: 8303472: Display name for region TR
  • ae797c6: 8301117: Remove old_size param from ResizeableResourceHashtable::resize()
  • 5085bd5: 8297936: Use reachabilityFence to manage liveness in ClassUnload tests
  • 379f206: 8303198: System and Runtime.exit() resilience to logging errors
  • ... and 106 more: https://git.openjdk.org/jdk/compare/07e976ac26fe3ff6a94713013114dc38c95950b8...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@iklam, @dholmes-ora, @calvinccheung) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 15, 2023
@iklam
Copy link
Member

iklam commented Feb 15, 2023

The PR description and the JBS issue should also be updated to reflect the latest design.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second Ioi's request to update the JBS issue etc to reflect the conversion from fail_continue to use of log_warning.

Otherwise this seems fine. I will assume tests pass with these changes.

Thanks.

@@ -1009,7 +1009,7 @@ void MetaspaceShared::initialize_runtime_shared_and_meta_spaces() {
// The base archive cannot be mapped. We cannot dump the dynamic shared archive.
AutoCreateSharedArchive = false;
DynamicDumpSharedSpaces = false;
FileMapInfo::fail_continue("Unable to map shared spaces");
log_info(cds)("Unable to map shared spaces");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to exit the VM if RequireSharedSpaces is true (this replaces the logic in FileMapInfo::fail_continue). Otherwise, "java -Xshare:on" will continue execution even if the archive(s) fails to map.

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update looks good.

@openjdk
Copy link

openjdk bot commented Feb 17, 2023

@matias9927 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout cdsWarning_8292269
git fetch https://git.openjdk.org/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Feb 17, 2023
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Feb 18, 2023
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unapproving as this latest set of changes is more than just minor tweaks.

Not at all clear why these latest changes were necessary. The code seems messier now.

Why do the tests now explicitly need -Xshare:auto?

Comment on lines 2922 to 2924
if (UseSharedSpaces && RequireSharedSpaces) {
LogConfiguration::configure_stdout(LogLevel::Info, true, LOG_TAGS(cds));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?


if (static_mapinfo != nullptr) {
log_info(cds)("Core region alignment: " SIZE_FORMAT, static_mapinfo->core_region_alignment());
dynamic_mapinfo = open_dynamic_archive();

dynamic_mapinfo = open_dynamic_archive(has_failed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_failed seems unused in relation to the open_dynamic_archive call.

@@ -974,7 +980,7 @@ FileMapInfo* MetaspaceShared::open_static_archive() {
return mapinfo;
}

FileMapInfo* MetaspaceShared::open_dynamic_archive() {
FileMapInfo* MetaspaceShared::open_dynamic_archive(bool &has_failed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the null return suffice for failure checking, possibly in combination with a flag check?

@iklam
Copy link
Member

iklam commented Feb 20, 2023

Why do the tests now explicitly need -Xshare:auto?

That was my suggestion to Matias, but I realized there's a better way to do it. My new suggestion is here:

Background: there are two kinds of CDS mapping errors: soft and hard.

  • You can have a soft error if a command-line VM option is incompatible with the CDS archive. For example, the default CDS archive is dumped with -XX:+CompactStrings, so it cannot be loaded if you run java -XX:-CompactStrings ...
  • You can have a hard error if there's something wrong with the archive. E.g., it's been corrupted and CRC check fails

Before this PR, both soft and hard errors are logged to stderr when -Xshare:on is specified.

In the PR, we have chosen to log the soft errors to the info channel. The reason is to avoid excessive warning messages.

Some of the CDS tests check for the soft errors. Before this PR, these tests would find the expected messages in stderr. After this PR, we need to explicitly add -Xlog:cds to the command-line (See the above commit).

@dholmes-ora
Copy link
Member

In the PR, we have chosen to log the soft errors to the info channel.

Okay but we now seem to have diverged somewhat from the topic of "Log more CDS failure messages in the warning channel". Maybe this should be returned to draft whilst these details are fleshed out.

@matias9927 matias9927 marked this pull request as draft February 20, 2023 17:41
@openjdk openjdk bot removed the rfr Pull request is ready for review label Feb 20, 2023
@iklam
Copy link
Member

iklam commented Feb 22, 2023

In the PR, we have chosen to log the soft errors to the info channel.

Okay but we now seem to have diverged somewhat from the topic of "Log more CDS failure messages in the warning channel". Maybe this should be returned to draft whilst these details are fleshed out.

I updated the JBS bug report to match the latest implementation.

@matias9927 matias9927 changed the title 8292269: Log more CDS failure messages in the warning channel 8292269: Replace FileMapInfo::fail_continue() with Unified Logging Feb 22, 2023
@matias9927 matias9927 marked this pull request as ready for review February 22, 2023 22:07
@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 22, 2023
@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 27, 2023
Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks.

A couple of minor queries on the tests.

One nit with the exit strategy - partly pre-existing - it is not clear why vm_exit_during_initialization is used in some places and fail_stop in others. It seems to me that fail_stop may no longer be required given the code doesn't use the tty anyway. But consistency would be nice.

Comment on lines 951 to +953
vm_exit_during_initialization("Unable to use shared archive.");
} else if (RequireSharedSpaces) {
FileMapInfo::fail_stop("Unable to map shared spaces");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks odd to use fail_stop here and vm_exit_during_initialization just before it. ??

@@ -253,7 +253,6 @@ private static void doTest(String baseArchiveName, String topArchiveName) throws
"-XX:+RecordDynamicDumpInfo",
"--patch-module",
"foo.bar=xxx",
"-Xshare:auto",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

@@ -64,6 +64,7 @@ static void test() throws Exception {
"-XX:+WhiteBoxAPI",
use_whitebox_jar,
"-Xlog:cds=debug",
"-Xshare:auto",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this?

@matias9927
Copy link
Contributor Author

One test was failing due to an unexpected output. This PR moves the vm exit to a later stage for most cases, but in order to preserve the pre-change behavior, an early exit was added for a specific case.

Copy link
Member

@calvinccheung calvinccheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good. Just one nit.

@matias9927
Copy link
Contributor Author

Thank you for the reviews and feedback @calvinccheung , @dholmes-ora , and @iklam !
/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 3, 2023
@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@matias9927
Your change (at version 8d7c1d4) is now ready to be sponsored by a Committer.

@calvinccheung
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 3, 2023

Going to push as commit cd4b88d.
Since your change was applied there have been 116 commits pushed to the master branch:

  • ae29054: 8303175: (fs) Deprecate com.sun.nio.file.SensitivityWatchEventModifier for removal
  • a04b104: 8303413: (fs) Ignore polling interval sensitivity modifiers in PollingWatchService
  • 9944314: 8303587: Remove VMOutOfMemoryError001 test from the problem list after 8303198
  • a50dc67: 8303527: update for deprecated sprintf for jdk.hotspot.agent
  • 40c5edf: 8303181: Memory leak in ClassLoaderExt::setup_app_search_path
  • 29ee7c3: 8303523: Cleanup problem listing of nsk/jvmti/AttachOnDemand/attach002a/TestDescription.java
  • e3016c1: 8303472: Display name for region TR
  • ae797c6: 8301117: Remove old_size param from ResizeableResourceHashtable::resize()
  • 5085bd5: 8297936: Use reachabilityFence to manage liveness in ClassUnload tests
  • 379f206: 8303198: System and Runtime.exit() resilience to logging errors
  • ... and 106 more: https://git.openjdk.org/jdk/compare/07e976ac26fe3ff6a94713013114dc38c95950b8...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 3, 2023
@openjdk openjdk bot closed this Mar 3, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 3, 2023
@openjdk
Copy link

openjdk bot commented Mar 3, 2023

@calvinccheung @matias9927 Pushed as commit cd4b88d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants