Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8286030: Avoid JVM crash when containers share the same /tmp dir #9406

Conversation

iklam
Copy link
Member

@iklam iklam commented Jul 7, 2022

Some Kubernetes setups share the /tmp directory across multiple containers. On rare occasions, the JVM may crash when it tries to write to /tmp/hsperfdata_<user>/<pid> when a process in a separate container decides to do the same thing (because they happen to have the same namespaced pid).

This patch avoids the crash by using flock() to allow only one of these processes to write to the file. All other competing processes that fail to grab the lock will give up the file and run with PerfMemory disabled. We will try to enable PerfMemory for the failed processes in a follow-up RFE: JDK-8289883

Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of using flock().

I kept the use of kill() for stale file detection to be compatible with older JVMs.

I also took the opportunity to clean up the comments and remove dead code. The old code was using "shared memory resources" which sounds unclear and odd. I changed the terminology to say "shared memory file" instead.

(Note: this is a less ambitious revision of an earlier, withdrawn PR, #9226)


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8286030: Avoid JVM crash when containers share the same /tmp dir

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9406/head:pull/9406
$ git checkout pull/9406

Update a local copy of the PR:
$ git checkout pull/9406
$ git pull https://git.openjdk.org/jdk pull/9406/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 9406

View PR using the GUI difftool:
$ git pr show -t 9406

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9406.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 7, 2022

👋 Welcome back iklam! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 7, 2022

@iklam The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Jul 7, 2022
@iklam
Copy link
Member Author

iklam commented Jul 7, 2022

/label add serviceability

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Jul 7, 2022
@openjdk
Copy link

openjdk bot commented Jul 7, 2022

@iklam
The serviceability label was successfully added.

@iklam iklam marked this pull request as ready for review July 7, 2022 06:39
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 7, 2022
@mlbridge
Copy link

mlbridge bot commented Jul 7, 2022

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ioi,

not a full review, just a time-limited glance. Will take a closer look later.

Cheers, Thomas

src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
src/hotspot/os/posix/perfMemory_posix.cpp Outdated Show resolved Hide resolved
Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nits and questions. I like the test (did not know we have Docker support for tests in jtreg).


pid_t pid = filename_to_pid(entry->d_name);
const char* filename = entry->d_name;
pid_t pid = filename_to_pid(filename);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing. An error value of -1 would be somewhat cleaner since strictly speaking pid 0 is a valid PID. (Feel free to ignore this comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll leave it for now to minimize the behavior changes of this PR. I read somewhere that pid 0 is the scheduler

https://unix.stackexchange.com/questions/83322/which-process-has-pid-0

So no actual JVM process will create a stale file with name "0". If such a file exists for some other reason we would remove it, which be consistent with the new comment "any other files found in this directory may be removed".

if (fd == OS_ERR) {
// Something wrong happened. Ignore the error and don't try to remove the
// file.
errno = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a debug or trace log line here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

unlink(filename);
}

#if defined(LINUX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

#if defined(LINUX)
// Hold the lock until here to prevent other JVMs from using this file
// while we are in the middle of deleting it.
::close(fd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment, it seems to contradict what you are doing. We are closing the only fd referring to this lock file, right? So the lock should get unlocked here too? If we want to keep the lock open, shouldn't we avoid closing the fd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to prevent the following race condition. Let's assume this process has PID 10 and there's another process (in a different pid namespace) with PID 20. Both process see a file named "20".

  1. No one holds a lock on this file.
  2. Process 20 successfully locks the file in cleanup_sharedmem_files().
  3. Process 20 gives up the lock.
  4. Process 20 decides it can delete the file (PID 20 matches its own PID).
  5. This process successfully locks the file in cleanup_sharedmem_files().
  6. This process gives up the lock
  7. This process decides it can delete the file (PID 20 does not exist in my pid namespace)
  8. Process 20 deletes the file. Creates a new version of this file. Successfully locks the new file.
  9. This process deletes the new version of this file (by name).

By holding the lock between steps 4 and 8, we can guaranteed that if a process can successfully lock the file in create_sharedmem_file(), this file will never be unintentionally deleted.

I changed the comment slightly:

    // Hold the lock until here to prevent other JVMs from using this file
-   // while we are in the middle of deleting it.
+   // while we were in the middle of deleting it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining, and the past tense in the comment helps.

log_warning(perf, memops)("Cannot use file %s/%s because %s", dirname, filename,
(errno == EWOULDBLOCK) ?
"it is locked by another process" :
"flock() failed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strerror would be helpful here, or at least errno

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added just the errno. Example:

[0.003s][warning][perf,memops] Cannot use file /tmp/hsperfdata_root/1 
because it is locked by another process (errno = 11)

If we print the os::strerror() it would look like this:

... because it is locked by another process (errno = 11, Operation would block) 

which seems too verbose and could be confusing.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

#if defined(LINUX)
// Hold the lock until here to prevent other JVMs from using this file
// while we are in the middle of deleting it.
::close(fd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining, and the past tense in the comment helps.

@openjdk
Copy link

openjdk bot commented Jul 12, 2022

@iklam This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8286030: Avoid JVM crash when containers share the same /tmp dir

Reviewed-by: stuefe, sgehwolf

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 12, 2022
@iklam
Copy link
Member Author

iklam commented Jul 14, 2022

@jerboaa could you take a look? Thanks.

if (file1.equals(file2)) {
// This should be the common case -- the first started process in a container should
// have pid==1.
// One of the two contains must fail to create the hsperf file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/contains/containers/

Copy link
Contributor

@jerboaa jerboaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My manual tests of this work as expected as well.

$ podman run --rm -ti --userns=keep-id -u $(id -u) -v $(pwd)/shared-tmp:/tmp:z -v /disk/openjdk/upstream-sources/git/jdk-jdk/build/linux-x86_64-server-release/images/jdk:/opt/jdk:z -v $(pwd)/test:/opt/test:z fedora:36 /opt/jdk/bin/java -Xlog:perf+memops=debug -cp /opt/test HelloWait
[0.001s][debug][perf,memops] PerfDataMemorySize = 32768, os::vm_allocation_granularity = 4096, adjusted size = 32768
[0.001s][info ][perf,memops] Trying to open /tmp/hsperfdata_sgehwolf/1
[0.001s][info ][perf,memops] Successfully opened
[0.001s][debug][perf,memops] PerfMemory created: address = 0x00007fac290dd000, size = 32768
Hello!
$ podman run --rm -ti --userns=keep-id -u $(id -u) -v $(pwd)/shared-tmp:/tmp:z -v /disk/openjdk/upstream-sources/git/jdk-jdk/build/linux-x86_64-server-release/images/jdk:/opt/jdk:z -v $(pwd)/test:/opt/test:z fedora:36 /opt/jdk/bin/java -Xlog:perf+memops=debug -cp /opt/test HelloWait
[0.001s][debug][perf,memops] PerfDataMemorySize = 32768, os::vm_allocation_granularity = 4096, adjusted size = 32768
[0.001s][debug][perf,memops] flock for stale file check failed for /tmp/hsperfdata_sgehwolf/1
[0.001s][info ][perf,memops] Trying to open /tmp/hsperfdata_sgehwolf/1
[0.001s][warning][perf,memops] Cannot use file /tmp/hsperfdata_sgehwolf/1 because it is locked by another process (errno = 11)
[0.001s][debug  ][perf,memops] PerfMemory created: address = 0x00007fc60bc79000, size = 32768
Hello!

@iklam
Copy link
Member Author

iklam commented Jul 18, 2022

Thanks to @tstuefe and @jerboaa for the review.
/integrate

@openjdk
Copy link

openjdk bot commented Jul 18, 2022

Going to push as commit 84f2314.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jul 18, 2022
@openjdk openjdk bot closed this Jul 18, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 18, 2022
@openjdk
Copy link

openjdk bot commented Jul 18, 2022

@iklam Pushed as commit 84f2314.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

4 participants