-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8286030: Avoid JVM crash when containers share the same /tmp dir #9406
8286030: Avoid JVM crash when containers share the same /tmp dir #9406
Conversation
…he shared mem file
👋 Welcome back iklam! A progress list of the required criteria for merging this PR into |
/label add serviceability |
@iklam |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ioi,
not a full review, just a time-limited glance. Will take a closer look later.
Cheers, Thomas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just nits and questions. I like the test (did not know we have Docker support for tests in jtreg).
|
||
pid_t pid = filename_to_pid(entry->d_name); | ||
const char* filename = entry->d_name; | ||
pid_t pid = filename_to_pid(filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-existing. An error value of -1 would be somewhat cleaner since strictly speaking pid 0 is a valid PID. (Feel free to ignore this comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll leave it for now to minimize the behavior changes of this PR. I read somewhere that pid 0 is the scheduler
https://unix.stackexchange.com/questions/83322/which-process-has-pid-0
So no actual JVM process will create a stale file with name "0". If such a file exists for some other reason we would remove it, which be consistent with the new comment "any other files found in this directory may be removed".
if (fd == OS_ERR) { | ||
// Something wrong happened. Ignore the error and don't try to remove the | ||
// file. | ||
errno = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a debug or trace log line here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
unlink(filename); | ||
} | ||
|
||
#if defined(LINUX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
#if defined(LINUX) | ||
// Hold the lock until here to prevent other JVMs from using this file | ||
// while we are in the middle of deleting it. | ||
::close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this comment, it seems to contradict what you are doing. We are closing the only fd referring to this lock file, right? So the lock should get unlocked here too? If we want to keep the lock open, shouldn't we avoid closing the fd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to prevent the following race condition. Let's assume this process has PID 10 and there's another process (in a different pid namespace) with PID 20. Both process see a file named "20".
- No one holds a lock on this file.
- Process 20 successfully locks the file in cleanup_sharedmem_files().
- Process 20 gives up the lock.
- Process 20 decides it can delete the file (PID 20 matches its own PID).
- This process successfully locks the file in cleanup_sharedmem_files().
- This process gives up the lock
- This process decides it can delete the file (PID 20 does not exist in my pid namespace)
- Process 20 deletes the file. Creates a new version of this file. Successfully locks the new file.
- This process deletes the new version of this file (by name).
By holding the lock between steps 4 and 8, we can guaranteed that if a process can successfully lock the file in create_sharedmem_file(), this file will never be unintentionally deleted.
I changed the comment slightly:
// Hold the lock until here to prevent other JVMs from using this file
- // while we are in the middle of deleting it.
+ // while we were in the middle of deleting it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for explaining, and the past tense in the comment helps.
log_warning(perf, memops)("Cannot use file %s/%s because %s", dirname, filename, | ||
(errno == EWOULDBLOCK) ? | ||
"it is locked by another process" : | ||
"flock() failed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strerror would be helpful here, or at least errno
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added just the errno. Example:
[0.003s][warning][perf,memops] Cannot use file /tmp/hsperfdata_root/1
because it is locked by another process (errno = 11)
If we print the os::strerror()
it would look like this:
... because it is locked by another process (errno = 11, Operation would block)
which seems too verbose and could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
#if defined(LINUX) | ||
// Hold the lock until here to prevent other JVMs from using this file | ||
// while we are in the middle of deleting it. | ||
::close(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for explaining, and the past tense in the comment helps.
@iklam This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
@jerboaa could you take a look? Thanks. |
if (file1.equals(file2)) { | ||
// This should be the common case -- the first started process in a container should | ||
// have pid==1. | ||
// One of the two contains must fail to create the hsperf file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/contains/containers/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. My manual tests of this work as expected as well.
$ podman run --rm -ti --userns=keep-id -u $(id -u) -v $(pwd)/shared-tmp:/tmp:z -v /disk/openjdk/upstream-sources/git/jdk-jdk/build/linux-x86_64-server-release/images/jdk:/opt/jdk:z -v $(pwd)/test:/opt/test:z fedora:36 /opt/jdk/bin/java -Xlog:perf+memops=debug -cp /opt/test HelloWait
[0.001s][debug][perf,memops] PerfDataMemorySize = 32768, os::vm_allocation_granularity = 4096, adjusted size = 32768
[0.001s][info ][perf,memops] Trying to open /tmp/hsperfdata_sgehwolf/1
[0.001s][info ][perf,memops] Successfully opened
[0.001s][debug][perf,memops] PerfMemory created: address = 0x00007fac290dd000, size = 32768
Hello!
$ podman run --rm -ti --userns=keep-id -u $(id -u) -v $(pwd)/shared-tmp:/tmp:z -v /disk/openjdk/upstream-sources/git/jdk-jdk/build/linux-x86_64-server-release/images/jdk:/opt/jdk:z -v $(pwd)/test:/opt/test:z fedora:36 /opt/jdk/bin/java -Xlog:perf+memops=debug -cp /opt/test HelloWait
[0.001s][debug][perf,memops] PerfDataMemorySize = 32768, os::vm_allocation_granularity = 4096, adjusted size = 32768
[0.001s][debug][perf,memops] flock for stale file check failed for /tmp/hsperfdata_sgehwolf/1
[0.001s][info ][perf,memops] Trying to open /tmp/hsperfdata_sgehwolf/1
[0.001s][warning][perf,memops] Cannot use file /tmp/hsperfdata_sgehwolf/1 because it is locked by another process (errno = 11)
[0.001s][debug ][perf,memops] PerfMemory created: address = 0x00007fc60bc79000, size = 32768
Hello!
…avoid-jvm-crash-when-containers-share-tmp-dir
Going to push as commit 84f2314. |
Some Kubernetes setups share the /tmp directory across multiple containers. On rare occasions, the JVM may crash when it tries to write to
/tmp/hsperfdata_<user>/<pid>
when a process in a separate container decides to do the same thing (because they happen to have the same namespaced pid).This patch avoids the crash by using
flock()
to allow only one of these processes to write to the file. All other competing processes that fail to grab the lock will give up the file and run with PerfMemory disabled. We will try to enable PerfMemory for the failed processes in a follow-up RFE: JDK-8289883Thanks to Vitaly Davidovich and Nico Williams for coming up with the idea of using
flock()
.I kept the use of
kill()
for stale file detection to be compatible with older JVMs.I also took the opportunity to clean up the comments and remove dead code. The old code was using "shared memory resources" which sounds unclear and odd. I changed the terminology to say "shared memory file" instead.
(Note: this is a less ambitious revision of an earlier, withdrawn PR, #9226)
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/9406/head:pull/9406
$ git checkout pull/9406
Update a local copy of the PR:
$ git checkout pull/9406
$ git pull https://git.openjdk.org/jdk pull/9406/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 9406
View PR using the GUI difftool:
$ git pr show -t 9406
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/9406.diff