Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8341310: Test TestJcmdWithSideCar.java should skip ACCESS_TMP_VIA_PROC_ROOT (after JDK-8327114) #21289

Closed
wants to merge 5 commits into from

Conversation

jerboaa
Copy link
Contributor

@jerboaa jerboaa commented Oct 1, 2024

The change of JDK-8327114 also increased test coverage. In particular, the TestJcmdWithSideCar.java test got enhanced to cover these cases (prior to JDK-8327114 only case 1 was tested):

  1. Shared volumes between attachee and attacher and shared pid namespace
  2. Shared volumes between attachee and attacher and shared pid namespace, both running with elevated privileges
  3. Shared pid namespace between attachee and attacher only
  4. Shared pid namespace between attachee and attacher, both running with elevated privileges

The OpenJDK attach code is able to handle cases 1 through 3 which pass, but the last case, 4, hasn't been implemented yet when running as regular user and directing the container runtime to map the container user to that user as well. Thus, the test fails. For now I propose to disable the 4th test case. It can get re-enabled once the product code got updated to account for this case (tracked in https://bugs.openjdk.org/browse/JDK-8341349).

Thoughts? Could somebody please run this through Oracle's test system in order to see if this fixes the issue? Thank you!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8341310: Test TestJcmdWithSideCar.java should skip ACCESS_TMP_VIA_PROC_ROOT (after JDK-8327114) (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21289/head:pull/21289
$ git checkout pull/21289

Update a local copy of the PR:
$ git checkout pull/21289
$ git pull https://git.openjdk.org/jdk.git pull/21289/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21289

View PR using the GUI difftool:
$ git pr show -t 21289

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21289.diff

Webrev

Link to Webrev Comment

Before:

Passed: containers/docker/TestJcmdWithSideCar.java
  build: 2.08 seconds
  compile: 2.068 seconds
  build: 4.842 seconds
  compile: 4.841 seconds
  driver: 70.776 seconds
Test results: passed: 1

After:

Passed: containers/docker/TestJcmdWithSideCar.java
  build: 2.169 seconds
  compile: 2.157 seconds
  build: 4.964 seconds
  compile: 4.963 seconds
  driver: 22.928 seconds
Test results: passed: 1
@bridgekeeper
Copy link

bridgekeeper bot commented Oct 1, 2024

👋 Welcome back sgehwolf! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 1, 2024

@jerboaa This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8341310: Test TestJcmdWithSideCar.java should skip ACCESS_TMP_VIA_PROC_ROOT (after JDK-8327114)

Reviewed-by: kevinw

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 39 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 1, 2024
@openjdk
Copy link

openjdk bot commented Oct 1, 2024

@jerboaa The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 1, 2024
@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 1, 2024

/label add serviceability

@openjdk openjdk bot added the serviceability serviceability-dev@openjdk.org label Oct 1, 2024
@openjdk
Copy link

openjdk bot commented Oct 1, 2024

@jerboaa
The serviceability label was successfully added.

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 1, 2024

PTAL @slovdahl Thanks!

@mlbridge
Copy link

mlbridge bot commented Oct 1, 2024

Webrevs

@kevinjwalls
Copy link
Contributor

Thanks for clarifying the different testing combinations and the pointer to the missing one for follow up in JDK-8341349.
I can check our testing with this change...

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 1, 2024

Thanks, Kevin.

@slovdahl
Copy link
Contributor

slovdahl commented Oct 2, 2024

PTAL @slovdahl Thanks!

@jerboaa Thanks a lot for the thorough investigation here!

I was also able to reproduce the failure locally on my Ubuntu 24.04 using podman (that runs containers as a regular user):

sudo apt install podman
make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=podman"

...

STDERR:
java.lang.RuntimeException: ACCESS_TMP_VIA_PROC_ROOT: Could not find specified process
	at TestJcmdWithSideCar.testCase01(TestJcmdWithSideCar.java:138)
	at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:111)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:573)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
	at java.base/java.lang.Thread.run(Thread.java:1576)

...

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
>>                                                       1     0     1     0 <<
==============================

Whereas with docker (that runs containers as root by default) the test passes:

make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker"

==============================
Test summary
==============================
   TEST                                              TOTAL  PASS  FAIL ERROR   
   jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
                                                         1     1     0     0   
==============================

With podman and the changes in this PR, the test works again.

While at it, I've discovered that the EventGeneratorLoop running container always runs for a fixed time: 30 seconds. That's irrespective of the attacher containers being done. Therefore, two iterations of the loop spawning this container running the fixed set of time will at least run 60 seconds.

Thanks for trying to improve this too, it was quite annoying to have to wait so long for each test run.. 😄

However, when I'm running the test with docker with the changes from this PR (make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker") I get a new failure: docker-failure.log. If I revert the test runtime optimization (and keep the disabling part) the test works with docker again.

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 2, 2024

However, when I'm running the test with docker with the changes from this PR (make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker") I get a new failure: docker-failure.log. If I revert the test runtime optimization (and keep the disabling part) the test works with docker again.

Thanks. Fails with:

[main-container-process] docker: Error response from daemon: Conflict. The container name "/test-container-main" is already in use by container "258e8ebbbac586ffcc6a7703bcb977bd7b4c203f4badb2ab08e1651ee546c1d6". You have to remove (or rename) that container to be able to reuse that name.
[main-container-process] See 'docker run --help'.
java.lang.RuntimeException: Timed out while waiting for main() to start
	at TestJcmdWithSideCar$MainContainer.waitForMainMethodStart(TestJcmdWithSideCar.java:285)
	at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:110)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:573)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
	at java.base/java.lang.Thread.run(Thread.java:1576)

We'll have to do docker rm -i /test-container-main before we start a new one. I'll fix that later today.

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 2, 2024

We'll have to do docker rm -i /test-container-main before we start a new one. I'll fix that later today.

Updated in ef7abf2 If that doesn't work, I'll remove the test run time optimization.

@slovdahl
Copy link
Contributor

slovdahl commented Oct 2, 2024

Right, this turned into a rabbit hole of its own.. 😁

make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker" is still failing:

Full child process STDOUT was saved to docker-stdout-139602.log
[2024-10-02T17:02:06.397293068Z] Waiting for completion for process 139602
[2024-10-02T17:02:06.397362093Z] Waiting for completion finished for process 139602
[COMMAND]
docker ps 
[2024-10-02T17:02:07.771881444Z] Gathering output for process 139723
[ELAPSED: 19 ms]
[STDERR]

[STDOUT]
CONTAINER ID   IMAGE                                                              COMMAND                  CREATED         STATUS         PORTS     NAMES
1db392c08e7c   jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd   "/jdk/bin/java -XX:+…"   5 seconds ago   Up 5 seconds             test-container-main

Full child process STDOUT was saved to docker-stdout-139723.log
[2024-10-02T17:02:07.790740367Z] Waiting for completion for process 139723
[2024-10-02T17:02:07.790915466Z] Waiting for completion finished for process 139723
[COMMAND]
docker rm test-container-main 
[2024-10-02T17:02:07.792750591Z] Gathering output for process 139734
[ELAPSED: 15 ms]
[STDERR]
Error response from daemon: You cannot remove a running container 1db392c08e7cbac3a256597c2ee693fdbbeab3a814cb229c83ac38c0c805f42d. Stop the container before attempting removal or force remove

I think the underlying problem is that killing the process from the Java test is not properly propagated to the EventGeneratorLoop process running as a Docker container. Among others, I found https://www.kaggle.com/code/residentmario/best-practices-for-propagating-signals-on-docker talking about it. So, in the Docker case the container continues running for the whole 30 seconds even though we called p.destroy(). docker rm for the next test case will fail because the container is still running. So, I started out by trying this:

diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
index 5132f14d788..4726284cff1 100644
--- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
+++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
@@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws Exception {
             OutputAnalyzer out = DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps")
                 .shouldHaveExitValue(0);
             if (out.contains(MAIN_CONTAINER_NAME)) {
-                DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", MAIN_CONTAINER_NAME)
+                DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", "--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME)
                    .shouldHaveExitValue(0);
             }
             // start "main" container (the observee)
             DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop");

..only to be met by:

[main-container-process] docker: Error response from daemon: Conflict. The container name "/test-container-main" is already in use by container "f2754e97e6eecd6b0538e8a8c3d7c12214d16f9f32a9e2fe313c1d9f3258732c". You have to remove (or rename) that container to be able to reuse that name.
[main-container-process] See 'docker run --help'.
java.lang.RuntimeException: Timed out while waiting for main() to start
	at TestJcmdWithSideCar$MainContainer.waitForMainMethodStart(TestJcmdWithSideCar.java:293)
	at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:110)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
	at java.base/java.lang.reflect.Method.invoke(Method.java:573)
	at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
	at java.base/java.lang.Thread.run(Thread.java:1576)

Because we start containers with --rm, they seem to be asynchronously removed after stop has finished. Finally, if we wait a little bit after the stop command, make test TEST="jtreg:test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java" JTREG="JAVA_OPTIONS=-Djdk.test.container.command=docker" works again.

diff --git test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
index 5132f14d788..4726284cff1 100644
--- test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
+++ test/hotspot/jtreg/containers/docker/TestJcmdWithSideCar.java
@@ -251,8 +251,9 @@ public Process start(final boolean elevated) throws Exception {
             OutputAnalyzer out = DockerTestUtils.execute(Container.ENGINE_COMMAND, "ps")
                 .shouldHaveExitValue(0);
             if (out.contains(MAIN_CONTAINER_NAME)) {
-                DockerTestUtils.execute(Container.ENGINE_COMMAND, "rm", MAIN_CONTAINER_NAME)
+                DockerTestUtils.execute(Container.ENGINE_COMMAND, "stop", "--time=1", "--signal=SIGKILL", MAIN_CONTAINER_NAME)
                    .shouldHaveExitValue(0);
+                Thread.sleep(1_000L);
             }
             // start "main" container (the observee)
             DockerRunOptions opts = commonDockerOpts("EventGeneratorLoop");

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 2, 2024

Right, this turned into a rabbit hole of its own.. 😁

Thanks. I'll remove this part from the patch as it's orthogonal to the actual issue and would just like to get the test passing again.

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 2, 2024

Right, this turned into a rabbit hole of its own.. 😁

Thanks. I'll remove this part from the patch as it's orthogonal to the actual issue and would just like to get the test passing again.

Filed https://bugs.openjdk.org/browse/JDK-8341436 to track this separate issue.

Copy link
Contributor

@slovdahl slovdahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@slovdahl
Copy link
Contributor

slovdahl commented Oct 2, 2024

Filed https://bugs.openjdk.org/browse/JDK-8341436 to track this separate issue.

Do you intend to look into this yourself? If not, I would be happy to pick it up.

@slovdahl
Copy link
Contributor

slovdahl commented Oct 2, 2024

It can get re-enabled once the product code got updated to account for this case (tracked in https://bugs.openjdk.org/browse/JDK-8341349).

I spent some time thinking about this, and I'm not sure if it can be solved?

The test case that fails with Podman is ACCESS_TMP_VIA_PROC_ROOT. That is, we try to attach to another JVM by accessing the target JVM's root filesystem through /proc/[pid]/root. But for processes with elevated privileges /proc/[pid]/root can only be read by root. That is why it works with the default setup of Docker but not Podman. Or am I missing something?

@slovdahl
Copy link
Contributor

slovdahl commented Oct 3, 2024

Filed https://bugs.openjdk.org/browse/JDK-8341436 to track this separate issue.

Do you intend to look into this yourself? If not, I would be happy to pick it up.

It finally occurred to me why p.destroy() doesn't seem to have any effect: the process the Java test sees is just a short-lived one that calls the Docker daemon. That is, it is not a reference to the container process itself and it cannot stop the container. That needs to go through a docker stop call.

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 3, 2024

Filed https://bugs.openjdk.org/browse/JDK-8341436 to track this separate issue.

Do you intend to look into this yourself? If not, I would be happy to pick it up.

Please go ahead and have a go at this.

@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 3, 2024

I can check our testing with this change...

@kevinjwalls Any update on this? Thanks!

@kevinjwalls
Copy link
Contributor

Good timing, I was just writing:

Thanks, looks good. Good to delay the additional changes.
Would be great if you can change the bug and PR title to something like:
8341310: Test TestJcmdWithSideCar.java should skip ACCESS_TMP_VIA_PROC_ROOT (after JDK-8327114)
(Multiple "test fails after..." bugs are confusing to me at least!)

With the test as it stands in the repo currently, I am seeing another failure. I don't get this myself with the change in this PR, but that may just be luck.

It's on Linux x64, with TMP_MOUNTED_INTO_SIDECAR, where the command:

docker run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --cap-add=NET_BIND_SERVICE --user=10668:10668 --volumes-from test-container-main jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/jcmd -l

...gets no output, where a good run would show:

[STDOUT]
1 EventGeneratorLoop 120
24 jdk.jcmd/sun.tools.jcmd.JCmd -l

e.g.

[main-container-process] MAIN_METHOD_STARTED, argument is 120
Attach strategy TMP_MOUNTED_INTO_SIDECAR
[COMMAND]
docker run --tty=true --rm --cap-add=SYS_PTRACE --sig-proxy=true --pid=container:test-container-main --cap-add=NET_BIND_SERVICE --user=10668:10668 --volumes-from test-container-main jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd /jdk/bin/jcmd -l
[2024-10-03T04:30:35.273416534Z] Gathering output for process 12125
[ELAPSED: 1068 ms]
[STDERR]

[STDOUT]

Full child process STDOUT was saved to docker-stdout-12125.log
[2024-10-03T04:30:36.341378706Z] Waiting for completion for process 12125
[2024-10-03T04:30:36.341534140Z] Waiting for completion finished for process 12125
[COMMAND]
docker rmi --force jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd
[2024-10-03T04:30:36.349399928Z] Gathering output for process 12260
[ELAPSED: 27 ms]
[STDERR]

[STDOUT]
Untagged: jdk-internal:test-containers-docker-TestJcmdWithSideCar-jfr-jcmd

Full child process STDOUT was saved to docker-stdout-12260.log
----------System.err:(16/748)----------
stdout: [];
stderr: []
exitValue = 0

java.lang.RuntimeException: 'sun.tools.jcmd.JCmd' missing from stdout/stderr
at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:253)
at TestJcmdWithSideCar.testCase01(TestJcmdWithSideCar.java:135)
at TestJcmdWithSideCar.main(TestJcmdWithSideCar.java:111)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
at java.base/java.lang.reflect.Method.invoke(Method.java:573)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333)
at java.base/java.lang.Thread.run(Thread.java:1576)

@jerboaa jerboaa changed the title 8341310: Test containers/docker/TestJcmdWithSideCar.java fails after JDK-8327114 8341310: Test TestJcmdWithSideCar.java should skip ACCESS_TMP_VIA_PROC_ROOT (after JDK-8327114) Oct 3, 2024
@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 3, 2024

@kevinjwalls

Thanks, looks good. Good to delay the additional changes. Would be great if you can change the bug and PR title to something like: 8341310: Test TestJcmdWithSideCar.java should skip ACCESS_TMP_VIA_PROC_ROOT (after JDK-8327114) (Multiple "test fails after..." bugs are confusing to me at least!)

OK. Done.

With the test as it stands in the repo currently, I am seeing another failure. I don't get this myself with the change in this PR, but that may just be luck.

I'm confused. Are you saying that after this patch in revision 24a4303 there are still failures in Oracle's CI?

@kevinjwalls
Copy link
Contributor

Sorry, was saying that the other failure type, ("'sun.tools.jcmd.JCmd' missing") is happening now in CI. I have not reproduced it with the change in this PR.

But, as that "other" failure is happening with "sidecar", skipping ACCESS_TMP_VIA_PROC_ROOT should not fix it? So maybe there is e.g. a chance/timing problem.

Either way, this change as it stands is still good, I think we should get it in to minimise the problems. I can monitor and see if the "JCmd missing" problem happens again.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 3, 2024
@jerboaa
Copy link
Contributor Author

jerboaa commented Oct 3, 2024

Thanks!

/integrate

@openjdk
Copy link

openjdk bot commented Oct 3, 2024

Going to push as commit 21f8ccf.
Since your change was applied there have been 39 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 3, 2024
@openjdk openjdk bot closed this Oct 3, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 3, 2024
@openjdk
Copy link

openjdk bot commented Oct 3, 2024

@jerboaa Pushed as commit 21f8ccf.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@kevinjwalls
Copy link
Contributor

This change is working, having the intended effect. 8-)
I logged https://bugs.openjdk.org/browse/JDK-8341518 to follow up the separate intermittent failure we are seeing:
java.lang.RuntimeException: 'sun.tools.jcmd.JCmd' missing from stdout/stderr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.

3 participants