Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GR-51479] Implement cgroup support in native code. #8989

Merged
merged 8 commits into from
Jul 6, 2024

Conversation

graalvmbot
Copy link
Collaborator

@graalvmbot graalvmbot commented May 29, 2024

On Linux, Hotspot contains two CGroups implementations: a native implementation written in C++, that is provides container support for low-level VM components such as the GC, and an internal Java API, that utilized higher-level functionality such as JFR.

Although, it might seem redundant to maintain two separate implementations at first glance, there are valid reasons for this approach. The Java API is more feature rich to serve the needs of Java-level monitoring tools. On the other hand, the C++ implementation only supports the bits needed by the VM, before the Java runtime startup is completed.

Up unit now, SubstrateVM used the Java-based CGroups implementation for its low-level container support. This poses the problem that during initialization of this API, we might already need to run the GC, which depends on information provided by the Java code that is currently initialized. To break this cycle, hard-coded (incorrect) values are used in the early initialization phase. See for example the changes in Target_jdk_internal_misc_VM.java.

Since working with incorrect values is problematic, SubstrateVM switches to an approach similar to Hotspot, i.e., having two CGroup implementations. A native one for low-level components and the Java API that comes with the JDK. To do so, we copy the C++ implementation from Hotspot into our code base and adopt it to our needs. The result is the libsvm_container library with a narrow native interface (substratevm/src/com.oracle.svm.native.libcontainer/src/hotspot/svm/svm_container.hpp). The library is self-contained and has no dependency on SubstrateVM, nor on the JDK. Thus, there is no need to keep it in sync with the JDK version we are running on. That said, we still plan to integrate upstream changes from time to time. See substratevm/src/com.oracle.svm.native.libcontainer/README.md for more details.

There is another advantage to moving SubstrateVM away from the Java implementation. The Java API uses java.util.regex for parsing certain CGroup information, which means that every native image would include the regex engine from the JDK, which might add up to 7MB to the total image size. To avoid this, the LabsJDK, which we distribute with GraalVM, contains an OpenJDK changes that removes the regex dependency See for example the CGroup related changes in the following diff:
graalvm/labs-openjdk@jdk-23+23...23+23-jvmci-b01
Note that the change is only an image size optimization. Native image happily runs with a stock JDK.

With the libsvm_container, there is no more need for this change, which is a requirement for switching to stock JDKs for development. See #8699 for more details.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label May 29, 2024
@zapster zapster self-assigned this May 29, 2024
@zapster
Copy link
Member

zapster commented May 29, 2024

@jerboaa @zakkak please have a look. If I'm informed correctly, you discussed this already with @christianhaeubl some time ago so there should be no big surprises. That said, please let me know about any concerns.

@jerboaa
Copy link
Collaborator

jerboaa commented May 29, 2024

@zapster Thanks for the ping. I'm sorry to say, but I have several concerns with this approach:

  • It adds ~20k lines of code which is to a large extent a fork of hotspot code which is bound to bit-rot. We've been there, but with the Java implementation of OpenJDK's cgroup code.
  • There is no clear indication of a test strategy. What ensures that this changed code works correctly and keeps working correctly? Ad-hoc testing doesn't seem to be sufficient. Certainly not what's mentioned in README.md of the new library.
  • In an earlier version of this, the GraalVM code had a fork of the Java classes for container support, but was dropped in favour of the OpenJDK implementation. This was a good move as it meant bug-for-bug equality for cgroups code between a Java version and a native-image compiled version. What was the reason to use the OpenJDK code then and why is that reason no longer valid?
  • The separate fork of the OpenJDK code was a source of bugs, since it didn't get updated as frequently as it should have been. Most of the time it was very hard to establish whether or not the same bug existed in native-image or not. Once one established that the bug existed, it would need the same fix (already done in OpenJDK). OpenJDK itself has - as you point out - two implementations. With this, there is again a third.
  • The imported code from hotspot is already out of date as of today. E.g. JDK-8302744 and related clean-ups that are going to land soon.
  • While it has been briefly mentioned to me by @christianhaeubl that this code ought to be implemented in native code, it wasn't discussed more broadly how that should be done. I'm a bit surprised to see this, I'm afraid. TBH, a clean-room implementation would likely be a better fit over an import of that size (including some of the dependencies that hotspot code has). How do you establish a resync? A bug in the os class in hotspot is found? A bug in the container detection code? Where do you draw the line for synchronization of code bases? This looks like it's going to be a frankenstein fork from day one.
  • Overall, the cost for maintaining this fork needs to be considered. The development and testing time needed for re-importing need to be considered and weighed thoroughly with the benefits that this brings. Has this been done and long-term maintenance considered? Speaking of benefits. It's not clear what those are? Is it really the size benefit of the RegEx subsystem? Any non-trivial application would drag in that dependency anyway. FWIW, we've shipped without that LabsJDK patch in Mandrel for many releases and nobody noticed any size increase in Quarkus. What exactly is the problem with the initialization trick that was implemented to break the cycle with the Java implementation? You mention "working with incorrect values is problematic". Could you explain the precise problem, please? Maybe that problem could be fixed without introducing 20k lines of forked OpenJDK code. I see the that this new code also needs careful initialization handling.
  • What does this mean for portability of generated native images? Does the library have a dependency of the libc version of the build host?

Perhaps there is a chance that you could reconsider?

@zapster
Copy link
Member

zapster commented Jun 3, 2024

@jerboaa thanks for the feedback and thanks for raising you concerns. Let me try to address them:

  • It adds ~20k lines of code which is to a large extent a fork of hotspot code which is bound to bit-rot. We've been there, but with the Java implementation of OpenJDK's cgroup code.

The difference to the Java implementation is that this one leaves the hotspot code mainly untouched. It only adds the #ifdef NATIVE_IMAGE guards and a few of the general functionality is custom. We implemented this with future upstream updates in mind. The process will be simple: copy over all of the hotspot code and review the diff, which is mainly reapplying the ifdefs and some "cosmetics" like copyright headers. Also, we plan to do regular updates, at least once per release.

  • There is no clear indication of a test strategy. What ensures that this changed code works correctly and keeps working correctly? Ad-hoc testing doesn't seem to be sufficient. Certainly not what's mentioned in README.md of the new library.

It has the same test coverage as the previous approach, so no regression in that sense. But I agree that testing can be improved. We plan to do so in follow-up changes. (We already have a list of things we want to improve after the first iteration lands.) One fun fact, most of our internal testing runs in containers (and I assume Github actions as well), so at least there is some confidence that this works reasonably well. 🙂

  • In an earlier version of this, the GraalVM code had a fork of the Java classes for container support, but was dropped in favour of the OpenJDK implementation. This was a good move as it meant bug-for-bug equality for cgroups code between a Java version and a native-image compiled version. What was the reason to use the OpenJDK code then and why is that reason no longer valid?

I assume you are referring to the state of 23.0. The Java classes where used for "low-level" C Group support, which on hotspot done via the native implementation. However, that has the same problem as our current (i.e. since 23.1) approach has: we cannot execute Java code at the time where we need this support.

  • The separate fork of the OpenJDK code was a source of bugs, since it didn't get updated as frequently as it should have been. Most of the time it was very hard to establish whether or not the same bug existed in native-image or not. Once one established that the bug existed, it would need the same fix (already done in OpenJDK).

The key point of this change is that we are as close to hotspot as we can. Hotspot has a native implementation and a Java implementation. We copy the native implementation with minor, reapplyable changes and use the Java implementation just as is. Also, we currently have a local JDK changes as well, which we want to get rid of because 1) it is another source of problems and maintenance overhead and 2) it is another differentiator between what "we" (as in GraalVM) run compared to what our collaborators do if they use their own JDK without our modification.

OpenJDK itself has - as you point out - two implementations. With this, there is again a third.

I'd rather say we are removing the need for our third one (the JDK change) and use the parts from hotspot as they are meant to be used: the native implementation for low-level support and the Java implementation for Java.

  • The imported code from hotspot is already out of date as of today. E.g. JDK-8302744 and related clean-ups that are going to land soon.

As I pointed out, we will keep the copied code updated. And we will resync soon once we get the initial version in.

  • While it has been briefly mentioned to me by @christianhaeubl that this code ought to be implemented in native code, it wasn't discussed more broadly how that should be done. I'm a bit surprised to see this, I'm afraid. TBH, a clean-room implementation would likely be a better fit over an import of that size (including some of the dependencies that hotspot code has). How do you establish a resync? A bug in the os class in hotspot is found? A bug in the container detection code? Where do you draw the line for synchronization of code bases? This looks like it's going to be a frankenstein fork from day one.

Regarding clean-room implementation: ideally, the implementation in hotspot would be hotspot independent and shipped as a library that native image can just use. But that probably hard or impossible to pull off. Having our own implementation seems to have all the problems you are concerned about with this approach plus that we need to figure out all problems ourselves. That is not realistic maintenance wise.

  • Overall, the cost for maintaining this fork needs to be considered. The development and testing time needed for re-importing need to be considered and weighed thoroughly with the benefits that this brings. Has this been done and long-term maintenance considered? Speaking of benefits. It's not clear what those are? Is it really the size benefit of the RegEx subsystem? Any non-trivial application would drag in that dependency anyway. FWIW, we've shipped without that LabsJDK patch in Mandrel for many releases and nobody noticed any size increase in Quarkus. What exactly is the problem with the initialization trick that was implemented to break the cycle with the Java implementation? You mention "working with incorrect values is problematic". Could you explain the precise problem, please? Maybe that problem could be fixed without introducing 20k lines of forked OpenJDK code. I see the that this new code also needs careful initialization handling.
  • re "long-term maintenance considered": yes, see above
  • size benefits: not every application need regex. while some use cases this will not be a problem, for others it is
  • initialization trick: it is just incorrect. e.g. the GC can work with wrong heap sizes and crash because it over-allocates beyond the container limits.
  • re "Maybe that problem could be fixed without introducing 20k lines of forked OpenJDK code.": the major point is, we cannot execute Java. We are open for all alternative suggestions.
  • re initialization handling: that is our top priority for the follow-up changes. We will change the initialization strategy to be more eagerly, but this requires careful benchmarking which we want to do only after we have the basic functionality up and running.
  • What does this mean for portability of generated native images? Does the library have a dependency of the libc version of the build host?

No changes. It is not different to other static libraries that are linked into native images, such as libchelper.

Perhaps there is a chance that you could reconsider?

As I tried to outline, the main problem we want to solve is that we cannot execute Java code when we first need C Groups support. To me it seems that there is no way around a native implementation. The native implementation in hotspot is well maintained and can be used by native image, with only "minor changes" (with that I mean mostly remove code that is not needed). It seem realistic that we can keep this up to date without too much maintenance overhead.

From your remarks I sense that your main concerns is keeping our copy updated. So in general, we have a lot of places that we need to keep in sync with the JDK, so we have experience with and means to do that. But I understand your concerns. Kind of as a proof-of-concept, I'll prepare a PR which resyncs the current hotspot code. (Don't expect anything the next two weeks as it is JDK branch-off time and holiday season. Don't read this as a sign that syncing is so complex. 😉 )

Regarding update frequency, we plan to update at least before every release, but of course we can import more often if need be. Another possibility is to add @BasedOnJDKFile annotations so that we at least track the source files and have all the information to detect changes.

@jerboaa
Copy link
Collaborator

jerboaa commented Jul 4, 2024

Regarding clean-room implementation: ideally, the implementation in hotspot would be hotspot independent and shipped as a library that native image can just use.

Have you considered using http://libcg.sourceforge.net?

@zapster
Copy link
Member

zapster commented Jul 5, 2024

Have you considered using http://libcg.sourceforge.net?

TBH, no. In general, external libraries need to be thoroughly reviewed and approved (compatible licenses, security reviews, etc.) and we need to track their vulnerabilities. Worst case, if the project gets unmaintained we need to keep maintaining it ourselves. That seems overly risky. Compared to that, the Hotspot code is under our control, maintained by JPG and known to have exactly the behavior that we want.

@zapster
Copy link
Member

zapster commented Jul 5, 2024

Kind of as a proof-of-concept, I'll prepare a PR which resyncs the current hotspot code. (Don't expect anything the next two weeks as it is JDK branch-off time and holiday season. Don't read this as a sign that syncing is so complex. 😉 )

Here we go #9250 (still work in progress).

@jerboaa
Copy link
Collaborator

jerboaa commented Jul 5, 2024

@jerboaa thanks for the feedback and thanks for raising you concerns. Let me try to address them:

  • It adds ~20k lines of code which is to a large extent a fork of hotspot code which is bound to bit-rot. We've been there, but with the Java implementation of OpenJDK's cgroup code.

The difference to the Java implementation is that this one leaves the hotspot code mainly untouched. It only adds the #ifdef NATIVE_IMAGE guards and a few of the general functionality is custom. We implemented this with future upstream updates in mind. The process will be simple: copy over all of the hotspot code and review the diff, which is mainly reapplying the ifdefs and some "cosmetics" like copyright headers. Also, we plan to do regular updates, at least once per release.

Sure, but since those #ifdefs and some other slight modifications are there, it's no longer the code that's being tested with the hotspot container tests and gtests. I think we have to agree to disagree that this is essentially a 3'rd implementation.

  • There is no clear indication of a test strategy. What ensures that this changed code works correctly and keeps working correctly? Ad-hoc testing doesn't seem to be sufficient. Certainly not what's mentioned in README.md of the new library.

It has the same test coverage as the previous approach, so no regression in that sense. But I agree that testing can be improved. We plan to do so in follow-up changes. (We already have a list of things we want to improve after the first iteration lands.) One fun fact, most of our internal testing runs in containers (and I assume Github actions as well), so at least there is some confidence that this works reasonably well. 🙂

I'm not sure it does. Using the JDK code verbatim at least has some guarantee that the JDK container tests cover issues in that code. That's not exactly the same thing with this.

  • In an earlier version of this, the GraalVM code had a fork of the Java classes for container support, but was dropped in favour of the OpenJDK implementation. This was a good move as it meant bug-for-bug equality for cgroups code between a Java version and a native-image compiled version. What was the reason to use the OpenJDK code then and why is that reason no longer valid?

I assume you are referring to the state of 23.0. The Java classes where used for "low-level" C Group support, which on hotspot done via the native implementation. However, that has the same problem as our current (i.e. since 23.1) approach has: we cannot execute Java code at the time where we need this support.

Noted. I was merely pointing out that we need to think about a second class of bugs that are theoretically possible using a detached tree, like this libsvm_container.a library.

  • The separate fork of the OpenJDK code was a source of bugs, since it didn't get updated as frequently as it should have been. Most of the time it was very hard to establish whether or not the same bug existed in native-image or not. Once one established that the bug existed, it would need the same fix (already done in OpenJDK).

The key point of this change is that we are as close to hotspot as we can. Hotspot has a native implementation and a Java implementation. We copy the native implementation with minor, reapplyable changes and use the Java implementation just as is.

Also, we currently have a local JDK changes as well, which we want to get rid of because 1) it is another source of problems and maintenance overhead and 2) it is another differentiator between what "we" (as in GraalVM) run compared to what our collaborators do if they use their own JDK without our modification.

I'll point out that this very same issue will pop up once the Oracle GraalVM team stops producing community builds based on JDK 25 and the rest of the community needs to maintain this. It doesn't really matter whether the skew is due to JDK version in use or external container library in use.

OpenJDK itself has - as you point out - two implementations. With this, there is again a third.

I'd rather say we are removing the need for our third one (the JDK change) and use the parts from hotspot as they are meant to be used: the native implementation for low-level support and the Java implementation for Java.

Well, the JDK Metrics code is still used. It's just no longer used for the low level code. In my book this is a 3rd implementation (due to ifdefs and other slight modifications).

  • The imported code from hotspot is already out of date as of today. E.g. JDK-8302744 and related clean-ups that are going to land soon.

As I pointed out, we will keep the copied code updated. And we will resync soon once we get the initial version in.

OK. We'll essentially need to do this for every bug fixed in the hotspot code.

  • While it has been briefly mentioned to me by @christianhaeubl that this code ought to be implemented in native code, it wasn't discussed more broadly how that should be done. I'm a bit surprised to see this, I'm afraid. TBH, a clean-room implementation would likely be a better fit over an import of that size (including some of the dependencies that hotspot code has). How do you establish a resync? A bug in the os class in hotspot is found? A bug in the container detection code? Where do you draw the line for synchronization of code bases? This looks like it's going to be a frankenstein fork from day one.

Regarding clean-room implementation: ideally, the implementation in hotspot would be hotspot independent and shipped as a library that native image can just use. But that probably hard or impossible to pull off. Having our own implementation seems to have all the problems you are concerned about with this approach plus that we need to figure out all problems ourselves. That is not realistic maintenance wise.

Fair enough.

  • Overall, the cost for maintaining this fork needs to be considered. The development and testing time needed for re-importing need to be considered and weighed thoroughly with the benefits that this brings. Has this been done and long-term maintenance considered? Speaking of benefits. It's not clear what those are? Is it really the size benefit of the RegEx subsystem? Any non-trivial application would drag in that dependency anyway. FWIW, we've shipped without that LabsJDK patch in Mandrel for many releases and nobody noticed any size increase in Quarkus. What exactly is the problem with the initialization trick that was implemented to break the cycle with the Java implementation? You mention "working with incorrect values is problematic". Could you explain the precise problem, please? Maybe that problem could be fixed without introducing 20k lines of forked OpenJDK code. I see the that this new code also needs careful initialization handling.
* re "long-term maintenance considered": yes, see above

It seems still be underestimated. Importing code != correct code.

* size benefits: not every application need regex. while some use cases this will not be a problem, for others it is

Sure. I'm seeing native image in the context of Quarkus. We've never seen that issue there. YMMV of course.

* initialization trick: it is just incorrect. e.g. the GC can work with wrong heap sizes and crash because it over-allocates beyond the container limits.

OK.

* re "Maybe that problem could be fixed without introducing 20k lines of forked OpenJDK code.": the major point is, we cannot execute Java. We are open for all alternative suggestions.

OK.

* re initialization handling: that is our top priority for the follow-up changes. We will change the initialization strategy to be more eagerly, but this requires careful benchmarking which we want to do only after we have the basic functionality up and running.
  • What does this mean for portability of generated native images? Does the library have a dependency of the libc version of the build host?

No changes. It is not different to other static libraries that are linked into native images, such as libchelper.

Thanks.

Perhaps there is a chance that you could reconsider?

As I tried to outline, the main problem we want to solve is that we cannot execute Java code when we first need C Groups support. To me it seems that there is no way around a native implementation. The native implementation in hotspot is well maintained and can be used by native image, with only "minor changes" (with that I mean mostly remove code that is not needed). It seem realistic that we can keep this up to date without too much maintenance overhead.

From your remarks I sense that your main concerns is keeping our copy updated. So in general, we have a lot of places that we need to keep in sync with the JDK, so we have experience with and means to do that. But I understand your concerns. Kind of as a proof-of-concept, I'll prepare a PR which resyncs the current hotspot code. (Don't expect anything the next two weeks as it is JDK branch-off time and holiday season. Don't read this as a sign that syncing is so complex. 😉 )

Regarding update frequency, we plan to update at least before every release, but of course we can import more often if need be. Another possibility is to add @BasedOnJDKFile annotations so that we at least track the source files and have all the information to detect changes.

My main concern is that this implementation will collect bugs which aren't present in OpenJDK and we'll have to fix bugs in the svm_container library as well. I'll accept that this seems to be the way the Oracle GraalVM team wants to go.

@graalvmbot graalvmbot closed this Jul 6, 2024
@graalvmbot graalvmbot merged commit 10223db into master Jul 6, 2024
12 checks passed
@graalvmbot graalvmbot deleted the chaeubl/GR-51479 branch July 6, 2024 09:53
Karm pushed a commit to graalvm/mandrel-packaging that referenced this pull request Jul 9, 2024
zakkak pushed a commit to zakkak/mandrel-packaging that referenced this pull request Jul 18, 2024
Closes: graalvm/mandrel#765

Depends on this upstream PR to be merged:
oracle/graal#8989

(cherry picked from commit d6ba9ca)
@zapster
Copy link
Member

zapster commented Aug 5, 2024

This was merged a few weeks ago and with #9250 in place, keeping the code up to date with the JDK worked out smoothly so far.

That said, we still think that the ideal solution would be to refactor the native cgroups implementation in the JDK code base into a separate, Hotspot independent library, so that it could be used by native image (or other projects) directly. However, we do not have the resources to pull this off at the moment. For the time being, we will continue following our current approach of copying and syncing the implementation into our own code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants