Include direct memory and non-heap memory in ML memory calculations. #128346

jan-elastic · 2025-05-23T06:40:12Z

jan-elastic · 2025-05-23T06:50:10Z

server/src/main/java/org/elasticsearch/monitor/jvm/JvmInfo.java

            directMemoryMax = (Long) vmClass.getMethod("maxDirectMemory").invoke(null);
        } catch (Exception t) {
-            // ignore
+            try {


I think the reflection code above to obtain the max direct memory size doesn't work since Java 9. I'm not 100% sure, so to be (overly) cautious, I didn't remove it.

In case it fails (which may be always), use the Java args the obtain the max direct memory size. This should be always set by the JvmErgonomics class.

I don't think looking at the raw arguments is right. Hotspot has a way to get the hotspot args, see HotSpotDiagnosticMXBean below this, we already grab several other options.

thanks, that makes sense. fixed!

jan-elastic · 2025-05-23T06:52:59Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java

            );
-            addMlNodeAttribute(additionalSettings, jvmSizeAttrName, Long.toString(Runtime.getRuntime().maxMemory()));
+
+            addMlNodeAttribute(additionalSettings, jvmSizeAttrName, Long.toString(JvmInfo.jvmInfo().getMem().getTotalMax().getBytes()));


For the JVM size, now use all memory that it may used by Java (so: heap, direct, and non-heap).

This should lead to less memory available for ML, and less OOMs.

If this leads to not enough memory for ML, we should explicitly reduce the direct memory size on ML nodes, by setting the Java arg -XX:MaxDirectMemorySize to some smaller value.

elasticsearchmachine · 2025-05-23T06:56:46Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2025-05-23T06:56:47Z

Hi @jan-elastic, I've created a changelog YAML for you.

valeriy42

Looks reasonable to me. Can you please test with this changes that the ML free tier nodes with 4GB still can run ELSER and e5-small models?

jan-elastic requested a review from a team as a code owner May 23, 2025 06:40

jan-elastic requested a review from valeriy42 May 23, 2025 06:40

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 labels May 23, 2025

jan-elastic commented May 23, 2025

View reviewed changes

jan-elastic added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.0.3 v8.17.8 v8.18.3 and removed needs:triage Requires assignment of a team area label labels May 23, 2025

jan-elastic marked this pull request as draft May 23, 2025 14:01

valeriy42 approved these changes May 23, 2025

View reviewed changes

jan-elastic added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label May 26, 2025

jan-elastic force-pushed the ml-memory-nonheap-direct branch from c2c9a9a to 7f65acd Compare May 26, 2025 12:13

jan-elastic requested a review from rjernst May 26, 2025 12:14

jan-elastic added 3 commits May 28, 2025 16:58

Include direct memory and non-heap memory in ML memory calculations.

3417fcb

Use hotSpotDiagnosticMXBean

601d760

fix heap/native memory calculations in NativeMemoryCalculator

1b43b32

jan-elastic force-pushed the ml-memory-nonheap-direct branch from 7f65acd to 1b43b32 Compare May 28, 2025 14:58

jan-elastic closed this Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Include direct memory and non-heap memory in ML memory calculations. #128346

Include direct memory and non-heap memory in ML memory calculations. #128346

Uh oh!

jan-elastic commented May 23, 2025

Uh oh!

jan-elastic May 23, 2025

Uh oh!

rjernst May 23, 2025

Uh oh!

jan-elastic May 26, 2025

Uh oh!

jan-elastic May 23, 2025

Uh oh!

jan-elastic May 23, 2025

Uh oh!

elasticsearchmachine commented May 23, 2025

Uh oh!

elasticsearchmachine commented May 23, 2025

Uh oh!

valeriy42 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Include direct memory and non-heap memory in ML memory calculations. #128346

Include direct memory and non-heap memory in ML memory calculations. #128346

Uh oh!

Conversation

jan-elastic commented May 23, 2025

Uh oh!

jan-elastic May 23, 2025

Choose a reason for hiding this comment

Uh oh!

rjernst May 23, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic May 26, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic May 23, 2025

Choose a reason for hiding this comment

Uh oh!

jan-elastic May 23, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented May 23, 2025

Uh oh!

elasticsearchmachine commented May 23, 2025

Uh oh!

valeriy42 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants