Skip to content

Conversation

@jan-elastic
Copy link
Contributor

fixes: #126535

@jan-elastic jan-elastic requested a review from a team as a code owner May 23, 2025 06:40
@jan-elastic jan-elastic requested a review from valeriy42 May 23, 2025 06:40
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 labels May 23, 2025
directMemoryMax = (Long) vmClass.getMethod("maxDirectMemory").invoke(null);
} catch (Exception t) {
// ignore
try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reflection code above to obtain the max direct memory size doesn't work since Java 9. I'm not 100% sure, so to be (overly) cautious, I didn't remove it.

In case it fails (which may be always), use the Java args the obtain the max direct memory size. This should be always set by the JvmErgonomics class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think looking at the raw arguments is right. Hotspot has a way to get the hotspot args, see HotSpotDiagnosticMXBean below this, we already grab several other options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, that makes sense. fixed!

);
addMlNodeAttribute(additionalSettings, jvmSizeAttrName, Long.toString(Runtime.getRuntime().maxMemory()));

addMlNodeAttribute(additionalSettings, jvmSizeAttrName, Long.toString(JvmInfo.jvmInfo().getMem().getTotalMax().getBytes()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the JVM size, now use all memory that it may used by Java (so: heap, direct, and non-heap).

This should lead to less memory available for ML, and less OOMs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this leads to not enough memory for ML, we should explicitly reduce the direct memory size on ML nodes, by setting the Java arg -XX:MaxDirectMemorySize to some smaller value.

@jan-elastic jan-elastic added >bug :ml Machine learning Team:ML Meta label for the ML team auto-backport Automatically create backport pull requests when merged v8.19.0 v9.0.3 v8.17.8 v8.18.3 and removed needs:triage Requires assignment of a team area label labels May 23, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Hi @jan-elastic, I've created a changelog YAML for you.

@jan-elastic jan-elastic marked this pull request as draft May 23, 2025 14:01
Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Can you please test with this changes that the ML free tier nodes with 4GB still can run ELSER and e5-small models?

@jan-elastic jan-elastic added the cloud-deploy Publish cloud docker image for Cloud-First-Testing label May 26, 2025
@jan-elastic jan-elastic force-pushed the ml-memory-nonheap-direct branch from c2c9a9a to 7f65acd Compare May 26, 2025 12:13
@jan-elastic jan-elastic requested a review from rjernst May 26, 2025 12:14
@jan-elastic jan-elastic force-pushed the ml-memory-nonheap-direct branch from 7f65acd to 1b43b32 Compare May 28, 2025 14:58
@jan-elastic jan-elastic closed this Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >bug cloud-deploy Publish cloud docker image for Cloud-First-Testing :ml Machine learning Team:ML Meta label for the ML team v8.17.8 v8.18.3 v8.19.0 v9.0.3 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants