-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Include direct memory and non-heap memory in ML memory calculations. #128346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| directMemoryMax = (Long) vmClass.getMethod("maxDirectMemory").invoke(null); | ||
| } catch (Exception t) { | ||
| // ignore | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reflection code above to obtain the max direct memory size doesn't work since Java 9. I'm not 100% sure, so to be (overly) cautious, I didn't remove it.
In case it fails (which may be always), use the Java args the obtain the max direct memory size. This should be always set by the JvmErgonomics class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think looking at the raw arguments is right. Hotspot has a way to get the hotspot args, see HotSpotDiagnosticMXBean below this, we already grab several other options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, that makes sense. fixed!
| ); | ||
| addMlNodeAttribute(additionalSettings, jvmSizeAttrName, Long.toString(Runtime.getRuntime().maxMemory())); | ||
|
|
||
| addMlNodeAttribute(additionalSettings, jvmSizeAttrName, Long.toString(JvmInfo.jvmInfo().getMem().getTotalMax().getBytes())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the JVM size, now use all memory that it may used by Java (so: heap, direct, and non-heap).
This should lead to less memory available for ML, and less OOMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this leads to not enough memory for ML, we should explicitly reduce the direct memory size on ML nodes, by setting the Java arg -XX:MaxDirectMemorySize to some smaller value.
|
Pinging @elastic/ml-core (Team:ML) |
|
Hi @jan-elastic, I've created a changelog YAML for you. |
valeriy42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. Can you please test with this changes that the ML free tier nodes with 4GB still can run ELSER and e5-small models?
c2c9a9a to
7f65acd
Compare
7f65acd to
1b43b32
Compare
fixes: #126535