-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed resources used/wasted computation for spark jobs - (Depends on Custom SHS - Requires peakJvmUsedMemory metric) #287
Conversation
val totalExecutorTaskTimeMillis = totalExecutorTaskTimeMillisOf(data) | ||
|
||
val resourcesAllocatedForUse = | ||
aggregateresourcesAllocatedForUse(executorInstances, executorMemoryBytes, applicationDurationMillis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocated resources should take care of the dynamic allocation.
var sumResourceUsage: BigInt = 0 | ||
executorSummaries.foreach( | ||
executorSummary => { | ||
var memUsed: Long = executorSummary.peakJvmUsedMemory.getOrElse(JVM_USED_MEMORY, 0) //+ MemoryFormatUtils.stringToBytes("300M") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add a buffer on top of the peak memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable "resourcesActuallyUsedWithBuffer" (line no: 64) is the resource usage with buffer and for calculating resources wasted, this value is being considered.
7f7717f
to
fc9b866
Compare
I think it would good if you can provide a brief description of this PR. It helps in the review. |
@shkhrgpt: Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peak JVM Used Memory is not part of the upstream public Spark Release. Otherwise this looks good.
b81a2b0
to
9dfcc1e
Compare
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (linkedin#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (linkedin#287)
…Custom SHS - Requires peakJvmUsedMemory metric) (#287)
Fixed the calculation of resources used/wastage, now the resources used is calculated by summing up the resources used by each executor as opposed to before where the executors were assumed to use the entire memory allocated for them. The resources allocated for use is also calculated by multiplying the time spent by each executor with the total memory allocated for an executor, and then summing them up.