-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Peak JVM used memory heuristic - (Depends on Custom SHS - Requires peakJvmUsedMemory metric) #283
Conversation
c8e15e5
to
b310982
Compare
* A heuristic based on peak JVM used memory for the spark executors and drivers | ||
* | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline.
|
||
if(evaluator.severityExecutor.getValue > Severity.LOW.getValue) { | ||
new HeuristicResultDetails("Note", "The allocated memory for the executor (in " + SPARK_EXECUTOR_MEMORY +") is much more than the peak JVM used memory by executors.") | ||
new HeuristicResultDetails("Reasonable size for executor memory", ((1+BUFFER_PERCENT/100)*evaluator.maxExecutorPeakJvmUsedMemory).toString) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this evaluate properly, with integer division? How about 100.0?
) | ||
result | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra newline.
*@ | ||
<p>This is a heuristic for peak JVM used memory.</p> | ||
<h4>Executor Max Peak JVM Used Memory</h4> | ||
<p>This is to analyse whether the executor memory is set to a good value. To avoid wasted memory, its checked that the peak JVM used memory is reasonably close to the allocated executor memory, (spark.executor.memory) -- if it is much smaller, then executor memory should be reduced.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For "its checked that the peak JVM used memory", how about "it checks if the peak JVM used memory"
new HeuristicResultDetails("Note", "The allocated memory for the driver (in " + SPARK_DRIVER_MEMORY + ") is much more than the peak JVM used memory by the driver.") | ||
} | ||
|
||
if(evaluator.severitySkew.getValue > Severity.LOW.getValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at some Spark applications, it looks like tasks can often be assigned to a fraction of the executors (based on node locality). It may not make sense to check for skew at the executor level. We can look for skew (data, peak execution memory and execution time) in tasks instead. We can also check how many executors have tasks assigned, and how imbalanced the assignment is -- we can recommend using fewer executors (and perhaps more cores), and/or locality. These would be separate heuristics, and we can discuss at the next meeting, or set up a separate meeting to discuss.
Thanks for making the changes. For now, can you disable the skew check? Everything else looks good.
Hi Edwina
Thank you for taking out time to review. :)
I will disable the skew test and will discuss it with you about adding it at task level.
Regards.
Get Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: edwinalu <notifications@github.com>
Sent: Thursday, September 21, 2017 3:45:23 AM
To: linkedin/dr-elephant
Cc: Swasti Kakker; Author
Subject: Re: [linkedin/dr-elephant] Peak JVM used memory heuristic **DEPENDENCY : new REST API** (#283)
@edwinalu commented on this pull request.
________________________________
In app/com/linkedin/drelephant/spark/heuristics/JvmUsedMemoryHeuristic.scala<#283 (comment)>:
+
+ var resultDetails = Seq(
+ new HeuristicResultDetails("Max peak JVM used memory", evaluator.maxExecutorPeakJvmUsedMemory.toString),
+ new HeuristicResultDetails("Median peak JVM used memory", evaluator.medianPeakJvmUsedMemory.toString)
+ )
+
+ if(evaluator.severityExecutor.getValue > Severity.LOW.getValue) {
+ new HeuristicResultDetails("Note", "The allocated memory for the executor (in " + SPARK_EXECUTOR_MEMORY +") is much more than the peak JVM used memory by executors.")
+ new HeuristicResultDetails("Reasonable size for executor memory", ((1+BUFFER_PERCENT.toDouble/100.0)*evaluator.maxExecutorPeakJvmUsedMemory).toString)
+ }
+
+ if(evaluator.severityDriver.getValue > Severity.LOW.getValue) {
+ new HeuristicResultDetails("Note", "The allocated memory for the driver (in " + SPARK_DRIVER_MEMORY + ") is much more than the peak JVM used memory by the driver.")
+ }
+
+ if(evaluator.severitySkew.getValue > Severity.LOW.getValue) {
Looking at some Spark applications, it looks like tasks can often be assigned to a fraction of the executors (based on node locality). It may not make sense to check for skew at the executor level. We can look for skew (data, peak execution memory and execution time) in tasks instead. We can also check how many executors have tasks assigned, and how imbalanced the assignment is -- we can recommend using fewer executors (and perhaps more cores), and/or locality. These would be separate heuristics, and we can discuss at the next meeting, or set up a separate meeting to discuss.
Thanks for making the changes. For now, can you disable the skew check? Everything else looks good.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#283 (review)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ATQ3cB2kRpMrp94MGmTQR3K03-wmpt0jks5skY57gaJpZM4POCDl>.
|
Thanks for making the change. Task-level skew would be done with the stage metrics, so you can remove the lines, but commented out is fine too. |
Sure! Will remove the lines. 👍 |
app-conf/HeuristicConf.xml
Outdated
@@ -193,6 +193,12 @@ | |||
<classname>com.linkedin.drelephant.spark.heuristics.StagesHeuristic</classname> | |||
<viewname>views.html.help.spark.helpStagesHeuristic</viewname> | |||
</heuristic> | |||
<heuristic> | |||
<applicationtype>spark</applicationtype> | |||
<heuristicname>Spark JVM Used Memory</heuristicname> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM Used Memory
_.peakJvmUsedMemory.getOrElse(JVM_USED_MEMORY, 0).asInstanceOf[Number].longValue | ||
}.max | ||
|
||
val DEFAULT_MAX_EXECUTOR_PEAK_JVM_USED_MEMORY_THRESHOLDS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configure thresholds in HeuristicConf
val executorList : Seq[ExecutorSummary] = executorSummaries.filterNot(_.id.equals("driver")) | ||
val sparkExecutorMemory : Long = (appConfigurationProperties.get(SPARK_EXECUTOR_MEMORY).map(MemoryFormatUtils.stringToBytes)).getOrElse(0L) | ||
val sparkDriverMemory : Long = appConfigurationProperties.get(SPARK_DRIVER_MEMORY).map(MemoryFormatUtils.stringToBytes).getOrElse(0L) | ||
val medianPeakJvmUsedMemory: Long = if (executorList.isEmpty) 0L else executorList.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this being used anywhere? Mention it along with the Heuristic Details
val sparkDriverMemory : Long = appConfigurationProperties.get(SPARK_DRIVER_MEMORY).map(MemoryFormatUtils.stringToBytes).getOrElse(0L) | ||
val medianPeakJvmUsedMemory: Long = if (executorList.isEmpty) 0L else executorList.map { | ||
_.peakJvmUsedMemory.getOrElse(JVM_USED_MEMORY, 0L).asInstanceOf[Number].longValue | ||
}.sortWith(_< _).drop(executorList.size/2).head |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Median computation is not accurate. Take a look at this code. http://rosettacode.org/wiki/Averages/Median#Scala
val JVM_USED_MEMORY = "jvmUsedMemory" | ||
val SPARK_EXECUTOR_MEMORY = "spark.executor.memory" | ||
val SPARK_DRIVER_MEMORY = "spark.driver.memory" | ||
val reservedMemory : Long = 314572800 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment on the value.
Use 300 * FileUtils.ONE_MB
or 300 * 1024 * 1024
for clarity
…akJvmUsedMemory metric) (linkedin#283)
…uires peakJvmUsedMemory metric) (linkedin#283)" (linkedin#317) This reverts commit 6b2f7e8.
To calculate if the executor memory is set to a good value, the average (or median) and maximum peak JVM used memory for executors is examined. Information about each executor’s peak JVM used memory will be available from the executors REST API, in maxPeakJvmUsedMemory.peakJvmUsedMemory. To avoid wasted memory, its checked that the peak JVM used memory is reasonably close to the allocated executor memory, spark.executor.memory -- if it is much smaller, then executor memory can be reduced.