New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MaxTaskMem 0.0 KB allways #21

kretes opened this Issue Sep 18, 2018 · 4 comments


None yet
2 participants

kretes commented Sep 18, 2018

Hello there! thanks for the tool, looks promising. nice thing that it works on historical events data!

I tried that and got a 'Max memory which an executor could have taken = 0.0 KB'. and everywhere a 0.0 KB in the 'MaxTaskMem' column which is obviously wrong since the job ran and used some memory.
Other metrics look at least sane.

I wonder what can be the case?

My ultimate goal is to find out how much can I lower the memory footprint of a job and do not get OOM, this 'MaxTaskMem' field looks like a good fit for that, right?


This comment has been minimized.


iamrohit commented Sep 18, 2018

Thanks @kretes for trying this out. We have seen this issue but never debugged it. I took a look. Looks like the JsonProtcol.scala, the main class responsible for converting events to JSON (which gets logged into event log files) is not handling the peak execution memory metric (part of TaskMetrics). We can try to get this patched into open source, but that will take its own sweet time.

For now, the only useful way to get this value will be to run sparklens as part of your application. It is possible to configure sparklens to not do simulations. In this mode, it will save a sparklens output file (typically few MBs in size). You can use this output file just like event log history files and get the same out (with simulation) at any later point in time.

Thanks again for sharing your findings.


This comment has been minimized.

kretes commented Sep 20, 2018

Thanks @iamrohit - it works when run together with the diagnosed job.

Still I wonder how should I interpret the MaxTaskMem.
In the code sparklens is doing such a thing:

val maxTaskMemory = sts.taskPeakMemoryUsage.take(executorCores.toInt).sum // this could
// be at different times?

And if it is a sum of that many peakMemoryUsages as many executorCores are there - then it is not a 'task' metric, as one task goes on one core, right?

I got the result of 16 GB in one stage, and given executor has that much memory - it is not available to a task, if a few of them are running concurrently on the same executor.


This comment has been minimized.


iamrohit commented Sep 20, 2018

MaxTaskMem is very wrong name! Let me first take this as an action item.

What is MaxTaskMem?
Executor memory is divided into user-memory and spark-memory. The spark-memory is further divided into native and on heap and also into storage and execution. For every task, spark reports maximum (peak) execution memory used by a task. The goal of MaxTaskMem is to figure out what is the worst case execution memory requirement per stage for any executor. We compute this number by first sorting all tasks of a stage by peakExecutionMemory and pick top N values, where N is number of cores per executor. Essentially if all largest tasks of a stage end up running on a single executor, how much execution memory they will need.

Usually by default spark allocates around 60%/70% of memory for execution+storage. This means, if we pick the max value of MaxTaskMem across all stage, and say double it, we are certain that in no stage, any possible combination of N tasks (where N = number of cores per executor) can cause executor to run out of execution memory. Note that we are not accounting for user-memory, so it doesn't really means no OOM. That still remains a possibility. This probably means no spill. But again, spill is mostly harmless. May be slows down, but will not kill the executor.

Please take a look here for some imore information.

I will probably be raising a pull request for a different feature which might be useful for memory management in spark. We call it GC aware task scheduling. Essentially instead of right-sizing the memory per executor, this feature changes the number of active cores per executor dynamically, thus making memory available for huge tasks at the expense of some loss of concurrency. Hopefully this will settle the question of memory tuning spark once for all.


This comment has been minimized.

kretes commented Sep 21, 2018


Thanks for the thorough explanation.

I would then call this metric 'WorstCaseExecutorMemUsage'. And this is actually what I am looking for. Although in my case the worst case didn't happen at all since it was way more than executor memory.
I will be looking at the result of this metric in a few more jobs.

The feature you are describing sounds like a good thing to do! Looking forward to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment