Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
MaxTaskMem 0.0 KB allways #21
Hello there! thanks for the tool, looks promising. nice thing that it works on historical events data!
I tried that and got a 'Max memory which an executor could have taken = 0.0 KB'. and everywhere a 0.0 KB in the 'MaxTaskMem' column which is obviously wrong since the job ran and used some memory.
I wonder what can be the case?
My ultimate goal is to find out how much can I lower the memory footprint of a job and do not get OOM, this 'MaxTaskMem' field looks like a good fit for that, right?
Thanks @kretes for trying this out. We have seen this issue but never debugged it. I took a look. Looks like the JsonProtcol.scala, the main class responsible for converting events to JSON (which gets logged into event log files) is not handling the peak execution memory metric (part of TaskMetrics). We can try to get this patched into open source, but that will take its own sweet time.
For now, the only useful way to get this value will be to run sparklens as part of your application. It is possible to configure sparklens to not do simulations. In this mode, it will save a sparklens output file (typically few MBs in size). You can use this output file just like event log history files and get the same out (with simulation) at any later point in time.
Thanks again for sharing your findings.
Thanks @iamrohit - it works when run together with the diagnosed job.
Still I wonder how should I interpret the MaxTaskMem.
And if it is a sum of that many peakMemoryUsages as many executorCores are there - then it is not a 'task' metric, as one task goes on one core, right?
I got the result of 16 GB in one stage, and given executor has that much memory - it is not available to a task, if a few of them are running concurrently on the same executor.
MaxTaskMem is very wrong name! Let me first take this as an action item.
What is MaxTaskMem?
Usually by default spark allocates around 60%/70% of memory for execution+storage. This means, if we pick the max value of MaxTaskMem across all stage, and say double it, we are certain that in no stage, any possible combination of N tasks (where N = number of cores per executor) can cause executor to run out of execution memory. Note that we are not accounting for user-memory, so it doesn't really means no OOM. That still remains a possibility. This probably means no spill. But again, spill is mostly harmless. May be slows down, but will not kill the executor.
Please take a look here for some imore information. https://www.qubole.com/blog/an-introduction-to-apache-spark-optimization-in-qubole/
I will probably be raising a pull request for a different feature which might be useful for memory management in spark. We call it GC aware task scheduling. Essentially instead of right-sizing the memory per executor, this feature changes the number of active cores per executor dynamically, thus making memory available for huge tasks at the expense of some loss of concurrency. Hopefully this will settle the question of memory tuning spark once for all.
Thanks for the thorough explanation.
I would then call this metric 'WorstCaseExecutorMemUsage'. And this is actually what I am looking for. Although in my case the worst case didn't happen at all since it was way more than executor memory.
The feature you are describing sounds like a good thing to do! Looking forward to it!