Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes MapReduce aggregator and heuristic to correctly handle task data when sampling is enabled #222

Merged
merged 2 commits into from
Mar 28, 2017

Conversation

shkhrgpt
Copy link
Contributor

@shkhrgpt shkhrgpt commented Mar 8, 2017

Also updates all the heuristic unit tests to take non-sampled tasks into account.

…a when sampling is enabled (linkedin#33)

* Fixes MapReduce aggregator and heuristic to correctly handle task data when sampling is enabled.

* Adds a comment
@shkhrgpt
Copy link
Contributor Author

shkhrgpt commented Mar 8, 2017

@akshayrai @shankar37

Please take a look.

@@ -106,6 +106,9 @@ private void compute(MapReduceTaskData[] taskDatas, long containerSize, long ide
}

for (MapReduceTaskData taskData: taskDatas) {
if (!taskData.isTimeAndCounterDataPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this always true ? Can you elaborate on what exactly is the issue you are trying to fix ? I remember you mentioning something in the meetup but dont recollect exactly what the issue was.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not always true. If sampling is enabled, then it's only going to be true for sampled tasks. In MapReduceFetcherHadoop2 class, MapReduceTaskData instance is created for all the tasks, but time and counter data is only stored for sampled tasks. Please look at the logic here.
The unit test in this change will fail because NullPointerException without this check.

import java.util.Properties;
import java.util.concurrent.TimeUnit;

import com.linkedin.drelephant.analysis.Heuristic;
import com.linkedin.drelephant.analysis.HeuristicResult;
import com.linkedin.drelephant.analysis.Severity;


public class JobQueueLimitHeuristic implements Heuristic<MapReduceApplicationData> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshayrai Is this heuristic used at linkedin ? I dont see it enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. We don't use this. As far as I remember, this heuristic was created to warn jobs which were hitting close to the default queue time out limit of 15mins.

@@ -106,6 +106,9 @@ private void compute(MapReduceTaskData[] taskDatas, long containerSize, long ide
}

for (MapReduceTaskData taskData: taskDatas) {
if (!taskData.isTimeAndCounterDataPresent()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to the function saying that the aggregatemetrics are not expected to be accurate when sampling is enabled so that it's clear ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @shankar37 for the review.
Added a comment. PTAL.

@akshayrai
Copy link
Contributor

+1

@akshayrai akshayrai merged commit 5a98701 into linkedin:master Mar 28, 2017
skakker pushed a commit to skakker/dr-elephant that referenced this pull request Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants