New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix leak of failed/aborted queries in coordinator #11843

Merged
merged 4 commits into from Nov 6, 2018

Conversation

Projects
None yet
4 participants
@dain
Contributor

dain commented Nov 2, 2018

This only addresses some of the issues in #11844

// when stage transitions to a done state, check if all tasks have final status information
if (newState.isDone() && doneTasks.containsAll(allTasks)) {
stateMachine.setAllTasksFinal();
synchronized (this) {

This comment has been minimized.

@nezihyigitbasi

nezihyigitbasi Nov 2, 2018

Contributor

Is a race possible between the StageTaskListener::stateChanged() method and the stage state change listener declared here? e.g., this stage state change listener declared here gets called before the task is added to the doneTasks set in the task state listener below, and we miss that here.

This comment has been minimized.

@dain

dain Nov 2, 2018

Contributor

I believe the race occurs when a thread is in scheduleTask when this callback is fired. Specifically, the thread being between allTasks.add(taskId); and task.abort();.

@nezihyigitbasi

This comment has been minimized.

Contributor

nezihyigitbasi commented Nov 2, 2018

I will appreciate if someone can clearly describe the sequence of events that can result in a leak.

@dain

This comment has been minimized.

Contributor

dain commented Nov 2, 2018

@nezihyigitbasi I wrote a test that reproduces and I think documents the issue.

I believe the core issue is, that while a thread is adding a new task, right after it adds the new task to "allTasks", the "final info ready" event listener fires, and the if statement sees that the "newly added task" isn't done yet so it doesn't enter the if block. Then the thread adding the tasks, see the stage is aborted and aborts the new task. The code adding the new task is already synchronized, and requires the call back to get the lock before checking if the query is done. I also, add a lock around the task state change update to ensure that it's changes are visible.

@dain dain changed the title from [WIP] Fix leak of failed/aborted queries in coordinator to Fix leak of failed/aborted queries in coordinator Nov 2, 2018

@raghavsethi

Remove unnecessary TaskInfo.isComplete() method LGTM

@raghavsethi

Fix warnings in StateMachine and TestStateMachine LGTM

@raghavsethi

This comment has been minimized.

Contributor

raghavsethi commented Nov 2, 2018

This is probably apparent to other people, but why is it important to 'Ensure StateMachine listeners are called back on a different thread'?

@dain

This comment has been minimized.

Contributor

dain commented Nov 2, 2018

This is probably apparent to other people, but why is it important to 'Ensure StateMachine listeners are called back on a different thread'?

It is one of the guarantees of the StateMachine class. It allows you to add listeners while holding a synchronized lock... without it you leak the lock to the callback method which can cause a deadlock.

@nezihyigitbasi

This comment has been minimized.

Contributor

nezihyigitbasi commented Nov 2, 2018

mvn checks are failing btw.

@dain

This comment has been minimized.

Contributor

dain commented Nov 3, 2018

@nezihyigitbasi all updated

@raghavsethi

This comment has been minimized.

Contributor

raghavsethi commented Nov 3, 2018

@nezihyigitbasi will let you approve since I haven't gotten a chance to see later commits yet.

@nezihyigitbasi

LGTM % typos in unit test.

}
@Test(timeOut = 2 * 60 * 1000)
public void testFinalSageInfo()

This comment has been minimized.

@nezihyigitbasi

nezihyigitbasi Nov 5, 2018

Contributor

type sage

}
}
private void testFinalSageInfoInternal()

This comment has been minimized.

@nezihyigitbasi

nezihyigitbasi Nov 5, 2018

Contributor

type sage

@nezihyigitbasi

This comment has been minimized.

Contributor

nezihyigitbasi commented Nov 5, 2018

@haozhun Please feel free to chime in if you have additional comments.

@dain dain closed this Nov 6, 2018

@dain dain deleted the dain:fix-query-leak branch Nov 6, 2018

@dain dain merged commit 4d4b0bb into prestodb:master Nov 6, 2018

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@dain dain referenced this pull request Nov 6, 2018

Merged

Cleanup execution changes #11862

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment