Skip to content

Conversation

malvika2147
Copy link

@malvika2147 malvika2147 commented Jun 5, 2019

Stack from ghstack:

Summary: When the Engine destructs, it sends a shutdown task to all the threads with the highest priority.

Test Plan: Verified change by temporarily having each thread print message while shutting down. Running with REL_WITH_DEB_INFO=1 should print the shutdown message (only once).
(expect output PYTORCH_API_USAGE worker shutting down)
Added a test for that.

Differential Revision: D15738856

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
@pytorchbot pytorchbot added the module: autograd Related to torch.autograd, and the autograd engine in general label Jun 5, 2019
@malvika2147 malvika2147 requested a review from ezyang June 5, 2019 22:59
@malvika2147 malvika2147 self-assigned this Jun 5, 2019
ezyang
ezyang previously requested changes Jun 6, 2019
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Looking forward to the tests!

@ezyang ezyang requested review from albanD, apaszke and colesbury June 6, 2019 15:00
albanD
albanD previously requested changes Jun 6, 2019
for (size_t i = 0; i < static_cast<size_t>(c10::DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES); i++) {
auto* impl = c10::impl::device_guard_impl_registry[i].load();
if (impl && device < impl->deviceCount()) {
guards[i].reset_device(at::Device(static_cast<c10::DeviceType>(i), device));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was this code for? Why is it safe to remove it now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was for setting the Cuda device. I have changed it now to do it without using guards.

FunctionTask task = queue->pop();
if (task.isShutdownTask_) {
C10_LOG_API_USAGE_ONCE("worker shutting down");
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using reentrant autograd, the thread_main function is called again, with a specific graph_task. So the shutdown task could be caught by a thread_main from the reentrant autograd and not the original one that keeps the thread alive.
Or do you assume that no backward us currently running when the engine is destroyed? If so, this should be checked in some way.

Copy link
Author

@malvika2147 malvika2147 Jun 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect this to work only when no backward is running while the engine is destroyed. This might not always be the case though and it needs a separate fix. Could you clarify what needs to be checked?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess making sure all ready queues are empty will make sure that no backward is currently running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment wasn't entirely correct. I think it's okay for a backward to be running if it's not from a reentrant autograd, so the queue may not be empty. Should we only process the shutdown task in those cases?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be the behavior if the Engine is destroyed while a backward is running? Wait for it to finish? Or interrupt it?
I guess that will change the priority of the task your add, minimal if you want to wait for it to finish and maximal otherwise.
In both cases, I would says that another shutdown task should be queued if !graph_task: inside reentrant autograd, (optionally stop the current processing) and notify the other thread_main that it should exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD My hope was to pretend this problem didn't exist :)

I think the correct thing to do is to stop executing work as much as soon as possible, throwing exceptions if necessary to speed things up. We just have to be careful to catch these exceptions properly and then swallow them (don't report errors to users here). If you queue another task on the queue, you'll wait for the task which made the reentrant backwards call to finish executing the rest of its stuff before shutting down; if we are actually going to fix this edge case, let's fix it properly :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case what happens is that the "parent" thread_main is stuck in its evaluate_function.
The "child" thread_main is executing stuff.
So in case of reentrant, the "child" will always get the shutdown task first.
That will interrupt it and force it to exit early.
Now the "parent" exit its evaluate_function and will start running other stuff.
If another shutdown task was not queued, it will try to finish it's backward and the thread will never be destroyed.

Maybe preventing destruction when a backward is running is too much? Or could be done as future work if it is actually needed.
Checking that all queues are empty before queuing the shutdown task seems ok for me. And raise an error saying that the Engine cannot be destroyed if it's running stuff. That would be simpler.
@ezyang does that sound ok with you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, SGTM. Actually I wouldn't raise an error, because I am not sure if we run destructors when you C-C a PyTorch process (I don't know exactly what Python's signal handler does...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure Python catches C-C, turns it into a KeyboardException and then terminates the program "cleanly" (in the sense that the destructors will always run). I have no idea which thread will be responsible for the cleanup though...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't surprise me if all of this handling happened on the main thread.

mal added 4 commits June 6, 2019 16:14
Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
@ezyang
Copy link
Contributor

ezyang commented Jun 7, 2019

Don't forget to dismiss reviews when you want to get a new review ;)

@ezyang
Copy link
Contributor

ezyang commented Jun 7, 2019 via email

Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
@malvika2147 malvika2147 dismissed albanD’s stale review June 7, 2019 18:07

made changes

Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
env=env)
return pipes.communicate()[1].decode('ascii')

@unittest.skipIf(IS_WINDOWS, "Skip for windows")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to mention exactly why you are skipping on Windows (in this case, it doesn't work. Might be worth filing a bug for it and mentioning it here.)

stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
env=env)
return pipes.communicate()[1].decode('ascii')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this got copy-pasted from test_logging.py, maybe it should get moved to the common test code? That way, we don't have to copy paste. It would also be a good opportunity to name the function appropriately to mention that it's all about PYTORCH_API_USAGE_STDERR.

Engine::~Engine() {
bool noBackward = true;
for (auto& queue: ready_queues_) {
std::lock_guard<std::mutex> lock(queue->mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking out a lock on a mutex in a destructor makes me nervous for a few reasons:

  1. Taking out a lock could throw an exception. Throwing exceptions in a destructor is bad juju, because you won't actually get an exception; you'll just forcibly terminate the program.
  2. It could lead to a deadlock at program shutdown (no one's idea of a happy time). I think in this case it's not possible, because the scope we acquire mutex_ is not very large and we don't ever take out other locks.

It seems to me that all we need is a "best effort" test to see if we're running a backwards or not. Perhaps a single, global atomic counter on Engine itself would be good enough? Increment it when you start a backwards, decrement it when you finish (probably want to do this with RAII to make sure exceptions are handled correctly). Then look at that counter to decide if a backwards is running or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we take the lock at every loop iteration instead of acquiring it once before the loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that the destruction is guaranteed not to run on any of the worker threads? Like, what happens if one of them calls os.exit? Wouldn't that lead to case 2.?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock has to be taken out on each iteration of the loop, because we don't have a global lock, it's a per ready queue lock, and so there isn't "one" lock to take out.

Also, are we sure that the destruction is guaranteed not to run on any of the worker threads? Like, what happens if one of them calls os.exit?

I think you're right. If a worker thread calls std::exit, that thread will handle destructing static objects, and we'd have to detect that case in the destructor in that case.

This is all a bit moot right now, because this PR got disabled again when we landed the fix for infinite recursion. But if we do revive this patch it would be good to get this part right too.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the comments are nits, but let's try not to take out a lock in the destructor.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malvika2147 convinced me that we can't get rid of the lock, because we have to push message on queues and those require locking. So I rescind that comment; only nits left.

Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
@pytorchbot pytorchbot added the module: tests Issues related to tests (not the torch.testing module) label Jun 7, 2019
Don't leak threads on exit

gh-metadata: pytorch pytorch 21438 gh/mal2147/2/head
@zou3519 zou3519 deleted the gh/mal2147/2/head branch June 10, 2019 16:16
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f308b07.

Engine::~Engine() {
bool noBackward = true;
for (auto& queue: ready_queues_) {
std::lock_guard<std::mutex> lock(queue->mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we take the lock at every loop iteration instead of acquiring it once before the loop?

queue->pushShutdownTask();
}
}
// Othewise threads are leaked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the issue with leaking threads? This is only a best-effort measure anyway, right?

Engine::~Engine() {
bool noBackward = true;
for (auto& queue: ready_queues_) {
std::lock_guard<std::mutex> lock(queue->mutex_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that the destruction is guaranteed not to run on any of the worker threads? Like, what happens if one of them calls os.exit? Wouldn't that lead to case 2.?

FunctionTask task = queue->pop();
if (task.isShutdownTask_) {
C10_LOG_API_USAGE_ONCE("worker shutting down");
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure Python catches C-C, turns it into a KeyboardException and then terminates the program "cleanly" (in the sense that the destructors will always run). I have no idea which thread will be responsible for the cleanup though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: autograd Related to torch.autograd, and the autograd engine in general module: tests Issues related to tests (not the torch.testing module)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants