Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified management of thread local variables #28520

Open
albanD opened this issue Oct 23, 2019 · 12 comments
Open

Unified management of thread local variables #28520

albanD opened this issue Oct 23, 2019 · 12 comments
Labels
feature A request for a proper, new feature. high priority module: internals Related to internal abstractions in c10 and ATen module: multithreading Related to issues that occur when running on multiple CPU threads module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@albanD
Copy link
Collaborator

albanD commented Oct 23, 2019

We should add a centralized thread local handling (in c10).
This would have two main advantages:

  • Make the code cleaner as all thread local elements can easily be identified and worked with.
  • Allow improvement for example in at::parallel_for where it will be able to handle any Tensor operation within the loop.

(Setting high priority only to be discussed for triage)

cc @ezyang @gchanan @zou3519 @jerryzh168 @ssnl @albanD @gqchen @VitalyFedyunin @ngimel @mruberry

@albanD albanD added feature A request for a proper, new feature. triage review high priority labels Oct 23, 2019
@gchanan gchanan added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: multithreading Related to issues that occur when running on multiple CPU threads module: performance Issues related to performance, either of kernel code or framework glue module: autograd Related to torch.autograd, and the autograd engine in general module: internals Related to internal abstractions in c10 and ATen and removed triage review module: autograd Related to torch.autograd, and the autograd engine in general labels Oct 28, 2019
@v0dro
Copy link
Contributor

v0dro commented Dec 14, 2019

@albanD can you provide some samples of what exactly you want solved with this issue and approaches that you might have in mind?

@albanD
Copy link
Collaborator Author

albanD commented Dec 16, 2019

This is discussed in #28370 for example.
The problem is that both our no grad logic and no variable type logic are thread local. And that leads to unsafe behavior in the current codebase: #28980
Beyond fixing the current behavior, we want to actually allow the use of Tensors in such constructs to enable use cases like this one for example.

@zdevito
Copy link
Contributor

zdevito commented Mar 10, 2020

We probably want something like:

unique_ptr<SavedThreadLocalState> saveThreadLocalState();
restoreThreadLocalState(const SavedThreadLocalState&);

#34360 is an instance that needs this centralized state.

@ilia-cher
Copy link
Contributor

ilia-cher commented Mar 10, 2020

it seems we already basically have this, but in a form of ThreadLocalDebugInfoBase and DebugInfoGuard, the name though is misleading - it doesn't have to be debug info; we can extend this as we want and we also already take care of fork and autograd threads. I'd suggest:

  1. refactor DebugInfoGuard into ThreadLocalStateGuard (and so on)
  2. there's been a discussion on passing extra bits of info along model execution and allowing a python interface (context manager) to do it from python
  3. no_grad context manager may as well be rewritten to use this thread local (+propagated across threads) struct; at the moment it simply uses https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/core/grad_mode.cpp#L13
  4. I think we may find more cases of thread local vars that can be included within this struct
  5. add propagation to the threads that execute at::parallel_for (straightforward, similarly to how we pass it into 'fork')

@albanD
Copy link
Collaborator Author

albanD commented Mar 10, 2020

For 4. are the thread_local associated to the dispatcher already there? Otherwise, these ones should be added as well.

@ngimel
Copy link
Collaborator

ngimel commented Mar 11, 2020

No, dispatcher keys are not there yet.

@peterbell10
Copy link
Collaborator

It looks like ThreadLocalDebugInfo only supports one struct of values for the whole program. I think it should be possible to have a ThreadLocal<T> class that automatically stores everything in one global context e.g.

ThreadLocal<bool> GradMode_enabled = true;

Thoughts?

@ezyang
Copy link
Contributor

ezyang commented Mar 30, 2020

@peterbell10 I don't think we have any requirement for extensible thread local state, in which case hard coding the struct is simpler and more efficient. Do you have a use case in mind?

@peterbell10
Copy link
Collaborator

The original PR that added ThreadLocalDebugInfo (#22365) uses it for a test case. #35523 seems to support that by holding a linked list of thread local states with a DebugInfoKind key value.

@ilia-cher
Copy link
Contributor

#35523 addresses this, it cleans up our internal interfaces and propagates thread locals (e.g. grad_mode, dispatch key) as well as thread local debug info which is set by the user

@ilia-cher
Copy link
Contributor

@peterbell10 I don't think we have any requirement for extensible thread local state, in which case hard coding the struct is simpler and more efficient. Do you have a use case in mind?

@ezyang we already have a use case for this in prod, e.g. passing user_id from the user down to logging operator observers

@mattip
Copy link
Collaborator

mattip commented May 21, 2020

#35523 addresses this

gh-35523 is merged. @ilia-cher will you be continuing to work on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. high priority module: internals Related to internal abstractions in c10 and ATen module: multithreading Related to issues that occur when running on multiple CPU threads module: performance Issues related to performance, either of kernel code or framework glue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants