-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] torch context with default device & dtype #27878
Comments
after one framework finally got rid of a global context after 3 years, are you saying the competing framework should add one? :P |
:D a different kind of context |
this sounds like a promising API -- there's probably some implementation complexity behind it, but we'll remove "needs research" as it looks acceptable. |
Unclear if this issue is still actually active, but it was pointed at recently as if it were, and I'd like to point out that #2 above would require some magic to make work properly (assuming that the Please use a context manager like |
@elistevens Yeah I think you are right. #2 doesn't work normally, unfortunately. |
Bumping priority due to activity on this issue, the parent issue, and also other issues. |
Wouldn't context manager be better suited that a global context?
|
@malfet Why not having the object do both? It is not always desirable to wrap the entire program in a context manager. |
Removing "high priority" label since this feature request still need some work in defining the user facing API and how it should work with existing code. |
Context manager doesn't give you desirable behavior, as it will also affect library code that allocates tensors. A lot of the resistance against a global "set default device" API is because it will make it difficult for library authors to write code in a way that will work no matter what the default device is. A module-like torch context object bypasses this problem as the device defaulting is lexical. This reminds me, though, that in the proposal above, Module creation isn't done using the torch context explicitly; some amount of dynamic scoping seems necessary there. So this proposal, unfortunately, isn't complete. |
Oh, actually, module creation is done doing torch context, because you say |
@ezyang Yeah... that could work. But in the usual case where one has a separate module containing the module definition, it seems hard to make the context device/dtype configurable (because that separate module just uses the regular |
You have to solve the problem which is that we don't actually support direct on device Module creation. Supposing you add some explicit API for doing this, then you can just write some Python magic to sniff for nn modules and then partially apply them with the device arguments. You still avoid dynamic scoping in this case. |
Hmm I don't think I understand. What I am thinking is a scenario like the following:
Oh maybe your this comment
is referring to something like |
Python modules are singletons, so it's a little unclear what I guess that's the yak that needed shaving, and the same applies to |
In my opinion, something like this would be helpful. My code uses a lot of tensor constructors like @ezyang Could those libraries not use their own nested context? |
Bumping up this thread. We(PyTorch/XLA) are recently testing large models on TPU clusters. One issue we run into is by default model initialization happens on CPU, with large models it is really easy to OOM the host memory. If we can specify the default device with a context(so we can init model weights on TPU directly) that would be really helpful. |
check out #82296 (comment) |
Fixes #82296 Fixes #27878 Fixes #260 Open to bikeshedding the module name. Open to bikeshedding if any of the identifiers should get reexporting from torch top level. Open to bikeshedding the global setter API name / location. Tests coming later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> [ghstack-poisoned]
Fixes #82296 Fixes #27878 Fixes #260 Open to bikeshedding the module name. Open to bikeshedding if any of the identifiers should get reexporting from torch top level. Open to bikeshedding the global setter API name / location. Tests coming later. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 3f678907f143125bf0931d2c86eae7c9eb8ee156 Pull Request resolved: #91525
Fixes #82296 Fixes #27878 Fixes #260 Open to bikeshedding the module name. Open to bikeshedding if any of the identifiers should get reexporting from torch top level. Open to bikeshedding the global setter API name / location. Tests coming later. Signed-off-by: Edward Z. Yang <ezyangfb.com> [ghstack-poisoned]
Fixes #82296 Fixes #27878 Fixes #260 Open to bikeshedding the module name. Open to bikeshedding if any of the identifiers should get reexporting from torch top level. Open to bikeshedding the global setter API name / location. Tests coming later. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: a3644fb379a42e65052be03d86bca7a56baec198 Pull Request resolved: #91525
Fixes #82296 Fixes #27878 Fixes #260 Open to bikeshedding the module name. Open to bikeshedding if any of the identifiers should get reexporting from torch top level. Open to bikeshedding the global setter API name / location. Tests coming later. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 539474a5becc7713dc61aa42edd3802640caf319 Pull Request resolved: #91525
Fixes #82296 Fixes #27878 Fixes #260 Open to bikeshedding the module name. Open to bikeshedding if any of the identifiers should get reexporting from torch top level. Open to bikeshedding the global setter API name / location. Tests coming later. Signed-off-by: Edward Z. Yang <ezyangfb.com> ghstack-source-id: 79b87ed4a96e3facce461f54d84286832f496d0b Pull Request resolved: #91525
🚀 Feature
There has been needs to set default device. This FR proposes an API that is compatible with 3rd party libs.
Motivation
There has been a lot of discussion around default device flag in pytorch (#7535). Yet, implementing such an API has been mostly blocked by the concern that 3rd party libraries may assume that tensors are created on CPU.
Similar dilemma has been also seen in the python
multiprocessing
library, where multiple start methods can be used (triggering bugs like librosa/librosa#747). They come up with this APImultiprocessing.get_context(xxx)
, which returns a context object with the same set of functions as themultiprocessing
module, but associated with a different start method, enabling patterns likemp = mp.get_context('spawn')
.In addition to tensor creation, for many users (including me) it is extremely verbose to write
.to(device)
for every tensor yielded tensor from the data loader. It would be very handy if this handles the moving automatically as well.Pitch
torch_ctx.utils.data.DataLoader
s.t. the yielded samples contain tensors moved to the device oftorch_ctx
.torch_ctx.load(xxx)
the same way asmap_location
.Concerns
Such features always have the potential issue of making things too "frictionless", obscure, and harder to debug. But I think this one is not too bad.
cc @ezyang @gchanan @zou3519 @bdhirsh @heitorschueroff @ngimel @ejguan
The text was updated successfully, but these errors were encountered: