New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] LossModule's functional parameters duplicate weights in memory #1769
Comments
I am confused by this deepcopy of the weights. It seems that after just one gradient update, the |
Thanks for posting this!
Not really since we always call the stored module with a functional call! I hope that clarifies things! |
Thanks, @vmoens! That clarification about the context manager is helpful 🙏 However, can you confirm if my understanding is correct that as a result of the At least when I instantiate a |
No, a tensor on "meta" device has no content, so it has (approximately) 0 memory footprint. If the memory increases by a factor 2x there must be an issue somewhere, this isn't the indented behaviour (it's a bug). |
Ok, that's what I thought as well re: "meta" device behavior. I can very clearly see the memory footprint double when the |
Nope, I will have a look and push a patch! |
Do you have any way to check that the memory doubles? from torchrl.modules import MLP, QValueActor
from torchrl.data import OneHotDiscreteTensorSpec
from torchrl.objectives import DQNLoss
n_obs, n_act = 4, 3
value_net = MLP(in_features=n_obs, out_features=n_act)
spec = OneHotDiscreteTensorSpec(n_act)
actor = QValueActor(value_net, in_keys=["observation"], action_space=spec)
loss = DQNLoss(actor, action_space=spec)
list(loss.value_network.parameters())
|
Are you sure the memory footprint isn't twice as big because of the target parameters? |
I was unable to reproduce this with some simplified code (e.g., the example scripts), and have determined from that my full code contains some other, non-parameter tensor attributes on my |
Got it We could design an ad-hoc strategy to avoid deepcopying the non-parameter, non-buffer tensors, but I'm not sure that this is what users will want (I can imagine that some users will want them to be copied, others no). If you feel like this deepcopy is causing troubles in your use case, I'd be happy to look at an adequate solution. |
The
LossModule.convert_to_functional(...)
method creates a deep copy of the parameters. If I understand correctly, this leads to the parameters being duplicated in memory and a larger memory footprint than necessary. Is my understanding correct? If so, why is this necessary? Is there any way for theLossModule
simply contain a single reference to the weights for, e.g., its actor and criticTorchModule
s?This can be seen in the following line:
rl/torchrl/objectives/common.py
Line 290 in 80c63ad
This is the specific snippet of code containing the deep copy that this question pertains to:
The text was updated successfully, but these errors were encountered: