-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Placeholder storage has not been allocated on MPS device! #90440
Comments
I don't think this is a bug in PyTorch! You haven't allocated your torch.zeros to the device in the forward pass. If you do that, it runs, at least for me. |
That’s indeed correct . We see these errors when all the tensors are not mapped to the device . Also there were some bugs in LSTM layer which got fixed in 2.0 release. I would recommend @collindbell to try that latest release with MacOS 13.3 OS version |
Dear Kulin,
Thank you! He needs to normalise his training loss too, but ...
The error that is of most interest to me right now is the
*"aten::empty.memory_format"* that seems to arise whenever I switch a model
that works perfectly well (but slowly) on the CPU (intel i9 I 8-core;
running *whisper*) to the "mps" device (*AMD Radeon Pro 5500M*). It throws
an error somewhere reported as lying around lines 1143-45 of the *module.py*
buried deep in the anaconda3/lib/ .../site-packages/ storage, but there is
no reference to this "*aten*" process anywhere in that file. My supposition
is that I have done something stupid/ignorant in innocence, or there is
some kind of hardware (memory) limitation, but nobody seems to have an
answer. It may be to do with *SparseMPS*, but I frankly doubt it.
So I started reading and wonder whether this hypothesis has any value: this
error arises for me running whisper, and I am loading the models
programmatically, but when I am using the "mps" device it will try to load
them *using the mps*, and if there is some element of quantization involved
in this, that will create an error because *quantization is supposed to be
done on the CPU*. That I am trying to run these models on a 16GB MacOS
Ventura 13.3 makes me wonder whether there is some kind of "*automatic
quantization optimization*" going on that tries to make the models smaller
because I don't have enough RAM.
I hesitated to post this because I don't know enough about quantization, or
indeed anything else, but is it possible that this is the issue and that it
is somehow registering as a different error couched in terms of
"aten::...." because the code at module.py 1143-45 looks more like a
quantization process than some sort of backprop differentiation?
If this has enough legs to merit being posted as an issue, I'll happily do
so, but my default assumption is that I am an idiot!
Apologies if this just confirms my stupidity!
Best,
John
…On Wed, Apr 12, 2023 at 2:20 PM Kulin Seth ***@***.***> wrote:
I don't think this is a bug in PyTorch! You haven't allocated your
torch.zeros to the device in the forward pass. If you do that, it runs, at
least for me.
def forward(self, x): h0 = torch.zeros(self.num_layers, x.size(0),
self.hidden_size, device=device) c0 = torch.zeros(self.num_layers,
x.size(0), self.hidden_size, device=device) out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :]) return out
That’s indeed correct . We see these errors when all the tensors are not
mapped to the device . Also there were some bugs in LSTM layer which got
fixed in 2.0 release. I would recommend @collindbell
<https://github.com/collindbell> to try that latest release with MacOS
13.3 OS version
—
Reply to this email directly, view it on GitHub
<#90440 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGG22YIXPXMTJO77NHYMCG3XA2T23ANCNFSM6AAAAAASXS4A6M>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I fix it by adding device before when create each network, |
I ran into a similar issue, got the same error message "RuntimeError: Placeholder storage has not been allocated on MPS device!" when using a LSTM model. All tensors and the model were correctly mapped to the device in the code. However, my code worked fine when I updated torch to version 2.2.1 (I was using version 2.0.1) I have M1 Mac and was using the "mps" device. Before I updated torch, I tried to run on "cpu" device and then the output tensor from the forward pass contained NaN. I didn't look into it since this issue is fixed in latest versions of torch :) |
🐛 Describe the bug
I get an error every time I attempt to use MPS to train a model on my M1 Mac. The error occurs at first training step (so first call of
model(x)
). MRE:I receive the following traceback:
Versions
Also note if relevant I'm running Mac OS 13.0. I also have tried this on the 1.13 stable release, same issue.
cc @kulinseth @albanD @malfet @DenisVieriu97 @razarmehr @abhudev
The text was updated successfully, but these errors were encountered: