New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: Unknown device when trying to run AlbertForMaskedLM on colab tpu #1909
Comments
From a quick peek, looks like the |
AFAICT in this code there are a lot of CPU default tensor creations/manipulation and python scalar mode ops. |
Hi! I'm not sure how to do this through colab -- do I navigate to the directory containing |
Hey @goggoloid I got the above running here on this colab notebook. Basically there were other CPU default tensors so we needed to call:
Also, Albert currently uses FYI, we also have a TPU GLUE runner in huggingface now too: https://github.com/huggingface/transformers/blob/fix-jit-tpu/examples/run_tpu_glue.py We'll add one for LM finetuning soon too. |
@jysohn23 That is not the optimal fix. For models where the parameters logic is a bit tricky, I think there is an |
Thanks everyone! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
dear @dlibenzi @jysohn23 @goggoloid @pietern @ezyang i was trying to convert this pytorch gpu code in pytorch xla : https://www.kaggle.com/khyeh0719/pytorch-efficientnet-baseline-train-amp-aug my dataloader,train_one_epoch and valid_one_epoch updated code looks like this :
then i designed train_model() function like this:
then when i try to start training process using this code block :
i get this error then :
i need two help please :
xm.xrt_world_size() prints 1 and when i do : xm.get_xla_supported_devices() i get this : ['xla:1', 'xla:2', 'xla:3', 'xla:4', 'xla:5', 'xla:6', 'xla:7', 'xla:8'] |
Hi,
I am running the following code on colab taken from the example here: https://huggingface.co/transformers/model_doc/albert.html#albertformaskedlm
I haven't done anything to the example code except move
input_ids
andmodel
onto the TPU device using.to(dev)
. It seems everything is moved to the TPU no problem as when I inputdata
I get the following output:tensor([[ 2, 10975, 15, 51, 1952, 25, 10901, 3]], device='xla:1')
However when I run this code I get the following error:
Anyone know what's going on?
The text was updated successfully, but these errors were encountered: