-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper way to install now on Colab/Linux, also "squeeze" gradient not implemented? #528
Comments
Sorry about that. Please look at this for reference for the Colab: xla/test/test_train_mnist_tensor.py Line 84 in c79202d
We have two ways of operation. Line 91 in c79202d
Line 145 in c79202d
xla/test/test_train_imagenet.py Line 87 in c79202d
This uses the PT JIT, and quite a few operations are not supported.
We will have more frequent Cloud TPU (and Colab) TF releases in the future, so catching up with latest PT/XLA developments will be easier.
To use with that Colab, you need to use the "old" JIT version, but you may be finding missing operators when trying to add new models. Yes, MKL needs to be pinned until the PT side fixes the source code to not use anymore the deprecated APIs: Line 65 in c79202d
That is a GCC ICE unfortunately. https://github.com/pytorch/xla/blob/master/README.md The Bazel thing is just a warning, we get that as well. |
Closing as it should all work now, feel free to reopen if you have follow-up questions. |
I'm aware this project is still under active development and not all things are ready yet, but I'd like to get some quick feedback on where things are. I've dug through the code and tried a couple of different things.
I first saw the Colab code (https://github.com/pytorch/xla/blob/master/contrib/colab/PyTorch_TPU_XRT_1_13.ipynb) and tried it on Colab. I saw some discrepancy between the code there and code elsewhere. I'm guessing the "train" method is no longer wrapped under XlaModel?
More importantly, the code there (MNIST) works, but once I try using it on my own model (a transformer, BERT), it throws an error:
This is weird since I then looked through this repo and found "squeeze.cpp" implemented under xla/torch_xla/csrc/ops/
So I thought maybe the Colab repos (http://storage.googleapis.com/pytorch-tpu-releases/tf-1.13/torch-1.0.0a0+1d94a2b-cp36-cp36m-linux_x86_64.whl) are out-dated and I then looked under http://storage.googleapis.com/pytorch-tpu-releases/ and found a lot of new releases, but they require compilers for python 3.5 and I couldn't install them on Colab.
So, I then tried installing them on a linux machine (Google's Cloud TPU). Interestingly, I also had to pin down MKL to older version like kokoro/ubuntu/common.sh, otherwise importing torch would throw an error about MKL. This time, however, I can't even get the colab code to work. It throws this message:
I'm guessing this is saying I should install from source?
So finally, I then tried installing from source following the directions on README.md. I got Pytorch to compile, but got the following message when trying to compile xla:
I also get this bazel warning, not sure if that's what's causing the problem:
I'm also a little confused, since the python script (setup.py) doesn't seem to make use of the kokoro/ubuntu/common.sh script. How do you actually build now? Any advice is appreciated. Thank you so much.
The text was updated successfully, but these errors were encountered: