-
Notifications
You must be signed in to change notification settings - Fork 471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updating the torch and torch_xla wheels in the colab notebook #572
Comments
Yes, the PIPs are for Colab, which has proper CUDA libraries. What issue did you get compiling from source? |
But even importing the CUDA 10 libraries seems to give the same error. So do you mean that I cannot use the pips for normal runs not on colab? Is there an updated version of the pips available? and is this logged somewhere? |
so does this mean I also need to install tf-nightly-gpu? |
Are you planning to use Colab, or Cloud TPU? If you have gotten TF nightly whitelisting, must be the latter, so I suggest you build from source for now. |
Yes I am using the Cloud TPU. When I build from source, it installs perfectly, although I do have to disable CUDA otherwise it gives a similar error to this. But after that when I try to run the
and |
Yes, you have to build with NO_CUDA=1 if you have not a CUDA environment (this is described in the PT build-from-source document). Do you have a deeper stack frame on the above error? |
and if I install the nightly GPU builds of tf, pytorch and torch_xla, I get this error while importing No I do have CUDA drivers included but it still happened gave that error. So I used USE_CUDA=False. |
I can try again (building from source) if there's no other option (i.e. to use nightly pips). Is the current master version stable? |
Until we have streamlined to PIP building, I suggest building from source. The current version of master is as stable as the older PIPs, but you get the new bits as well. |
Also just to verify do I need to have tensorflow installed before building pytorch and xla (I know that xla compiles it from source)? Also does it work with tensorborad? |
Also what python version is recommended? Is 3.7 supported? |
No, the PT/XLA repo carries the TF code as submodule. But, if you want to use TF standalone, then yes, you need it of course. TensorBoard? I am not sure PT produces model checkpoints which are compatible with the TF ones. |
We use 3.6 and it is known to be working. |
What I meant was having TF binary installed separately would not interfere with the PT/XLA installation? |
Also, I plan to install both the repo in |
No, you can have TF installed, and PT/XLA, and they will not interfere. |
@asuhan said otherwise. Also he advised me install using COMPILE_PARALLEL=0 |
should I do NO_DISTRIBUTED=1 too? |
COMPILE_PARALLEL=0 might be needed ... but only if your PT/XLA build hangs. I do not set NO_DISTRIBUTED=1 and it works for me. |
Yes it hanged for me as well earlier. |
Then use COMPILE_PARALLEL=0 |
As far as TF, we build the TF lib statically, so we carry no dependency on libtensorflow.so:
|
Cool thnx |
Closing the issue as the issue is resolved. Please feel free to reopen if you have followup questions. |
The pip libraries listed here seem to be outdated. (Also discussed with @asuhan on slack and #528 ) I am using the nightly builds of tf (1.14.1).
I get the following error when importing torch:
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory
Seems like these are compiled using CUDA libraries. Do you have the corresponding CPU versions?
Also is there an official page where you list the nightly builds of torch_xla?
I also tried building from source but it didnt help either.
The text was updated successfully, but these errors were encountered: