updating the torch and torch_xla wheels in the colab notebook #572

backpropper · 2019-04-06T17:23:25Z

The pip libraries listed here seem to be outdated. (Also discussed with @asuhan on slack and #528 ) I am using the nightly builds of tf (1.14.1).

I get the following error when importing torch:
ImportError: libcudart.so.10.0: cannot open shared object file: No such file or directory

Seems like these are compiled using CUDA libraries. Do you have the corresponding CPU versions?

Also is there an official page where you list the nightly builds of torch_xla?

I also tried building from source but it didnt help either.

The text was updated successfully, but these errors were encountered:

dlibenzi · 2019-04-07T06:44:08Z

Yes, the PIPs are for Colab, which has proper CUDA libraries.

What issue did you get compiling from source?

backpropper · 2019-04-07T06:53:13Z

But even importing the CUDA 10 libraries seems to give the same error. So do you mean that I cannot use the pips for normal runs not on colab? Is there an updated version of the pips available? and is this logged somewhere?

backpropper · 2019-04-07T07:02:43Z

so does this mean I also need to install tf-nightly-gpu?

dlibenzi · 2019-04-07T07:05:36Z

Are you planning to use Colab, or Cloud TPU?

If you have gotten TF nightly whitelisting, must be the latter, so I suggest you build from source for now.
You did not mention the error you have gotten when building from source ...

backpropper · 2019-04-07T07:11:00Z

Yes I am using the Cloud TPU.

When I build from source, it installs perfectly, although I do have to disable CUDA otherwise it gives a similar error to this. But after that when I try to run the test/test_train_mnist_tensor.py file, I get

return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: Must not create a new variable from a variable, use its .data()

and test/test_operations.py gives a Segmentation Fault

dlibenzi · 2019-04-07T07:15:02Z

Yes, you have to build with NO_CUDA=1 if you have not a CUDA environment (this is described in the PT build-from-source document).

Do you have a deeper stack frame on the above error?

backpropper · 2019-04-07T07:17:43Z

and if I install the nightly GPU builds of tf, pytorch and torch_xla, I get this error while importing torch_xla:
ImportError: .....python3.6/site-packages/_XLAC.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit15specializeUndefERNS0_5GraphE

No I do have CUDA drivers included but it still happened gave that error. So I used USE_CUDA=False.

backpropper · 2019-04-07T07:18:49Z

I can try again (building from source) if there's no other option (i.e. to use nightly pips). Is the current master version stable?

dlibenzi · 2019-04-07T07:22:03Z

Until we have streamlined to PIP building, I suggest building from source.
Since we use PT C++ APIs, the PT/XLA code based is tightly coupled with the PT one, so the PT wheel and the PT/XLA one MUST be built together.

The current version of master is as stable as the older PIPs, but you get the new bits as well.
We do not have a stable/development release process yet.

backpropper · 2019-04-07T07:24:34Z

Also just to verify do I need to have tensorflow installed before building pytorch and xla (I know that xla compiles it from source)? Also does it work with tensorborad?

backpropper · 2019-04-07T07:26:40Z

Also what python version is recommended? Is 3.7 supported?

dlibenzi · 2019-04-07T07:27:01Z

No, the PT/XLA repo carries the TF code as submodule.

But, if you want to use TF standalone, then yes, you need it of course.
But for PT/XLA only, you do not need to install anything TF.

TensorBoard? I am not sure PT produces model checkpoints which are compatible with the TF ones.

dlibenzi · 2019-04-07T07:27:48Z

We use 3.6 and it is known to be working.
I suggest 3.6 if you can choose.

backpropper · 2019-04-07T07:28:01Z

What I meant was having TF binary installed separately would not interfere with the PT/XLA installation?

backpropper · 2019-04-07T07:29:49Z

Also, I plan to install both the repo in develop mode since xla is being constantly updated

dlibenzi · 2019-04-07T07:30:51Z

No, you can have TF installed, and PT/XLA, and they will not interfere.

backpropper · 2019-04-07T07:32:32Z

@asuhan said otherwise. Also he advised me install using COMPILE_PARALLEL=0

backpropper · 2019-04-07T07:33:47Z

should I do NO_DISTRIBUTED=1 too?

dlibenzi · 2019-04-07T07:36:29Z

COMPILE_PARALLEL=0 might be needed ... but only if your PT/XLA build hangs.
We have seen this happening on some machines. Might be 3.7 related.

I do not set NO_DISTRIBUTED=1 and it works for me.

backpropper · 2019-04-07T07:37:07Z

Yes it hanged for me as well earlier.

dlibenzi · 2019-04-07T07:37:56Z

Then use COMPILE_PARALLEL=0

dlibenzi · 2019-04-07T07:41:48Z

As far as TF, we build the TF lib statically, so we carry no dependency on libtensorflow.so:

(pytorch) dlibenzi@dlibenzi2:~/google-git/pytorch/xla$ ldd build/lib.linux-x86_64-3.6/torch_xla/lib/libxla_computation_client.so 
	linux-vdso.so.1 (0x00007ffc8ed8e000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f8f3c1a8000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f8f3bea4000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f8f3bc87000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f8f3ba7f000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f8f3b6fa000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f8f3b4e2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f8f3b143000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f8f475d8000)

backpropper · 2019-04-07T07:47:45Z

Cool thnx

ailzhang · 2019-09-05T22:10:24Z

Closing the issue as the issue is resolved. Please feel free to reopen if you have followup questions.

ailzhang closed this as completed Sep 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

updating the torch and torch_xla wheels in the colab notebook #572

updating the torch and torch_xla wheels in the colab notebook #572

backpropper commented Apr 6, 2019 •

edited

Loading

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019 •

edited

Loading

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

ailzhang commented Sep 5, 2019

updating the torch and torch_xla wheels in the colab notebook #572

updating the torch and torch_xla wheels in the colab notebook #572

Comments

backpropper commented Apr 6, 2019 • edited Loading

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019 • edited Loading

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

dlibenzi commented Apr 7, 2019

backpropper commented Apr 7, 2019

ailzhang commented Sep 5, 2019

backpropper commented Apr 6, 2019 •

edited

Loading

backpropper commented Apr 7, 2019 •

edited

Loading