Funtions without XLA compilations #4511

mfatih7 · 2023-01-25T18:19:33Z

Hello

print(met.metrics_report())

According to the metrics reported printed after an XLA training session I observe that the functions cannot be processed in TPU.

Counter: aten::_linalg_svd
  Value: 3276
Counter: aten::_local_scalar_dense
  Value: 16380
Counter: aten::_unique2
  Value: 3276
Counter: aten::bincount
  Value: 3276

What should I do?

The text was updated successfully, but these errors were encountered:

JackCaoG · 2023-01-25T19:47:25Z

_local_scalar_dense is somewhat exepcted, I think it means you try to move the tensor from xla device to CPU too fequrent, one possible cause is logging.

_linalg_svd I think we can support, I will add it to the op lowering queue.

unique and bincount would be very difficult to support since the output shape of these two value are input value dependent, and that will likely request in frequent recompilation. Is there an way to avoid using these two ops in your model?

mfatih7 · 2023-01-25T20:24:34Z

Thanks @JackCaoG

I am not sure about the positions of the functions.
For unique I found np.unique function.
It is located in dataset functions.
Therefore it must be operating on the CPU side in my opinion.

For bincount I could not find its actual position.
How can I find it?
Can it be in torch functions that I use to construct models.

_linalg_svd function is located in the model and I need its XLA implementation.
It is singular value decomposition which is quite common in deep learning models.

JackCaoG · 2023-01-26T00:02:14Z

maybe run your model with

XLA_SAVE_TENSORS_FMT='text' XLA_SAVE_TENSORS_FILE='/tmp/save1.ir'

then you should find the IR file with python file line. Please checkout https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#environment-variables

mfatih7 · 2023-01-26T17:41:58Z

@JackCaoG

I inserted the following lines to my scripts

os.environ['XLA_SAVE_TENSORS_FMT'] = 'text'
os.environ['XLA_SAVE_TENSORS_FILE'] = '/tmp/save1.ir'

I cannot see /tmp folder generated
Do we need more environment variables to be set?

JackCaoG · 2023-01-26T18:09:42Z

hmm, no, those two should be enough. @wonjoolee95 can you follow up?

mfatih7 · 2023-01-26T18:37:55Z

even if I generate the folder manually still no .ir file is generated

mfatih7 · 2023-01-26T18:43:10Z

Apart from

XLA_SAVE_TENSORS_FMT='text' XLA_SAVE_TENSORS_FILE='/tmp/save1.ir

The pt-xla-profiler messages are like below

pt-xla-profiler: TransferFromServerTime too frequent: 25755 counts during 3183 steps
pt-xla-profiler: Op(s) not lowered: aten::_linalg_svd, aten::_unique2, aten::bincount,  Please open a GitHub issue with the above op lowering requests.

Does it mean _unique2 and bincount functions are called internally in _linalg_svd ?

When can we have _linalg_svd lowered in XLA @JackCaoG ?

mfatih7 · 2023-01-28T17:32:59Z

hi @JackCaoG and @wonjoolee95

I am now sure that _unique2 and bincount functions are called outside of the _linalg_svd function.
Also, I have the .ir file now.
How can I interpret it?
It is more than 50k lines.
Would you like me to share it with you? Which part of it do I paste here?

mfatih7 · 2023-01-28T19:37:52Z

OK

As far as I understand .ir (XLA_SAVE_TENSORS_FILE) file points XLA recompilations positions in the program flow.
It is really useful.
Should I consider only TensorsGraphInfo logs or should I also understand the content of the lines in IR Graphs?

JackCaoG · 2023-01-30T18:45:14Z

In the IR graph, if you search for _unique2 and bincount, can you see them? If they are falling back there is a chance that they don't show up in the dump, since graph will breaks when it sees an fallback op.

As you said, you can use TensorsGraphInfo log to see where graph break happen and that might help you to figure out where fallback is happening. Another thing you can do is bisect your training code and check whether

Op(s) not lowered: aten::_linalg_svd, aten::_unique2, aten::bincount,  Please open a GitHub issue with the above op lowering requests.

is printed, which can help you find the offending code.

wonjoolee95 · 2023-01-30T18:54:16Z

For _linalg_svd, it's actually a part of a bigger PR that we currently have open for enabling functionalization. Let me try to make a separate PR for that.

mfatih7 · 2023-01-30T19:04:20Z

Thank you very much @wonjoolee95 and @JackCaoG

Can you give a time prediction? When can I use _linalg_svd without XLA recompilations?
I could manage to eliminate all other XLA compilations in my scripts.
But I have to use _linalg_svd in my studies and it is very important for me.

Even if I run my scripts with _linalg_svd recompilations can I observe more problems?

For example here I have a problem as you know.

Till now I mainly used TPUs with a single core.
Before running my scripts with multiple TPU cores I just want to be sure that everything is OK.

wonjoolee95 · 2023-01-30T19:48:42Z

I'll try to have the PR by today and merge it by tomorrow. Should anything come up, I'll keep you updated in this thread.

mfatih7 · 2023-01-30T19:56:57Z

Thank you very much

As soon as it is ready I will run and give you feedback.

wonjoolee95 · 2023-02-01T18:46:31Z

#4537 is just merged. @mfatih7, if you build pytorch/xla source, can you rebase with master and give a try again? If you use nightly image, hopefully the image will be updated in tonight's nightly image with these changes.

mfatih7 · 2023-02-01T19:06:09Z

Thank you @wonjoolee95

I am sticking to the instructions for single core here.

At the top of my notebook, I use

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl .

Should I use something like?

# VERSION = "1.13"  #@param ["1.13", "nightly", "20220315"]  # or YYYYMMDD format
# !curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
# !python pytorch-xla-env-setup.py --version $VERSION
# import os 
# os.environ['LD_LIBRARY_PATH']='/usr/local/lib'
# !echo $LD_LIBRARY_PATH

# !sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
# !sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
# !sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

# !ldconfig
# !ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

wonjoolee95 · 2023-02-01T20:52:55Z

The command (!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl) installs the 1.13 release image, which is not what you want to do. You should use the cell below to use the nightly version. The nightly image tomorrow (i.e, 20230201) should have this change. Let me know if you run into any other problems.

mfatih7 · 2023-02-03T15:15:10Z

Hello @wonjoolee95

I think I am facing the same error here.

On COLAB, I am running

VERSION = "20230201"  #@param ["1.13", "nightly", "20220315"]  # or YYYYMMDD format
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION
import os 
os.environ['LD_LIBRARY_PATH']='/usr/local/lib'
!echo $LD_LIBRARY_PATH

!sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
!sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
!sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

!ldconfig
!ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

and getting the error

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6034  100  6034    0     0  36131      0 --:--:-- --:--:-- --:--:-- 36131
Updating... This may take around 2 minutes.
Updating TPU runtime to pytorch-dev20230201 ...
WARNING: Skipping torch as it is not installed.
WARNING: Skipping torchvision as it is not installed.
CommandException: No URLs matched: gs://tpu-pytorch/wheels/colab/torch-nightly+20230201-cp38-cp38m-linux_x86_64.whl
CommandException: No URLs matched: gs://tpu-pytorch/wheels/colab/torch_xla-nightly+20230201-cp38-cp38m-linux_x86_64.whl
CommandException: No URLs matched: gs://tpu-pytorch/wheels/colab/torchvision-nightly+20230201-cp38-cp38m-linux_x86_64.whl
WARNING: Requirement 'torch-nightly+20230201-cp38-cp38m-linux_x86_64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: torch-nightly+20230201-cp38-cp38m-linux_x86_64.whl is not a supported wheel on this platform.
WARNING: Requirement 'torch_xla-nightly+20230201-cp38-cp38m-linux_x86_64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: torch_xla-nightly+20230201-cp38-cp38m-linux_x86_64.whl is not a supported wheel on this platform.
WARNING: Requirement 'torchvision-nightly+20230201-cp38-cp38m-linux_x86_64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: torchvision-nightly+20230201-cp38-cp38m-linux_x86_64.whl is not a supported wheel on this platform.

wonjoolee95 · 2023-02-03T21:58:41Z

It looks like our installation script at https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py was out-of-date. We've just updated it, so give it a try again. With that said, a couple of suggestions:

For VERSION, try VERSION=20230203, as I've tested with this and seems to be working.
If installation still fails for some reason, let us know but in the meanwhile to unblock yourself, you can also try a manual installation in colab like:

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly+20230203-cp38-cp38-linux_x86_64.whl https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly+20230203-cp38-cp38-linux_x86_64.whl

This script should install the nightly torch and torch-xla.

mfatih7 · 2023-02-04T11:49:16Z

Thank you @wonjoolee95

With VERSION 20230203, the training procodure moves a bit forward but now I have the error below.

File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 221, in <module> from torch._C import * # noqa: F403 ImportError: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: undefined symbol: cblas_sgemm_pack_get_size

mfatih7 · 2023-02-04T11:55:44Z

@wonjoolee95

Your second suggestion

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly+20230203-cp38-cp38-linux_x86_64.whl https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly+20230203-cp38-cp38-linux_x86_64.whl

Gives the error below

File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 220, in <module> _load_global_deps() File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 180, in _load_global_deps raise err File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 175, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__ self._handle = _dlopen(self._name, mode) OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

mfatih7 · 2023-02-08T16:02:29Z

Hi @wonjoolee95

Any update? I have not tested _linalg_svd lowering yet.

mfatih7 · 2023-03-05T18:04:53Z

Hi @wonjoolee95

In COLAB, when I use

!pip install cloud-tpu-client==0.10 
!pip install torch==1.13.0 
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl

training occurs without _linalg_svd lowering.

When I use

!pip install cloud-tpu-client==0.10
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly+20230203-cp38-cp38-linux_x86_64.whl
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly+20230203-cp38-cp38-linux_x86_64.whl

or

!pip install cloud-tpu-client==0.10
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly-cp38-cp38-linux_x86_64.whl
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly-cp38-cp38-linux_x86_64.whl

I get the error

Traceback (most recent call last):
  File "runTrain_n_to_n_TPU_multi.py", line 1, in <module>
    import params
  File "/content/drive/MyDrive/00_featureMatching_01/params.py", line 2, in <module>
    import torch
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 228, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 187, in _load_global_deps
    raise err
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 168, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

So I cannot test _linalg_svd lowering in my setup
What should I do ?

mfatih7 · 2023-03-05T19:03:59Z

OK I read here

and updated the installation lines like

!pip install cloud-tpu-client==0.10
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly-cp38-cp38-linux_x86_64.whl
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly-cp38-cp38-linux_x86_64.whl

# VERSION = "1.13"  #@param ["1.13", "nightly", "20220315"]  # or YYYYMMDD format
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version nightly
import os 
os.environ['LD_LIBRARY_PATH']='/usr/local/lib'
!echo $LD_LIBRARY_PATH

!sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
!sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
!sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

!ldconfig
!ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

this time I get the error

File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: undefined symbol: 
cblas_sgemm_pack_get_size

What should I do ?

wonjoolee95 · 2023-03-06T23:09:21Z

Sorry for the late reply. So I did a little digging around for this and it looks similar to some past issues: pytorch/pytorch#18932 and pytorch/pytorch#10234. And this seems to be affecting when we try to install any torch nightly images in Colab. The pytorch/pytorch#10234 suggests to build from source, which should work but is not easily possible in Colab.

If I build from source locally, I can confirm that these nightly images work. @mfatih7, is building from source in a TPUVM a possible working option for you? This seems to be an issue only when we install the nightly images in Colab, so just trying to see if there are other ways to unblock you.

JackCaoG · 2023-03-07T01:48:09Z

2.0 release will be out in ~2 weeks, I think that shoud ship with whl that have the fix.

mfatih7 · 2023-03-08T11:04:53Z

Hi @wonjoolee95

According to your suggestion, I try to run my scripts on Google Cloud.
To do so, first I run the scripts here on 2 separate TPUVMs.

This works fine

cd /usr/share/
sudo git clone -b release/1.13 --recursive https://github.com/pytorch/pytorch 
cd pytorch/
sudo git clone -b r1.13 --recursive https://github.com/pytorch/xla.git
cd xla/
yes | sudo pip3 uninstall torch_xla
yes | sudo pip3 uninstall torch
yes | sudo pip3 uninstall torch_vision
sudo pip3 install torch==1.13.0
sudo pip3 install torchvision==0.14.0
sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-1.13-cp38-cp38-linux_x86_64.whl
sudo rm -rf /usr/local/lib/python3.8/dist-packages/libtpu*
sudo pip3 install torch_xla[tpuvm]

However, when I modify the script like the following to run on nightly wheels

cd /usr/share/
sudo git clone -b release/1.13 --recursive https://github.com/pytorch/pytorch 
cd pytorch/
sudo git clone -b r1.13 --recursive https://github.com/pytorch/xla.git
cd xla/
yes | sudo pip3 uninstall torch_xla
yes | sudo pip3 uninstall torch
yes | sudo pip3 uninstall torch_vision
#sudo pip3 install torch==1.13.0
sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch-nightly-cp38-cp38-linux_x86_64.whl
sudo pip3 install torchvision==0.14.0
sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-nightly-cp38-cp38-linux_x86_64.whl
sudo rm -rf /usr/local/lib/python3.8/dist-packages/libtpu*
sudo pip3 install torch_xla[tpuvm]

I get the error

  File "/usr/local/lib/python3.8/dist-packages/torch_xla/__init__.py", line 134, in <module>
    import _XLAC
ImportError: /usr/local/lib/python3.8/dist-packages/_XLAC.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

What should I do?

If I can observe that lowering is successful it will be very good before the 2.0 release @JackCaoG

mfatih7 · 2023-03-12T17:26:11Z

Hi @JackCaoG and @wonjoolee95

I could have not verified the _linalg_svd lowering in my setup yet.

mfatih7 · 2023-03-22T10:45:00Z

Hi @JackCaoG and @wonjoolee95

Today I updated my COLAB setup to work with torch==2.0.0 and torch_xla-2.0.

import torch
import torch_xla

print(torch.__version__)
print(torch_xla.__version__)

2.0.0+cu117
2.0

However, I have error about _linalg_svd

Exception in device=TPU:4: index 8 is out of bounds for dimension 1 with size 8
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 334, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.9/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 328, in _start_fn
    fn(gindex, *args)
  File "/content/drive/MyDrive/00_featureMatching_01/train_network_n_to_n_TPU_multi.py", line 277, in train_network
    outputs, outputs_essential_matrix, outputs_geo_distance = model(inputs, inputs_metadata_device)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/drive/MyDrive/00_featureMatching_01/models/LTFGC.py", line 158, in forward
    out_regression = predict_essential_matrix_with_8_point_algorithm(
  File "/content/drive/MyDrive/00_featureMatching_01/models/models_functions.py", line 79, in predict_essential_matrix_with_8_point_algorithm
    out_regression = vh[:,8,:]

The size of vh is torch.Size([9, 8, 9]) in TPU.
However, in my local setup, the size of vh is torch.Size([8, 9, 9])

Here is the code from my setup.
That is the only place I use torch.linalg.svd

  _, _, vh = torch.linalg.svd(A, full_matrices=False)  
  print(vh.size())  
  out_regression = vh[:,8,:]

I often asked about testing the lowering in my setup using nightly releases.
In both COLAB and CLOUD, I could not run nightly releases.
Since I cannot test your lowering you could not get feedback and solve possible bugs.
Therefore now we have Pytorch XLA 2.0 with the bug.

What should I do?
This is becoming urgent for me.

wonjoolee95 · 2023-03-22T17:28:23Z

Hi, in the code above:

  _, _, vh = torch.linalg.svd(A, full_matrices=False)  
  print(vh.size())  
  out_regression = vh[:,8,:]

Could you give me the value for the input tensor A? I can try to reproduce on my side and look for a fix.

mfatih7 · 2023-03-22T17:39:41Z

Hello

Thank you for your answer.
Here is a code to use with both GPU and TPU

import numpy as np
import torch

### COMMENT OUT FOR GPU ###
DEVICE = torch.device('cuda')
###########################

### COMMENT OUT FOR TPU ###
# import torch_xla
# import torch_xla.core.xla_model as xm
# DEVICE = xm.xla_device()
###########################

a = np.ones( (8, 9, 9) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
# A = A.to(torch.float32)

print('A ' + str(A.size()) + ' ' + str(A.dtype))
print(A)

T1, T2, vh = torch.linalg.svd(A, full_matrices=False)

print('T1 ' + str(T1.size()) + ' ' + str(T1.dtype))
print('T2 ' + str(T2.size()) + ' ' + str(T2.dtype))
print('vh ' + str(vh.size()) + ' ' + str(vh.dtype))

print(vh)

Interestingly this works for both GPU and TPU.
Here is its output.

A torch.Size([20, 9, 9]) torch.float64
T1 torch.Size([20, 9, 9]) torch.float64
T2 torch.Size([20, 9]) torch.float64
vh torch.Size([20, 9, 9]) torch.float64

Here is the notebook code that I use in COLAB

!pip install cloud-tpu-client==0.10
!pip install torch==2.0.0
!pip install torchvision==0.15.1
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp39-cp39-linux_x86_64.whl           

%cd /content/drive/MyDrive/
!python svd_error_test.py

I am also rerunning my training scripts.
However, it continuously gives errors.
I can supply whatever needed
Do you want to run my scripts on COLAB?

mfatih7 · 2023-03-22T17:48:11Z

I have similar prints in my training scripts.
And here is what I see when the training fails

.
.
IndexError: index 8 is out of bounds for dimension 1 with size 8
A torch.Size([8, 9, 9]) torch.float32
V1 torch.Size([8, 9, 9]) torch.float32
V2 torch.Size([9]) torch.float32
vh torch.Size([9, 8, 9]) torch.float32
Exception in device=TPU:3: index 8 is out of bounds for dimension 1 with size 8
Traceback (most recent call last):
.
.

mfatih7 · 2023-03-22T17:57:32Z

Sorry sorry
I realized that I was not transferring data to device.
Give me half hour

mfatih7 · 2023-03-22T18:37:54Z

Hello @wonjoolee95

Just use the notebook code here

!pip install cloud-tpu-client==0.10
!pip install torch==2.0.0
!pip install torchvision==0.15.1
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp39-cp39-linux_x86_64.whl           

%cd /content/drive/MyDrive/
!python svd_error_test.py

to run the svd_error_test.py below.

import numpy as np
import torch

### FOR CPU #################
device = torch.device('cpu')
###########################

### FOR GPU #################
# device = torch.device('cuda')
###########################

### FOR TPU #################
# import torch_xla
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()
###########################

a = np.ones( (8, 9, 9) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
A = A.to(torch.float32)
A = A.to(device)

print('A ' + str(A.size()) + ' ' + str(A.dtype) + ' ' + str(A.device))
# print(A)

T1, T2, vh = torch.linalg.svd(A, full_matrices=False)

print('T1 ' + str(T1.size()) + ' ' + str(T1.dtype) + ' ' + str(T1.device))
print('T2 ' + str(T2.size()) + ' ' + str(T2.dtype) + ' ' + str(T2.device))
print('vh ' + str(vh.size()) + ' ' + str(vh.dtype) + ' ' + str(vh.device))

# print(vh)

If you select CPU you get

A torch.Size([8, 9, 9]) torch.float32 cpu
T1 torch.Size([8, 9, 9]) torch.float32 cpu
T2 torch.Size([8, 9]) torch.float32 cpu
vh torch.Size([8, 9, 9]) torch.float32 cpu

If you select TPU you get

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1

vh is different for different devices definitely.

No need to run my training scripts anymore.
But if needed I am ready to share

mfatih7 · 2023-03-24T09:49:49Z

Hello @wonjoolee95

Do we have any progress?
Anything that I can help?

mfatih7 · 2023-03-31T18:15:28Z

Hello @wonjoolee95

Do we have any progress?

wonjoolee95 · 2023-04-03T07:31:46Z

Hey @mfatih7, apologies for the late reply -- I was able to reproduce but unfortunately could not find any bandwidth to work on this. Would you be comfortable to make the code edit? This is our op lowering doc that describes how ops work in PyTorch/XLA (https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md), and the code for this op is implemented at https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp#L3259. If not, I should be able to take some time next week to make the change.

mfatih7 · 2023-04-03T18:25:50Z

Hi @wonjoolee95

I think it is better for me to wait for your update and test it on my setup immediately.
I am looking forward to your update.

mfatih7 · 2023-04-14T12:43:56Z

Hi @wonjoolee95

Have you found any time to work on this issue?

mfatih7 · 2023-05-17T18:30:10Z

Hi @wonjoolee95, @JackCaoG

Do we have any update?

wonjoolee95 · 2023-05-17T18:32:35Z

I was finally able to find some time last week and have a local branch. Let me push a PR by today.

mfatih7 · 2023-05-17T18:38:20Z

@wonjoolee95

Thank you for the answer.

Could you let me know if you checked on the scripts I supplied before?

Do you think I should check with my training pipeline?

wonjoolee95 · 2023-05-17T20:10:20Z

@mfatih7, I'm checking based on this set of code:

import numpy as np
import torch

### FOR CPU #################
device = torch.device('cpu')
###########################

### FOR GPU #################
# device = torch.device('cuda')
###########################

### FOR TPU #################
# import torch_xla
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()
###########################

a = np.ones( (8, 9, 9) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
A = A.to(torch.float32)
A = A.to(device)

print('A ' + str(A.size()) + ' ' + str(A.dtype) + ' ' + str(A.device))
# print(A)

T1, T2, vh = torch.linalg.svd(A, full_matrices=False)

print('T1 ' + str(T1.size()) + ' ' + str(T1.dtype) + ' ' + str(T1.device))
print('T2 ' + str(T2.size()) + ' ' + str(T2.dtype) + ' ' + str(T2.device))
print('vh ' + str(vh.size()) + ' ' + str(vh.dtype) + ' ' + str(vh.device))

# print(vh)

I'll finish up and test against this to check the returned values of torch.linalg.svd are the same. And just to verify, could you report the versions of torch and torch_xla you're using this test this against? You can do:

import torch; import torch_xla
print(torch.__version__)
print(torch_xla.__version__)

One thing I noticed while I was working on this is that we actually have a cpp unit test (https://github.com/pytorch/xla/blob/master/test/cpp/test_aten_xla_tensor.cpp#L919) that compares XLA results to PyTorch results, so it's a bit odd we're seeing such problem.

And I'm cleaning up my dev env a bit, I'm seeing an error right now while I try to run torch.linalg.svd in my TPUVM:

>>> torch.linalg.svd(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: svd: LAPACK library not found in compilation

mfatih7 · 2023-05-20T17:39:50Z

@wonjoolee95

On Colab, after the initialization with

import os
assert os.environ['COLAB_TPU_ADDR']

!pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl --force-reinstall

I am getting the same data with wrong dimensions

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1

The versions are

2.0.0+cu117
2.0.0.dev20230516+colab

mfatih7 · 2023-05-20T17:48:33Z

@wonjoolee95

Are you missing LAPACK package?

Are you sure that your unit test code does not have any errors?

wonjoolee95 · 2023-05-22T05:41:05Z

Thank you for noting the versions. My TPUVM dev env must have something messed up in installing/finding the LAPACK package while building PyTorch, as they're still giving me that error although I've manually tried to install them multiple times. I'm deleting this one and creating a new env. Also, could you confirm if you see this types of error for other inputs as well or if it's specific to the input shape provided in the example?

Regarding the unit tests, we compare the result of XLA ops with PyTorch. And we do this for linalg.svd as well (https://github.com/pytorch/xla/blob/master/test/cpp/test_aten_xla_tensor.cpp#L919).

mfatih7 · 2023-05-22T17:01:32Z

@wonjoolee95

I tried svd on pytorch and pytorch-xla with the test script for the inputs below.

a = np.ones( (8, 9, 9) )
a = np.ones( (20, 9, 9) )
a = np.ones( (11, 9, 9) )

With pytorch I get

A torch.Size([8, 9, 9]) torch.float32 cpu
T1 torch.Size([8, 9, 9]) torch.float32 cpu
T2 torch.Size([8, 9]) torch.float32 cpu
vh torch.Size([8, 9, 9]) torch.float32 CPU

A torch.Size([20, 9, 9]) torch.float32 cpu
T1 torch.Size([20, 9, 9]) torch.float32 cpu
T2 torch.Size([20, 9]) torch.float32 cpu
vh torch.Size([20, 9, 9]) torch.float32 CPU

A torch.Size([11, 9, 9]) torch.float32 cpu
T1 torch.Size([11, 9, 9]) torch.float32 cpu
T2 torch.Size([11, 9]) torch.float32 cpu
vh torch.Size([11, 9, 9]) torch.float32 cpu

With pytorch-xla I get

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1

A torch.Size([20, 9, 9]) torch.float32 xla:1
T1 torch.Size([20, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 20, 9]) torch.float32 xla:1

A torch.Size([11, 9, 9]) torch.float32 xla:1
T1 torch.Size([11, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 11, 9]) torch.float32 xla:1

Since the output data dimensions do not match. I do not check their content.
The platform is colab with the previous software versions.

2.0.0+cu117
2.0.0.dev20230516+colab

mfatih7 · 2023-08-14T16:00:29Z

Hello @wonjoolee95 and @JackCaoG

Do you have any update for lowering torch.linalg.svd() ?

best regards

mfatih7 · 2023-08-30T15:52:17Z

Hello @wonjoolee95 and @JackCaoG

I am still getting the same error for torch.linalg.svd() on TPU.

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1
2.0.1+cu118
2.0.0.dev20230516+colab

Before your update, I could run the function slowly but mathematically correctly.
Now, I cannot use torch.linalg.svd() with the current version.

Would you happen to have any plans to update the function?

mfatih7 · 2023-09-04T18:19:14Z

Ok, here is the whole story @JackCaoG, @wonjoolee95, and @mateuszlewko

In order to compute the singular value decomposition of a matrix in Pytorch we have 2 alternatives if there are not more.
The alternatives are torch.linalg.svd and torch.svd

Since torch.svd will be deprecated I chose to use torch.linalg.svd.

At first, the lowering of torch.linalg.svd function did not exist.

After it was lowered within Pytorch 2.0, I realized that the dimensions of the torch.linalg.svd result in xla devices is wrong and inconsistent with its dimensions generated in CPU or GPU.

Then I waited for a long time for the correction of torch.linalg.svd lowering.

After I realize this update I decided to test both torch.linalg.svd and torch.svd on a single script on CPU and XLA devices using the script below.

!pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl

import numpy as np
import torch

device_list = []

### UNCOMMENT OUT FOR CPU ###
device_list.append( torch.device('cpu') )
###########################

### UNCOMMENT OUT FOR GPU ###
# device_list.append( torch.device('cuda') )
###########################

### UNCOMMENT OUT FOR TPU ###
import torch_xla
import torch_xla.core.xla_model as xm
device_list.append( xm.xla_device() )
###########################

a = np.ones( (2, 3, 4) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
A = A.to(torch.float32)

functions = ['torch.linalg.svd', 'torch.svd']
U_list = []
S_list = []
Vh_list = []

print('-'*80)
for device in device_list:
    for function in functions:

        A = A.to(device)
    
        print('A ' + str(A.size()) + ' ' + str(A.dtype) + ' ' + str(A.device))
        # print(A)
        
        if(function == 'torch.linalg.svd'):            
            U, S, Vh = torch.linalg.svd(A, full_matrices=False)
        elif(function == 'torch.svd'):
            U, S, V = torch.svd(A, some=True)
            Vh = V.permute(0, 2, 1)
						
# Correction for torch.linalg.svd function in xla devices
        if( device.type == 'xla' and function == 'torch.linalg.svd' ):
            Vh = Vh.permute(1, 2, 0)

        if( device.type != 'cpu' ):
            U = U.detach().cpu()
            S = S.detach().cpu()
            Vh = Vh.detach().cpu() 

        print('U ' + str(U.size()) + ' ' + str(U.dtype) + ' ' + str(U.device) + ' using ' + function )
        print('S ' + str(S.size()) + ' ' + str(S.dtype) + ' ' + str(S.device) + ' using ' + function )
        print('Vh ' + str(Vh.size()) + ' ' + str(Vh.dtype) + ' ' + str(Vh.device) + ' using ' + function )
        print('-'*80)

        U_list.append( U )
        S_list.append( S )
        Vh_list.append( Vh )
        
for res1_id, res1 in enumerate(U_list):
    for res2_id in range(res1_id+1, len(U_list)):
        print( str( torch.all(torch.eq(U_list[res1_id], U_list[res2_id])) ) )
        print( str( torch.all(torch.eq(S_list[res1_id], S_list[res2_id])) ) )
        print( str( torch.all(torch.eq(Vh_list[res1_id], Vh_list[res2_id])) ) )
print('-'*80)
for res1_id, res1 in enumerate(S_list):
    print(S_list[res1_id])
print('-'*80)
for res1_id, res1 in enumerate(Vh_list):
    print(Vh_list[res1_id])
print('-'*80)

print(torch.__version__)
print(torch_xla.__version__)

The output of the script is as below

--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 cpu
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.linalg.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.linalg.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.linalg.svd
--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 cpu
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.svd
--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 xla:1
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.linalg.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.linalg.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.linalg.svd
--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 xla:1
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.svd
--------------------------------------------------------------------------------
tensor(True)
tensor(True)
tensor(True)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(True)
tensor(True)
tensor(True)
--------------------------------------------------------------------------------
tensor([[2.5437e+01, 1.7226e+00, 2.8646e-07],
        [6.5189e+01, 6.7217e-01, 9.4831e-07]])
tensor([[2.5437e+01, 1.7226e+00, 2.8646e-07],
        [6.5189e+01, 6.7217e-01, 9.4831e-07]])
tensor([[2.5437e+01, 1.7226e+00, 4.4650e-07],
        [6.5189e+01, 6.7217e-01, 1.5007e-06]])
tensor([[2.5437e+01, 1.7226e+00, 4.4650e-07],
        [6.5189e+01, 6.7217e-01, 1.5007e-06]])
--------------------------------------------------------------------------------
tensor([[[-0.4036, -0.4647, -0.5259, -0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [-0.4506,  0.8329, -0.3140, -0.0683]],

        [[-0.4599, -0.4861, -0.5122, -0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [-0.5256,  0.5858,  0.4051, -0.4653]]])
tensor([[[-0.4036, -0.4647, -0.5259, -0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [-0.4506,  0.8329, -0.3140, -0.0683]],

        [[-0.4599, -0.4861, -0.5122, -0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [-0.5256,  0.5858,  0.4051, -0.4653]]])
tensor([[[ 0.4036,  0.4647,  0.5259,  0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [ 0.5400, -0.7883, -0.0434,  0.2917]],

        [[ 0.4599,  0.4861,  0.5122,  0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [ 0.5333, -0.6179, -0.3640,  0.4486]]])
tensor([[[ 0.4036,  0.4647,  0.5259,  0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [ 0.5400, -0.7883, -0.0434,  0.2917]],

        [[ 0.4599,  0.4861,  0.5122,  0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [ 0.5333, -0.6179, -0.3640,  0.4486]]])
--------------------------------------------------------------------------------
2.0.0+cu117
2.0.0.dev20230516+colab

I realized that although the dimensions of torch.linalg.svd output are not correct its content is not errornous.

When the output dimensions of the torch.linalg.svd is corrected with

# Correction for torch.linalg.svd function in xla devices
        if( device.type == 'xla' and function == 'torch.linalg.svd' ):
            Vh = Vh.permute(1, 2, 0)

torch.linalg.svd and torch.svd computations on both CPU and XLA devices all become consistent and correct.

To conclude, torch.linalg.svd xla lowering is still not correct.
But when the permutations of the dimensions are changed it can be used.
torch.svd implementations on both CPU and XLA are correct but it will be deprecated in upcoming releases.

JackCaoG added the op lowering label Jan 25, 2023

This was referenced Jan 29, 2023

XLA support Lightning-AI/torchmetrics#1466

Closed

Slow Validation #4532

Open

wonjoolee95 mentioned this issue Jan 31, 2023

Lower _linalg_svd #4537

Merged

JackCaoG closed this as completed in #4537 Feb 1, 2023

wonjoolee95 reopened this Feb 1, 2023

wonjoolee95 self-assigned this Feb 3, 2023

Funtions without XLA compilations #4511

Funtions without XLA compilations #4511

Comments

mfatih7 commented Jan 25, 2023

JackCaoG commented Jan 25, 2023

mfatih7 commented Jan 25, 2023

JackCaoG commented Jan 26, 2023

mfatih7 commented Jan 26, 2023

JackCaoG commented Jan 26, 2023

mfatih7 commented Jan 26, 2023

mfatih7 commented Jan 26, 2023 • edited

mfatih7 commented Jan 28, 2023

mfatih7 commented Jan 28, 2023

JackCaoG commented Jan 30, 2023

wonjoolee95 commented Jan 30, 2023

mfatih7 commented Jan 30, 2023 • edited

wonjoolee95 commented Jan 30, 2023

mfatih7 commented Jan 30, 2023

wonjoolee95 commented Feb 1, 2023

mfatih7 commented Feb 1, 2023

wonjoolee95 commented Feb 1, 2023

mfatih7 commented Feb 3, 2023

wonjoolee95 commented Feb 3, 2023

mfatih7 commented Feb 4, 2023 • edited

mfatih7 commented Feb 4, 2023

mfatih7 commented Feb 8, 2023

mfatih7 commented Mar 5, 2023 • edited

mfatih7 commented Mar 5, 2023

wonjoolee95 commented Mar 6, 2023

JackCaoG commented Mar 7, 2023

mfatih7 commented Mar 8, 2023 • edited

mfatih7 commented Mar 12, 2023

mfatih7 commented Mar 22, 2023

wonjoolee95 commented Mar 22, 2023

mfatih7 commented Mar 22, 2023 • edited

mfatih7 commented Mar 22, 2023

mfatih7 commented Mar 22, 2023

mfatih7 commented Mar 22, 2023 • edited

mfatih7 commented Mar 24, 2023

mfatih7 commented Mar 31, 2023

wonjoolee95 commented Apr 3, 2023

mfatih7 commented Apr 3, 2023

mfatih7 commented Apr 14, 2023

mfatih7 commented May 17, 2023

wonjoolee95 commented May 17, 2023

mfatih7 commented May 17, 2023

wonjoolee95 commented May 17, 2023

mfatih7 commented May 20, 2023 • edited

mfatih7 commented May 20, 2023

wonjoolee95 commented May 22, 2023

mfatih7 commented May 22, 2023 • edited

mfatih7 commented Aug 14, 2023 • edited

mfatih7 commented Aug 30, 2023

mfatih7 commented Sep 4, 2023

mfatih7 commented Jan 26, 2023 •

edited

mfatih7 commented Jan 30, 2023 •

edited

mfatih7 commented Feb 4, 2023 •

edited

mfatih7 commented Mar 5, 2023 •

edited

mfatih7 commented Mar 8, 2023 •

edited

mfatih7 commented Mar 22, 2023 •

edited

mfatih7 commented Mar 22, 2023 •

edited

mfatih7 commented May 20, 2023 •

edited

mfatih7 commented May 22, 2023 •

edited

mfatih7 commented Aug 14, 2023 •

edited