Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Funtions without XLA compilations #4511

Open
mfatih7 opened this issue Jan 25, 2023 · 50 comments · Fixed by #4537
Open

Funtions without XLA compilations #4511

mfatih7 opened this issue Jan 25, 2023 · 50 comments · Fixed by #4537
Assignees

Comments

@mfatih7
Copy link

mfatih7 commented Jan 25, 2023

Hello

print(met.metrics_report())

According to the metrics reported printed after an XLA training session I observe that the functions cannot be processed in TPU.

Counter: aten::_linalg_svd
  Value: 3276
Counter: aten::_local_scalar_dense
  Value: 16380
Counter: aten::_unique2
  Value: 3276
Counter: aten::bincount
  Value: 3276

What should I do?

@JackCaoG
Copy link
Collaborator

_local_scalar_dense is somewhat exepcted, I think it means you try to move the tensor from xla device to CPU too fequrent, one possible cause is logging.

_linalg_svd I think we can support, I will add it to the op lowering queue.

unique and bincount would be very difficult to support since the output shape of these two value are input value dependent, and that will likely request in frequent recompilation. Is there an way to avoid using these two ops in your model?

@mfatih7
Copy link
Author

mfatih7 commented Jan 25, 2023

Thanks @JackCaoG

I am not sure about the positions of the functions.
For unique I found np.unique function.
It is located in dataset functions.
Therefore it must be operating on the CPU side in my opinion.

For bincount I could not find its actual position.
How can I find it?
Can it be in torch functions that I use to construct models.

_linalg_svd function is located in the model and I need its XLA implementation.
It is singular value decomposition which is quite common in deep learning models.

@JackCaoG
Copy link
Collaborator

maybe run your model with

XLA_SAVE_TENSORS_FMT='text' XLA_SAVE_TENSORS_FILE='/tmp/save1.ir'

then you should find the IR file with python file line. Please checkout https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#environment-variables

@mfatih7
Copy link
Author

mfatih7 commented Jan 26, 2023

@JackCaoG

I inserted the following lines to my scripts

os.environ['XLA_SAVE_TENSORS_FMT'] = 'text'
os.environ['XLA_SAVE_TENSORS_FILE'] = '/tmp/save1.ir'

I cannot see /tmp folder generated
Do we need more environment variables to be set?

@JackCaoG
Copy link
Collaborator

hmm, no, those two should be enough. @wonjoolee95 can you follow up?

@mfatih7
Copy link
Author

mfatih7 commented Jan 26, 2023

even if I generate the folder manually still no .ir file is generated

@mfatih7
Copy link
Author

mfatih7 commented Jan 26, 2023

Apart from

XLA_SAVE_TENSORS_FMT='text' XLA_SAVE_TENSORS_FILE='/tmp/save1.ir

The pt-xla-profiler messages are like below

pt-xla-profiler: TransferFromServerTime too frequent: 25755 counts during 3183 steps
pt-xla-profiler: Op(s) not lowered: aten::_linalg_svd, aten::_unique2, aten::bincount,  Please open a GitHub issue with the above op lowering requests.

Does it mean _unique2 and bincount functions are called internally in _linalg_svd ?

When can we have _linalg_svd lowered in XLA @JackCaoG ?

@mfatih7
Copy link
Author

mfatih7 commented Jan 28, 2023

hi @JackCaoG and @wonjoolee95

I am now sure that _unique2 and bincount functions are called outside of the _linalg_svd function.
Also, I have the .ir file now.
How can I interpret it?
It is more than 50k lines.
Would you like me to share it with you? Which part of it do I paste here?

@mfatih7
Copy link
Author

mfatih7 commented Jan 28, 2023

OK

As far as I understand .ir (XLA_SAVE_TENSORS_FILE) file points XLA recompilations positions in the program flow.
It is really useful.
Should I consider only TensorsGraphInfo logs or should I also understand the content of the lines in IR Graphs?

@JackCaoG
Copy link
Collaborator

In the IR graph, if you search for _unique2 and bincount, can you see them? If they are falling back there is a chance that they don't show up in the dump, since graph will breaks when it sees an fallback op.

As you said, you can use TensorsGraphInfo log to see where graph break happen and that might help you to figure out where fallback is happening. Another thing you can do is bisect your training code and check whether

Op(s) not lowered: aten::_linalg_svd, aten::_unique2, aten::bincount,  Please open a GitHub issue with the above op lowering requests.

is printed, which can help you find the offending code.

@wonjoolee95
Copy link
Collaborator

For _linalg_svd, it's actually a part of a bigger PR that we currently have open for enabling functionalization. Let me try to make a separate PR for that.

@mfatih7
Copy link
Author

mfatih7 commented Jan 30, 2023

Thank you very much @wonjoolee95 and @JackCaoG

Can you give a time prediction? When can I use _linalg_svd without XLA recompilations?
I could manage to eliminate all other XLA compilations in my scripts.
But I have to use _linalg_svd in my studies and it is very important for me.

Even if I run my scripts with _linalg_svd recompilations can I observe more problems?

For example here I have a problem as you know.

Till now I mainly used TPUs with a single core.
Before running my scripts with multiple TPU cores I just want to be sure that everything is OK.

@wonjoolee95
Copy link
Collaborator

I'll try to have the PR by today and merge it by tomorrow. Should anything come up, I'll keep you updated in this thread.

@mfatih7
Copy link
Author

mfatih7 commented Jan 30, 2023

Thank you very much

As soon as it is ready I will run and give you feedback.

@wonjoolee95
Copy link
Collaborator

#4537 is just merged. @mfatih7, if you build pytorch/xla source, can you rebase with master and give a try again? If you use nightly image, hopefully the image will be updated in tonight's nightly image with these changes.

@wonjoolee95 wonjoolee95 reopened this Feb 1, 2023
@mfatih7
Copy link
Author

mfatih7 commented Feb 1, 2023

Thank you @wonjoolee95

I am sticking to the instructions for single core here.

At the top of my notebook, I use

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl .

Should I use something like?

# VERSION = "1.13"  #@param ["1.13", "nightly", "20220315"]  # or YYYYMMDD format
# !curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
# !python pytorch-xla-env-setup.py --version $VERSION
# import os 
# os.environ['LD_LIBRARY_PATH']='/usr/local/lib'
# !echo $LD_LIBRARY_PATH

# !sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
# !sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
# !sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

# !ldconfig
# !ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

@wonjoolee95
Copy link
Collaborator

The command (!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl) installs the 1.13 release image, which is not what you want to do. You should use the cell below to use the nightly version. The nightly image tomorrow (i.e, 20230201) should have this change. Let me know if you run into any other problems.

@mfatih7
Copy link
Author

mfatih7 commented Feb 3, 2023

Hello @wonjoolee95

I think I am facing the same error here.

On COLAB, I am running

VERSION = "20230201"  #@param ["1.13", "nightly", "20220315"]  # or YYYYMMDD format
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version $VERSION
import os 
os.environ['LD_LIBRARY_PATH']='/usr/local/lib'
!echo $LD_LIBRARY_PATH

!sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
!sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
!sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

!ldconfig
!ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

and getting the error

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  6034  100  6034    0     0  36131      0 --:--:-- --:--:-- --:--:-- 36131
Updating... This may take around 2 minutes.
Updating TPU runtime to pytorch-dev20230201 ...
WARNING: Skipping torch as it is not installed.
WARNING: Skipping torchvision as it is not installed.
CommandException: No URLs matched: gs://tpu-pytorch/wheels/colab/torch-nightly+20230201-cp38-cp38m-linux_x86_64.whl
CommandException: No URLs matched: gs://tpu-pytorch/wheels/colab/torch_xla-nightly+20230201-cp38-cp38m-linux_x86_64.whl
CommandException: No URLs matched: gs://tpu-pytorch/wheels/colab/torchvision-nightly+20230201-cp38-cp38m-linux_x86_64.whl
WARNING: Requirement 'torch-nightly+20230201-cp38-cp38m-linux_x86_64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: torch-nightly+20230201-cp38-cp38m-linux_x86_64.whl is not a supported wheel on this platform.
WARNING: Requirement 'torch_xla-nightly+20230201-cp38-cp38m-linux_x86_64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: torch_xla-nightly+20230201-cp38-cp38m-linux_x86_64.whl is not a supported wheel on this platform.
WARNING: Requirement 'torchvision-nightly+20230201-cp38-cp38m-linux_x86_64.whl' looks like a filename, but the file does not exist
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
ERROR: torchvision-nightly+20230201-cp38-cp38m-linux_x86_64.whl is not a supported wheel on this platform.

@wonjoolee95
Copy link
Collaborator

It looks like our installation script at https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py was out-of-date. We've just updated it, so give it a try again. With that said, a couple of suggestions:

  1. For VERSION, try VERSION=20230203, as I've tested with this and seems to be working.
  2. If installation still fails for some reason, let us know but in the meanwhile to unblock yourself, you can also try a manual installation in colab like:
!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly+20230203-cp38-cp38-linux_x86_64.whl https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly+20230203-cp38-cp38-linux_x86_64.whl

This script should install the nightly torch and torch-xla.

@wonjoolee95 wonjoolee95 self-assigned this Feb 3, 2023
@mfatih7
Copy link
Author

mfatih7 commented Feb 4, 2023

Thank you @wonjoolee95

With VERSION 20230203, the training procodure moves a bit forward but now I have the error below.

File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 221, in <module> from torch._C import * # noqa: F403 ImportError: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: undefined symbol: cblas_sgemm_pack_get_size

@mfatih7
Copy link
Author

mfatih7 commented Feb 4, 2023

@wonjoolee95

Your second suggestion

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly+20230203-cp38-cp38-linux_x86_64.whl https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly+20230203-cp38-cp38-linux_x86_64.whl

Gives the error below

File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 220, in <module> _load_global_deps() File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 180, in _load_global_deps raise err File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 175, in _load_global_deps ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL) File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__ self._handle = _dlopen(self._name, mode) OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

@mfatih7
Copy link
Author

mfatih7 commented Feb 8, 2023

Hi @wonjoolee95

Any update? I have not tested _linalg_svd lowering yet.

@mfatih7
Copy link
Author

mfatih7 commented Mar 5, 2023

Hi @wonjoolee95

In COLAB, when I use

!pip install cloud-tpu-client==0.10 
!pip install torch==1.13.0 
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl

training occurs without _linalg_svd lowering.

When I use

!pip install cloud-tpu-client==0.10
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly+20230203-cp38-cp38-linux_x86_64.whl
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly+20230203-cp38-cp38-linux_x86_64.whl

or

!pip install cloud-tpu-client==0.10
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly-cp38-cp38-linux_x86_64.whl
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly-cp38-cp38-linux_x86_64.whl

I get the error

Traceback (most recent call last):
  File "runTrain_n_to_n_TPU_multi.py", line 1, in <module>
    import params
  File "/content/drive/MyDrive/00_featureMatching_01/params.py", line 2, in <module>
    import torch
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 228, in <module>
    _load_global_deps()
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 187, in _load_global_deps
    raise err
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 168, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/usr/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

So I cannot test _linalg_svd lowering in my setup
What should I do ?

@mfatih7
Copy link
Author

mfatih7 commented Mar 5, 2023

OK I read here

and updated the installation lines like

!pip install cloud-tpu-client==0.10
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch-nightly-cp38-cp38-linux_x86_64.whl
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-nightly-cp38-cp38-linux_x86_64.whl

# VERSION = "1.13"  #@param ["1.13", "nightly", "20220315"]  # or YYYYMMDD format
!curl https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py -o pytorch-xla-env-setup.py
!python pytorch-xla-env-setup.py --version nightly
import os 
os.environ['LD_LIBRARY_PATH']='/usr/local/lib'
!echo $LD_LIBRARY_PATH

!sudo ln -s /usr/local/lib/libmkl_intel_lp64.so /usr/local/lib/libmkl_intel_lp64.so.1
!sudo ln -s /usr/local/lib/libmkl_intel_thread.so /usr/local/lib/libmkl_intel_thread.so.1
!sudo ln -s /usr/local/lib/libmkl_core.so /usr/local/lib/libmkl_core.so.1

!ldconfig
!ldd /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch.so

this time I get the error

File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_cpu.so: undefined symbol: 
cblas_sgemm_pack_get_size

What should I do ?

@wonjoolee95
Copy link
Collaborator

Sorry for the late reply. So I did a little digging around for this and it looks similar to some past issues: pytorch/pytorch#18932 and pytorch/pytorch#10234. And this seems to be affecting when we try to install any torch nightly images in Colab. The pytorch/pytorch#10234 suggests to build from source, which should work but is not easily possible in Colab.

If I build from source locally, I can confirm that these nightly images work. @mfatih7, is building from source in a TPUVM a possible working option for you? This seems to be an issue only when we install the nightly images in Colab, so just trying to see if there are other ways to unblock you.

@JackCaoG
Copy link
Collaborator

JackCaoG commented Mar 7, 2023

2.0 release will be out in ~2 weeks, I think that shoud ship with whl that have the fix.

@mfatih7
Copy link
Author

mfatih7 commented Mar 8, 2023

Hi @wonjoolee95

According to your suggestion, I try to run my scripts on Google Cloud.
To do so, first I run the scripts here on 2 separate TPUVMs.

This works fine

cd /usr/share/
sudo git clone -b release/1.13 --recursive https://github.com/pytorch/pytorch 
cd pytorch/
sudo git clone -b r1.13 --recursive https://github.com/pytorch/xla.git
cd xla/
yes | sudo pip3 uninstall torch_xla
yes | sudo pip3 uninstall torch
yes | sudo pip3 uninstall torch_vision
sudo pip3 install torch==1.13.0
sudo pip3 install torchvision==0.14.0
sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-1.13-cp38-cp38-linux_x86_64.whl
sudo rm -rf /usr/local/lib/python3.8/dist-packages/libtpu*
sudo pip3 install torch_xla[tpuvm]

However, when I modify the script like the following to run on nightly wheels

cd /usr/share/
sudo git clone -b release/1.13 --recursive https://github.com/pytorch/pytorch 
cd pytorch/
sudo git clone -b r1.13 --recursive https://github.com/pytorch/xla.git
cd xla/
yes | sudo pip3 uninstall torch_xla
yes | sudo pip3 uninstall torch
yes | sudo pip3 uninstall torch_vision
#sudo pip3 install torch==1.13.0
sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch-nightly-cp38-cp38-linux_x86_64.whl
sudo pip3 install torchvision==0.14.0
sudo pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torch_xla-nightly-cp38-cp38-linux_x86_64.whl
sudo rm -rf /usr/local/lib/python3.8/dist-packages/libtpu*
sudo pip3 install torch_xla[tpuvm]

I get the error

  File "/usr/local/lib/python3.8/dist-packages/torch_xla/__init__.py", line 134, in <module>
    import _XLAC
ImportError: /usr/local/lib/python3.8/dist-packages/_XLAC.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK5torch8autograd4Node4nameB5cxx11Ev

What should I do?

If I can observe that lowering is successful it will be very good before the 2.0 release @JackCaoG

@mfatih7
Copy link
Author

mfatih7 commented Mar 12, 2023

Hi @JackCaoG and @wonjoolee95

I could have not verified the _linalg_svd lowering in my setup yet.

@mfatih7
Copy link
Author

mfatih7 commented Mar 22, 2023

Hi @JackCaoG and @wonjoolee95

Today I updated my COLAB setup to work with torch==2.0.0 and torch_xla-2.0.

import torch
import torch_xla

print(torch.__version__)
print(torch_xla.__version__)
2.0.0+cu117
2.0

However, I have error about _linalg_svd

Exception in device=TPU:4: index 8 is out of bounds for dimension 1 with size 8
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 334, in _mp_start_fn
    _start_fn(index, pf_cfg, fn, args)
  File "/usr/local/lib/python3.9/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 328, in _start_fn
    fn(gindex, *args)
  File "/content/drive/MyDrive/00_featureMatching_01/train_network_n_to_n_TPU_multi.py", line 277, in train_network
    outputs, outputs_essential_matrix, outputs_geo_distance = model(inputs, inputs_metadata_device)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/content/drive/MyDrive/00_featureMatching_01/models/LTFGC.py", line 158, in forward
    out_regression = predict_essential_matrix_with_8_point_algorithm(
  File "/content/drive/MyDrive/00_featureMatching_01/models/models_functions.py", line 79, in predict_essential_matrix_with_8_point_algorithm
    out_regression = vh[:,8,:]

The size of vh is torch.Size([9, 8, 9]) in TPU.
However, in my local setup, the size of vh is torch.Size([8, 9, 9])

Here is the code from my setup.
That is the only place I use torch.linalg.svd

  _, _, vh = torch.linalg.svd(A, full_matrices=False)  
  print(vh.size())  
  out_regression = vh[:,8,:]

I often asked about testing the lowering in my setup using nightly releases.
In both COLAB and CLOUD, I could not run nightly releases.
Since I cannot test your lowering you could not get feedback and solve possible bugs.
Therefore now we have Pytorch XLA 2.0 with the bug.

What should I do?
This is becoming urgent for me.

@wonjoolee95
Copy link
Collaborator

Hi, in the code above:

  _, _, vh = torch.linalg.svd(A, full_matrices=False)  
  print(vh.size())  
  out_regression = vh[:,8,:]

Could you give me the value for the input tensor A? I can try to reproduce on my side and look for a fix.

@mfatih7
Copy link
Author

mfatih7 commented Mar 22, 2023

Hello

Thank you for your answer.
Here is a code to use with both GPU and TPU

import numpy as np
import torch

### COMMENT OUT FOR GPU ###
DEVICE = torch.device('cuda')
###########################

### COMMENT OUT FOR TPU ###
# import torch_xla
# import torch_xla.core.xla_model as xm
# DEVICE = xm.xla_device()
###########################

a = np.ones( (8, 9, 9) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
# A = A.to(torch.float32)

print('A ' + str(A.size()) + ' ' + str(A.dtype))
print(A)

T1, T2, vh = torch.linalg.svd(A, full_matrices=False)

print('T1 ' + str(T1.size()) + ' ' + str(T1.dtype))
print('T2 ' + str(T2.size()) + ' ' + str(T2.dtype))
print('vh ' + str(vh.size()) + ' ' + str(vh.dtype))

print(vh)

Interestingly this works for both GPU and TPU.
Here is its output.

A torch.Size([20, 9, 9]) torch.float64
T1 torch.Size([20, 9, 9]) torch.float64
T2 torch.Size([20, 9]) torch.float64
vh torch.Size([20, 9, 9]) torch.float64

Here is the notebook code that I use in COLAB

!pip install cloud-tpu-client==0.10
!pip install torch==2.0.0
!pip install torchvision==0.15.1
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp39-cp39-linux_x86_64.whl           

%cd /content/drive/MyDrive/
!python svd_error_test.py

I am also rerunning my training scripts.
However, it continuously gives errors.
I can supply whatever needed
Do you want to run my scripts on COLAB?

@mfatih7
Copy link
Author

mfatih7 commented Mar 22, 2023

I have similar prints in my training scripts.
And here is what I see when the training fails

.
.
IndexError: index 8 is out of bounds for dimension 1 with size 8
A torch.Size([8, 9, 9]) torch.float32
V1 torch.Size([8, 9, 9]) torch.float32
V2 torch.Size([9]) torch.float32
vh torch.Size([9, 8, 9]) torch.float32
Exception in device=TPU:3: index 8 is out of bounds for dimension 1 with size 8
Traceback (most recent call last):
.
.

@mfatih7
Copy link
Author

mfatih7 commented Mar 22, 2023

Sorry sorry
I realized that I was not transferring data to device.
Give me half hour

@mfatih7
Copy link
Author

mfatih7 commented Mar 22, 2023

Hello @wonjoolee95

Just use the notebook code here

!pip install cloud-tpu-client==0.10
!pip install torch==2.0.0
!pip install torchvision==0.15.1
!pip install https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp39-cp39-linux_x86_64.whl           

%cd /content/drive/MyDrive/
!python svd_error_test.py

to run the svd_error_test.py below.

import numpy as np
import torch

### FOR CPU #################
device = torch.device('cpu')
###########################

### FOR GPU #################
# device = torch.device('cuda')
###########################

### FOR TPU #################
# import torch_xla
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()
###########################

a = np.ones( (8, 9, 9) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
A = A.to(torch.float32)
A = A.to(device)

print('A ' + str(A.size()) + ' ' + str(A.dtype) + ' ' + str(A.device))
# print(A)

T1, T2, vh = torch.linalg.svd(A, full_matrices=False)

print('T1 ' + str(T1.size()) + ' ' + str(T1.dtype) + ' ' + str(T1.device))
print('T2 ' + str(T2.size()) + ' ' + str(T2.dtype) + ' ' + str(T2.device))
print('vh ' + str(vh.size()) + ' ' + str(vh.dtype) + ' ' + str(vh.device))

# print(vh)

If you select CPU you get

A torch.Size([8, 9, 9]) torch.float32 cpu
T1 torch.Size([8, 9, 9]) torch.float32 cpu
T2 torch.Size([8, 9]) torch.float32 cpu
vh torch.Size([8, 9, 9]) torch.float32 cpu

If you select TPU you get

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1

vh is different for different devices definitely.

No need to run my training scripts anymore.
But if needed I am ready to share

@mfatih7
Copy link
Author

mfatih7 commented Mar 24, 2023

Hello @wonjoolee95

Do we have any progress?
Anything that I can help?

@mfatih7
Copy link
Author

mfatih7 commented Mar 31, 2023

Hello @wonjoolee95

Do we have any progress?

@wonjoolee95
Copy link
Collaborator

Hey @mfatih7, apologies for the late reply -- I was able to reproduce but unfortunately could not find any bandwidth to work on this. Would you be comfortable to make the code edit? This is our op lowering doc that describes how ops work in PyTorch/XLA (https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md), and the code for this op is implemented at https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp#L3259. If not, I should be able to take some time next week to make the change.

@mfatih7
Copy link
Author

mfatih7 commented Apr 3, 2023

Hi @wonjoolee95

I think it is better for me to wait for your update and test it on my setup immediately.
I am looking forward to your update.

@mfatih7
Copy link
Author

mfatih7 commented Apr 14, 2023

Hi @wonjoolee95

Have you found any time to work on this issue?

@mfatih7
Copy link
Author

mfatih7 commented May 17, 2023

Hi @wonjoolee95, @JackCaoG

Do we have any update?

@wonjoolee95
Copy link
Collaborator

I was finally able to find some time last week and have a local branch. Let me push a PR by today.

@mfatih7
Copy link
Author

mfatih7 commented May 17, 2023

@wonjoolee95

Thank you for the answer.

Could you let me know if you checked on the scripts I supplied before?

Do you think I should check with my training pipeline?

@wonjoolee95
Copy link
Collaborator

@mfatih7, I'm checking based on this set of code:

import numpy as np
import torch

### FOR CPU #################
device = torch.device('cpu')
###########################

### FOR GPU #################
# device = torch.device('cuda')
###########################

### FOR TPU #################
# import torch_xla
# import torch_xla.core.xla_model as xm
# device = xm.xla_device()
###########################

a = np.ones( (8, 9, 9) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
A = A.to(torch.float32)
A = A.to(device)

print('A ' + str(A.size()) + ' ' + str(A.dtype) + ' ' + str(A.device))
# print(A)

T1, T2, vh = torch.linalg.svd(A, full_matrices=False)

print('T1 ' + str(T1.size()) + ' ' + str(T1.dtype) + ' ' + str(T1.device))
print('T2 ' + str(T2.size()) + ' ' + str(T2.dtype) + ' ' + str(T2.device))
print('vh ' + str(vh.size()) + ' ' + str(vh.dtype) + ' ' + str(vh.device))

# print(vh)

I'll finish up and test against this to check the returned values of torch.linalg.svd are the same. And just to verify, could you report the versions of torch and torch_xla you're using this test this against? You can do:

import torch; import torch_xla
print(torch.__version__)
print(torch_xla.__version__)

One thing I noticed while I was working on this is that we actually have a cpp unit test (https://github.com/pytorch/xla/blob/master/test/cpp/test_aten_xla_tensor.cpp#L919) that compares XLA results to PyTorch results, so it's a bit odd we're seeing such problem.

And I'm cleaning up my dev env a bit, I'm seeing an error right now while I try to run torch.linalg.svd in my TPUVM:

>>> torch.linalg.svd(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: svd: LAPACK library not found in compilation

@mfatih7
Copy link
Author

mfatih7 commented May 20, 2023

@wonjoolee95

On Colab, after the initialization with

import os
assert os.environ['COLAB_TPU_ADDR']

!pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl --force-reinstall

I am getting the same data with wrong dimensions

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1

The versions are

2.0.0+cu117
2.0.0.dev20230516+colab

@mfatih7
Copy link
Author

mfatih7 commented May 20, 2023

@wonjoolee95

Are you missing LAPACK package?

Are you sure that your unit test code does not have any errors?

@wonjoolee95
Copy link
Collaborator

Thank you for noting the versions. My TPUVM dev env must have something messed up in installing/finding the LAPACK package while building PyTorch, as they're still giving me that error although I've manually tried to install them multiple times. I'm deleting this one and creating a new env. Also, could you confirm if you see this types of error for other inputs as well or if it's specific to the input shape provided in the example?

Regarding the unit tests, we compare the result of XLA ops with PyTorch. And we do this for linalg.svd as well (https://github.com/pytorch/xla/blob/master/test/cpp/test_aten_xla_tensor.cpp#L919).

@mfatih7
Copy link
Author

mfatih7 commented May 22, 2023

@wonjoolee95

I tried svd on pytorch and pytorch-xla with the test script for the inputs below.

a = np.ones( (8, 9, 9) )
a = np.ones( (20, 9, 9) )
a = np.ones( (11, 9, 9) )

With pytorch I get

A torch.Size([8, 9, 9]) torch.float32 cpu
T1 torch.Size([8, 9, 9]) torch.float32 cpu
T2 torch.Size([8, 9]) torch.float32 cpu
vh torch.Size([8, 9, 9]) torch.float32 CPU

A torch.Size([20, 9, 9]) torch.float32 cpu
T1 torch.Size([20, 9, 9]) torch.float32 cpu
T2 torch.Size([20, 9]) torch.float32 cpu
vh torch.Size([20, 9, 9]) torch.float32 CPU

A torch.Size([11, 9, 9]) torch.float32 cpu
T1 torch.Size([11, 9, 9]) torch.float32 cpu
T2 torch.Size([11, 9]) torch.float32 cpu
vh torch.Size([11, 9, 9]) torch.float32 cpu

With pytorch-xla I get

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1

A torch.Size([20, 9, 9]) torch.float32 xla:1
T1 torch.Size([20, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 20, 9]) torch.float32 xla:1

A torch.Size([11, 9, 9]) torch.float32 xla:1
T1 torch.Size([11, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 11, 9]) torch.float32 xla:1

Since the output data dimensions do not match. I do not check their content.
The platform is colab with the previous software versions.

2.0.0+cu117
2.0.0.dev20230516+colab

@mfatih7
Copy link
Author

mfatih7 commented Aug 14, 2023

Hello @wonjoolee95 and @JackCaoG

Do you have any update for lowering torch.linalg.svd() ?

best regards

@mfatih7
Copy link
Author

mfatih7 commented Aug 30, 2023

Hello @wonjoolee95 and @JackCaoG

I am still getting the same error for torch.linalg.svd() on TPU.

A torch.Size([8, 9, 9]) torch.float32 xla:1
T1 torch.Size([8, 9, 9]) torch.float32 xla:1
T2 torch.Size([9]) torch.float32 xla:1
vh torch.Size([9, 8, 9]) torch.float32 xla:1
2.0.1+cu118
2.0.0.dev20230516+colab

Before your update, I could run the function slowly but mathematically correctly.
Now, I cannot use torch.linalg.svd() with the current version.

Would you happen to have any plans to update the function?

@mfatih7
Copy link
Author

mfatih7 commented Sep 4, 2023

Ok, here is the whole story @JackCaoG, @wonjoolee95, and @mateuszlewko

In order to compute the singular value decomposition of a matrix in Pytorch we have 2 alternatives if there are not more.
The alternatives are torch.linalg.svd and torch.svd

Since torch.svd will be deprecated I chose to use torch.linalg.svd.

At first, the lowering of torch.linalg.svd function did not exist.

After it was lowered within Pytorch 2.0, I realized that the dimensions of the torch.linalg.svd result in xla devices is wrong and inconsistent with its dimensions generated in CPU or GPU.

Then I waited for a long time for the correction of torch.linalg.svd lowering.

After I realize this update I decided to test both torch.linalg.svd and torch.svd on a single script on CPU and XLA devices using the script below.

!pip install cloud-tpu-client==0.10 torch==2.0.0 torchvision==0.15.1 https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-2.0-cp310-cp310-linux_x86_64.whl

import numpy as np
import torch

device_list = []

### UNCOMMENT OUT FOR CPU ###
device_list.append( torch.device('cpu') )
###########################

### UNCOMMENT OUT FOR GPU ###
# device_list.append( torch.device('cuda') )
###########################

### UNCOMMENT OUT FOR TPU ###
import torch_xla
import torch_xla.core.xla_model as xm
device_list.append( xm.xla_device() )
###########################

a = np.ones( (2, 3, 4) )

tmp = 0 
for i in range(a.shape[0]):
	for j in range(a.shape[1]):
		for k in range(a.shape[2]):
			a[i,j,k] += tmp
			tmp += 1
            
A = torch.from_numpy(a)
A = A.to(torch.float32)

functions = ['torch.linalg.svd', 'torch.svd']
U_list = []
S_list = []
Vh_list = []

print('-'*80)
for device in device_list:
    for function in functions:

        A = A.to(device)
    
        print('A ' + str(A.size()) + ' ' + str(A.dtype) + ' ' + str(A.device))
        # print(A)
        
        if(function == 'torch.linalg.svd'):            
            U, S, Vh = torch.linalg.svd(A, full_matrices=False)
        elif(function == 'torch.svd'):
            U, S, V = torch.svd(A, some=True)
            Vh = V.permute(0, 2, 1)
						
# Correction for torch.linalg.svd function in xla devices
        if( device.type == 'xla' and function == 'torch.linalg.svd' ):
            Vh = Vh.permute(1, 2, 0)

        if( device.type != 'cpu' ):
            U = U.detach().cpu()
            S = S.detach().cpu()
            Vh = Vh.detach().cpu() 

        print('U ' + str(U.size()) + ' ' + str(U.dtype) + ' ' + str(U.device) + ' using ' + function )
        print('S ' + str(S.size()) + ' ' + str(S.dtype) + ' ' + str(S.device) + ' using ' + function )
        print('Vh ' + str(Vh.size()) + ' ' + str(Vh.dtype) + ' ' + str(Vh.device) + ' using ' + function )
        print('-'*80)

        U_list.append( U )
        S_list.append( S )
        Vh_list.append( Vh )
        
for res1_id, res1 in enumerate(U_list):
    for res2_id in range(res1_id+1, len(U_list)):
        print( str( torch.all(torch.eq(U_list[res1_id], U_list[res2_id])) ) )
        print( str( torch.all(torch.eq(S_list[res1_id], S_list[res2_id])) ) )
        print( str( torch.all(torch.eq(Vh_list[res1_id], Vh_list[res2_id])) ) )
print('-'*80)
for res1_id, res1 in enumerate(S_list):
    print(S_list[res1_id])
print('-'*80)
for res1_id, res1 in enumerate(Vh_list):
    print(Vh_list[res1_id])
print('-'*80)

print(torch.__version__)
print(torch_xla.__version__)

The output of the script is as below

--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 cpu
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.linalg.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.linalg.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.linalg.svd
--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 cpu
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.svd
--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 xla:1
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.linalg.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.linalg.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.linalg.svd
--------------------------------------------------------------------------------
A torch.Size([2, 3, 4]) torch.float32 xla:1
U torch.Size([2, 3, 3]) torch.float32 cpu using torch.svd
S torch.Size([2, 3]) torch.float32 cpu using torch.svd
Vh torch.Size([2, 3, 4]) torch.float32 cpu using torch.svd
--------------------------------------------------------------------------------
tensor(True)
tensor(True)
tensor(True)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(False)
tensor(True)
tensor(True)
tensor(True)
--------------------------------------------------------------------------------
tensor([[2.5437e+01, 1.7226e+00, 2.8646e-07],
        [6.5189e+01, 6.7217e-01, 9.4831e-07]])
tensor([[2.5437e+01, 1.7226e+00, 2.8646e-07],
        [6.5189e+01, 6.7217e-01, 9.4831e-07]])
tensor([[2.5437e+01, 1.7226e+00, 4.4650e-07],
        [6.5189e+01, 6.7217e-01, 1.5007e-06]])
tensor([[2.5437e+01, 1.7226e+00, 4.4650e-07],
        [6.5189e+01, 6.7217e-01, 1.5007e-06]])
--------------------------------------------------------------------------------
tensor([[[-0.4036, -0.4647, -0.5259, -0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [-0.4506,  0.8329, -0.3140, -0.0683]],

        [[-0.4599, -0.4861, -0.5122, -0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [-0.5256,  0.5858,  0.4051, -0.4653]]])
tensor([[[-0.4036, -0.4647, -0.5259, -0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [-0.4506,  0.8329, -0.3140, -0.0683]],

        [[-0.4599, -0.4861, -0.5122, -0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [-0.5256,  0.5858,  0.4051, -0.4653]]])
tensor([[[ 0.4036,  0.4647,  0.5259,  0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [ 0.5400, -0.7883, -0.0434,  0.2917]],

        [[ 0.4599,  0.4861,  0.5122,  0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [ 0.5333, -0.6179, -0.3640,  0.4486]]])
tensor([[[ 0.4036,  0.4647,  0.5259,  0.5870],
         [ 0.7329,  0.2898, -0.1532, -0.5962],
         [ 0.5400, -0.7883, -0.0434,  0.2917]],

        [[ 0.4599,  0.4861,  0.5122,  0.5384],
         [ 0.6989,  0.2525, -0.1940, -0.6404],
         [ 0.5333, -0.6179, -0.3640,  0.4486]]])
--------------------------------------------------------------------------------
2.0.0+cu117
2.0.0.dev20230516+colab

I realized that although the dimensions of torch.linalg.svd output are not correct its content is not errornous.

When the output dimensions of the torch.linalg.svd is corrected with

# Correction for torch.linalg.svd function in xla devices
        if( device.type == 'xla' and function == 'torch.linalg.svd' ):
            Vh = Vh.permute(1, 2, 0)

torch.linalg.svd and torch.svd computations on both CPU and XLA devices all become consistent and correct.

To conclude, torch.linalg.svd xla lowering is still not correct.
But when the permutations of the dimensions are changed it can be used.
torch.svd implementations on both CPU and XLA are correct but it will be deprecated in upcoming releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants