-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Funtions without XLA compilations #4511
Comments
|
Thanks @JackCaoG I am not sure about the positions of the functions. For
|
maybe run your model with
then you should find the IR file with python file line. Please checkout https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#environment-variables |
I inserted the following lines to my scripts
I cannot see |
hmm, no, those two should be enough. @wonjoolee95 can you follow up? |
even if I generate the folder manually still no |
Apart from
The pt-xla-profiler messages are like below
Does it mean When can we have |
hi @JackCaoG and @wonjoolee95 I am now sure that |
OK As far as I understand |
In the IR graph, if you search for As you said, you can use
is printed, which can help you find the offending code. |
For |
Thank you very much @wonjoolee95 and @JackCaoG Can you give a time prediction? When can I use Even if I run my scripts with For example here I have a problem as you know. Till now I mainly used TPUs with a single core. |
I'll try to have the PR by today and merge it by tomorrow. Should anything come up, I'll keep you updated in this thread. |
Thank you very much As soon as it is ready I will run and give you feedback. |
Thank you @wonjoolee95 I am sticking to the instructions for single core here. At the top of my notebook, I use
Should I use something like?
|
The command ( |
Hello @wonjoolee95 I think I am facing the same error here. On COLAB, I am running
and getting the error
|
It looks like our installation script at https://raw.githubusercontent.com/pytorch/xla/master/contrib/scripts/env-setup.py was out-of-date. We've just updated it, so give it a try again. With that said, a couple of suggestions:
This script should install the nightly torch and torch-xla. |
Thank you @wonjoolee95 With VERSION
|
Your second suggestion
Gives the error below
|
Hi @wonjoolee95 Any update? I have not tested |
Hi @wonjoolee95 In COLAB, when I use
training occurs without When I use
or
I get the error
So I cannot test |
OK I read here and updated the installation lines like
this time I get the error
What should I do ? |
Sorry for the late reply. So I did a little digging around for this and it looks similar to some past issues: pytorch/pytorch#18932 and pytorch/pytorch#10234. And this seems to be affecting when we try to install any torch nightly images in Colab. The pytorch/pytorch#10234 suggests to build from source, which should work but is not easily possible in Colab. If I build from source locally, I can confirm that these nightly images work. @mfatih7, is building from source in a TPUVM a possible working option for you? This seems to be an issue only when we install the nightly images in Colab, so just trying to see if there are other ways to unblock you. |
2.0 release will be out in ~2 weeks, I think that shoud ship with whl that have the fix. |
Hi @wonjoolee95 According to your suggestion, I try to run my scripts on Google Cloud. This works fine
However, when I modify the script like the following to run on nightly wheels
I get the error
What should I do? If I can observe that lowering is successful it will be very good before the 2.0 release @JackCaoG |
Hi @JackCaoG and @wonjoolee95 I could have not verified the |
Hi @JackCaoG and @wonjoolee95 Today I updated my COLAB setup to work with
However, I have error about
The size of vh is Here is the code from my setup.
I often asked about testing the lowering in my setup using nightly releases. What should I do? |
Hi, in the code above:
Could you give me the value for the input tensor |
Hello Thank you for your answer.
Interestingly this works for both GPU and TPU.
Here is the notebook code that I use in COLAB
I am also rerunning my training scripts. |
I have similar prints in my training scripts.
|
Sorry sorry |
Hello @wonjoolee95 Just use the notebook code here
to run the svd_error_test.py below.
If you select CPU you get
If you select TPU you get
No need to run my training scripts anymore. |
Hello @wonjoolee95 Do we have any progress? |
Hello @wonjoolee95 Do we have any progress? |
Hey @mfatih7, apologies for the late reply -- I was able to reproduce but unfortunately could not find any bandwidth to work on this. Would you be comfortable to make the code edit? This is our op lowering doc that describes how ops work in PyTorch/XLA (https://github.com/pytorch/xla/blob/master/OP_LOWERING_GUIDE.md), and the code for this op is implemented at https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp#L3259. If not, I should be able to take some time next week to make the change. |
Hi @wonjoolee95 I think it is better for me to wait for your update and test it on my setup immediately. |
Hi @wonjoolee95 Have you found any time to work on this issue? |
Do we have any update? |
I was finally able to find some time last week and have a local branch. Let me push a PR by today. |
Thank you for the answer. Could you let me know if you checked on the scripts I supplied before? Do you think I should check with my training pipeline? |
@mfatih7, I'm checking based on this set of code:
I'll finish up and test against this to check the returned values of
One thing I noticed while I was working on this is that we actually have a cpp unit test (https://github.com/pytorch/xla/blob/master/test/cpp/test_aten_xla_tensor.cpp#L919) that compares XLA results to PyTorch results, so it's a bit odd we're seeing such problem. And I'm cleaning up my dev env a bit, I'm seeing an error right now while I try to run torch.linalg.svd in my TPUVM:
|
On Colab, after the initialization with
I am getting the same data with wrong dimensions
The versions are
|
Are you missing LAPACK package? Are you sure that your unit test code does not have any errors? |
Thank you for noting the versions. My TPUVM dev env must have something messed up in installing/finding the LAPACK package while building PyTorch, as they're still giving me that error although I've manually tried to install them multiple times. I'm deleting this one and creating a new env. Also, could you confirm if you see this types of error for other inputs as well or if it's specific to the input shape provided in the example? Regarding the unit tests, we compare the result of XLA ops with PyTorch. And we do this for |
I tried svd on pytorch and pytorch-xla with the test script for the inputs below.
With pytorch I get
With pytorch-xla I get
Since the output data dimensions do not match. I do not check their content.
|
Hello @wonjoolee95 and @JackCaoG Do you have any update for lowering best regards |
Hello @wonjoolee95 and @JackCaoG I am still getting the same error for
Before your update, I could run the function slowly but mathematically correctly. Would you happen to have any plans to update the function? |
Ok, here is the whole story @JackCaoG, @wonjoolee95, and @mateuszlewko In order to compute the singular value decomposition of a matrix in Pytorch we have 2 alternatives if there are not more. Since At first, the lowering of After it was lowered within Pytorch 2.0, I realized that the dimensions of the Then I waited for a long time for the correction of After I realize this update I decided to test both
The output of the script is as below
I realized that although the dimensions of When the output dimensions of the
To conclude, |
Hello
print(met.metrics_report())
According to the metrics reported printed after an XLA training session I observe that the functions cannot be processed in TPU.
What should I do?
The text was updated successfully, but these errors were encountered: