Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roll-up: remaining TH functions #49421

Closed
14 tasks done
ngimel opened this issue Dec 15, 2020 · 6 comments
Closed
14 tasks done

Roll-up: remaining TH functions #49421

ngimel opened this issue Dec 15, 2020 · 6 comments
Assignees
Labels
better-engineering Relatively self-contained tasks for better engineering contributors module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: porting Issues related to porting TH/THNN legacy to ATen native triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ngimel
Copy link
Collaborator

ngimel commented Dec 15, 2020

Not to be confused with THC/THNN/THCUNN, there the list is bigger.
Based on native_functions.yaml

Most of them are linear algebra, should be ported._multinomial_alias_setup is likely not used. We should get rid of them, we can do it!

cc @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr

@ngimel ngimel added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul better-engineering Relatively self-contained tasks for better engineering contributors labels Dec 15, 2020
@mruberry mruberry added the module: porting Issues related to porting TH/THNN legacy to ATen native label Dec 15, 2020
@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 15, 2020
@nikitaved
Copy link
Collaborator

nikitaved commented Jan 13, 2021

I would take index_fill_. Maybe it will fix #48230.

Regarding gels, there is torch.linalg.lstsq on its way. What would be the best way to handle it's port?

@ngimel
Copy link
Collaborator Author

ngimel commented Jan 13, 2021

Regarding gels, there is torch.linalg.lstsq on its way. What would be the best way to handle it's port?

First port gels to ATen (similarly to how it's done for cholesky_inverse), then expose as torch.linalg.lstsq?

@nikitaved
Copy link
Collaborator

nikitaved commented Jan 13, 2021

torch.linalg.lstsq is ready #49093. Afaik gels is only used by torch.lstsq. But calling torch.linalg from within torch is not a good idea, right? What about deprecation or rewrapping?

@mruberry
Copy link
Collaborator

torch.linalg.lstsq is ready #49093. Afaik gels is only used by torch.lstsq. But calling torch.linalg from within torch is not a good idea, right? What about deprecation or rewrapping?

Calling torch.linaln from the torch namespace seems fine. We should look at deprecating lstsq once torch.linalg.lstsq is implemented and has feature parity, and then we should consider it as part of our "doc deprecations" for linalg ahead of the 1.8 release.

@nikitaved nikitaved reopened this Jan 19, 2021
facebook-github-bot pushed a commit that referenced this issue Jan 25, 2021
Summary:
Now we can remove `_th_orgqr`!

Compared to the original TH-based `orgqr`, complex (#33152) and batched inputs are now supported.
CUDA support will be added in a follow-up PR.

Closes #24747

Ref. #49421, #42666

Pull Request resolved: #50502

Reviewed By: mrshenli

Differential Revision: D25953300

Pulled By: mruberry

fbshipit-source-id: f52a74e1c8f51b5e24f7b461430ca8fc96e4d149
facebook-github-bot pushed a commit that referenced this issue Jan 29, 2021
Summary:
Now we can remove `_th_potri`!

Compared to the original TH-based `cholesky_inverse`, complex (#33152) and batched inputs (#7500) are now supported both on CPU and CUDA.

Closes #24685.
Closes #24543.

Ref. #49421, #42666

Pull Request resolved: #50269

Reviewed By: bdhirsh

Differential Revision: D26047548

Pulled By: anjali411

fbshipit-source-id: e4f191e39c684f241b7cb0f4b4c025de082cccef
@ngimel
Copy link
Collaborator Author

ngimel commented May 6, 2021

cc @heitorschueroff for _th_var, _th_std. There are implementations currently in ATen, but, as all non-vectorized reductions, they are much slower, so the attempt to directly swap implementations caused severe perf regressions.

@peterbell10 peterbell10 self-assigned this Jun 1, 2021
facebook-github-bot pushed a commit that referenced this issue Jun 3, 2021
Summary:
Ref #49421

This migrates `std`/`var`'s special case all-reduction from TH to ATen. Using the benchmark from gh-43858 that was used to justify keeping the TH version; I find this PR has similar (slightly better) performance in single threaded. And unlike the TH version, this is multi-threaded and so much faster for large tensors.

TH Results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       3.6   |       3.8    |     8.2  |      1.2
      80       |       3.7   |       3.8    |     8.4  |      1.2
      800      |       4.2   |       4.3    |     8.7  |      1.2
      8000     |       9.0   |       9.1    |    11.2  |      1.5
      80000    |      58.3   |      59.0    |    30.6  |      4.2
      800000   |     546.9   |     546.9    |   183.4  |     31.3
      8000000  |    5729.7   |    5701.0    |  6165.4  |    484.1
```

ATen results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       4.0   |       4.0    |     8.7  |      1.2
      80       |       3.6   |       3.8    |     9.0  |      1.2
      800      |       4.1   |       4.3    |     8.9  |      1.2
      8000     |       8.9   |       9.2    |    10.6  |      1.5
      80000    |      57.0   |      57.4    |    28.8  |      4.3
      800000   |     526.9   |     526.9    |   178.3  |     30.2
      8000000  |    5568.1   |    5560.6    |  6042.1  |    453.2

[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
8 threads: ---------------------------------------------------------
      8        |      3.9    |      3.8     |     9.1  |      1.2
      80       |      3.8    |      3.9     |     8.8  |      1.2
      800      |      4.2    |      4.3     |     8.9  |      1.3
      8000     |      9.0    |      9.2     |    10.4  |      1.5
      80000    |     26.0    |     26.8     |    26.4  |      4.4
      800000   |     92.9    |     87.3     |    72.1  |     22.4
      8000000  |    793.5    |    791.8     |  5334.8  |    115.1
```

Pull Request resolved: #59258

Reviewed By: mruberry

Differential Revision: D28821216

Pulled By: ngimel

fbshipit-source-id: f35992c21f08a0a8878053680dc0ca7a8facd155
facebook-github-bot pushed a commit that referenced this issue Jun 4, 2021
Summary:
Ref #49421

This migrates `std`/`var`'s special case all-reduction from TH to ATen. Using the benchmark from gh-43858 that was used to justify keeping the TH version; I find this PR has similar (slightly better) performance in single threaded. And unlike the TH version, this is multi-threaded and so much faster for large tensors.

TH Results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       3.6   |       3.8    |     8.2  |      1.2
      80       |       3.7   |       3.8    |     8.4  |      1.2
      800      |       4.2   |       4.3    |     8.7  |      1.2
      8000     |       9.0   |       9.1    |    11.2  |      1.5
      80000    |      58.3   |      59.0    |    30.6  |      4.2
      800000   |     546.9   |     546.9    |   183.4  |     31.3
      8000000  |    5729.7   |    5701.0    |  6165.4  |    484.1
```

ATen results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       4.0   |       4.0    |     8.7  |      1.2
      80       |       3.6   |       3.8    |     9.0  |      1.2
      800      |       4.1   |       4.3    |     8.9  |      1.2
      8000     |       8.9   |       9.2    |    10.6  |      1.5
      80000    |      57.0   |      57.4    |    28.8  |      4.3
      800000   |     526.9   |     526.9    |   178.3  |     30.2
      8000000  |    5568.1   |    5560.6    |  6042.1  |    453.2

[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
8 threads: ---------------------------------------------------------
      8        |      3.9    |      3.8     |     9.1  |      1.2
      80       |      3.8    |      3.9     |     8.8  |      1.2
      800      |      4.2    |      4.3     |     8.9  |      1.3
      8000     |      9.0    |      9.2     |    10.4  |      1.5
      80000    |     26.0    |     26.8     |    26.4  |      4.4
      800000   |     92.9    |     87.3     |    72.1  |     22.4
      8000000  |    793.5    |    791.8     |  5334.8  |    115.1
```

Pull Request resolved: #59258

Reviewed By: jbschlosser

Differential Revision: D28860353

Pulled By: ngimel

fbshipit-source-id: 80c9fe1db84dbc864eeb1a319076c7aaff0a04e5
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this issue Jun 9, 2021
Summary:
Ref pytorch#49421

This migrates `std`/`var`'s special case all-reduction from TH to ATen. Using the benchmark from pytorchgh-43858 that was used to justify keeping the TH version; I find this PR has similar (slightly better) performance in single threaded. And unlike the TH version, this is multi-threaded and so much faster for large tensors.

TH Results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       3.6   |       3.8    |     8.2  |      1.2
      80       |       3.7   |       3.8    |     8.4  |      1.2
      800      |       4.2   |       4.3    |     8.7  |      1.2
      8000     |       9.0   |       9.1    |    11.2  |      1.5
      80000    |      58.3   |      59.0    |    30.6  |      4.2
      800000   |     546.9   |     546.9    |   183.4  |     31.3
      8000000  |    5729.7   |    5701.0    |  6165.4  |    484.1
```

ATen results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       4.0   |       4.0    |     8.7  |      1.2
      80       |       3.6   |       3.8    |     9.0  |      1.2
      800      |       4.1   |       4.3    |     8.9  |      1.2
      8000     |       8.9   |       9.2    |    10.6  |      1.5
      80000    |      57.0   |      57.4    |    28.8  |      4.3
      800000   |     526.9   |     526.9    |   178.3  |     30.2
      8000000  |    5568.1   |    5560.6    |  6042.1  |    453.2

[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
8 threads: ---------------------------------------------------------
      8        |      3.9    |      3.8     |     9.1  |      1.2
      80       |      3.8    |      3.9     |     8.8  |      1.2
      800      |      4.2    |      4.3     |     8.9  |      1.3
      8000     |      9.0    |      9.2     |    10.4  |      1.5
      80000    |     26.0    |     26.8     |    26.4  |      4.4
      800000   |     92.9    |     87.3     |    72.1  |     22.4
      8000000  |    793.5    |    791.8     |  5334.8  |    115.1
```

Pull Request resolved: pytorch#59258

Reviewed By: mruberry

Differential Revision: D28821216

Pulled By: ngimel

fbshipit-source-id: f35992c21f08a0a8878053680dc0ca7a8facd155
deniskokarev pushed a commit to deniskokarev/pytorch that referenced this issue Jun 9, 2021
Summary:
Ref pytorch#49421

This migrates `std`/`var`'s special case all-reduction from TH to ATen. Using the benchmark from pytorchgh-43858 that was used to justify keeping the TH version; I find this PR has similar (slightly better) performance in single threaded. And unlike the TH version, this is multi-threaded and so much faster for large tensors.

TH Results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       3.6   |       3.8    |     8.2  |      1.2
      80       |       3.7   |       3.8    |     8.4  |      1.2
      800      |       4.2   |       4.3    |     8.7  |      1.2
      8000     |       9.0   |       9.1    |    11.2  |      1.5
      80000    |      58.3   |      59.0    |    30.6  |      4.2
      800000   |     546.9   |     546.9    |   183.4  |     31.3
      8000000  |    5729.7   |    5701.0    |  6165.4  |    484.1
```

ATen results:
```
[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
1 threads: ---------------------------------------------------------
      8        |       4.0   |       4.0    |     8.7  |      1.2
      80       |       3.6   |       3.8    |     9.0  |      1.2
      800      |       4.1   |       4.3    |     8.9  |      1.2
      8000     |       8.9   |       9.2    |    10.6  |      1.5
      80000    |      57.0   |      57.4    |    28.8  |      4.3
      800000   |     526.9   |     526.9    |   178.3  |     30.2
      8000000  |    5568.1   |    5560.6    |  6042.1  |    453.2

[----------------------------- Index ------------------------------]
               |  torch_var  |  torch_var0  |  stdfn   |  torch_sum0
8 threads: ---------------------------------------------------------
      8        |      3.9    |      3.8     |     9.1  |      1.2
      80       |      3.8    |      3.9     |     8.8  |      1.2
      800      |      4.2    |      4.3     |     8.9  |      1.3
      8000     |      9.0    |      9.2     |    10.4  |      1.5
      80000    |     26.0    |     26.8     |    26.4  |      4.4
      800000   |     92.9    |     87.3     |    72.1  |     22.4
      8000000  |    793.5    |    791.8     |  5334.8  |    115.1
```

Pull Request resolved: pytorch#59258

Reviewed By: jbschlosser

Differential Revision: D28860353

Pulled By: ngimel

fbshipit-source-id: 80c9fe1db84dbc864eeb1a319076c7aaff0a04e5
@peterbell10
Copy link
Collaborator

LegacyFunctionsCPU is now gone, as of #58780.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: porting Issues related to porting TH/THNN legacy to ATen native triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants