Foreach clamp_min clamp_max #91384

milesial · 2022-12-26T01:42:48Z

Adds _foreach_clamp_min and _foreach_clamp_max as binary ops, with scalar, scalarlist and tensorlist support.

Timing example for _foreach_clamp_min_ on a GTX3070Ti across a list of tensors with varying count and item size (times are in microseconds (us)):

CUDA:

[------------------ (tensors, scalar) -------------------]
                                   |  for loop  |  foreach
      10 tensors of size 4         |     29.0   |     10.2
      100 tensors of size 4        |    234.4   |     18.3
      1000 tensors of size 4       |   2194.1   |    113.5
      10000 tensors of size 4      |  21745.6   |   1144.5
      10 tensors of size 16        |     29.5   |     12.0
      100 tensors of size 16       |    256.9   |     19.9
      1000 tensors of size 16      |   2499.7   |    123.6
      10000 tensors of size 16     |  25022.2   |   1295.6
      10 tensors of size 256       |     32.8   |     11.2
      100 tensors of size 256      |    258.8   |     19.7
      1000 tensors of size 256     |   2509.2   |    123.7
      10000 tensors of size 256    |  25016.2   |   1295.4
      10 tensors of size 65536     |     32.9   |     18.7
      100 tensors of size 65536    |    327.1   |    150.3
      1000 tensors of size 65536   |   3051.3   |   1388.0
      10000 tensors of size 65536  |  30476.9   |  14021.5

[------------------ (tensors, tensors) ------------------]
                                   |  for loop  |  foreach
      10 tensors of size 4         |     26.8   |     17.3
      100 tensors of size 4        |    206.8   |     90.5
      1000 tensors of size 4       |   1993.0   |    828.9
      10000 tensors of size 4      |  19851.0   |   9063.3
      10 tensors of size 16        |     34.7   |     20.0
      100 tensors of size 16       |    232.2   |    102.1
      1000 tensors of size 16      |   2220.9   |    977.3
      10000 tensors of size 16     |  22644.5   |  10361.4
      10 tensors of size 256       |     30.5   |     19.7
      100 tensors of size 256      |    231.6   |    102.4
      1000 tensors of size 256     |   2251.9   |    978.7
      10000 tensors of size 256    |  22680.3   |  10405.8
      10 tensors of size 65536     |     30.6   |     34.4
      100 tensors of size 65536    |    315.1   |    223.6
      1000 tensors of size 65536   |   3252.1   |   2114.4
      10000 tensors of size 65536  |  30578.0   |  22826.3

CPU:

[------------------- (tensors, scalar) -------------------]
                                   |  for loop  |  foreach 
      10 tensors of size 4         |      13.0  |       9.6
      100 tensors of size 4        |      62.4  |      31.6
      1000 tensors of size 4       |     562.2  |     245.6
      10000 tensors of size 4      |    5552.2  |    2517.7
      10 tensors of size 16        |      14.9  |      11.3
      100 tensors of size 16       |      74.1  |      36.9
      1000 tensors of size 16      |     663.7  |     285.5
      10000 tensors of size 16     |    6765.2  |    2947.5
      10 tensors of size 256       |      15.2  |      11.8
      100 tensors of size 256      |      76.0  |      37.7
      1000 tensors of size 256     |     728.8  |     323.9
      10000 tensors of size 256    |    7274.4  |    3800.3
      10 tensors of size 65536     |     105.6  |     124.5
      100 tensors of size 65536    |     982.8  |     939.7
      1000 tensors of size 65536   |   14993.1  |   14579.2
      10000 tensors of size 65536  |  163091.0  |  151555.8

[------------------- (tensors, tensors) ------------------]
                                   |  for loop  |  foreach 
      10 tensors of size 4         |      11.8  |      10.5
      100 tensors of size 4        |      53.1  |      38.2
      1000 tensors of size 4       |     465.1  |     316.1
      10000 tensors of size 4      |    4616.9  |    3625.9
      10 tensors of size 16        |      13.5  |      12.3
      100 tensors of size 16       |      63.0  |      46.5
      1000 tensors of size 16      |     560.1  |     359.9
      10000 tensors of size 16     |    5586.8  |    3765.9
      10 tensors of size 256       |      15.2  |      13.7
      100 tensors of size 256      |      64.4  |      48.3
      1000 tensors of size 256     |     653.7  |     410.0
      10000 tensors of size 256    |    5916.6  |    3901.3
      10 tensors of size 65536     |     109.1  |     106.8
      100 tensors of size 65536    |    1128.9  |    1105.0
      1000 tensors of size 65536   |   16245.0  |   15950.8
      10000 tensors of size 65536  |  171111.3  |  163540.2

Example use:

tensors = [torch.randn(16, device='cuda') for _ in range(10)]

out = torch._foreach_clamp_min(tensors, 0.1)
out = torch._foreach_clamp_min(tensors, [0.1] * len(tensors))
out = torch._foreach_clamp_min(tensors, tensors)
torch._foreach_clamp_min_(tensors, 0.1)
torch._foreach_clamp_min_(tensors, [0.1] * len(tensors))
torch._foreach_clamp_min_(tensors, tensors)

Does not support complex types.
Changes the existing foreach_minimum/maximum to use this new implementation.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @Guobing-Chen @chunyuan-w @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

linux-foundation-easycla · 2022-12-26T01:42:51Z

The committers listed above are authorized under a signed CLA.

✅ login: milesial (083daeb)

pytorch-bot · 2022-12-26T01:42:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91384

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit f7b450f:

NEW FAILURES - The following jobs have failed:

linux-bionic-py3.7-clang9 / test (dynamo, 1, 2, linux.2xlarge)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vadimkantorov · 2022-12-26T18:45:00Z

@cpuhrsch With proliferation of foreach methods, worth adding TensorList (as accepted by foreach methods) as some sort of NestedTensor or a companion TensorList first-order structure?

aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu

milesial · 2022-12-27T00:21:54Z

I'm not happy with the duplication of this piece:

template <typename T>
struct clamp_min {
    __device__ T operator()(const T& a, const T& b) const { return _isnan(a) or a > b ? a : b; }
};

template <typename T>
struct clamp_max {
    __device__ T operator()(const T& a, const T& b) const { return _isnan(a) or a < b ? a : b; }
};

Where can I put it so that I include it in the three .cu files?

ngimel · 2022-12-27T01:07:54Z

Where can I put it so that I include it in the three .cu files?

Just create a new header file if none of the existing headers seem appropriate? aten/native/cuda folder contains a few headers, so one more is not a problem.

milesial · 2023-01-04T16:24:47Z

@ngimel ready for final review.

In the process of fixing tests I added bool support to the regular clamp forward CUDA, and bool+float16 for CPU. Also expanded the nan/inf test to all foreach binary ops.

windows and multiprocessing test failures unrelated.

aten/src/ATen/native/cuda/ForeachMinMaxFunctors.cuh

aten/src/ATen/native/cpu/TensorCompareKernel.cpp

ngimel

This looks great @milesial let's see what CI says

ngimel · 2023-01-06T17:24:52Z

Test failure looks unrelated

ngimel · 2023-01-06T17:25:08Z

@pytorchbot merge

pytorchmergebot · 2023-01-06T17:26:50Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-01-06T17:26:54Z

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

pull

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

milesial · 2023-01-09T19:24:29Z

@ngimel merge failed :/

ngimel · 2023-01-09T19:27:07Z

@pytorchbot merge -f "test failure flaky"

pytorchmergebot · 2023-01-09T19:28:43Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

DanilBaibak · 2023-01-11T07:31:02Z

@pytorchbot revert -m "Break internal build" -c ghfirst

pytorchmergebot · 2023-01-11T07:32:43Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-01-11T07:32:47Z

Reverting PR 91384 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit 9d20d6d5ec5c0ac5ff00e4967f480f07ba0bb2bf returned non-zero exit code 1

Auto-merging aten/src/ATen/native/ForeachOpsKernels.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/native/ForeachOpsKernels.cpp
Auto-merging aten/src/ATen/native/native_functions.yaml
Auto-merging test/test_foreach.py
CONFLICT (content): Merge conflict in test/test_foreach.py
Auto-merging torch/testing/_internal/common_methods_invocations.py
CONFLICT (content): Merge conflict in torch/testing/_internal/common_methods_invocations.py
error: could not revert 9d20d6d5ec... Foreach clamp_min clamp_max (#91384)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".

Details for Dev Infra team

Raised by workflow job

DanilBaibak · 2023-01-11T08:56:20Z

@pytorchbot revert -m "Break internal build" -c ghfirst

pytorchmergebot · 2023-01-11T08:57:55Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-01-11T08:57:59Z

Reverting PR 91384 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit 9d20d6d5ec5c0ac5ff00e4967f480f07ba0bb2bf returned non-zero exit code 1

Auto-merging aten/src/ATen/native/ForeachOpsKernels.cpp
CONFLICT (content): Merge conflict in aten/src/ATen/native/ForeachOpsKernels.cpp
Auto-merging aten/src/ATen/native/native_functions.yaml
Auto-merging test/test_foreach.py
CONFLICT (content): Merge conflict in test/test_foreach.py
Auto-merging torch/testing/_internal/common_methods_invocations.py
CONFLICT (content): Merge conflict in torch/testing/_internal/common_methods_invocations.py
error: could not revert 9d20d6d5ec... Foreach clamp_min clamp_max (#91384)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".

Details for Dev Infra team

Raised by workflow job

milesial requested review from mruberry and ngimel as code owners December 26, 2022 01:42

pytorch-bot bot added the release notes: foreach_frontend release notes category label Dec 26, 2022

pytorchbot added the open source label Dec 26, 2022

milesial marked this pull request as draft December 26, 2022 16:24

Foreach clamp_min clamp_max

f90dd5f

ngimel reviewed Dec 26, 2022

View reviewed changes

aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu Outdated Show resolved Hide resolved

aten/src/ATen/native/cuda/ForeachBinaryOpScalar.cu Outdated Show resolved Hide resolved

Fix nan handling

1b1a3c9

milesial marked this pull request as ready for review December 27, 2022 00:22

_foreach_min/max uses _foreach_clamp_max/min

cd3c0f7

bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 27, 2022

milesial added 3 commits December 29, 2022 23:51

Try to fix tests

0b3175e

Fix typo

6c1860b

Add clamp bool support and fix tests

36d90eb

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 31, 2022

milesial added 4 commits December 31, 2022 15:04

Test clamp with scalar rhs

9715e1c

Cleanup tests

9eb5c7f

Cleanup

ab414d3

Add inf/nan tests

9f95e74

github-actions bot added the module: inductor label Jan 1, 2023

milesial requested a review from zou3519 as a code owner January 1, 2023 09:04

milesial added 2 commits January 2, 2023 03:19

Tentative test fix

241af8d

Fix relu backward tests

26a2bd9

zou3519 removed their request for review January 3, 2023 16:04

Fix clamp tests

05c6f95

milesial requested review from ngimel and removed request for mruberry January 5, 2023 18:01

ngimel reviewed Jan 5, 2023

View reviewed changes

aten/src/ATen/native/cuda/ForeachMinMaxFunctors.cuh Outdated Show resolved Hide resolved

ngimel reviewed Jan 5, 2023

View reviewed changes

aten/src/ATen/native/cpu/TensorCompareKernel.cpp Outdated Show resolved Hide resolved

milesial added 2 commits January 5, 2023 19:59

Make ternary formulation more clear

9aae5f4

Don't add unneccessary new types for regular clamp

f7b450f

milesial requested a review from ngimel January 6, 2023 08:39

ngimel approved these changes Jan 6, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 6, 2023

pytorchmergebot added the Merged label Jan 9, 2023

pytorchmergebot closed this in 9d20d6d Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foreach clamp_min clamp_max #91384

Foreach clamp_min clamp_max #91384

milesial commented Dec 26, 2022 •

edited by pytorch-bot bot

linux-foundation-easycla bot commented Dec 26, 2022 •

edited

pytorch-bot bot commented Dec 26, 2022 •

edited

vadimkantorov commented Dec 26, 2022

milesial commented Dec 27, 2022

ngimel commented Dec 27, 2022

milesial commented Jan 4, 2023 •

edited

ngimel left a comment

ngimel commented Jan 6, 2023

ngimel commented Jan 6, 2023

pytorchmergebot commented Jan 6, 2023

pytorchmergebot commented Jan 6, 2023

milesial commented Jan 9, 2023

ngimel commented Jan 9, 2023

pytorchmergebot commented Jan 9, 2023

DanilBaibak commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

DanilBaibak commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

Foreach clamp_min clamp_max #91384

Foreach clamp_min clamp_max #91384

Conversation

milesial commented Dec 26, 2022 • edited by pytorch-bot bot

linux-foundation-easycla bot commented Dec 26, 2022 • edited

pytorch-bot bot commented Dec 26, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91384

❌ 1 Failures

vadimkantorov commented Dec 26, 2022

milesial commented Dec 27, 2022

ngimel commented Dec 27, 2022

milesial commented Jan 4, 2023 • edited

ngimel left a comment

Choose a reason for hiding this comment

ngimel commented Jan 6, 2023

ngimel commented Jan 6, 2023

pytorchmergebot commented Jan 6, 2023

Merge started

pytorchmergebot commented Jan 6, 2023

Merge failed

milesial commented Jan 9, 2023

ngimel commented Jan 9, 2023

pytorchmergebot commented Jan 9, 2023

Merge started

DanilBaibak commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

Reverting PR 91384 failed

DanilBaibak commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

pytorchmergebot commented Jan 11, 2023

Reverting PR 91384 failed

milesial commented Dec 26, 2022 •

edited by pytorch-bot bot

linux-foundation-easycla bot commented Dec 26, 2022 •

edited

pytorch-bot bot commented Dec 26, 2022 •

edited

milesial commented Jan 4, 2023 •

edited