add BFloat16 support on CPU for cumsum, cumprod and logcumsumexp and adjust grain size #74899

mingfeima · 2022-03-29T07:24:52Z

Stack from ghstack:

Differential Revision: D37441441

cc @jgong5 @XiaobingSuper @sanchitintel @ashokei @jingxu10

…adjust grain size [ghstack-poisoned]

facebook-github-bot · 2022-03-29T07:24:58Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74899
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

❌ 4 New Failures

As of commit 4ea7f36 (more details on the Dr. CI page):

Expand to see more

4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T04:32:05.2309172Z RuntimeError: CUDA error: an illegal memory access was encountered

2022-06-06T04:32:05.2304202Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 999, in sample_inputs
2022-06-06T04:32:05.2304683Z     conj_samples = self.conjugate_sample_inputs(device, dtype, requires_grad, **kwargs)
2022-06-06T04:32:05.2305320Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 972, in conjugate_sample_inputs
2022-06-06T04:32:05.2305750Z     conj_samples = list(samples)
2022-06-06T04:32:05.2306222Z   File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
2022-06-06T04:32:05.2306599Z     response = gen.send(None)
2022-06-06T04:32:05.2307183Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 6454, in sample_inputs_linalg_cholesky_inverse
2022-06-06T04:32:05.2307736Z     batch_well_conditioned_matrices = random_well_conditioned_matrix(2, S, S, dtype=dtype, device=device)
2022-06-06T04:32:05.2308358Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2659, in random_well_conditioned_matrix
2022-06-06T04:32:05.2308807Z     u, _, vh = torch.linalg.svd(x, full_matrices=False)
2022-06-06T04:32:05.2309172Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T04:32:05.2309642Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T04:32:05.2310081Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T04:32:05.2310296Z 
2022-06-06T04:32:05.2310570Z ----------------------------------------------------------------------
2022-06-06T04:32:05.2310904Z Ran 1029 tests in 113.072s
2022-06-06T04:32:05.2311068Z 
2022-06-06T04:32:05.2311212Z FAILED (errors=1, skipped=914, expected failures=9)
2022-06-06T04:32:05.2311414Z 
2022-06-06T04:32:05.2311542Z Generating XML reports...
2022-06-06T04:32:05.3680872Z Generated XML report: test-reports/python-unittest/test_ops_gradients/TEST-TestGradientsCUDA-20220606043011.xml

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu) (2/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T03:07:56.7551006Z FAIL [0.207s]: tes...es_cholesky_inverse_cuda (__main__.TestCommonCUDA)

2022-06-06T03:07:56.7520694Z   File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 156, in _lazy_call
2022-06-06T03:07:56.7521016Z     callable()
2022-06-06T03:07:56.7521437Z   File "/opt/conda/lib/python3.7/site-packages/torch/cuda/random.py", line 111, in cb
2022-06-06T03:07:56.7521804Z     default_generator.manual_seed(seed)
2022-06-06T03:07:56.7522158Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:07:56.7522611Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T03:07:56.7523059Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T03:07:56.7523273Z 
2022-06-06T03:07:56.7549863Z 
2022-06-06T03:07:56.7550411Z ======================================================================
2022-06-06T03:07:56.7551006Z FAIL [0.207s]: test_dtypes_cholesky_inverse_cuda (__main__.TestCommonCUDA)
2022-06-06T03:07:56.7552041Z ----------------------------------------------------------------------
2022-06-06T03:07:56.7552802Z Traceback (most recent call last):
2022-06-06T03:07:56.7554049Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 377, in instantiated_test
2022-06-06T03:07:56.7554690Z     result = test(self, **param_kwargs)
2022-06-06T03:07:56.7555225Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 821, in dep_fn
2022-06-06T03:07:56.7555614Z     return fn(slf, *args, **kwargs)
2022-06-06T03:07:56.7556115Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 821, in dep_fn
2022-06-06T03:07:56.7556493Z     return fn(slf, *args, **kwargs)
2022-06-06T03:07:56.7557007Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 786, in test_wrapper
2022-06-06T03:07:56.7557397Z     return test(*args, **kwargs)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T03:09:13.7640313Z RuntimeError: CUDA error: an illegal memory access was encountered

2022-06-06T03:09:13.7635867Z   File "test_decomp.py", line 348, in test_comprehensive
2022-06-06T03:09:13.7636212Z     self.do_cross_ref(device, dtype, op, run_all=True)
2022-06-06T03:09:13.7636552Z   File "test_decomp.py", line 451, in do_cross_ref
2022-06-06T03:09:13.7636857Z     for sample_input in samples:
2022-06-06T03:09:13.7637339Z   File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
2022-06-06T03:09:13.7637701Z     response = gen.send(None)
2022-06-06T03:09:13.7638285Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 6454, in sample_inputs_linalg_cholesky_inverse
2022-06-06T03:09:13.7638839Z     batch_well_conditioned_matrices = random_well_conditioned_matrix(2, S, S, dtype=dtype, device=device)
2022-06-06T03:09:13.7639471Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2659, in random_well_conditioned_matrix
2022-06-06T03:09:13.7639933Z     u, _, vh = torch.linalg.svd(x, full_matrices=False)
2022-06-06T03:09:13.7640313Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:09:13.7640767Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T03:09:13.7641282Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T03:09:13.7641502Z 
2022-06-06T03:09:13.7641782Z ----------------------------------------------------------------------
2022-06-06T03:09:13.7642119Z Ran 837 tests in 307.433s
2022-06-06T03:09:13.7642264Z 
2022-06-06T03:09:13.7642412Z FAILED (errors=10, expected failures=30)
2022-06-06T03:09:13.7642604Z 
2022-06-06T03:09:13.7642729Z Generating XML reports...
2022-06-06T03:09:13.8434409Z Generated XML report: test-reports/python-unittest/test_decomp/TEST-TestDecompCUDA-20220606030406.xml

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T03:07:51.2885861Z RuntimeError: CUDA error: an illegal memory access was encountered

2022-06-06T03:07:51.2881457Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1168, in wrapper
2022-06-06T03:07:51.2881811Z     fn(*args, **kwargs)
2022-06-06T03:07:51.2882105Z   File "test_meta.py", line 908, in test_dispatch_meta
2022-06-06T03:07:51.2882413Z     for sample_input in samples:
2022-06-06T03:07:51.2882900Z   File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
2022-06-06T03:07:51.2883277Z     response = gen.send(None)
2022-06-06T03:07:51.2883864Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 6454, in sample_inputs_linalg_cholesky_inverse
2022-06-06T03:07:51.2884416Z     batch_well_conditioned_matrices = random_well_conditioned_matrix(2, S, S, dtype=dtype, device=device)
2022-06-06T03:07:51.2885041Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2659, in random_well_conditioned_matrix
2022-06-06T03:07:51.2885487Z     u, _, vh = torch.linalg.svd(x, full_matrices=False)
2022-06-06T03:07:51.2885861Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:07:51.2886335Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T03:07:51.2886763Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T03:07:51.2886979Z 
2022-06-06T03:07:51.2887247Z ----------------------------------------------------------------------
2022-06-06T03:07:51.2887574Z Ran 837 tests in 308.402s
2022-06-06T03:07:51.2887738Z 
2022-06-06T03:07:51.2887864Z FAILED (errors=10, expected failures=30)
2022-06-06T03:07:51.2888051Z 
2022-06-06T03:07:51.2888173Z Generating XML reports...
2022-06-06T03:07:51.3666840Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220606030242.xml

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…sumexp and adjust grain size" [ghstack-poisoned]

github-actions · 2022-05-30T10:38:59Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

…sumexp and adjust grain size" [ghstack-poisoned]

frank-wei · 2022-06-26T05:39:28Z

@frank-wei has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

frank-wei · 2022-06-28T01:53:37Z

Pls check the failing test cases

github-actions · 2022-09-11T02:17:59Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

facebook-github-bot · 2022-10-04T01:08:59Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T01:09:08Z

The committers listed above are authorized under a signed CLA.

✅ login: mingfeima / name: Ma Mingfei (ad3a87b, 81a6a7a, 4ea7f36)

…sumexp and adjust grain size" Differential Revision: [D37441441](https://our.internmc.facebook.com/intern/diff/D37441441) [ghstack-poisoned]

pytorch-bot · 2023-02-03T02:56:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74899

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 07bd9bf:

NEW FAILURES - The following jobs have failed:

Check labels

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: cfd02ed Pull Request resolved: #74899

github-actions · 2023-02-03T02:57:02Z

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

add BFloat16 support on CPU for cumsum, cumprod and logcumsumexp and …

ad3a87b

…adjust grain size [ghstack-poisoned]

facebook-github-bot added the cla signed label Mar 29, 2022

This was referenced Mar 29, 2022

improve sort multi-core perf by adjusting grain_size w.r.t. dim_size #74897

Closed

improve sort cpu bfloat16 perf by directly comparing on acc_type #74898

Closed

add prefix sum vectorization for both float32 and bfloat16 #74900

Closed

pytorchbot added the open source label Mar 29, 2022

Update on "add BFloat16 support on CPU for cumsum, cumprod and logcum…

81a6a7a

…sumexp and adjust grain size" [ghstack-poisoned]

github-actions bot added the Stale label May 30, 2022

Update on "add BFloat16 support on CPU for cumsum, cumprod and logcum…

4ea7f36

…sumexp and adjust grain size" [ghstack-poisoned]

mingfeima mentioned this pull request Jun 6, 2022

improve log_softmax multi-core performance #76279

Closed

mingfeima added intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel and removed Stale labels Jun 6, 2022

yanbing-j removed the intel priority matters to intel architecture from performance wise label Jul 13, 2022

github-actions bot added the Stale label Sep 11, 2022

zhuhaozhe closed this Oct 20, 2022

zhuhaozhe reopened this Oct 20, 2022

github-actions bot closed this Nov 19, 2022

Update on "add BFloat16 support on CPU for cumsum, cumprod and logcum…

07bd9bf

…sumexp and adjust grain size" Differential Revision: [D37441441](https://our.internmc.facebook.com/intern/diff/D37441441) [ghstack-poisoned]

mingfeima removed the Stale label Feb 3, 2023

mingfeima reopened this Feb 3, 2023

mingfeima added a commit that referenced this pull request Feb 3, 2023

cumsum, cumprod, logcumsumexp: adjust grain size

be6ea78

ghstack-source-id: cfd02ed Pull Request resolved: #74899

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Feb 3, 2023

mingfeima closed this Feb 3, 2023

facebook-github-bot deleted the gh/mingfeima/69/head branch June 8, 2023 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add BFloat16 support on CPU for cumsum, cumprod and logcumsumexp and adjust grain size #74899

add BFloat16 support on CPU for cumsum, cumprod and logcumsumexp and adjust grain size #74899

Uh oh!

mingfeima commented Mar 29, 2022 •

edited by pytorch-bot bot

Loading

Uh oh!

facebook-github-bot commented Mar 29, 2022 •

edited

Loading

🕵️ 4 new failures recognized by patterns

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (1/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu) (2/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (3/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (4/4)

Uh oh!

github-actions bot commented May 30, 2022

Uh oh!

frank-wei commented Jun 26, 2022

Uh oh!

frank-wei commented Jun 28, 2022

Uh oh!

github-actions bot commented Sep 11, 2022

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Feb 3, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Feb 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

add BFloat16 support on CPU for cumsum, cumprod and logcumsumexp and adjust grain size #74899

add BFloat16 support on CPU for cumsum, cumprod and logcumsumexp and adjust grain size #74899

Uh oh!

Conversation

mingfeima commented Mar 29, 2022 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 4 New Failures

🕵️ 4 new failures recognized by patterns

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (1/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu) (2/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (3/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (4/4)

Uh oh!

github-actions bot commented May 30, 2022

Uh oh!

frank-wei commented Jun 26, 2022

Uh oh!

frank-wei commented Jun 28, 2022

Uh oh!

github-actions bot commented Sep 11, 2022

Uh oh!

facebook-github-bot commented Oct 4, 2022

Uh oh!

linux-foundation-easycla bot commented Oct 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74899

❌ 1 Failures

Uh oh!

github-actions bot commented Feb 3, 2023

This PR needs a label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mingfeima commented Mar 29, 2022 •

edited by pytorch-bot bot

Loading

facebook-github-bot commented Mar 29, 2022 •

edited

Loading

linux-foundation-easycla bot commented Oct 4, 2022 •

edited

Loading

pytorch-bot bot commented Feb 3, 2023 •

edited

Loading