Skip to content

Conversation

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 29, 2022

🔗 Helpful links

❌ 4 New Failures

As of commit 4ea7f36 (more details on the Dr. CI page):

Expand to see more
  • 4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 4, 4, linux.4xlarge.nvidia.gpu) (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T04:32:05.2309172Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T04:32:05.2304202Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 999, in sample_inputs
2022-06-06T04:32:05.2304683Z     conj_samples = self.conjugate_sample_inputs(device, dtype, requires_grad, **kwargs)
2022-06-06T04:32:05.2305320Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 972, in conjugate_sample_inputs
2022-06-06T04:32:05.2305750Z     conj_samples = list(samples)
2022-06-06T04:32:05.2306222Z   File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
2022-06-06T04:32:05.2306599Z     response = gen.send(None)
2022-06-06T04:32:05.2307183Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 6454, in sample_inputs_linalg_cholesky_inverse
2022-06-06T04:32:05.2307736Z     batch_well_conditioned_matrices = random_well_conditioned_matrix(2, S, S, dtype=dtype, device=device)
2022-06-06T04:32:05.2308358Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2659, in random_well_conditioned_matrix
2022-06-06T04:32:05.2308807Z     u, _, vh = torch.linalg.svd(x, full_matrices=False)
2022-06-06T04:32:05.2309172Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T04:32:05.2309642Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T04:32:05.2310081Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T04:32:05.2310296Z 
2022-06-06T04:32:05.2310570Z ----------------------------------------------------------------------
2022-06-06T04:32:05.2310904Z Ran 1029 tests in 113.072s
2022-06-06T04:32:05.2311068Z 
2022-06-06T04:32:05.2311212Z FAILED (errors=1, skipped=914, expected failures=9)
2022-06-06T04:32:05.2311414Z 
2022-06-06T04:32:05.2311542Z Generating XML reports...
2022-06-06T04:32:05.3680872Z Generated XML report: test-reports/python-unittest/test_ops_gradients/TEST-TestGradientsCUDA-20220606043011.xml

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 4, linux.4xlarge.nvidia.gpu) (2/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T03:07:56.7551006Z FAIL [0.207s]: tes...es_cholesky_inverse_cuda (__main__.TestCommonCUDA)
2022-06-06T03:07:56.7520694Z   File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 156, in _lazy_call
2022-06-06T03:07:56.7521016Z     callable()
2022-06-06T03:07:56.7521437Z   File "/opt/conda/lib/python3.7/site-packages/torch/cuda/random.py", line 111, in cb
2022-06-06T03:07:56.7521804Z     default_generator.manual_seed(seed)
2022-06-06T03:07:56.7522158Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:07:56.7522611Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T03:07:56.7523059Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T03:07:56.7523273Z 
2022-06-06T03:07:56.7549863Z 
2022-06-06T03:07:56.7550411Z ======================================================================
2022-06-06T03:07:56.7551006Z FAIL [0.207s]: test_dtypes_cholesky_inverse_cuda (__main__.TestCommonCUDA)
2022-06-06T03:07:56.7552041Z ----------------------------------------------------------------------
2022-06-06T03:07:56.7552802Z Traceback (most recent call last):
2022-06-06T03:07:56.7554049Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 377, in instantiated_test
2022-06-06T03:07:56.7554690Z     result = test(self, **param_kwargs)
2022-06-06T03:07:56.7555225Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 821, in dep_fn
2022-06-06T03:07:56.7555614Z     return fn(slf, *args, **kwargs)
2022-06-06T03:07:56.7556115Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 821, in dep_fn
2022-06-06T03:07:56.7556493Z     return fn(slf, *args, **kwargs)
2022-06-06T03:07:56.7557007Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 786, in test_wrapper
2022-06-06T03:07:56.7557397Z     return test(*args, **kwargs)

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 4, linux.4xlarge.nvidia.gpu) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T03:09:13.7640313Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:09:13.7635867Z   File "test_decomp.py", line 348, in test_comprehensive
2022-06-06T03:09:13.7636212Z     self.do_cross_ref(device, dtype, op, run_all=True)
2022-06-06T03:09:13.7636552Z   File "test_decomp.py", line 451, in do_cross_ref
2022-06-06T03:09:13.7636857Z     for sample_input in samples:
2022-06-06T03:09:13.7637339Z   File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
2022-06-06T03:09:13.7637701Z     response = gen.send(None)
2022-06-06T03:09:13.7638285Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 6454, in sample_inputs_linalg_cholesky_inverse
2022-06-06T03:09:13.7638839Z     batch_well_conditioned_matrices = random_well_conditioned_matrix(2, S, S, dtype=dtype, device=device)
2022-06-06T03:09:13.7639471Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2659, in random_well_conditioned_matrix
2022-06-06T03:09:13.7639933Z     u, _, vh = torch.linalg.svd(x, full_matrices=False)
2022-06-06T03:09:13.7640313Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:09:13.7640767Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T03:09:13.7641282Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T03:09:13.7641502Z 
2022-06-06T03:09:13.7641782Z ----------------------------------------------------------------------
2022-06-06T03:09:13.7642119Z Ran 837 tests in 307.433s
2022-06-06T03:09:13.7642264Z 
2022-06-06T03:09:13.7642412Z FAILED (errors=10, expected failures=30)
2022-06-06T03:09:13.7642604Z 
2022-06-06T03:09:13.7642729Z Generating XML reports...
2022-06-06T03:09:13.8434409Z Generated XML report: test-reports/python-unittest/test_decomp/TEST-TestDecompCUDA-20220606030406.xml

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 3, 4, linux.4xlarge.nvidia.gpu) (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-06-06T03:07:51.2885861Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:07:51.2881457Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1168, in wrapper
2022-06-06T03:07:51.2881811Z     fn(*args, **kwargs)
2022-06-06T03:07:51.2882105Z   File "test_meta.py", line 908, in test_dispatch_meta
2022-06-06T03:07:51.2882413Z     for sample_input in samples:
2022-06-06T03:07:51.2882900Z   File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 43, in generator_context
2022-06-06T03:07:51.2883277Z     response = gen.send(None)
2022-06-06T03:07:51.2883864Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_methods_invocations.py", line 6454, in sample_inputs_linalg_cholesky_inverse
2022-06-06T03:07:51.2884416Z     batch_well_conditioned_matrices = random_well_conditioned_matrix(2, S, S, dtype=dtype, device=device)
2022-06-06T03:07:51.2885041Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 2659, in random_well_conditioned_matrix
2022-06-06T03:07:51.2885487Z     u, _, vh = torch.linalg.svd(x, full_matrices=False)
2022-06-06T03:07:51.2885861Z RuntimeError: CUDA error: an illegal memory access was encountered
2022-06-06T03:07:51.2886335Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
2022-06-06T03:07:51.2886763Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
2022-06-06T03:07:51.2886979Z 
2022-06-06T03:07:51.2887247Z ----------------------------------------------------------------------
2022-06-06T03:07:51.2887574Z Ran 837 tests in 308.402s
2022-06-06T03:07:51.2887738Z 
2022-06-06T03:07:51.2887864Z FAILED (errors=10, expected failures=30)
2022-06-06T03:07:51.2888051Z 
2022-06-06T03:07:51.2888173Z Generating XML reports...
2022-06-06T03:07:51.3666840Z Generated XML report: test-reports/python-unittest/test_meta/TEST-TestMetaCUDA-20220606030242.xml

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…sumexp and adjust grain size"

[ghstack-poisoned]
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 30, 2022
…sumexp and adjust grain size"

[ghstack-poisoned]
@mingfeima mingfeima added intel priority matters to intel architecture from performance wise intel This tag is for PR from Intel and removed Stale labels Jun 6, 2022
@frank-wei
Copy link
Contributor

@frank-wei has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@frank-wei
Copy link
Contributor

Pls check the failing test cases

@yanbing-j yanbing-j removed the intel priority matters to intel architecture from performance wise label Jul 13, 2022
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Sep 11, 2022
@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@zhuhaozhe zhuhaozhe closed this Oct 20, 2022
@zhuhaozhe zhuhaozhe reopened this Oct 20, 2022
@github-actions github-actions bot closed this Nov 19, 2022
…sumexp and adjust grain size"

Differential Revision: [D37441441](https://our.internmc.facebook.com/intern/diff/D37441441)

[ghstack-poisoned]
@mingfeima mingfeima removed the Stale label Feb 3, 2023
@mingfeima mingfeima reopened this Feb 3, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 3, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/74899

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Failures

As of commit 07bd9bf:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

mingfeima added a commit that referenced this pull request Feb 3, 2023
ghstack-source-id: cfd02ed
Pull Request resolved: #74899
@github-actions github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Feb 3, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2023

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@mingfeima mingfeima closed this Feb 3, 2023
@facebook-github-bot facebook-github-bot deleted the gh/mingfeima/69/head branch June 8, 2023 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed intel This tag is for PR from Intel module: cpu CPU specific problem (e.g., perf, algorithm) open source

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants