Skip to content

Conversation

pearu
Copy link
Collaborator

@pearu pearu commented Feb 20, 2022

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 20, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/77ee5c7554b5d13fa23b2bb6c62cd24d703f68d2/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/scheduled 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 20, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 32173a2 (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge) (1/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-25T17:21:14.1353105Z NameError: name 'parse_args' is not defined
2022-02-25T17:21:14.1261761Z 
2022-02-25T17:21:14.1321778Z  �[36mtest/onnx/test_utility_funs.py�[0m::TestUtilityFuns_opset15.test_aten_fallthrough�[0m �[32m✓�[0m�[31m43% �[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[31m█�[0m�[40m�[31m▍�[0m�[40m�[31m     �[0m
2022-02-25T17:21:14.1322168Z 
2022-02-25T17:21:14.1325789Z 
2022-02-25T17:21:14.1350287Z  �[36mtest/onnx/test_utility_funs.py�[0m::TestUtilityFuns_opset15.test_autograd_onnx_fallthrough�[0m �[32m✓�[0m�[31m43% �[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[31m█�[0m�[40m�[31m▍�[0m�[40m�[31m     �[0m
2022-02-25T17:21:14.1350931Z 
2022-02-25T17:21:14.1351297Z ―――――――――――― TestUtilityFuns_opset15.test_bad_symbolic_registration ――――――――――――
2022-02-25T17:21:14.1351717Z Traceback (most recent call last):
2022-02-25T17:21:14.1352199Z   File "/var/lib/jenkins/workspace/test/onnx/test_utility_funs.py", line 1277, in test_bad_symbolic_registration
2022-02-25T17:21:14.1352623Z     @parse_args("v")
2022-02-25T17:21:14.1353105Z NameError: name 'parse_args' is not defined
2022-02-25T17:21:14.1353325Z 
2022-02-25T17:21:14.1353332Z 
2022-02-25T17:21:14.1355904Z 
2022-02-25T17:21:14.1714796Z  �[36mtest/onnx/test_utility_funs.py�[0m::TestUtilityFuns_opset15.test_bad_symbolic_registration�[0m �[31m⨯�[0m�[31m43% �[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[31m█�[0m�[40m�[31m▍�[0m�[40m�[31m     �[0m
2022-02-25T17:21:14.1715209Z 
2022-02-25T17:21:14.1718885Z 
2022-02-25T17:21:14.2635825Z  �[36mtest/onnx/test_utility_funs.py�[0m::TestUtilityFuns_opset15.test_constant_fold_add�[0m �[32m✓�[0m�[31m43% �[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[31m█�[0m�[40m�[31m▍�[0m�[40m�[31m     �[0m
2022-02-25T17:21:14.2636085Z 
2022-02-25T17:21:14.2639874Z 
2022-02-25T17:21:14.2945440Z  �[36mtest/onnx/test_utility_funs.py�[0m::TestUtilityFuns_opset15.test_constant_fold_concat�[0m �[32m✓�[0m�[31m44% �[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[32m█�[0m�[40m�[31m█�[0m�[40m�[31m▍�[0m�[40m�[31m     �[0m

See GitHub Actions build win-vs2019-cuda11.3-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-25T18:33:04.6087551Z test_add_done_ca...arg() takes 0 positional arguments but 1 was given
2022-02-25T18:33:04.6071384Z 
2022-02-25T18:33:04.6072354Z For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
2022-02-25T18:33:04.6073521Z   warnings.warn(errors.NumbaWarning(msg))
2022-02-25T18:33:04.6074376Z C:\Jenkins\Miniconda3\lib\site-packages\numba\cuda\envvars.py:17: NumbaWarning: �[1m
2022-02-25T18:33:04.6075791Z Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\nvvm\libdevice.
2022-02-25T18:33:04.6076741Z 
2022-02-25T18:33:04.6077688Z For more information about alternatives visit: ('https://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')�[0m
2022-02-25T18:33:04.6078846Z   warnings.warn(errors.NumbaWarning(msg))
2022-02-25T18:33:04.6079353Z ok (0.787s)
2022-02-25T18:33:04.6079965Z   test_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.016s)
2022-02-25T18:33:04.6087551Z   test_add_done_callback_no_arg_error_is_ignored (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: TypeError: no_arg() takes 0 positional arguments but 1 was given
2022-02-25T18:33:04.6088711Z ok (0.000s)
2022-02-25T18:33:04.6107360Z   test_add_done_callback_simple (__main__.TestFuture) ... ok (0.000s)
2022-02-25T18:33:04.6173997Z   test_chained_then (__main__.TestFuture) ... ok (0.000s)
2022-02-25T18:33:04.7205985Z   test_collect_all (__main__.TestFuture) ... ok (0.114s)
2022-02-25T18:33:04.7218285Z   test_done (__main__.TestFuture) ... ok (0.000s)
2022-02-25T18:33:04.7238435Z   test_done_exception (__main__.TestFuture) ... ok (0.000s)
2022-02-25T18:33:04.7268613Z   test_interleaving_then_and_add_done_callback_maintains_callback_order (__main__.TestFuture) ... ok (0.000s)
2022-02-25T18:33:04.7284640Z   test_interleaving_then_and_add_done_callback_propagates_error (__main__.TestFuture) ... [E pybind_utils.h:201] Got the following error when running the callback: ValueError: Expected error
2022-02-25T18:33:04.7285800Z 
2022-02-25T18:33:04.7286099Z At:

1 failure not recognized by patterns:

Job Step Action
GitHub Actions linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

pearu added a commit that referenced this pull request Feb 20, 2022
…st failures

ghstack-source-id: 537dfdd
Pull Request resolved: #73155
@pearu pearu self-assigned this Feb 20, 2022
@pearu pearu added module: sparse Related to torch.sparse open source release notes: sparse release notes category topic: not user facing topic category labels Feb 20, 2022
@pearu pearu linked an issue Feb 20, 2022 that may be closed by this pull request
@pearu pearu requested review from cpuhrsch and suo February 20, 2022 14:12
@suo suo removed their request for review February 21, 2022 14:45
@suo
Copy link
Member

suo commented Feb 21, 2022

un-requesting myself as I will be out for the week

@cpuhrsch
Copy link
Contributor

Bumping up the relative and absolute error limits is far from satisfying to resolve issues like this. Maybe we should look into seeding our tests so that we always run on the same data, or at least pick a seed at random and record it so we can create deterministically reproducible failures.

@cpuhrsch
Copy link
Contributor

@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@pearu
Copy link
Collaborator Author

pearu commented Feb 23, 2022

Bumping up the relative and absolute error limits is far from satisfying to resolve issues like this. Maybe we should look into seeding our tests so that we always run on the same data, or at least pick a seed at random and record it so we can create deterministically reproducible failures.

I agree these issues of flaky tests are annoying.

There is wrapper_set_seed function. It looks like its usage is very verbose and its need may be missed by developers. So I think it would be more reliable to reset the seed semi-automatically, say, via dtype check in OpInfo.sample_inputs method, or introduce the manual_seed attribute to OpInfo so that developers can specify the seed when defining OpInfo instances.

@mruberry, any advice on the method of resetting the random seed for a particular set of tests or avoiding seed-related inference between unrelated tests?

@cpuhrsch
Copy link
Contributor

@pearu - Merging PRs has been a bit hellish. Could you try rebasing this stack please?

….float16 test failures"

Fixes #73145


Differential Revision: [D34398935](https://our.internmc.facebook.com/intern/diff/D34398935)

[ghstack-poisoned]
@pearu
Copy link
Collaborator Author

pearu commented Feb 23, 2022

@cpuhrsch The problem is that the PR #72397 was backed out by 5dad19f

This kind of workflow where the test flakiness on the random seed pushes out reasonable PRs can be frustrating to developers.

@cpuhrsch
Copy link
Contributor

@pearu - I agree. We should have infrastructure to that allows developers to opt-into seeded tests. But we have to be careful to not encourage seed hacking. You can try a thousand seeds and it'll eventually succeed.

@pearu
Copy link
Collaborator Author

pearu commented Feb 23, 2022

You can try a thousand seeds and it'll eventually succeed.

Yeah, the solution should reverse this situation: one can try a thousand seeds and it'll eventually fail.

How about this, if one wants to specify the seed manually for testing then they must specify, say, at least 3 different seed values.

@cpuhrsch
Copy link
Contributor

@pearu - I was thinking we used a fixed seed as a fallback when a test fails and note down the seed and failing test to create an issue. That way we can investigate without blocking development. The fixed seed with very strict atol and rtol will still prevent gross mistakes, but we can subsequently investigate whether the flakiness is random or due to some underlying change.

@cpuhrsch
Copy link
Contributor

@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@cpuhrsch
Copy link
Contributor

@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@cpuhrsch
Copy link
Contributor

@pearu - could you try rebasing this again?

….float16 test failures"

Fixes #73145


Differential Revision: [D34398935](https://our.internmc.facebook.com/intern/diff/D34398935)

[ghstack-poisoned]
@cpuhrsch
Copy link
Contributor

@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Feb 26, 2022
…st failures (#73155)

Summary:
Pull Request resolved: #73155

Fixes #73145

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34398935

Pulled By: cpuhrsch

fbshipit-source-id: b1e852f25b0888b37d9c9c1418ddf344ac8f0a04
pearu added a commit that referenced this pull request Feb 27, 2022
pearu added a commit that referenced this pull request Feb 27, 2022
@facebook-github-bot facebook-github-bot deleted the gh/pearu/37/head branch March 1, 2022 15:17
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
…st failures (#73155)

Summary:
Pull Request resolved: pytorch/pytorch#73155

Fixes #73145

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34398935

Pulled By: cpuhrsch

fbshipit-source-id: b1e852f25b0888b37d9c9c1418ddf344ac8f0a04
(cherry picked from commit d63c977fb39c7dcb3f3d083edc4b25cd2d6c2ec4)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
…st failures (#73155)

Summary:
Pull Request resolved: pytorch/pytorch#73155

Fixes #73145

Test Plan: Imported from OSS

Reviewed By: mikaylagawarecki

Differential Revision: D34398935

Pulled By: cpuhrsch

fbshipit-source-id: b1e852f25b0888b37d9c9c1418ddf344ac8f0a04
(cherry picked from commit d63c977fb39c7dcb3f3d083edc4b25cd2d6c2ec4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DISABLED test_sparse_addmm_cpu_bfloat16 (__main__.TestSparseCPU)

4 participants