Skip to content

Conversation

malfet
Copy link
Contributor

@malfet malfet commented May 6, 2022

Stack from ghstack:

- Use double quotes for Python string literals (per https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings )
- Do not post comment of merge_on_green called with dry_run argument
- Dismantle pyramid of doom (and avoid infinite while) by specifying
  timeout as exit criteria
- Use `handle_exception` for revert workflow as well

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 6, 2022

🔗 Helpful links

❌ 6 New Failures

As of commit 4e4b181 (more details on the Dr. CI page):

Expand to see more
  • 6/6 failures introduced in this PR

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (1/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-07T01:29:50.2112226Z FAIL [0.023s]: tes...iplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T01:29:50.2109350Z     return fn(self, *args, **kwargs)
2022-05-07T01:29:50.2109745Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 808, in wrapper
2022-05-07T01:29:50.2110079Z     fn(*args, **kwargs)
2022-05-07T01:29:50.2110591Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1150, in wrapper
2022-05-07T01:29:50.2110966Z     fn(*args, **kwargs)
2022-05-07T01:29:50.2111205Z   File "test_meta.py", line 893, in test_meta
2022-05-07T01:29:50.2111443Z     self.fail('expected failure, but succeeded')
2022-05-07T01:29:50.2111707Z AssertionError: expected failure, but succeeded
2022-05-07T01:29:50.2111858Z 
2022-05-07T01:29:50.2111961Z ======================================================================
2022-05-07T01:29:50.2112226Z FAIL [0.023s]: test_meta_nn_functional_triplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T01:29:50.2112556Z ----------------------------------------------------------------------
2022-05-07T01:29:50.2112823Z Traceback (most recent call last):
2022-05-07T01:29:50.2113233Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 376, in instantiated_test
2022-05-07T01:29:50.2113569Z     result = test(self, **param_kwargs)
2022-05-07T01:29:50.2113967Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 773, in test_wrapper
2022-05-07T01:29:50.2114312Z     return test(*args, **kwargs)
2022-05-07T01:29:50.2114694Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_device_type.py", line 966, in only_fn
2022-05-07T01:29:50.2115030Z     return fn(self, *args, **kwargs)
2022-05-07T01:29:50.2115387Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 808, in wrapper
2022-05-07T01:29:50.2115701Z     fn(*args, **kwargs)

See GitHub Actions build pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (default, 1, 2, linux.4xlarge.nvidia.gpu) (2/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-07T00:49:34.4393155Z FAIL [0.014s]: tes...let_margin_loss_cuda_uint8 (__main__.TestMetaCUDA)
2022-05-07T00:49:34.4389656Z     return fn(self, *args, **kwargs)
2022-05-07T00:49:34.4390145Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:49:34.4390529Z     fn(*args, **kwargs)
2022-05-07T00:49:34.4391028Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1150, in wrapper
2022-05-07T00:49:34.4391383Z     fn(*args, **kwargs)
2022-05-07T00:49:34.4391677Z   File "test_meta.py", line 893, in test_meta
2022-05-07T00:49:34.4392077Z     self.fail('expected failure, but succeeded')
2022-05-07T00:49:34.4392408Z AssertionError: expected failure, but succeeded
2022-05-07T00:49:34.4392618Z 
2022-05-07T00:49:34.4392762Z ======================================================================
2022-05-07T00:49:34.4393155Z FAIL [0.014s]: test_meta_nn_functional_triplet_margin_loss_cuda_uint8 (__main__.TestMetaCUDA)
2022-05-07T00:49:34.4393663Z ----------------------------------------------------------------------
2022-05-07T00:49:34.4394000Z Traceback (most recent call last):
2022-05-07T00:49:34.4394521Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1800, in wrapper
2022-05-07T00:49:34.4394914Z     method(*args, **kwargs)
2022-05-07T00:49:34.4395487Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-05-07T00:49:34.4395919Z     result = test(self, **param_kwargs)
2022-05-07T00:49:34.4396461Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 773, in test_wrapper
2022-05-07T00:49:34.4396864Z     return test(*args, **kwargs)
2022-05-07T00:49:34.4397360Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 966, in only_fn
2022-05-07T00:49:34.4397765Z     return fn(self, *args, **kwargs)

See GitHub Actions build pull / linux-bionic-py3.7-clang9 / test (default, 1, 2, linux.2xlarge) (3/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-07T00:14:58.8847166Z FAIL [0.009s]: tes...iplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T00:14:58.8844700Z     return fn(self, *args, **kwargs)
2022-05-07T00:14:58.8845069Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:14:58.8845333Z     fn(*args, **kwargs)
2022-05-07T00:14:58.8845690Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1150, in wrapper
2022-05-07T00:14:58.8845946Z     fn(*args, **kwargs)
2022-05-07T00:14:58.8846150Z   File "test_meta.py", line 893, in test_meta
2022-05-07T00:14:58.8846429Z     self.fail('expected failure, but succeeded')
2022-05-07T00:14:58.8846663Z AssertionError: expected failure, but succeeded
2022-05-07T00:14:58.8846804Z 
2022-05-07T00:14:58.8846897Z ======================================================================
2022-05-07T00:14:58.8847166Z FAIL [0.009s]: test_meta_nn_functional_triplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T00:14:58.8847523Z ----------------------------------------------------------------------
2022-05-07T00:14:58.8847776Z Traceback (most recent call last):
2022-05-07T00:14:58.8848176Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-05-07T00:14:58.8848475Z     result = test(self, **param_kwargs)
2022-05-07T00:14:58.8848859Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 773, in test_wrapper
2022-05-07T00:14:58.8849145Z     return test(*args, **kwargs)
2022-05-07T00:14:58.8849527Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 966, in only_fn
2022-05-07T00:14:58.8849797Z     return fn(self, *args, **kwargs)
2022-05-07T00:14:58.8850169Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:14:58.8850436Z     fn(*args, **kwargs)

See GitHub Actions build pull / linux-xenial-py3.7-gcc7 / test (default, 1, 2, linux.2xlarge) (4/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-07T00:27:43.0060637Z FAIL [0.009s]: tes...iplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T00:27:43.0057975Z     return fn(self, *args, **kwargs)
2022-05-07T00:27:43.0058413Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:27:43.0058693Z     fn(*args, **kwargs)
2022-05-07T00:27:43.0059075Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1150, in wrapper
2022-05-07T00:27:43.0059341Z     fn(*args, **kwargs)
2022-05-07T00:27:43.0059557Z   File "test_meta.py", line 893, in test_meta
2022-05-07T00:27:43.0059853Z     self.fail('expected failure, but succeeded')
2022-05-07T00:27:43.0060101Z AssertionError: expected failure, but succeeded
2022-05-07T00:27:43.0060254Z 
2022-05-07T00:27:43.0060352Z ======================================================================
2022-05-07T00:27:43.0060637Z FAIL [0.009s]: test_meta_nn_functional_triplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T00:27:43.0061031Z ----------------------------------------------------------------------
2022-05-07T00:27:43.0061289Z Traceback (most recent call last):
2022-05-07T00:27:43.0061720Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-05-07T00:27:43.0062038Z     result = test(self, **param_kwargs)
2022-05-07T00:27:43.0062442Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 773, in test_wrapper
2022-05-07T00:27:43.0062745Z     return test(*args, **kwargs)
2022-05-07T00:27:43.0063146Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 966, in only_fn
2022-05-07T00:27:43.0063445Z     return fn(self, *args, **kwargs)
2022-05-07T00:27:43.0063823Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:27:43.0064100Z     fn(*args, **kwargs)

See GitHub Actions build pull / linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu) (5/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-07T01:11:17.2974990Z FAIL [0.011s]: tes...let_margin_loss_cuda_uint8 (__main__.TestMetaCUDA)
2022-05-07T01:11:17.2971214Z     return fn(self, *args, **kwargs)
2022-05-07T01:11:17.2971767Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T01:11:17.2972154Z     fn(*args, **kwargs)
2022-05-07T01:11:17.2972701Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1150, in wrapper
2022-05-07T01:11:17.2973101Z     fn(*args, **kwargs)
2022-05-07T01:11:17.2973406Z   File "test_meta.py", line 893, in test_meta
2022-05-07T01:11:17.2973836Z     self.fail('expected failure, but succeeded')
2022-05-07T01:11:17.2974202Z AssertionError: expected failure, but succeeded
2022-05-07T01:11:17.2974436Z 
2022-05-07T01:11:17.2974598Z ======================================================================
2022-05-07T01:11:17.2974990Z FAIL [0.011s]: test_meta_nn_functional_triplet_margin_loss_cuda_uint8 (__main__.TestMetaCUDA)
2022-05-07T01:11:17.2975550Z ----------------------------------------------------------------------
2022-05-07T01:11:17.2975920Z Traceback (most recent call last):
2022-05-07T01:11:17.2976463Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1800, in wrapper
2022-05-07T01:11:17.2976898Z     method(*args, **kwargs)
2022-05-07T01:11:17.2977483Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-05-07T01:11:17.2977928Z     result = test(self, **param_kwargs)
2022-05-07T01:11:17.2978506Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 773, in test_wrapper
2022-05-07T01:11:17.2978931Z     return test(*args, **kwargs)
2022-05-07T01:11:17.2979480Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 966, in only_fn
2022-05-07T01:11:17.2979898Z     return fn(self, *args, **kwargs)

See GitHub Actions build pull / linux-xenial-py3.7-gcc5.4 / test (default, 1, 2, linux.2xlarge) (6/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-07T00:13:34.1724261Z FAIL [0.008s]: tes...iplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T00:13:34.1721770Z     return fn(self, *args, **kwargs)
2022-05-07T00:13:34.1722172Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:13:34.1722436Z     fn(*args, **kwargs)
2022-05-07T00:13:34.1722784Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1150, in wrapper
2022-05-07T00:13:34.1723047Z     fn(*args, **kwargs)
2022-05-07T00:13:34.1723249Z   File "test_meta.py", line 893, in test_meta
2022-05-07T00:13:34.1723527Z     self.fail('expected failure, but succeeded')
2022-05-07T00:13:34.1723760Z AssertionError: expected failure, but succeeded
2022-05-07T00:13:34.1723901Z 
2022-05-07T00:13:34.1723994Z ======================================================================
2022-05-07T00:13:34.1724261Z FAIL [0.008s]: test_meta_nn_functional_triplet_margin_loss_cpu_uint8 (__main__.TestMetaCPU)
2022-05-07T00:13:34.1724616Z ----------------------------------------------------------------------
2022-05-07T00:13:34.1724868Z Traceback (most recent call last):
2022-05-07T00:13:34.1725278Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-05-07T00:13:34.1725580Z     result = test(self, **param_kwargs)
2022-05-07T00:13:34.1725960Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 773, in test_wrapper
2022-05-07T00:13:34.1726243Z     return test(*args, **kwargs)
2022-05-07T00:13:34.1726617Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 966, in only_fn
2022-05-07T00:13:34.1726888Z     return fn(self, *args, **kwargs)
2022-05-07T00:13:34.1727255Z   File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 808, in wrapper
2022-05-07T00:13:34.1727516Z     fn(*args, **kwargs)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

- Use double quotes for Python string literals (per https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings )
- Do not post comment of merge_on_green called with dry_run argument
- Dismantle pyramid of doom (and avoid infinite while) by specifying
  timeout as exit criteria
- Use `handle_exception` for revert workflow as well

[ghstack-poisoned]
Copy link
Contributor

@mehtanirav mehtanirav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. some minor comments.

last_exception = ''
while True:
elapsed_time = 0.0
while elapsed_time < 400 * 60:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we have such long (6+ hours) timeout. Does merging PR also run additional CI jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, 6h limits sounds wrong to me (and the job would probably be killed earlier)
@zengk95 care to explain?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the scenario where
Person creates PR > Triggers CI > Immediately after, they write merge this > CI can potentially take 4-5 hours to run so this script spins for around 6 hours (once we make those tests mandatory)? which only gives us a 1 or 2 hour overhead. I chose 6 hours cause I think that's the default GHA timeout.

Or the scenario where they retrigger CI by pushing another commit after like an hour or so and still want to land it when green.

- Use double quotes for Python string literals (per https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings )
- Do not post comment of merge_on_green called with dry_run argument
- Dismantle pyramid of doom (and avoid infinite while) by specifying
  timeout as exit criteria
- Use `handle_exception` for revert workflow as well

[ghstack-poisoned]
@malfet
Copy link
Contributor Author

malfet commented May 6, 2022

@pytorchbot merge on green this

@facebook-github-bot facebook-github-bot deleted the gh/malfet/39/head branch May 10, 2022 14:16
facebook-github-bot pushed a commit that referenced this pull request May 13, 2022
Summary:
- Use double quotes for Python string literals (per https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#strings )
- Do not add label if merge_on_green is called with dry_run argument
- Dismantle pyramid of doom (and avoid infinite while) by specifying
  timeout as exit criteria
- Use `handle_exception` for revert workflow as well
- Do not double post comment if merge_on_green times out

Pull Request resolved: #77004

Approved by: https://github.com/mehtanirav

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/9f3a497c62bac3f0cc47fe12d30430b2281f0d00

Reviewed By: malfet

Differential Revision: D36250438

fbshipit-source-id: eb2165597b96889907e1a88bfa31fc7cdd72844e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants