Skip to content

Conversation

@seemethere
Copy link
Member

This was causing periodic jobs to not be run at all

Signed-off-by: Eli Uriegas eliuriegas@fb.com

Fixes #76863

This was causing periodic jobs to not be run at all

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>
@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label May 4, 2022
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented May 4, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit a0a82f7 (more details on the Dr. CI page):

Expand to see more
  • 2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build periodic / linux-bionic-cuda11.6-py3.7-gcc7 / test (default, 2, 2, linux.4xlarge.nvidia.gpu) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-05T01:16:17.5152998Z AssertionError: can only test a child process
2022-05-05T01:16:17.4389796Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2022-05-05T01:16:17.4390230Z AssertionError: can only test a child process
2022-05-05T01:16:17.5139273Z Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f9f3aed0c20>
2022-05-05T01:16:17.5139783Z Traceback (most recent call last):
2022-05-05T01:16:17.5140482Z   File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1388, in __del__
2022-05-05T01:16:17.5145493Z     self._shutdown_workers()
2022-05-05T01:16:17.5146038Z   File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1371, in _shutdown_workers
2022-05-05T01:16:17.5151440Z     if w.is_alive():
2022-05-05T01:16:17.5151834Z   File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 151, in is_alive
2022-05-05T01:16:17.5152347Z     assert self._parent_pid == os.getpid(), 'can only test a child process'
2022-05-05T01:16:17.5152998Z AssertionError: can only test a child process
2022-05-05T01:16:20.2620143Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2022-05-05T01:16:20.2633490Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2022-05-05T01:16:20.2673345Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2022-05-05T01:16:23.2500755Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2022-05-05T01:16:23.2502143Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2022-05-05T01:16:23.2517798Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2022-05-05T01:16:28.8191981Z ok (11.517s)
2022-05-05T01:16:28.8214799Z   test_multiprocessing_iterdatapipe (__main__.TestDataLoaderPersistentWorkers) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/74498 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.002s)
2022-05-05T01:16:30.0623163Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... ok (1.241s)
2022-05-05T01:16:30.0657650Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)

🕵️‍♀️ 1 failure not recognized by patterns:

The following CI failures may be due to changes from the PR
Job Step Action
GitHub Actions pull / linux-bionic-rocm5.1-py3.7 / test (default, 1, 2, linux.rocm.gpu) Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@seemethere seemethere added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label May 4, 2022
@seemethere seemethere requested a review from a team May 4, 2022 23:53
@malfet
Copy link
Contributor

malfet commented May 5, 2022

@pytorchbot merge this please

@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2022

Hey @seemethere.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

facebook-github-bot pushed a commit that referenced this pull request May 6, 2022
Summary:
This was causing periodic jobs to not be run at all

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Fixes #76863

Pull Request resolved: #76864
Approved by: https://github.com/malfet

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/32ae5840081ca1bde6a6acdc0b08c7f168620242

Reviewed By: malfet

Differential Revision: D36171124

fbshipit-source-id: 68c091a1ac8e69ca06508c798a5d6d8cf307984b
@github-actions github-actions bot deleted the seemethere/fix_periodic_builds branch February 16, 2024 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR cla signed module: rocm AMD GPU support for Pytorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Periodic jobs have not been running, invalid workflow file

3 participants