Skip to content

Conversation

@soulitzer
Copy link
Contributor

@soulitzer soulitzer commented Aug 12, 2022

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 12, 2022

🔗 Helpful links

❌ 6 New Failures

As of commit 3572be1 (more details on the Dr. CI page):

Expand to see more
  • 6/6 failures introduced in this PR

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 1, 2, windows.4xlarge) (1/6)

Step: "Test" (full log | diagnosis details)

2022-08-15T17:05:38.1193932Z RuntimeError: test_ops failed!
2022-08-15T17:05:36.8524818Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 634, in run_tests
2022-08-15T17:05:36.8525204Z     os.environ['DISABLED_TESTS_DICT'] = fp.read()
2022-08-15T17:05:36.8525566Z   File "C:\Jenkins\Miniconda3\lib\os.py", line 685, in __setitem__
2022-08-15T17:05:36.8622928Z     putenv(key, value)
2022-08-15T17:05:38.1186219Z ValueError: the environment variable is longer than 32767 characters
2022-08-15T17:05:38.1186549Z Traceback (most recent call last):
2022-08-15T17:05:38.1187047Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 990, in <module>
2022-08-15T17:05:38.1190451Z     main()
2022-08-15T17:05:38.1190833Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 968, in main
2022-08-15T17:05:38.1193638Z     raise RuntimeError(err_message)
2022-08-15T17:05:38.1193932Z RuntimeError: test_ops failed!
2022-08-15T17:05:38.3372848Z 
2022-08-15T17:05:38.3373529Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-08-15T17:05:38.3375172Z 
2022-08-15T17:05:38.3375420Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-08-15T17:05:38.3430019Z ##[error]Process completed with exit code 1.
2022-08-15T17:05:38.3877141Z Prepare all required actions
2022-08-15T17:05:38.3877699Z Getting action download info
2022-08-15T17:05:38.6490653Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-15T17:05:38.8351078Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T17:05:38.8351355Z with:

See GitHub Actions build periodic / win-vs2019-cuda11.7-py3 / test (default, 1, 2, windows.8xlarge.nvidia.gpu) (2/6)

Step: "Test" (full log | diagnosis details)

2022-08-15T17:13:26.9879596Z RuntimeError: test_ops failed!
2022-08-15T17:13:26.1680397Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 634, in run_tests
2022-08-15T17:13:26.1680893Z     os.environ['DISABLED_TESTS_DICT'] = fp.read()
2022-08-15T17:13:26.1681302Z   File "C:\Jenkins\Miniconda3\lib\os.py", line 685, in __setitem__
2022-08-15T17:13:26.1807137Z     putenv(key, value)
2022-08-15T17:13:26.1807514Z ValueError: the environment variable is longer than 32767 characters
2022-08-15T17:13:26.9869324Z Traceback (most recent call last):
2022-08-15T17:13:26.9869988Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 990, in <module>
2022-08-15T17:13:26.9874667Z     main()
2022-08-15T17:13:26.9875096Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 968, in main
2022-08-15T17:13:26.9879302Z     raise RuntimeError(err_message)
2022-08-15T17:13:26.9879596Z RuntimeError: test_ops failed!
2022-08-15T17:13:27.3171715Z 
2022-08-15T17:13:27.3172574Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-08-15T17:13:27.3175587Z 
2022-08-15T17:13:27.3176018Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-08-15T17:13:27.3251738Z ##[error]Process completed with exit code 1.
2022-08-15T17:13:27.3418280Z Prepare all required actions
2022-08-15T17:13:27.3418779Z Getting action download info
2022-08-15T17:13:27.5075373Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-15T17:13:27.8017294Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T17:13:27.8017630Z with:

See GitHub Actions build periodic / win-vs2019-cuda11.7-py3 / test (default, 2, 2, windows.8xlarge.nvidia.gpu) (3/6)

Step: "Test" (full log | diagnosis details)

2022-08-15T17:13:24.3174729Z RuntimeError: test_ops_gradients failed!
2022-08-15T17:13:23.6768643Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 634, in run_tests
2022-08-15T17:13:23.6769131Z     os.environ['DISABLED_TESTS_DICT'] = fp.read()
2022-08-15T17:13:23.6769527Z   File "C:\Jenkins\Miniconda3\lib\os.py", line 685, in __setitem__
2022-08-15T17:13:23.6894136Z     putenv(key, value)
2022-08-15T17:13:23.6894454Z ValueError: the environment variable is longer than 32767 characters
2022-08-15T17:13:24.3164456Z Traceback (most recent call last):
2022-08-15T17:13:24.3165079Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 990, in <module>
2022-08-15T17:13:24.3170211Z     main()
2022-08-15T17:13:24.3170696Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 968, in main
2022-08-15T17:13:24.3174438Z     raise RuntimeError(err_message)
2022-08-15T17:13:24.3174729Z RuntimeError: test_ops_gradients failed!
2022-08-15T17:13:24.6390046Z 
2022-08-15T17:13:24.6390981Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-08-15T17:13:24.6394538Z 
2022-08-15T17:13:24.6395187Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-08-15T17:13:24.6460767Z ##[error]Process completed with exit code 1.
2022-08-15T17:13:24.6628641Z Prepare all required actions
2022-08-15T17:13:24.6629113Z Getting action download info
2022-08-15T17:13:24.8328828Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-15T17:13:25.0346845Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T17:13:25.0347125Z with:

See GitHub Actions build periodic / win-vs2019-cuda11.7-py3 / test (force_on_cpu, 1, 1, windows.4xlarge) (4/6)

Step: "Test" (full log | diagnosis details)

2022-08-15T17:13:20.4329086Z RuntimeError: backends/xeon/test_launch failed!
2022-08-15T17:13:20.2161001Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 634, in run_tests
2022-08-15T17:13:20.2161411Z     os.environ['DISABLED_TESTS_DICT'] = fp.read()
2022-08-15T17:13:20.2161716Z   File "C:\Jenkins\Miniconda3\lib\os.py", line 685, in __setitem__
2022-08-15T17:13:20.2258385Z     putenv(key, value)
2022-08-15T17:13:20.2258744Z ValueError: the environment variable is longer than 32767 characters
2022-08-15T17:13:20.4321876Z Traceback (most recent call last):
2022-08-15T17:13:20.4322490Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 990, in <module>
2022-08-15T17:13:20.4325579Z     main()
2022-08-15T17:13:20.4326173Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 968, in main
2022-08-15T17:13:20.4328826Z     raise RuntimeError(err_message)
2022-08-15T17:13:20.4329086Z RuntimeError: backends/xeon/test_launch failed!
2022-08-15T17:13:20.6827193Z 
2022-08-15T17:13:20.6827745Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-08-15T17:13:20.6829851Z 
2022-08-15T17:13:20.6830375Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-08-15T17:13:20.6884156Z ##[error]Process completed with exit code 1.
2022-08-15T17:13:20.7018824Z Prepare all required actions
2022-08-15T17:13:20.7019346Z Getting action download info
2022-08-15T17:13:20.8619277Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-15T17:13:21.1270821Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T17:13:21.1271091Z with:

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (functorch, 1, 1, windows.4xlarge) (5/6)

Step: "Test" (full log | diagnosis details)

2022-08-15T17:06:00.5609326Z RuntimeError: C:\a...\pytorch\functorch\test\test_compile_cache failed!
2022-08-15T17:06:00.3361195Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 634, in run_tests
2022-08-15T17:06:00.3363539Z     os.environ['DISABLED_TESTS_DICT'] = fp.read()
2022-08-15T17:06:00.3364230Z   File "C:\Jenkins\Miniconda3\lib\os.py", line 685, in __setitem__
2022-08-15T17:06:00.3477860Z     putenv(key, value)
2022-08-15T17:06:00.3478132Z ValueError: the environment variable is longer than 32767 characters
2022-08-15T17:06:00.5601809Z Traceback (most recent call last):
2022-08-15T17:06:00.5602323Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 990, in <module>
2022-08-15T17:06:00.5605529Z     main()
2022-08-15T17:06:00.5606574Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 968, in main
2022-08-15T17:06:00.5608814Z     raise RuntimeError(err_message)
2022-08-15T17:06:00.5609326Z RuntimeError: C:\actions-runner\_work\pytorch\pytorch\functorch\test\test_compile_cache failed!
2022-08-15T17:06:00.8154797Z 
2022-08-15T17:06:00.8155427Z (base) C:\actions-runner\_work\pytorch\pytorch\test>popd
2022-08-15T17:06:00.8159692Z 
2022-08-15T17:06:00.8159958Z (base) C:\actions-runner\_work\pytorch\pytorch>if ERRORLEVEL 1 goto fail 
2022-08-15T17:06:00.8162278Z 
2022-08-15T17:06:00.8162478Z (base) C:\actions-runner\_work\pytorch\pytorch>exit /b 1 
2022-08-15T17:06:00.8217370Z ##[error]Process completed with exit code 1.
2022-08-15T17:06:00.8692058Z Prepare all required actions
2022-08-15T17:06:00.8692675Z Getting action download info
2022-08-15T17:06:01.1247643Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)

See GitHub Actions build pull / win-vs2019-cpu-py3 / test (default, 2, 2, windows.4xlarge) (6/6)

Step: "Test" (full log | diagnosis details)

2022-08-15T17:07:09.8491209Z RuntimeError: test_ops_jit failed!
2022-08-15T17:07:09.4512346Z   File "C:\actions-runner\_work\pytorch\pytorch\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 634, in run_tests
2022-08-15T17:07:09.4512759Z     os.environ['DISABLED_TESTS_DICT'] = fp.read()
2022-08-15T17:07:09.4513102Z   File "C:\Jenkins\Miniconda3\lib\os.py", line 685, in __setitem__
2022-08-15T17:07:09.4613657Z     putenv(key, value)
2022-08-15T17:07:09.4614057Z ValueError: the environment variable is longer than 32767 characters
2022-08-15T17:07:09.8484297Z Traceback (most recent call last):
2022-08-15T17:07:09.8484784Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 990, in <module>
2022-08-15T17:07:09.8487863Z     main()
2022-08-15T17:07:09.8488217Z   File "C:\actions-runner\_work\pytorch\pytorch\test\run_test.py", line 968, in main
2022-08-15T17:07:09.8490944Z     raise RuntimeError(err_message)
2022-08-15T17:07:09.8491209Z RuntimeError: test_ops_jit failed!
2022-08-15T17:07:10.0273787Z 
2022-08-15T17:07:10.0274420Z (base) C:\actions-runner\_work\pytorch\pytorch\test>if ERRORLEVEL 1 goto fail 
2022-08-15T17:07:10.0276556Z 
2022-08-15T17:07:10.0276958Z (base) C:\actions-runner\_work\pytorch\pytorch\test>exit /b 1 
2022-08-15T17:07:10.0326017Z ##[error]Process completed with exit code 1.
2022-08-15T17:07:10.0464471Z Prepare all required actions
2022-08-15T17:07:10.0465317Z Getting action download info
2022-08-15T17:07:10.1919163Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a)
2022-08-15T17:07:10.3791171Z ##[group]Run ./.github/actions/get-workflow-job-id
2022-08-15T17:07:10.3791394Z with:

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@soulitzer soulitzer requested a review from janeyx99 August 12, 2022 21:57
@soulitzer soulitzer added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Aug 12, 2022
soulitzer added a commit that referenced this pull request Aug 12, 2022
soulitzer added a commit that referenced this pull request Aug 15, 2022
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sad, but if it unblocks us, then this is fine

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@soulitzer
Copy link
Contributor Author

@pytorchbot merge -f "Preexisting failures"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link
Contributor

Hey @soulitzer.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@soulitzer soulitzer added the topic: not user facing topic category label Aug 15, 2022
facebook-github-bot pushed a commit that referenced this pull request Aug 16, 2022
)

Summary:
Fixes #83335

Pull Request resolved: #83354
Approved by: https://github.com/malfet, https://github.com/albanD

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/43f950af201f8a39e5728a65e03cfcafec04585d

Reviewed By: atalman

Differential Revision: D38724725

Pulled By: soulitzer

fbshipit-source-id: 5511e0b0c8209ef01efda67691ed59c7ef8bc06a
pytorchmergebot pushed a commit that referenced this pull request Aug 19, 2022
… (#83704)

Now that pytorch/test-infra#529 exists, we can undo the custom sharding from #83354 for slow grad check

test plan: look at logs to see if it sharded + look at time to see that its evenly distributed
Pull Request resolved: #83704
Approved by: https://github.com/huydhn
pytorchmergebot pushed a commit that referenced this pull request Aug 19, 2022
… (#83704)

Now that pytorch/test-infra#529 exists, we can undo the custom sharding from #83354 for slow grad check

test plan: look at logs to see if it sharded + look at time to see that its evenly distributed
Pull Request resolved: #83704
Approved by: https://github.com/huydhn
@facebook-github-bot facebook-github-bot deleted the gh/soulitzer/123/head branch August 19, 2022 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR cla signed Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants