Skip to content

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Apr 5, 2021

Stack from ghstack:

Update _powerSGD_comm_hook_wrapper to only expose 2 most critical hyperparameters, to make this API more clear to any future user (although the second hyperparameter start_powerSGD_iter is not in use yet).

Facebook: PyTorch STL/Lightning team once tried to use this API.

Differential Revision: D27561734

…ose 2 most critical hyperparameters

Update `_powerSGD_comm_hook_wrapper` to only expose 2 most critical hyperparameters, to make this API more clear to any future user (although the second hyperparameter `start_powerSGD_iter` is not in use yet).

Differential Revision: [D27561734](https://our.internmc.facebook.com/intern/diff/D27561734/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 5, 2021

💊 CI failures summary and remediations

As of commit 851661a (more details on the Dr. CI page):


  • 5/9 failures possibly* introduced in this PR
    • 1/5 non-scanned failure(s)
  • 4/9 broken upstream at merge base 6831f13 since Apr 03

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Apr 05 05:35:29 AssertionError: False is not true
Apr 05 05:35:29 ----------------------------------------------------------------------
Apr 05 05:35:29 Traceback (most recent call last):
Apr 05 05:35:29   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 292, in instantiated_test
Apr 05 05:35:29     result = test_fn(self, *args)
Apr 05 05:35:29   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 266, in test_wrapper
Apr 05 05:35:29     return test(*args, **kwargs)
Apr 05 05:35:29   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 726, in only_fn
Apr 05 05:35:29     return fn(self, device, *args, **kwargs)
Apr 05 05:35:29   File "test_ops.py", line 59, in test_supported_dtypes
Apr 05 05:35:29     self.assertTrue(sample_input.dtype == dtype)
Apr 05 05:35:29 AssertionError: False is not true
Apr 05 05:35:29 
Apr 05 05:35:29 ----------------------------------------------------------------------
Apr 05 05:35:29 Ran 4170 tests in 882.701s
Apr 05 05:35:29 
Apr 05 05:35:29 FAILED (failures=1, skipped=499)
Apr 05 05:35:29 
Apr 05 05:35:29 Generating XML reports...
Apr 05 05:35:29 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestCommonCPU-20210405052046.xml
Apr 05 05:35:29 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestGradientsCPU-20210405052046.xml
Apr 05 05:35:30 Generated XML report: test-reports/dist-gloo/test_ops/TEST-TestOpInfoCPU-20210405052046.xml

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (2/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 05 06:17:16 AssertionError: False is not true
Apr 05 06:17:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 979, in wrapper
Apr 05 06:17:16     method(*args, **kwargs)
Apr 05 06:17:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 292, in instantiated_test
Apr 05 06:17:16     result = test_fn(self, *args)
Apr 05 06:17:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 266, in test_wrapper
Apr 05 06:17:16     return test(*args, **kwargs)
Apr 05 06:17:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 726, in only_fn
Apr 05 06:17:16     return fn(self, device, *args, **kwargs)
Apr 05 06:17:16   File "test_ops.py", line 59, in test_supported_dtypes
Apr 05 06:17:16     self.assertTrue(sample_input.dtype == dtype)
Apr 05 06:17:16 AssertionError: False is not true
Apr 05 06:17:16 
Apr 05 06:17:17 ----------------------------------------------------------------------
Apr 05 06:17:17 Ran 8364 tests in 2511.567s
Apr 05 06:17:17 
Apr 05 06:17:17 FAILED (failures=2, skipped=1161)
Apr 05 06:17:17 
Apr 05 06:17:17 Generating XML reports...
Apr 05 06:17:17 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestCommonCPU-20210405053524.xml
Apr 05 06:17:17 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestCommonCUDA-20210405053524.xml
Apr 05 06:17:17 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestGradientsCPU-20210405053524.xml

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (3/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 05 06:16:08 AssertionError: False is not true
Apr 05 06:16:08 ----------------------------------------------------------------------
Apr 05 06:16:08 Traceback (most recent call last):
Apr 05 06:16:08   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_device_type.py", line 292, in instantiated_test
Apr 05 06:16:08     result = test_fn(self, *args)
Apr 05 06:16:08   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_device_type.py", line 266, in test_wrapper
Apr 05 06:16:08     return test(*args, **kwargs)
Apr 05 06:16:08   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_device_type.py", line 726, in only_fn
Apr 05 06:16:08     return fn(self, device, *args, **kwargs)
Apr 05 06:16:08   File "test_ops.py", line 59, in test_supported_dtypes
Apr 05 06:16:08     self.assertTrue(sample_input.dtype == dtype)
Apr 05 06:16:08 AssertionError: False is not true
Apr 05 06:16:08 
Apr 05 06:16:09 ----------------------------------------------------------------------
Apr 05 06:16:09 Ran 4170 tests in 1494.715s
Apr 05 06:16:09 
Apr 05 06:16:09 FAILED (failures=1, skipped=499)
Apr 05 06:16:09 
Apr 05 06:16:09 Generating XML reports...
Apr 05 06:16:09 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestCommonCPU-20210405055113.xml
Apr 05 06:16:09 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestGradientsCPU-20210405055113.xml
Apr 05 06:16:09 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestOpInfoCPU-20210405055113.xml

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test1 (4/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 05 06:19:32 AssertionError: False is not true
Apr 05 06:19:32 ----------------------------------------------------------------------
Apr 05 06:19:32 Traceback (most recent call last):
Apr 05 06:19:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 292, in instantiated_test
Apr 05 06:19:32     result = test_fn(self, *args)
Apr 05 06:19:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 266, in test_wrapper
Apr 05 06:19:32     return test(*args, **kwargs)
Apr 05 06:19:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 726, in only_fn
Apr 05 06:19:32     return fn(self, device, *args, **kwargs)
Apr 05 06:19:32   File "test_ops.py", line 59, in test_supported_dtypes
Apr 05 06:19:32     self.assertTrue(sample_input.dtype == dtype)
Apr 05 06:19:32 AssertionError: False is not true
Apr 05 06:19:32 
Apr 05 06:19:33 ----------------------------------------------------------------------
Apr 05 06:19:33 Ran 4170 tests in 3030.159s
Apr 05 06:19:33 
Apr 05 06:19:33 FAILED (failures=1, skipped=499)
Apr 05 06:19:33 
Apr 05 06:19:33 Generating XML reports...
Apr 05 06:19:33 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestCommonCPU-20210405052902.xml
Apr 05 06:19:33 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestGradientsCPU-20210405052902.xml
Apr 05 06:19:33 Generated XML report: test-reports/python-unittest/test_ops/TEST-TestOpInfoCPU-20210405052902.xml

🚧 4 ongoing upstream failures:

These were probably caused by upstream breakages that are not fixed yet.


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@facebook-github-bot facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Apr 5, 2021
wayi1 pushed a commit that referenced this pull request Apr 5, 2021
…ose 2 most critical hyperparameters

Update `_powerSGD_comm_hook_wrapper` to only expose 2 most critical hyperparameters, to make this API more clear to any future user (although the second hyperparameter `start_powerSGD_iter` is not in use yet).

Differential Revision: [D27561734](https://our.internmc.facebook.com/intern/diff/D27561734/)

ghstack-source-id: 125707743
Pull Request resolved: #55295
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1b4bb36.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/102/head branch April 9, 2021 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants