[WIP][FSDP] Option to keep gradients in lower precision #83310

rohan-varma · 2022-08-12T02:28:54Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

facebook-github-bot · 2022-08-12T02:29:01Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/83310
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours
↩️ [fb-only] Re-run with SSH instructions

❌ 2 New Failures

As of commit dd76c26 (more details on the Dr. CI page):

Expand to see more

2/2 failures introduced in this PR

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages

Lint / lintrunner (1/2)

Step: "Run lintrunner on all files" (full log | diagnosis details)

2022-08-12T02:35:16.2434097Z ##[error]Process completed with exit code 1.

2022-08-12T02:35:16.2398653Z         �[2m2976�[0m  |                        output.div_(self.gradient_postdivide_factor)
2022-08-12T02:35:16.2399254Z         �[2m2977�[0m  |
2022-08-12T02:35:16.2399640Z     >>> �[2m2978�[0m  |�[33m                    print(f"rv: casting")
2022-08-12T02:35:16.2400111Z �[0m        �[2m2979�[0m  |                    self._cast_grad_to_param_dtype(output, param)
2022-08-12T02:35:16.2400471Z         �[2m2980�[0m  |
2022-08-12T02:35:16.2400943Z         �[2m2981�[0m  |                    # To support gradient accumulation outside `no_sync()`, we save
2022-08-12T02:35:16.2403216Z 
2022-08-12T02:35:16.2403346Z 
2022-08-12T02:35:16.2403972Z �[1m�[36mYou can reproduce these results locally by using `lintrunner`.�[0m
2022-08-12T02:35:16.2404590Z �[1m�[36mSee https://github.com/pytorch/pytorch/wiki/lintrunner for setup instructions.�[0m
2022-08-12T02:35:16.2434097Z ##[error]Process completed with exit code 1.
2022-08-12T02:35:16.2470759Z ##[group]Run # Use jq to massage the JSON lint output into GitHub Actions workflow commands.
2022-08-12T02:35:16.2471159Z �[36;1m# Use jq to massage the JSON lint output into GitHub Actions workflow commands.�[0m
2022-08-12T02:35:16.2471444Z �[36;1mjq --raw-output \�[0m
2022-08-12T02:35:16.2471833Z �[36;1m  '"::\(if .severity == "advice" or .severity == "disabled" then "warning" else .severity end) file=\(.path),line=\(.line),col=\(.char),title=\(.code) \(.name)::" + (.description | gsub("\\n"; "%0A"))' \�[0m
2022-08-12T02:35:16.2472180Z �[36;1m  lint.json�[0m
2022-08-12T02:35:16.2516728Z shell: /usr/bin/bash -e {0}
2022-08-12T02:35:16.2516929Z env:
2022-08-12T02:35:16.2517156Z   pythonLocation: /opt/hostedtoolcache/Python/3.8.13/x64
2022-08-12T02:35:16.2517455Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.13/x64/lib
2022-08-12T02:35:16.2517691Z ##[endgroup]

pull / linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge) (2/2)

Step: "Test" (full log | diagnosis details)

2022-08-12T02:45:25.4357070Z AssertionError: Torch not compiled with CUDA enabled

2022-08-12T02:45:25.4353733Z   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 745, in cuda
2022-08-12T02:45:25.4354169Z     return self._apply(lambda t: t.cuda(device))
2022-08-12T02:45:25.4354544Z   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 634, in _apply
2022-08-12T02:45:25.4354803Z     module._apply(fn)
2022-08-12T02:45:25.4355196Z   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 657, in _apply
2022-08-12T02:45:25.4355457Z     param_applied = fn(param)
2022-08-12T02:45:25.4355808Z   File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 745, in <lambda>
2022-08-12T02:45:25.4356086Z     return self._apply(lambda t: t.cuda(device))
2022-08-12T02:45:25.4356444Z   File "/opt/conda/lib/python3.7/site-packages/torch/cuda/__init__.py", line 215, in _lazy_init
2022-08-12T02:45:25.4356810Z     raise AssertionError("Torch not compiled with CUDA enabled")
2022-08-12T02:45:25.4357070Z AssertionError: Torch not compiled with CUDA enabled
2022-08-12T02:45:25.4357224Z 
2022-08-12T02:45:25.4357229Z 
2022-08-12T02:45:25.4357233Z 
2022-08-12T02:45:25.4357442Z ----------------------------------------------------------------------
2022-08-12T02:45:25.4357686Z Ran 46 tests in 75.589s
2022-08-12T02:45:25.4357797Z 
2022-08-12T02:45:25.4357906Z FAILED (errors=1, skipped=42, expected failures=3)
2022-08-12T02:45:25.4358049Z 
2022-08-12T02:45:25.4358119Z Generating XML reports...
2022-08-12T02:45:25.4397933Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionSharded-20220812024409.xml

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 628cbf14b714e51dc1e47bf4b59dbd3f00a922f1 Pull Request resolved: #83310

rohan-varma · 2022-09-15T00:46:13Z

#85062

Enable low prec grads

dd76c26

[ghstack-poisoned]

rohan-varma requested review from mrshenli, pritamdamania87, zhaojuanmao, H-Huang, awgu and mingzhe09088 as code owners August 12, 2022 02:28

rohan-varma mentioned this pull request Aug 12, 2022

[FSDP] Pass kwargs to load_state_dict #83309

Closed

facebook-github-bot added the cla signed label Aug 12, 2022

rohan-varma added a commit that referenced this pull request Aug 12, 2022

Enable low prec grads

5f3f1cc

ghstack-source-id: 628cbf14b714e51dc1e47bf4b59dbd3f00a922f1 Pull Request resolved: #83310

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 12, 2022

rohan-varma changed the title ~~Enable low prec grads~~ [FSDP] Option to keep gradients in lower precision Aug 12, 2022

rohan-varma changed the title ~~[FSDP] Option to keep gradients in lower precision~~ [WIP][FSDP] Option to keep gradients in lower precision Aug 12, 2022

rohan-varma closed this Sep 15, 2022

facebook-github-bot deleted the gh/rohan-varma/582/head branch June 8, 2023 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][FSDP] Option to keep gradients in lower precision #83310

[WIP][FSDP] Option to keep gradients in lower precision #83310

rohan-varma commented Aug 12, 2022 •

edited

facebook-github-bot commented Aug 12, 2022 •

edited

🕵️ 2 new failures recognized by patterns

Lint / lintrunner (1/2)

pull / linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge) (2/2)

rohan-varma commented Sep 15, 2022

[WIP][FSDP] Option to keep gradients in lower precision #83310

[WIP][FSDP] Option to keep gradients in lower precision #83310

Conversation

rohan-varma commented Aug 12, 2022 • edited

facebook-github-bot commented Aug 12, 2022 • edited

🔗 Helpful links

❌ 2 New Failures

🕵️ 2 new failures recognized by patterns

Lint / lintrunner (1/2)

pull / linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge) (2/2)

rohan-varma commented Sep 15, 2022

rohan-varma commented Aug 12, 2022 •

edited

facebook-github-bot commented Aug 12, 2022 •

edited