Implement _compute_intra_grad_corr_mean for gradient computation #1095

cyugao · 2022-12-02T22:49:48Z

What does this PR do?

Implement _compute_intra_grad_corr_mean and tests
This utility function can be helpful for analyzing learning behavior.

Before submitting

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Add ``is_scaled_loss`` flag to support both scaled / unscaled loss Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate``

- Revert the changes in `step` method - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly - Add gradient tests

Use ubuntu-20.04 to fix the `arch x64 not found` issue [Version 3.10 with arch x64 not found actions/setup-python#401](actions/setup-python#401)

Flake8 was moved to Github See discussions https://www.reddit.com/r/Python/comments/yvfww8/flake8_took_down_the_gitlab_repository_in_favor/

.circleci/config.yml

fairscale/optim/adascale.py

min-xu-ai · 2022-12-03T18:19:13Z

fairscale/optim/adascale.py

@@ -449,6 +487,9 @@ def _final_callback(self) -> None:
            return

        # Since self._local_grad_sqr is FP32, sum shouldn't overflow.
+
+        # TODO: Hongbo says param.grad might be FP16 should do this before converting to FP32.


this is a bit too sparse. Maybe you can expand this and provide more context?

requirements-dev.txt

min-xu-ai

Thanks for working on this! I left some comments. If you need help on the CI errors, let me know. The code is in general very clean.

cyugao and others added 6 commits October 4, 2022 14:24

Fix gradient accumulation

d5844b4

Add ``is_scaled_loss`` flag to support both scaled / unscaled loss Fix ``test_grad_accum`` and``test_set_num_gradients_to_accumulate``

Add a method to scale grad for grad_accum using unscaled loss

816e128

- Revert the changes in `step` method - Add a method `scale_grad_by_num_grads_to_accum`to handle gradient accumulation using unscaled loss more explicitly - Add gradient tests

Merge branch 'facebookresearch:main' into main

46ddd37

Merge branch 'facebookresearch:main' into main

b559272

Implement _compute_corr_mean_between_grads

7e3c924

Improve tests and comments

ae1d89d

cyugao requested a review from min-xu-ai December 2, 2022 22:49

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 2, 2022

cyugao and others added 4 commits December 2, 2022 15:01

Use ubuntu-20.04 instead of latest

678d498

Use ubuntu-20.04 to fix the `arch x64 not found` issue [Version 3.10 with arch x64 not found actions/setup-python#401](actions/setup-python#401)

Switch flake8 from gitlab to github

69640d0

Flake8 was moved to Github See discussions https://www.reddit.com/r/Python/comments/yvfww8/flake8_took_down_the_gitlab_repository_in_favor/

Fix scikit-learn package

17e3615

Update PyTorch versions

987f12b

cyugao force-pushed the main branch from 97942af to 987f12b Compare December 3, 2022 01:18