Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update test_cuda.py and test_torch.py optim tests to use OptimizerInfo and optim_db #123451

Open
3 of 7 tasks
janeyx99 opened this issue Apr 5, 2024 · 16 comments · May be fixed by #124563 or #125071
Open
3 of 7 tasks

Update test_cuda.py and test_torch.py optim tests to use OptimizerInfo and optim_db #123451

janeyx99 opened this issue Apr 5, 2024 · 16 comments · May be fixed by #124563 or #125071
Labels
actionable better-engineering Relatively self-contained tasks for better engineering contributors good first issue module: optimizer Related to torch.optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@janeyx99
Copy link
Contributor

janeyx99 commented Apr 5, 2024

🚀 The feature, motivation and pitch

Update: all tasks below have in progress PRs currently!

In test_cuda.py and test_torch.py, we have the following optimizer tests that should be migrated to use the new OptimizerInfo infrastructure. (See any of the tests in TestOptimRenewed for how to apply OptimizerInfos.) The basic structure to expect for an updated test will look like:

    @optims([optim for optim in optim_db if ...], dtypes=[torch.float32])
    def test_name(self, device, dtype, optim_info):
        optim_cls = optim_info.optim_cls
        optim_inputs = optim_info.optim_inputs_func(device=device)
        for optim_input in optim_inputs:
            params = ...
            // processing, like updating grads

            optimizer = optim_cls(params, ..., **optim_input.kwargs)

            // the actual test
            ...
            self.assertEquals(...)

test_cuda.py:

  • test_grad_scaling_autocast_fused_optimizers will be addressed in [optim]fix ut and sgd kernel #124904 and by @SandishKumarHN
  • test_graph_grad_scaling
  • test_graph_optims
  • test_graph_optims_with_explicitly_capturable_param_groups
  • test_graph_scaling_fused_optimizers

test_torch.py: #125538

  • tests that call _grad_scaling_autocast_test can be combined into one test
  • test_params_invalidated_with_grads_invalidated_between_unscale_and_step should be tested on more configurations (like SGD fused!)
  • there are many other tests that test just one optimizer config that we can leave for now.

Feel free to take up one or a subset of those tests to migrate--edit your username next to the checklist to claim an item to submit PRs for!

Alternatives

No response

Additional context

No response

cc @vincentqb @jbschlosser @albanD @crcrpar

@janeyx99 janeyx99 added module: optimizer Related to torch.optim good first issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module better-engineering Relatively self-contained tasks for better engineering contributors actionable labels Apr 5, 2024
@rithikp06
Copy link

I'd like to work on this

@janeyx99
Copy link
Contributor Author

janeyx99 commented Apr 5, 2024

@rithikp06 feel free to claim a sub task (e.g., which tests are you migrating?) and open a PR!

@rithikp06
Copy link

I'll work on the test_torch.py tests.

@ComposeC
Copy link

ComposeC commented Apr 7, 2024

@janeyx99 I'd like to work on the test_grad_scaling_autocast_fused_optimizers function in test_cuda.py.

@rithikp06
Copy link

I created a PR that addresses the test_torch.py part:
#123537

@FireACNC
Copy link
Contributor

FireACNC commented Apr 8, 2024

I'd like to work on the test_graph_grad_scaling function in test_cuda.py.

@jayanthd04
Copy link
Contributor

Hi, I'm a first time contributor. I would like to work on test_graph_optims and test_graph_scaling_fused_optimizers. Could i get some pointer on where I could find more information about OptimizerInfo infrastructure and TestOptimRenewed?

@dwang3851
Copy link

I'd like to work on the test_graph_optims_with_explicitly_capturable_param_groups function in test_cuda.py

@khlaifiabilel
Copy link

Greetings 👋 , This is very interesting
I would like to work on the test_graph_optims in the test_cuda.py

@janeyx99
Copy link
Contributor Author

Hi, I'm a first time contributor. I would like to work on test_graph_optims and test_graph_scaling_fused_optimizers. Could i get some pointer on where I could find more information about OptimizerInfo infrastructure and TestOptimRenewed?

TestOptimRenewed is a class full of tests that use the OptimizerInfo infrastructure: https://github.com/pytorch/pytorch/blob/main/test/test_optim.py#L44

The OptimizerInfo infrastructure is defined in common_optimizers.py, you would want to search for optims_db and OptimizerInfo for what they are. I think looking at an existing test (any of the ones in TestOptimRenewed) would give a good highlevel of how these are used.

@julian-8897
Copy link

HI there, I'm a first time contributor here :). I would like to work on the test for _grad_scaling_autocast_test under test_torch.py

@PetaBread545
Copy link

Hello, I am also a first time contributor. Seeing that #123537 already addresses the test_py subtasks, I would like to work on the test_cuda subtasks.

@dwang3851
Copy link

Hi, I'm a first time contributor working on this issue. Is there any way to approve Github workflows for me on the pr above, I was hoping to make sure those results matched my local results?

@omastey
Copy link

omastey commented Apr 22, 2024

Hi, I have begun working on this ticket, are there any specs or documentation I could look at?

pytorchmergebot pushed a commit that referenced this issue Apr 23, 2024
Support fused_sgd_kernel support for CPU.

## Bench result:
32 core/sockets ICX
Test Scripts:
https://gist.github.com/zhuhaozhe/688763e17e93e4c5e12f25f676ec90d9
https://gist.github.com/zhuhaozhe/ad9938694bc7fae8b66d376f4dffc6c9
```
Tensor Size: 262144, Num Tensor 4, Num Threads: 1
_single_tensor_sgd time: 0.2301 seconds
_fused_sgd time: 0.0925 seconds
Tensor Size: 4194304, Num Tensor 32, Num Threads: 32
_single_tensor_sgd time: 2.6195 seconds
_fused_sgd time: 1.7543 seconds
```
## Test Plan:
```
python test_optim.py -k test_fused_matches_forloop
python test_optim.py -k test_fused_large_tensor
python test_optim.py -k test_can_load_older_state_dict
python test_optim.py -k test_grad_scaling_autocast_fused_optimizers
python test_torch.py -k test_grad_scaling_autocast_fused
python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step
```
Looks like we already have some PRs under this issue #123451 to unified the UTs, I did not modified UT in this PR.

Co-authored-by: Jane Xu <janeyx@meta.com>
Pull Request resolved: #123629
Approved by: https://github.com/jgong5, https://github.com/janeyx99
@Rbhu376264
Copy link

Hello There,
Hope this message finds you well!!!
I am a newbie to open-source contribution, and would really like to get some advice how to get started in contributing "test_graph_scaling_fused_optimizers" . How can I get started on this? Any suggestions would be highly appreciated!!!

pytorchmergebot pushed a commit that referenced this issue Apr 25, 2024
…re (#123581)

This PR targets the issue mentioned in #123451 , and solves the specific task to update`test_graph_grad_scaling` in `test/test_cuda.py` to use the new OptimizerInfo infrastructure.

`test_graph_grad_scaling` is moved to a new `TestCase` class called `TestCudaOptims` in order to use `instantiate_device_type_tests`. The test content remained the same. `@onlyCUDA` is applied to the new test; the original use of the wrapper function is also changed to a `@parametrize` decorator for better style.

If we think that this migration is successful, we can delete the original test item under `TestCuda`. Currently it is left untouched to avoid any unexpected issues.

Local linter passed.
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Local tests passed.
```
> python .\test\test_cuda.py -k test_graph_grad_scaling
Ran 7 tests in 0.458s
OK (skipped = 3)
```
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: #123581
Approved by: https://github.com/janeyx99
alat-rights pushed a commit to alat-rights/pytorch that referenced this issue Apr 26, 2024
…re (pytorch#123581)

This PR targets the issue mentioned in pytorch#123451 , and solves the specific task to update`test_graph_grad_scaling` in `test/test_cuda.py` to use the new OptimizerInfo infrastructure.

`test_graph_grad_scaling` is moved to a new `TestCase` class called `TestCudaOptims` in order to use `instantiate_device_type_tests`. The test content remained the same. `@onlyCUDA` is applied to the new test; the original use of the wrapper function is also changed to a `@parametrize` decorator for better style.

If we think that this migration is successful, we can delete the original test item under `TestCuda`. Currently it is left untouched to avoid any unexpected issues.

Local linter passed.
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Local tests passed.
```
> python .\test\test_cuda.py -k test_graph_grad_scaling
Ran 7 tests in 0.458s
OK (skipped = 3)
```
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: pytorch#123581
Approved by: https://github.com/janeyx99
@omastey omastey linked a pull request Apr 26, 2024 that will close this issue
carmocca pushed a commit to carmocca/pytorch that referenced this issue Apr 29, 2024
Support fused_sgd_kernel support for CPU.

## Bench result:
32 core/sockets ICX
Test Scripts:
https://gist.github.com/zhuhaozhe/688763e17e93e4c5e12f25f676ec90d9
https://gist.github.com/zhuhaozhe/ad9938694bc7fae8b66d376f4dffc6c9
```
Tensor Size: 262144, Num Tensor 4, Num Threads: 1
_single_tensor_sgd time: 0.2301 seconds
_fused_sgd time: 0.0925 seconds
Tensor Size: 4194304, Num Tensor 32, Num Threads: 32
_single_tensor_sgd time: 2.6195 seconds
_fused_sgd time: 1.7543 seconds
```
## Test Plan:
```
python test_optim.py -k test_fused_matches_forloop
python test_optim.py -k test_fused_large_tensor
python test_optim.py -k test_can_load_older_state_dict
python test_optim.py -k test_grad_scaling_autocast_fused_optimizers
python test_torch.py -k test_grad_scaling_autocast_fused
python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step
```
Looks like we already have some PRs under this issue pytorch#123451 to unified the UTs, I did not modified UT in this PR.

Co-authored-by: Jane Xu <janeyx@meta.com>
Pull Request resolved: pytorch#123629
Approved by: https://github.com/jgong5, https://github.com/janeyx99
carmocca pushed a commit to carmocca/pytorch that referenced this issue Apr 29, 2024
…re (pytorch#123581)

This PR targets the issue mentioned in pytorch#123451 , and solves the specific task to update`test_graph_grad_scaling` in `test/test_cuda.py` to use the new OptimizerInfo infrastructure.

`test_graph_grad_scaling` is moved to a new `TestCase` class called `TestCudaOptims` in order to use `instantiate_device_type_tests`. The test content remained the same. `@onlyCUDA` is applied to the new test; the original use of the wrapper function is also changed to a `@parametrize` decorator for better style.

If we think that this migration is successful, we can delete the original test item under `TestCuda`. Currently it is left untouched to avoid any unexpected issues.

Local linter passed.
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Local tests passed.
```
> python .\test\test_cuda.py -k test_graph_grad_scaling
Ran 7 tests in 0.458s
OK (skipped = 3)
```
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: pytorch#123581
Approved by: https://github.com/janeyx99
andoorve pushed a commit to andoorve/pytorch that referenced this issue May 1, 2024
Support fused_sgd_kernel support for CPU.

## Bench result:
32 core/sockets ICX
Test Scripts:
https://gist.github.com/zhuhaozhe/688763e17e93e4c5e12f25f676ec90d9
https://gist.github.com/zhuhaozhe/ad9938694bc7fae8b66d376f4dffc6c9
```
Tensor Size: 262144, Num Tensor 4, Num Threads: 1
_single_tensor_sgd time: 0.2301 seconds
_fused_sgd time: 0.0925 seconds
Tensor Size: 4194304, Num Tensor 32, Num Threads: 32
_single_tensor_sgd time: 2.6195 seconds
_fused_sgd time: 1.7543 seconds
```
## Test Plan:
```
python test_optim.py -k test_fused_matches_forloop
python test_optim.py -k test_fused_large_tensor
python test_optim.py -k test_can_load_older_state_dict
python test_optim.py -k test_grad_scaling_autocast_fused_optimizers
python test_torch.py -k test_grad_scaling_autocast_fused
python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step
```
Looks like we already have some PRs under this issue pytorch#123451 to unified the UTs, I did not modified UT in this PR.

Co-authored-by: Jane Xu <janeyx@meta.com>
Pull Request resolved: pytorch#123629
Approved by: https://github.com/jgong5, https://github.com/janeyx99
andoorve pushed a commit to andoorve/pytorch that referenced this issue May 1, 2024
…re (pytorch#123581)

This PR targets the issue mentioned in pytorch#123451 , and solves the specific task to update`test_graph_grad_scaling` in `test/test_cuda.py` to use the new OptimizerInfo infrastructure.

`test_graph_grad_scaling` is moved to a new `TestCase` class called `TestCudaOptims` in order to use `instantiate_device_type_tests`. The test content remained the same. `@onlyCUDA` is applied to the new test; the original use of the wrapper function is also changed to a `@parametrize` decorator for better style.

If we think that this migration is successful, we can delete the original test item under `TestCuda`. Currently it is left untouched to avoid any unexpected issues.

Local linter passed.
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Local tests passed.
```
> python .\test\test_cuda.py -k test_graph_grad_scaling
Ran 7 tests in 0.458s
OK (skipped = 3)
```
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: pytorch#123581
Approved by: https://github.com/janeyx99
petrex pushed a commit to petrex/pytorch that referenced this issue May 3, 2024
Support fused_sgd_kernel support for CPU.

## Bench result:
32 core/sockets ICX
Test Scripts:
https://gist.github.com/zhuhaozhe/688763e17e93e4c5e12f25f676ec90d9
https://gist.github.com/zhuhaozhe/ad9938694bc7fae8b66d376f4dffc6c9
```
Tensor Size: 262144, Num Tensor 4, Num Threads: 1
_single_tensor_sgd time: 0.2301 seconds
_fused_sgd time: 0.0925 seconds
Tensor Size: 4194304, Num Tensor 32, Num Threads: 32
_single_tensor_sgd time: 2.6195 seconds
_fused_sgd time: 1.7543 seconds
```
## Test Plan:
```
python test_optim.py -k test_fused_matches_forloop
python test_optim.py -k test_fused_large_tensor
python test_optim.py -k test_can_load_older_state_dict
python test_optim.py -k test_grad_scaling_autocast_fused_optimizers
python test_torch.py -k test_grad_scaling_autocast_fused
python test_torch.py -k test_params_invalidated_with_grads_invalidated_between_unscale_and_step
```
Looks like we already have some PRs under this issue pytorch#123451 to unified the UTs, I did not modified UT in this PR.

Co-authored-by: Jane Xu <janeyx@meta.com>
Pull Request resolved: pytorch#123629
Approved by: https://github.com/jgong5, https://github.com/janeyx99
pytorch-bot bot pushed a commit that referenced this issue May 3, 2024
…re (#123581)

This PR targets the issue mentioned in #123451 , and solves the specific task to update`test_graph_grad_scaling` in `test/test_cuda.py` to use the new OptimizerInfo infrastructure.

`test_graph_grad_scaling` is moved to a new `TestCase` class called `TestCudaOptims` in order to use `instantiate_device_type_tests`. The test content remained the same. `@onlyCUDA` is applied to the new test; the original use of the wrapper function is also changed to a `@parametrize` decorator for better style.

If we think that this migration is successful, we can delete the original test item under `TestCuda`. Currently it is left untouched to avoid any unexpected issues.

Local linter passed.
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Local tests passed.
```
> python .\test\test_cuda.py -k test_graph_grad_scaling
Ran 7 tests in 0.458s
OK (skipped = 3)
```
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Pull Request resolved: #123581
Approved by: https://github.com/janeyx99
ZelboK pushed a commit to ZelboK/pytorch that referenced this issue May 19, 2024
…h#125538)

Fixes pytorch#123451 (only addresses test_torch.py cases)

This PR solves the specific task to update `test_grad_scaling_autocast` and `test_params_invalidated_with_grads_invalidated_between_unscale_and_step` in `test/test_torch.py` to use the new OptimizerInfo infrastructure.

I have combined tests that call `_grad_scaling_autocast_test` into one called `test_grad_scaling_autocast` and used `_get_optim_inputs_including_global_cliquey_kwargs` to avoid hard-coded configurations.

```
$ lintrunner test/test_cuda.py
ok No lint issues.
```

Pull Request resolved: pytorch#125538
Approved by: https://github.com/janeyx99
pytorchmergebot pushed a commit that referenced this issue May 20, 2024
… use new OptimizerInfo infrastructure (#125127)

This PR is meant to address issue #123451, more specifically, the ```test_graph_optims``` and ```test_graph_scaling_fused_optimizers``` functions in ```test_cuda.py``` have been updated so that they now use the new OptimizerInfo infrastructure.

Lintrunner passed:
```
$ lintrunner test/test_cuda.py
ok No lint issues.
```
Tests passed:
```
>python test_cuda.py -k test_graph_optims
Ran 19 tests in 7.463s

OK (skipped=9)

>python test_cuda.py -k test_graph_scaling_fused_optimizers
Ran 6 tests in 2.800s

OK (skipped=3)
```
Both the functions have been moved to the newly created TestCase class ```TestCudaOptims```. The test is mostly the same except the ```@optims``` decorator is used at the top of the function to implicitly call the function using each of the optimizers mentioned in the decorator instead of explicitly using a for loop to iterate through each of the optimizers.

I was unable to use the ```_get_optim_inputs_including_global_cliquey_kwargs``` to get all kwargs for each of the optimizers since some of the kwargs that are used in the original ```test_graph_optims``` function are not being returned by the new OptimizerInfo infrastructure, more specifically, for the ```torch.optim.rmsprop.RMSprop``` optimizer, the following kwargs are not returned whenever ```_get_optim_inputs_including_global_cliquey_kwargs``` is called:
```
{'foreach': False, 'maximize': True, 'weight_decay': 0}
{ 'foreach': True, 'maximize': True, 'weight_decay': 0}
```
I ran into the same issue for ```test_graph_scaling_fused_optimizers```, for the ```torch.optim.adamw.AdamW``` optimizer, whenever ```optim_info.optim_inputs_func(device=device)``` was called, the following kwarg was not returned:
```
{'amsgrad': True}
```

Due to this issue, I resorted to using a dictionary to store the kwargs for each of the optimizers, I am aware that this is less than ideal. I was wondering whether I should use the OptimizerInfo infrastructure to get all the kwargs regardless of the fact that it lacks some kwargs.

Pull Request resolved: #125127
Approved by: https://github.com/janeyx99
@janeyx99
Copy link
Contributor Author

not done yet

@janeyx99 janeyx99 reopened this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment