Add tests to verify disabled optimizers #118919

mlazos · 2024-02-01T22:05:23Z

As title

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

pytorch-bot · 2024-02-01T22:05:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118919

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b5e4cf5 with merge base e9b78f2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

janeyx99

Thanks for starting to do this; we should be more formal with our approach to include:

all configs of each optimizer
the unsupported forloop optimizer configs
more clarity/maybe an invariant where "if the compiler can handle it, we test it and check the number of kernels, otherwise we just make sure it doesn't error". Today, this invariant isn't obvious so coverage isn't certain.

I left some chill comments on the test cases but I'm looking for a more overarching solution that would make the third point clear. One way I can see is having the current make_test pipe out configurations to "supported" and "unsupported" and then having the tests be made accordingly.

test/inductor/test_compiled_optimizers.py

janeyx99 · 2024-02-05T19:48:40Z

Leaving first round of review as there's merge conflict + as mentioned in review, it'd be good to use _get_optim_inputs_including_global_cliquey_kwargs over the optim_inputs_func to avoid needing to handle flagging of foreach + to get all the configs at once.

test/inductor/test_compiled_optimizers.py

janeyx99

Two highlevel questions:

I am not sure I like the story as much when there are caveats for the disabled configs being grouped as one. If we just lump them all into one, how do we differentiate disabled fused/forloop vs enabled foreach?

This leads to a more general question I’ve now been thinking about. Right now there are two tests, one that checks correctness and another that checks for correctness and kernel count. Theoretically, I’m imagining these can be the same test, which we can then pipe all the configs through. Maybe the default is to not check KERNEL COUNT unless it’s flagged in some way. I’m thinking this would make the test easier to maintain and allow it to be applied more generally across more configs—why do we not do this in the PR today?

mlazos · 2024-02-09T06:50:47Z

Two highlevel questions:

I am not sure I like the story as much when there are caveats for the disabled configs being grouped as one. If we just lump them all into one, how do we differentiate disabled fused/forloop vs enabled foreach?

This leads to a more general question I’ve now been thinking about. Right now there are two tests, one that checks correctness and another that checks for correctness and kernel count. Theoretically, I’m imagining these can be the same test, which we can then pipe all the configs through. Maybe the default is to not check KERNEL COUNT unless it’s flagged in some way. I’m thinking this would make the test easier to maintain and allow it to be applied more generally across more configs—why do we not do this in the PR today?

The categories are basically optimizers that are fully disabled, and optimizers where some configs are disabled (ie forloop disabled, etc.). If an optimizer is fully disabled, it does not have an entry in the KernelCounts dict and a disabled test is created for it, which checks that the kernel count is zero, correctness matches eager and nothing else. If only a certain config is disabled we have carveouts in the non-disabled test. I could possibly reconcile these, but I don't really think it's necessary because in the end I do want to treat disabled tests differently because SparseAdam and LBFGS have their own initializations. I could abstract out the initializations, but it will basically save replicating the model code, which I don't find worth it, esp since it's pretty basic. Wdyt?

janeyx99 · 2024-02-09T16:28:19Z

The categories are basically optimizers that are fully disabled, and optimizers where some configs are disabled (ie forloop disabled, etc.). If an optimizer is fully disabled, it does not have an entry in the KernelCounts dict and a disabled test is created for it, which checks that the kernel count is zero, correctness matches eager and nothing else. If only a certain config is disabled we have carveouts in the non-disabled test.

Does this mean that some disabled configs are tested as a part of the non-disabled test? If so, that is conceptually confusing and is not an intuitive divide of tests.

I could possibly reconcile these, but I don't really think it's necessary because in the end I do want to treat disabled tests differently because SparseAdam and LBFGS have their own initializations. I could abstract out the initializations, but it will basically save replicating the model code, which I don't find worth it, esp since it's pretty basic. Wdyt?

The consolidation of code is just one pro. The stronger pro (and what I find necessary) is to be to able to have a test that just verifies "torch.compile on optimizers matches eager" whether or not it's enabled. This test would stand for our story that torch.compile does not regress eager in terms of correctness + is the crucial thing I'm looking for. In the current scheme, it's not easy to tell that this is true for all optimizers.

At this point, I prefer having one solid test for "torch.compile on optimizers matches eager", leaving the kernel count ones alone. We should just go with using the optimizer db and TestParametrizer (see TestOptimRenewed for an example), since this is the official recommended way to do testing across Optimizer configs with the minor con that it's not per-sample granularity. The test would look like the below, and you can see https://github.com/pytorch/pytorch/pull/119288/files as an example with adding a new test + necessary skips:

@optims(optim_db, dtypes=[torch.float32])
def compiled_optim_matches_eager_correctness(self, device, dtype, optim_info):
    optim_cls = optim_info.optim_cls

    # Skip differentiable testing for now, see <insert tracking issue>
    all_optim_inputs = _get_optim_inputs_including_global_cliquey_kwargs(device, dtype, optim_info, skip("differentiable",))
    for optim_input in all_optim_inputs:
        kwargs = optim_input.kwargs
        torch._dynamo.reset()
        torch._inductor.metrics.reset()
        input = torch.ones([10, 10], device=device)
        model_eager = torch.nn.Sequential(
            *[torch.nn.Linear(10, 10, device=device) for _ in range(2)]
        )
        model_eager(input).sum().backward()
        model_compiled = deepcopy(model_eager)
        model_compiled(input).sum().backward()

        if optim_cls is SparseAdam:
            for param in model_eager.parameters():
                param.grad = param.grad.to_sparse()
            for param in model_compiled.parameters():
                param.grad = param.grad.to_sparse()

        opt_compiled = optim_cls(model_compiled.parameters(), **kwargs)
        opt_eager = optim_cls(model_eager.parameters(), **kwargs)

        if optim_cls is LBFGS:

            @torch.compile()
            def fn():
                def closure():
                    loss = model_compiled(input).sum()
                    loss.backward()
                    return loss

                opt_compiled.step(closure)

            def closure_eager():
                loss = model_eager(input).sum()
                loss.backward()
                return loss

            opt_eager.step(closure_eager)
            opt_eager.step(closure_eager)
        else:

            @torch.compile()
            def fn():
                opt_compiled.step()

            opt_eager.step()
            opt_eager.step()

        fn()
        fn()

        self.assertEqual(
            list(model_eager.parameters()),
            list(model_compiled.parameters()),
        )

        for p_eager, p_compiled in zip(
            model_eager.parameters(), model_compiled.parameters()
        ):
            self.assertEqual(
                opt_eager.state[p_eager],
                opt_compiled.state[p_compiled],
            )

        self.assertEqual(torch._inductor.metrics.generated_kernel_count, 0)

test/inductor/test_compiled_optimizers.py

torch/testing/_internal/common_optimizers.py

test/inductor/test_compiled_optimizers.py

mlazos · 2024-02-13T22:48:51Z

@pytorchbot merge

pytorchmergebot · 2024-02-13T22:50:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-02-13T22:51:09Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

mlazos · 2024-02-13T22:56:02Z

@pytorchbot merge

pytorchmergebot · 2024-02-13T22:58:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-02-13T23:47:17Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5, linux.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

mlazos · 2024-02-14T07:41:50Z

@pytorchbot merge

pytorchmergebot · 2024-02-14T07:44:14Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

janeyx99 · 2024-02-14T18:46:21Z

torch/testing/_internal/common_optimizers.py

        optim_inputs_func=optim_inputs_func_adadelta,
        optim_error_inputs_func=optim_error_inputs_func_adadelta,
        supported_impls=("foreach", "differentiable"),
+        decorators=(


You definitely don't need all these!

I think I do, I only want to require cuda on the tests that use cuda ..

That should already be true....do you have an HUD link on what is failing?

You should be able to apply a skip on the instantiated cuda class def itself

Yeah I'm going to submit a PR to use skipCUDAIf, (Jane and I discussed offline)

test/inductor/test_compiled_optimizers.py

Add tests to verify disabled optimizers

8c14d4b

mlazos requested a review from janeyx99 February 1, 2024 22:05

pytorch-bot bot added the topic: not user facing topic category label Feb 1, 2024

github-actions bot added the module: inductor label Feb 1, 2024

janeyx99 reviewed Feb 1, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

mlazos added 2 commits February 2, 2024 01:31

Add disabled invriant tests

fbd4fa4

Fix kwargs

def9aca

janeyx99 reviewed Feb 5, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

Merge with main + fixes

107b731

mlazos requested a review from janeyx99 February 8, 2024 05:01

janeyx99 reviewed Feb 8, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

janeyx99 reviewed Feb 8, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

janeyx99 reviewed Feb 8, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

janeyx99 reviewed Feb 8, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

janeyx99 reviewed Feb 8, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Show resolved Hide resolved

PR comments

673c16b

mlazos requested a review from janeyx99 February 8, 2024 22:54

janeyx99 reviewed Feb 9, 2024

View reviewed changes

mlazos added 2 commits February 12, 2024 10:09

Add correctness test for all optimizers

3211cea

Merge branch 'main' into mlazos/disabled-test

68958a0

mlazos added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 12, 2024

Remove print

cf0d915

mlazos requested a review from janeyx99 February 12, 2024 10:19

janeyx99 reviewed Feb 12, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

PR comments

bff5b6f

janeyx99 reviewed Feb 13, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Outdated Show resolved Hide resolved

janeyx99 reviewed Feb 13, 2024

View reviewed changes

torch/testing/_internal/common_optimizers.py Show resolved Hide resolved

mlazos added 2 commits February 13, 2024 18:28

Add comment

bda9800

PR comments

3832f9d

janeyx99 reviewed Feb 13, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Show resolved Hide resolved

janeyx99 approved these changes Feb 13, 2024

View reviewed changes

pytorchmergebot added the merging label Feb 13, 2024

pytorchmergebot removed the merging label Feb 13, 2024

Merge branch 'main' into mlazos/disabled-test

7cb49ef

pytorchmergebot added the merging label Feb 13, 2024

pytorchmergebot removed the merging label Feb 13, 2024

mlazos added 2 commits February 14, 2024 02:04

Require cuda on tests

3651a17

Add requires cuda decorator to optimizer configs

b5e4cf5

pytorchmergebot added the merging label Feb 14, 2024

pytorchmergebot closed this in 9f44274 Feb 14, 2024

pytorchmergebot added Merged and removed merging labels Feb 14, 2024

janeyx99 reviewed Feb 14, 2024

View reviewed changes

test/inductor/test_compiled_optimizers.py Show resolved Hide resolved

mlazos mentioned this pull request Feb 21, 2024

Make sure RAdam + Adamax are well tested in dynamo/test_optimizers.py #118885

Closed

github-actions bot deleted the mlazos/disabled-test branch March 16, 2024 01:51

Add tests to verify disabled optimizers #118919

Add tests to verify disabled optimizers #118919

Uh oh!

Conversation

mlazos commented Feb 1, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118919

✅ No Failures

Uh oh!

janeyx99 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janeyx99 commented Feb 5, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

mlazos commented Feb 9, 2024

Uh oh!

janeyx99 commented Feb 9, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mlazos commented Feb 13, 2024

Uh oh!

pytorchmergebot commented Feb 13, 2024

Merge started

Uh oh!

pytorchmergebot commented Feb 13, 2024

Merge failed

Uh oh!

mlazos commented Feb 13, 2024

Uh oh!

pytorchmergebot commented Feb 13, 2024

Merge started

Uh oh!

pytorchmergebot commented Feb 13, 2024

Merge failed

Uh oh!

mlazos commented Feb 14, 2024

Uh oh!

pytorchmergebot commented Feb 14, 2024

Merge started

Uh oh!

janeyx99 Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

mlazos Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

janeyx99 Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Feb 14, 2024

Choose a reason for hiding this comment

Uh oh!

mlazos Feb 14, 2024

Choose a reason for hiding this comment

mlazos commented Feb 1, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Feb 1, 2024 •

edited

Loading

janeyx99 left a comment •

edited

Loading