Skip to content

Conversation

mlazos
Copy link
Contributor

@mlazos mlazos commented Feb 1, 2024

@mlazos mlazos requested a review from janeyx99 February 1, 2024 22:05
Copy link

pytorch-bot bot commented Feb 1, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/118919

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b5e4cf5 with merge base e9b78f2 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting to do this; we should be more formal with our approach to include:

  • all configs of each optimizer
  • the unsupported forloop optimizer configs
  • more clarity/maybe an invariant where "if the compiler can handle it, we test it and check the number of kernels, otherwise we just make sure it doesn't error". Today, this invariant isn't obvious so coverage isn't certain.

I left some chill comments on the test cases but I'm looking for a more overarching solution that would make the third point clear. One way I can see is having the current make_test pipe out configurations to "supported" and "unsupported" and then having the tests be made accordingly.

@janeyx99
Copy link
Contributor

janeyx99 commented Feb 5, 2024

Leaving first round of review as there's merge conflict + as mentioned in review, it'd be good to use _get_optim_inputs_including_global_cliquey_kwargs over the optim_inputs_func to avoid needing to handle flagging of foreach + to get all the configs at once.

@mlazos mlazos requested a review from janeyx99 February 8, 2024 05:01
@mlazos mlazos requested a review from janeyx99 February 8, 2024 22:54
Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two highlevel questions:

I am not sure I like the story as much when there are caveats for the disabled configs being grouped as one. If we just lump them all into one, how do we differentiate disabled fused/forloop vs enabled foreach?

This leads to a more general question I’ve now been thinking about. Right now there are two tests, one that checks correctness and another that checks for correctness and kernel count. Theoretically, I’m imagining these can be the same test, which we can then pipe all the configs through. Maybe the default is to not check KERNEL COUNT unless it’s flagged in some way. I’m thinking this would make the test easier to maintain and allow it to be applied more generally across more configs—why do we not do this in the PR today?

@mlazos
Copy link
Contributor Author

mlazos commented Feb 9, 2024

Two highlevel questions:

I am not sure I like the story as much when there are caveats for the disabled configs being grouped as one. If we just lump them all into one, how do we differentiate disabled fused/forloop vs enabled foreach?

This leads to a more general question I’ve now been thinking about. Right now there are two tests, one that checks correctness and another that checks for correctness and kernel count. Theoretically, I’m imagining these can be the same test, which we can then pipe all the configs through. Maybe the default is to not check KERNEL COUNT unless it’s flagged in some way. I’m thinking this would make the test easier to maintain and allow it to be applied more generally across more configs—why do we not do this in the PR today?

The categories are basically optimizers that are fully disabled, and optimizers where some configs are disabled (ie forloop disabled, etc.). If an optimizer is fully disabled, it does not have an entry in the KernelCounts dict and a disabled test is created for it, which checks that the kernel count is zero, correctness matches eager and nothing else. If only a certain config is disabled we have carveouts in the non-disabled test. I could possibly reconcile these, but I don't really think it's necessary because in the end I do want to treat disabled tests differently because SparseAdam and LBFGS have their own initializations. I could abstract out the initializations, but it will basically save replicating the model code, which I don't find worth it, esp since it's pretty basic. Wdyt?

@janeyx99
Copy link
Contributor

janeyx99 commented Feb 9, 2024

The categories are basically optimizers that are fully disabled, and optimizers where some configs are disabled (ie forloop disabled, etc.). If an optimizer is fully disabled, it does not have an entry in the KernelCounts dict and a disabled test is created for it, which checks that the kernel count is zero, correctness matches eager and nothing else. If only a certain config is disabled we have carveouts in the non-disabled test.

Does this mean that some disabled configs are tested as a part of the non-disabled test? If so, that is conceptually confusing and is not an intuitive divide of tests.

I could possibly reconcile these, but I don't really think it's necessary because in the end I do want to treat disabled tests differently because SparseAdam and LBFGS have their own initializations. I could abstract out the initializations, but it will basically save replicating the model code, which I don't find worth it, esp since it's pretty basic. Wdyt?

The consolidation of code is just one pro. The stronger pro (and what I find necessary) is to be to able to have a test that just verifies "torch.compile on optimizers matches eager" whether or not it's enabled. This test would stand for our story that torch.compile does not regress eager in terms of correctness + is the crucial thing I'm looking for. In the current scheme, it's not easy to tell that this is true for all optimizers.

At this point, I prefer having one solid test for "torch.compile on optimizers matches eager", leaving the kernel count ones alone. We should just go with using the optimizer db and TestParametrizer (see TestOptimRenewed for an example), since this is the official recommended way to do testing across Optimizer configs with the minor con that it's not per-sample granularity. The test would look like the below, and you can see https://github.com/pytorch/pytorch/pull/119288/files as an example with adding a new test + necessary skips:

@optims(optim_db, dtypes=[torch.float32])
def compiled_optim_matches_eager_correctness(self, device, dtype, optim_info):
    optim_cls = optim_info.optim_cls

    # Skip differentiable testing for now, see <insert tracking issue>
    all_optim_inputs = _get_optim_inputs_including_global_cliquey_kwargs(device, dtype, optim_info, skip("differentiable",))
    for optim_input in all_optim_inputs:
        kwargs = optim_input.kwargs
        torch._dynamo.reset()
        torch._inductor.metrics.reset()
        input = torch.ones([10, 10], device=device)
        model_eager = torch.nn.Sequential(
            *[torch.nn.Linear(10, 10, device=device) for _ in range(2)]
        )
        model_eager(input).sum().backward()
        model_compiled = deepcopy(model_eager)
        model_compiled(input).sum().backward()

        if optim_cls is SparseAdam:
            for param in model_eager.parameters():
                param.grad = param.grad.to_sparse()
            for param in model_compiled.parameters():
                param.grad = param.grad.to_sparse()

        opt_compiled = optim_cls(model_compiled.parameters(), **kwargs)
        opt_eager = optim_cls(model_eager.parameters(), **kwargs)

        if optim_cls is LBFGS:

            @torch.compile()
            def fn():
                def closure():
                    loss = model_compiled(input).sum()
                    loss.backward()
                    return loss

                opt_compiled.step(closure)

            def closure_eager():
                loss = model_eager(input).sum()
                loss.backward()
                return loss

            opt_eager.step(closure_eager)
            opt_eager.step(closure_eager)
        else:

            @torch.compile()
            def fn():
                opt_compiled.step()

            opt_eager.step()
            opt_eager.step()

        fn()
        fn()

        self.assertEqual(
            list(model_eager.parameters()),
            list(model_compiled.parameters()),
        )

        for p_eager, p_compiled in zip(
            model_eager.parameters(), model_compiled.parameters()
        ):
            self.assertEqual(
                opt_eager.state[p_eager],
                opt_compiled.state[p_compiled],
            )

        self.assertEqual(torch._inductor.metrics.generated_kernel_count, 0)

@mlazos mlazos added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 12, 2024
@mlazos mlazos requested a review from janeyx99 February 12, 2024 10:19
@mlazos
Copy link
Contributor Author

mlazos commented Feb 13, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@mlazos
Copy link
Contributor Author

mlazos commented Feb 13, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@mlazos
Copy link
Contributor Author

mlazos commented Feb 14, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

optim_inputs_func=optim_inputs_func_adadelta,
optim_error_inputs_func=optim_error_inputs_func_adadelta,
supported_impls=("foreach", "differentiable"),
decorators=(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely don't need all these!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I do, I only want to require cuda on the tests that use cuda ..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should already be true....do you have an HUD link on what is failing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to apply a skip on the instantiated cuda class def itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm going to submit a PR to use skipCUDAIf, (Jane and I discussed offline)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants