Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable dynamo on some opt methods and differentiable optimizer tests #103066

Closed
wants to merge 5 commits into from

Conversation

mlazos
Copy link
Contributor

@mlazos mlazos commented Jun 6, 2023

  • Disables dynamo on the differentiable optimizer tests
  • Disables dynamo on some test methods which expose a very rare dynamo edge case
  • Disables dynamo on export/save optimizer state methods because it shouldn't trace those anyway.

I have a draft PR to fix the two tests marked skip due to unsupported mutation of step.

cc @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @ipiszy

@mlazos mlazos requested review from malfet and janeyx99 June 6, 2023 06:45
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103066

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 65938d3:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. what is the rare dynamo edge case? should an issue be filed or a big NOTES comment be written so people are aware why certain things are disabled?
  2. do we not trace state dict info for modules as well?

CI still seems red :(

test/optim/test_optim.py Outdated Show resolved Hide resolved
@janeyx99
Copy link
Contributor

janeyx99 commented Jun 6, 2023

Is this PR intended to fix the flakiness?

@mlazos
Copy link
Contributor Author

mlazos commented Jun 6, 2023

Is this PR intended to fix the flakiness?

Sorry told Nikita in chat

There are still two more issues (with Adadelta and RMSProp) that I'm working on a separate PR for.
I can add more comments.

Re the edge case, I can file an issue, I've never seen it repro in real workloads and haven't been able to create a minimal repro, I've only seen it while tracing test code so I don't think it's worth handling.

@mlazos
Copy link
Contributor Author

mlazos commented Jun 6, 2023

@janeyx99 this should be green, I added skips for adadelta and rms prop because I have a more involved fix that I'm cleaning up right now for those.

@mlazos mlazos requested a review from janeyx99 June 6, 2023 17:57
@janeyx99
Copy link
Contributor

janeyx99 commented Jun 6, 2023

Is this PR intended to fix the flakiness?

Sorry told Nikita in chat

Could the PR description link the PR that introduced the flakiness so we have a coherent story for later search? Even if it doesn't fix it all the way, it would be good to know the impact of this specific PR.

There are still two more issues (with Adadelta and RMSProp) that I'm working on a separate PR for. I can add more comments.

Yes pls.

Re the edge case, I can file an issue, I've never seen it repro in real workloads and haven't been able to create a minimal repro, I've only seen it while tracing test code so I don't think it's worth handling.

Then a NOTE comment would suffice in the code

Copy link
Contributor

@janeyx99 janeyx99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock but please do the due diligence of explaining (in the PR description/code) which of these will be fixed in the future vs never fixed.

@mlazos
Copy link
Contributor Author

mlazos commented Jun 6, 2023

Approving to unblock but please do the due diligence of explaining (in the PR description/code) which of these will be fixed in the future vs never fixed.

Yeah I requested since I added notes in the code, all good, ty.

@malfet
Copy link
Contributor

malfet commented Jun 6, 2023

Feel free to either rebase or ignore CUDA-12.1 failure (workflow was added after you've created the PR)

test/optim/test_optim.py Outdated Show resolved Hide resolved
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Comment with id 1579492388 not found

Details for Dev Infra team Raised by workflow job

@mlazos
Copy link
Contributor Author

mlazos commented Jun 6, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@mlazos
Copy link
Contributor Author

mlazos commented Jun 7, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants