-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Check dynamo graph-breaks in CI #96346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96346
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit dc4e562: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
.ci/pytorch/test.sh
Outdated
|
||
python benchmarks/dynamo/check_breaks.py --actual \ | ||
"$TEST_REPORTS_DIR/inference_$suite$shard_id.csv" \ | ||
--expected "benchmarks/dynamo/ci_expected_accuracy/inference_$suite$shard_id.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just put this in test_single_dynamo_benchmark? Also, a downside to having it be a separate script is if there is an unrelated failure, you don't get the graph break stats either. It is nice to report them all (I tell myself this is why we have a post-check script in the first place, so we can easily report all failed models and not just stop on the first one.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just put this in test_single_dynamo_benchmark
I can try that. I was trying to only run the checker in the non-perf jobs but it looks like i can still do that from in there.
a downside to having it be a separate script is if there is an unrelated failure, you don't get the graph break stats either
can you say more here? iiuc the job will still run all the models first (and that prints graph break stats model by model as it goes), then the csv is saved (so you can always access it) at that point, and finally the check runs, which is the last thing .... OH do you mean, run the 2 checks (train/infr) after running the sweeps for train/infr?
i'm also confused about if there is an unrelated failure
-- if the bench runner itself bails out, i'm not making things worse am i?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So our batch scripts run with -e, which means after the first one fails the second one never gets run. So the antipattern here is, eg rexnet flaky fails accuracy, we stop running, and you don't realize there are ALSO graph break mismatches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, the thing i was missing is the bench runner keeps running and reports fails at the end, so i could integrate the graph break checks with the runner too. makes sense, ill see if its hard
@@ -0,0 +1,43 @@ | |||
dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me that we actually want all these columns here? calls_captured in particular feels likely to wobble, in ways that we ought not to care about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought about this, and it seems easier to check in the exact csv that is easy to produce using our current flows.
and to your point about wobble, we just don't check the other fields- we check "graph_breaks"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should generate two csvs: one with detailed, and another that is ok to check in. It's suboptimal to put in extra stuff even if it is ignored because it means that you'll get lots of unrelated wobbling whenever you update the csvs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the cleanest would be
- the msg also tells you "run this command: pytorch/benchmarks/dynamo/update_expected.py --pr {PRNUM}"
- there is some clever mapping so the script can download the right .zips, and extract the csvs (2 csv per zip currently)
- it can also spit out newly stripped CSVs that only contain the cared-about columns
- and 'git add' the changes so you don't have to
But i have to figure out (2), not sure if it is trivial or we have to do something to make it possible. CC @ZainRizvi any help here? (best practices for (a) how to interact with artifact files from a script (b) any stable 'ENV' inside test jobs that i can use to build a URL to download artifacts etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For (2), you need an action after the test runs finish (see #95675 as an example). I am ok with doing this as a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re (2) I'd probably do something dumb like feed the script a HUD url, and then just parse out the hyperlinks to find the one I need lol
msg += textwrap.dedent( | ||
f""" | ||
If this change is expected, you can update `{args.expected}` to reflect the new baseline. | ||
This can either be done manually, or by downloading artifacts from your PR CI job." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the name of the artifact in HUD would be useful here
@@ -0,0 +1,43 @@ | |||
dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks | |||
cuda,AlbertForMaskedLM,1,pass,439,1,0,0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something I did not realize until seeing this, is that we get a csv per shard. This means that I have to click through and download eight times to get all of them. Maybe we should have a script that downloads all the csvs from a job...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, i am wanting to build more tooling for this too. i have some other ideas...
Thanks for doing this! |
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job ghstack-source-id: 2972319 Pull Request resolved: #96346 Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner.
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job ghstack-source-id: a334f1e Pull Request resolved: #96346 Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner.
if you are out of patience and just want to land something, holler and I'll probably approve you to move things along |
i actually want to land it together with the WIP (next PR, which adds a script for downloading all the artifacts for the job) next week. Anyway i'm on PTO tmrw so i won't rush this in. It's probably landable though, if you're wanting it urgently, i can update the csvs monday and push it. |
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]
ok @ezyang if you feel inclined (and this passes CI) you can stamp and mergebot, otherwise i can take care of it next week. I have gotten the updater tool (next PR) in an almost landable state so i am not too worried about the inconvenience of updating the CSVs in the interim. |
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]
By the way, for the differential between dynamic and non-dynamic, my preference is not to have a completely separate dynamic csv, and instead a hand-written list of "we have more graph breaks here". |
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]
what would the list contain, names of models and deltas compared to static? In any case I planned to land this as is and probably let you tackle the dynamic part. Or at least let me come back to it next week. |
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
For the most part, I'd expect most models to have the same number of graph breaks with/without dynamic, so it would be a list of one or two models that this is not true for. |
Merge failedReason: 1 jobs have failed, first few of them are: inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@wconstab I think there is a bug in the script as it's now failing on periodic https://hud.pytorch.org/pytorch/pytorch/commit/906a1952c676dcccc72684da8e385d98e4704f68. The error is:
This was missed because periodic wasn't included in this PR. It looks like a forward fix is needed? Or feel free to revert if you need time to look into this |
I will shoot for a forward fix. Ping me if i'm taking too long. |
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs Pull Request resolved: pytorch/pytorch#96346 Approved by: https://github.com/ezyang
- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs Pull Request resolved: pytorch/pytorch#96346 Approved by: https://github.com/ezyang
Stack from ghstack (oldest at bottom):
Refactor graph-break CI check
Take steps toward merging checker with existing check flow,
consider merging it all the way inside the bench runner.
csvs
cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire