Check dynamo graph-breaks in CI #96346

wconstab · 2023-03-08T21:47:04Z

Stack from ghstack (oldest at bottom):

add graph-breaks baselines
add check_graph_breaks script (message users on regress or improvement)
hook up test.sh for existing accuracy job

Refactor graph-break CI check

Take steps toward merging checker with existing check flow,
consider merging it all the way inside the bench runner.

csvs

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @desertfire

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job [ghstack-poisoned]

pytorch-bot · 2023-03-08T21:47:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96346

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit dc4e562:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ezyang · 2023-03-08T22:22:17Z

.ci/pytorch/test.sh

-
+    python benchmarks/dynamo/check_breaks.py --actual \
+      "$TEST_REPORTS_DIR/inference_$suite$shard_id.csv" \
+      --expected "benchmarks/dynamo/ci_expected_accuracy/inference_$suite$shard_id.csv"


Why not just put this in test_single_dynamo_benchmark? Also, a downside to having it be a separate script is if there is an unrelated failure, you don't get the graph break stats either. It is nice to report them all (I tell myself this is why we have a post-check script in the first place, so we can easily report all failed models and not just stop on the first one.)

Why not just put this in test_single_dynamo_benchmark

I can try that. I was trying to only run the checker in the non-perf jobs but it looks like i can still do that from in there.

a downside to having it be a separate script is if there is an unrelated failure, you don't get the graph break stats either

can you say more here? iiuc the job will still run all the models first (and that prints graph break stats model by model as it goes), then the csv is saved (so you can always access it) at that point, and finally the check runs, which is the last thing .... OH do you mean, run the 2 checks (train/infr) after running the sweeps for train/infr?

i'm also confused about if there is an unrelated failure -- if the bench runner itself bails out, i'm not making things worse am i?

So our batch scripts run with -e, which means after the first one fails the second one never gets run. So the antipattern here is, eg rexnet flaky fails accuracy, we stop running, and you don't realize there are ALSO graph break mismatches

alright, the thing i was missing is the bench runner keeps running and reports fails at the end, so i could integrate the graph break checks with the runner too. makes sense, ill see if its hard

ezyang · 2023-03-08T22:23:33Z

benchmarks/dynamo/ci_expected_accuracy/inference_huggingface.csv

@@ -0,0 +1,43 @@
+dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks


It's not obvious to me that we actually want all these columns here? calls_captured in particular feels likely to wobble, in ways that we ought not to care about?

i thought about this, and it seems easier to check in the exact csv that is easy to produce using our current flows.

and to your point about wobble, we just don't check the other fields- we check "graph_breaks"

Maybe we should generate two csvs: one with detailed, and another that is ok to check in. It's suboptimal to put in extra stuff even if it is ignored because it means that you'll get lots of unrelated wobbling whenever you update the csvs.

i think the cleanest would be

the msg also tells you "run this command: pytorch/benchmarks/dynamo/update_expected.py --pr {PRNUM}"

there is some clever mapping so the script can download the right .zips, and extract the csvs (2 csv per zip currently)

it can also spit out newly stripped CSVs that only contain the cared-about columns

and 'git add' the changes so you don't have to

But i have to figure out (2), not sure if it is trivial or we have to do something to make it possible. CC @ZainRizvi any help here? (best practices for (a) how to interact with artifact files from a script (b) any stable 'ENV' inside test jobs that i can use to build a URL to download artifacts etc

For (2), you need an action after the test runs finish (see #95675 as an example). I am ok with doing this as a followup PR.

re (2) I'd probably do something dumb like feed the script a HUD url, and then just parse out the hyperlinks to find the one I need lol

ezyang · 2023-03-08T22:24:03Z

benchmarks/dynamo/check_graph_breaks.py

+        msg += textwrap.dedent(
+            f"""
+        If this change is expected, you can update `{args.expected}` to reflect the new baseline.
+        This can either be done manually, or by downloading artifacts from your PR CI job."


what's the name of the artifact in HUD would be useful here

ezyang · 2023-03-08T22:25:17Z

benchmarks/dynamo/ci_expected_accuracy/inference_huggingface.csv

@@ -0,0 +1,43 @@
+dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
+cuda,AlbertForMaskedLM,1,pass,439,1,0,0


Something I did not realize until seeing this, is that we get a csv per shard. This means that I have to click through and download eight times to get all of them. Maybe we should have a script that downloads all the csvs from a job...

yea, i am wanting to build more tooling for this too. i have some other ideas...

ezyang · 2023-03-08T22:25:50Z

Thanks for doing this!

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job ghstack-source-id: 8f5f9e9 Pull Request resolved: #96346

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job ghstack-source-id: 2972319 Pull Request resolved: #96346 Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner.

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job ghstack-source-id: a334f1e Pull Request resolved: #96346 Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner.

ezyang · 2023-03-10T02:26:17Z

if you are out of patience and just want to land something, holler and I'll probably approve you to move things along

wconstab · 2023-03-10T04:32:34Z

if you are out of patience and just want to land something, holler and I'll probably approve you to move things along

i actually want to land it together with the WIP (next PR, which adds a script for downloading all the artifacts for the job) next week. Anyway i'm on PTO tmrw so i won't rush this in. It's probably landable though, if you're wanting it urgently, i can update the csvs monday and push it.

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx desertfire [ghstack-poisoned]

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]

wconstab · 2023-03-10T15:36:08Z

ok @ezyang if you feel inclined (and this passes CI) you can stamp and mergebot, otherwise i can take care of it next week. I have gotten the updater tool (next PR) in an almost landable state so i am not too worried about the inconvenience of updating the CSVs in the interim.

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]

ezyang · 2023-03-13T17:10:53Z

By the way, for the differential between dynamic and non-dynamic, my preference is not to have a completely separate dynamic csv, and instead a hand-written list of "we have more graph breaks here".

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]

wconstab · 2023-03-13T18:57:38Z

By the way, for the differential between dynamic and non-dynamic, my preference is not to have a completely separate dynamic csv, and instead a hand-written list of "we have more graph breaks here".

what would the list contain, names of models and deltas compared to static?

In any case I planned to land this as is and probably let you tackle the dynamic part. Or at least let me come back to it next week.

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs [ghstack-poisoned]

wconstab · 2023-03-13T23:10:06Z

@pytorchbot merge

pytorchmergebot · 2023-03-13T23:12:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ezyang · 2023-03-13T23:33:40Z

what would the list contain, names of models and deltas compared to static?

For the most part, I'd expect most models to have the same number of graph breaks with/without dynamic, so it would be a list of one or two models that this is not true for.

pytorchmergebot · 2023-03-14T00:38:21Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / cuda11.8-py3.10-gcc7-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

wconstab · 2023-03-14T03:07:12Z

@pytorchbot merge

pytorchmergebot · 2023-03-14T03:09:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

huydhn · 2023-03-14T20:05:55Z

@wconstab I think there is a bug in the script as it's now failing on periodic https://hud.pytorch.org/pytorch/pytorch/commit/906a1952c676dcccc72684da8e385d98e4704f68. The error is:

2023-03-14T17:51:28.9197764Z Traceback (most recent call last):
2023-03-14T17:51:28.9198275Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/check_graph_breaks.py", line 86, in <module>
2023-03-14T17:51:28.9198600Z     main()
2023-03-14T17:51:28.9198959Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/check_graph_breaks.py", line 79, in main
2023-03-14T17:51:28.9199435Z     failed, msg = check_graph_breaks(actual, expected, args.expected)
2023-03-14T17:51:28.9199870Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/check_graph_breaks.py", line 21, in check_graph_breaks
2023-03-14T17:51:28.9200225Z     expected_graph_breaks = get_field(expected_csv, model, "graph_breaks", typ=int)
2023-03-14T17:51:28.9200576Z   File "/var/lib/jenkins/workspace/benchmarks/dynamo/check_graph_breaks.py", line 10, in get_field
2023-03-14T17:51:28.9200892Z     return typ(csv.loc[csv["name"] == model_name][field])
2023-03-14T17:51:28.9201416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/pandas/core/series.py", line 206, in wrapper
2023-03-14T17:51:28.9201754Z     raise TypeError(f"cannot convert the series to {converter}")
2023-03-14T17:51:28.9202085Z TypeError: cannot convert the series to <class 'int'>

This was missed because periodic wasn't included in this PR. It looks like a forward fix is needed? Or feel free to revert if you need time to look into this

wconstab · 2023-03-14T21:47:04Z

This was missed because periodic wasn't included in this PR. It looks like a forward fix is needed? Or feel free to revert if you need time to look into this

I will shoot for a forward fix. Ping me if i'm taking too long.

wconstab · 2023-03-14T22:05:53Z

@huydhn hopefully, #96780 will fix this. I've tagged it with periodic label so lets see if it helps.

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job Refactor graph-break CI check Take steps toward merging checker with existing check flow, consider merging it all the way inside the bench runner. csvs Pull Request resolved: pytorch/pytorch#96346 Approved by: https://github.com/ezyang

Check dynamo graph-breaks in CI

d962a3c

- add graph-breaks baselines - add check_graph_breaks script (message users on regress or improvement) - hook up test.sh for existing accuracy job [ghstack-poisoned]

github-actions bot added ciflow/inductor module: dynamo labels Mar 8, 2023

wconstab requested review from desertfire, ezyang and malfet and removed request for ezyang and malfet March 8, 2023 22:00

wconstab added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Mar 8, 2023

ezyang reviewed Mar 8, 2023

View reviewed changes

wconstab mentioned this pull request Mar 9, 2023

Refactor graph-break CI check #96399

Closed

wconstab mentioned this pull request Mar 10, 2023

update_expected.py to parse artifacts and update graph break stats #96480

Closed

wconstab added 2 commits March 10, 2023 15:09

ezyang approved these changes Mar 10, 2023

View reviewed changes

pytorchmergebot added the Merged label Mar 14, 2023

pytorchmergebot closed this in 218eeac Mar 14, 2023

This was referenced Mar 14, 2023

WIP #96749

Closed

One line print for check_graph_breaks #96750

Closed

facebook-github-bot deleted the gh/wconstab/130/head branch June 8, 2023 19:13

		@@ -0,0 +1,43 @@
		dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks

		@@ -0,0 +1,43 @@
		dev,name,batch_size,accuracy,calls_captured,unique_graphs,graph_breaks,unique_graph_breaks
		cuda,AlbertForMaskedLM,1,pass,439,1,0,0

Check dynamo graph-breaks in CI #96346

Check dynamo graph-breaks in CI #96346

Uh oh!

Conversation

wconstab commented Mar 8, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96346

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezyang commented Mar 8, 2023

Uh oh!

ezyang commented Mar 10, 2023

Uh oh!

wconstab commented Mar 10, 2023

Uh oh!

wconstab commented Mar 10, 2023

Uh oh!

ezyang commented Mar 13, 2023

Uh oh!

wconstab commented Mar 13, 2023

Uh oh!

wconstab commented Mar 13, 2023

Uh oh!

pytorchmergebot commented Mar 13, 2023

Merge started

Uh oh!

ezyang commented Mar 13, 2023

Uh oh!

pytorchmergebot commented Mar 14, 2023

Merge failed

Uh oh!

wconstab commented Mar 14, 2023

Uh oh!

pytorchmergebot commented Mar 14, 2023

Merge started

Uh oh!

huydhn commented Mar 14, 2023

Uh oh!

wconstab commented Mar 14, 2023

Uh oh!

wconstab commented Mar 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wconstab commented Mar 8, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 8, 2023 •

edited

Loading