Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to port FX minified graph to HLO via StableHLO #109084

Closed
wants to merge 1 commit into from

Conversation

awskila
Copy link
Contributor

@awskila awskila commented Sep 12, 2023

If XLA_HLO_DEBUG flag is enabled, generated a minified HLO graph when using the minifier. This function enables HLO minification support by porting the minified FX graph to StableHLO via the save_torch_model_as_stablehlo function.

This allows users to port the minified graph to compilers that are not compatible with TorchDynamo/Inductor workflow and use XLA instead. The purpose of this PR is to help XLA users debug accuracy and compilation errors. It will also be helpful for existing TorchDynamo/XLA workflow on torchxla_trace_once backend as well.

Fixes #5461 in Torch XLA repo. CC @GleasonK @qihqi

@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Sep 12, 2023
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 12, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109084

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 198acf8 with merge base c1a2f35 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 12, 2023
@janeyx99
Copy link
Contributor

Added @anijain2305 and @mlazos as reviewers as this is minifier related--feel free to reassign reviewers if you're not the right people.

@awskila
Copy link
Contributor Author

awskila commented Sep 12, 2023

The two failures are unrelated to this PR. Both are due to an Android installation issue (maybe caused by permissions issue/read-only filesystem?)

ANDROID_HOME not a directory; did you install it under /opt/android/sdk?

@awskila
Copy link
Contributor Author

awskila commented Sep 18, 2023

@janeyx99 Seems like @mlazos @anijain2305 may be busy at the moment. Is it possible to ping them, or add another reviewer? There's interest from my team (AWS SageMaker) and StableHLO to get this in as a debugging feature for XLA users.

@awskila
Copy link
Contributor Author

awskila commented Sep 22, 2023

Is it possible to get a review on this PR? Have been waiting for about 2 weeks. Thanks!

If `XLA_HLO_DEBUG` flag is enabled, generate a minified HLO graph when
using the minifier.

This function enables HLO minification support by porting
the minified FX graph to StableHLO via the `save_torch_model_as_stablehlo`
function.

This allows users to port the FX minified graph to compilers that
are not compatible with TorchDynamo/Inductor and use XLA as its backend.
@awskila
Copy link
Contributor Author

awskila commented Sep 27, 2023

@anijain2305 @mlazos Can this PR be reviewed this week? We have interest from the AWS SageMaker team plus Google's StableHLO team. Thanks for your time!

Copy link
Contributor

@anijain2305 anijain2305 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the delay. It lgtm!

@awskila
Copy link
Contributor Author

awskila commented Sep 27, 2023

No worries @anijain2305. Thanks! Glad to see it get approved!

@awskila
Copy link
Contributor Author

awskila commented Oct 2, 2023

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 2, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kit1980
Copy link
Member

kit1980 commented Oct 2, 2023

@awskila @anijain2305 After this PR a test started to time out:

https://github.com/pytorch/pytorch/actions/runs/6384661915/job/17328135872

dynamo/test_dynamic_shapes.py::DynamicShapesExportTests::test_retracibility_dynamic_shapes <- test/dynamo/test_export.py Command took >30min, returning 124

I'm not sure if this PR is actually the cause of this, what do you think?

@awskila
Copy link
Contributor Author

awskila commented Oct 2, 2023

Hmmm, I took a look at it, and I don't think it is the cause of the test failure. Main reason is that this function is only called when XLA_HLO_DEBUG environment variable is set.

Per the docker container runtime arguments, XLA_HLO_DEBUG is not set.

-e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e PYTORCH_RETRY_TEST_CASES -e PYTORCH_OVERRIDE_FLAKY_SIGNAL -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_S3_KEY_PREFIX -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e DASHBOARD_TAG

Also the FX minifier does not support dynamic shapes. And test_dynamic_shapes does not import the minifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: fx release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Add HLO Minification API
6 participants