Skip to content

Conversation

@frgossen
Copy link
Collaborator

This reverts commit 7786d5d.

python benchmarks/experiment_runner.py --dynamo=openxla --dynamo=openxla_eval --dynamo=inductor --xla=PJRT --xla=None --test=eval --test=train --suite-name=torchbench --accelerator=cuda --filter=^alexnet$ --no-resume fails for openxla benchmarks.

@frgossen frgossen requested review from JackCaoG, vanbasten23, ysiraichi and zpcore and removed request for JackCaoG December 13, 2023 17:03
@zpcore
Copy link
Member

zpcore commented Dec 13, 2023

Sorry I didn't catch up with recent PRs. Do we already support automatically match dynamo backend openxla, openxla_eval with train and eval based on the cmdline in the description?

@frgossen
Copy link
Collaborator Author

We should support openxla_eval + eval, openxla + eval, and openxla + train
This issue is unrelated though. The flags are expanded to configs, so if the combination is not supported it should just skip it.
The failure I see is a segfault which did not happen before.

Copy link
Member

@zpcore zpcore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. LGTM! Can you also resolve the conflict?

This reverts commit 7786d5d.

`python benchmarks/experiment_runner.py --dynamo=openxla --dynamo=openxla_eval --dynamo=inductor --xla=PJRT --xla=None --test=eval --test=train --suite-name=torchbench --accelerator=cuda  --filter=^alexnet$  --no-resume`
fails for openxla benchmarks.
@frgossen frgossen merged commit 788e1b5 into master Dec 13, 2023
@frgossen frgossen deleted the frg-revert-6076 branch December 13, 2023 19:15
golechwierowicz added a commit that referenced this pull request Dec 13, 2023
I discovered some models from the suite do not have the precision set
so instead of failing the script we just log the case, and use the default
precision, as no additional machinery should run for the Inductor
anyway. Additionally I wrapped the exceptions with the
ValueError so the logging message will not pollute with info about
str not inheriting from Exception class.

ecg@, note that needs to be hooked "somewhere". Not sure where, as
there was a revert in #6134.
golechwierowicz added a commit that referenced this pull request Dec 13, 2023
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in #6134, but in general it can be done prior to moving the model to the device safely.
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in pytorch#6134, but in general it can be done prior to moving the model to the device safely.
golechwierowicz added a commit that referenced this pull request Jan 12, 2024
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in #6134, but in general it can be done prior to moving the model to the device safely.
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in #6134, but in general it can be done prior to moving the model to the device safely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants