Skip to content

Conversation

@golechwierowicz
Copy link
Collaborator

@golechwierowicz golechwierowicz commented Dec 13, 2023

I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in #6134, but in general it can be done prior to moving the model to the device safely.

I discovered some models from the suite do not have the precision set
so instead of failing the script we just log the case, and use the default
precision, as no additional machinery should run for the Inductor
anyway. Additionally I wrapped the exceptions with the
ValueError so the logging message will not pollute with info about
str not inheriting from Exception class.

ecg@, note that needs to be hooked "somewhere". Not sure where, as
there was a revert in #6134.
@cota
Copy link
Collaborator

cota commented Dec 13, 2023

Thanks Greg -- I had missed that your change was reverted as part of another revert.

Please land this (including the PR description when merging, otherwise annoyingly we lose that info in the repo) and I'll try to re-land your reverted change.

Copy link
Collaborator

@frgossen frgossen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment.
Thanks you for finding these thinsg!

@golechwierowicz golechwierowicz merged commit 6b1344e into master Dec 13, 2023
@cota
Copy link
Collaborator

cota commented Dec 13, 2023

Please land this (including the PR description when merging, otherwise annoyingly we lose that info in the repo) and I'll try to re-land your reverted change.

Given that Jack is looking into this at the bridge level, I'll wait for him to finish that investigation since that's the right place to fix this.
Thanks again for identifying this issue Greg!

chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in pytorch#6134, but in general it can be done prior to moving the model to the device safely.
golechwierowicz added a commit that referenced this pull request Jan 12, 2024
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in #6134, but in general it can be done prior to moving the model to the device safely.
Comment on lines 228 to +230
elif precision == "amp":
raise f"AMP for PT/XLA:GPU is not implemented yet for torchbench models"
raise ValueError(
f"AMP for PT/XLA:GPU is not implemented yet for torchbench models")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miladm @frgossen @golechwierowicz Isn't AMP supported on PT/XLA:GPU?

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
I discovered some models from the suite do not have the default precision set so instead of failing the script we just log the case, and do nothing, as no additional machinery should run for the Inductor anyway. Additionally I wrapped the exceptions with the ValueError so the logging message will not pollute with info about str not inheriting from Exception class.

@cota , note that needs to be hooked "somewhere". Not sure where, as there was a revert in #6134, but in general it can be done prior to moving the model to the device safely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants