Skip to content

Conversation

jagadish-amd
Copy link
Contributor

@jagadish-amd jagadish-amd commented Feb 14, 2025

Since the env variable HIPBLASLT_ALLOW_TF32 can change, remove static type for allow_tf32 variable so that it captures the current value of env variable HIPBLASLT_ALLOW_TF32.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd

Since the env variable HIPBLASLT_ALLOW_TF32 can change, remove
const static type for allow_tf32 variable so that it captures the
current value of env variable HIPBLASLT_ALLOW_TF32.

Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
Copy link

pytorch-bot bot commented Feb 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/147186

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 641ce99 with merge base 331d5cf (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Feb 14, 2025
@jagadish-amd
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Feb 14, 2025
@jagadish-amd
Copy link
Contributor Author

jagadish-amd commented Feb 14, 2025

I am working on PR to enable tf32 testing on test_nn for ROCm.
tf32 on and off https://github.com/pytorch/pytorch/blob/main/test/test_nn.py#L54 requires env variable HIPBLASLT_ALLOW_TF32 to change accordingly. Without this PR, allow_tf32 variable in Context.cpp always evaluates to whatever HIPBLASLT_ALLOW_TF32 was set initially.

cc @xw285cornell

@pruthvistony pruthvistony added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm labels Feb 14, 2025
Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
@jagadish-amd jagadish-amd changed the title ROCm: Remove const static specifier for allow_tf32 variable. ROCm: Remove static specifier for allow_tf32 variable. Feb 14, 2025
@jagadish-amd jagadish-amd marked this pull request as ready for review February 14, 2025 20:41
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 25, 2025
Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
@jithunnair-amd jithunnair-amd added ciflow/rocm Trigger "default" config CI on ROCm and removed ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/rocm Trigger "default" config CI on ROCm labels Feb 25, 2025
else:
del os.environ["HIPBLASLT_ALLOW_TF32"]

@unittest.skipIf(not TEST_WITH_ROCM, "not relevant for CUDA testing")
Copy link
Collaborator

@naromero77amd naromero77amd Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test only runs on MI300, so we need to do:

from torch.testing._internal.common_utils import runOnRocmArch, MI300_ARCH

Then we need to add the decorator below to the test.

@runOnRocmArch(MI300_ARCH)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since we don't execute any test. You can just ignore my last comment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. The new test is not arch-specific, it only tests whether the env var controls the returned value. No kernels are run.

@jeffdaily
Copy link
Collaborator

@pytorch merge -f "unrelated rocm ci failures"

@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "unrelated rocm ci failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

aditew01 pushed a commit that referenced this pull request Feb 28, 2025
Since the env variable HIPBLASLT_ALLOW_TF32 can change, remove static type for allow_tf32 variable so that it captures the current value of env variable HIPBLASLT_ALLOW_TF32.

Pull Request resolved: #147186
Approved by: https://github.com/jeffdaily, https://github.com/naromero77amd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants