Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added AMD GPU Support to Zeus #57

Merged
merged 30 commits into from
May 2, 2024
Merged

Added AMD GPU Support to Zeus #57

merged 30 commits into from
May 2, 2024

Conversation

parthraut
Copy link
Collaborator

Added AMD GPU support to Zeus. Involved adding method implementations to Zeus.device.gpu for the AMDGPU class.

@jaywonchung
Copy link
Member

Thanks for your work. Please resolve merge conflicts and make CI pass, and then request review.

zeus/device/gpu.py Outdated Show resolved Hide resolved
@parthraut
Copy link
Collaborator Author

@jaywonchung fixed all merge conflicts and passing all tests, ready for review.

zeus/device/gpu.py Outdated Show resolved Hide resolved
zeus/device/gpu.py Outdated Show resolved Hide resolved
zeus/device/gpu.py Outdated Show resolved Hide resolved
@parthraut
Copy link
Collaborator Author

@jaywonchung I went through and retested each method, and fixed any issues. It should be all correct now.

info = amdsmi.amdsmi_get_power_cap_info(self.handle) # Returns in W
amdsmi.amdsmi_set_power_cap(
self.handle, 0, cap=int(info["default_power_cap"] * 1e6)
) # expects value in microwatts
Copy link
Member

@jaywonchung jaywonchung May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-05-01 at 8 21 27 PM Holy shit... 5.7 returns in microwatts but 6.0 returns in watts.... Is everything based on 5.7? Then let's keep it and later figure out how to make it work for 6.0...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This is what I see, I was going off the source code for 6.0. So it should be correct for ROCM 6.0.

As for pytorch, it looks like it just got full support for ROCM 6.0 with the release of Pytorch 2.3 a week ago.

Should we stick to ROCM 6.0 then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, if everything is consistent with 6.0, let's keep it that way!

Copy link
Member

@jaywonchung jaywonchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your wonderful work!

@jaywonchung jaywonchung merged commit cda3a3e into master May 2, 2024
1 check passed
@jaywonchung jaywonchung deleted the amd_support branch May 2, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants