-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing cmake gpu backend hip_rocm build on machine with AMD gpu #457
Comments
@williamberman Thanks for your advice! After I tested it on my machine, the problem with the kfd not being accessible was solved. I also discovered two new problems:
|
@GYDmedwin Thank you for following up with this, super helpful!
Very good catch, I can follow up with a fix.
For this, I am hoping that this is because we pushed a new legion commit and your legion submodule needs to be updated. If you are testing from flexflow tip, you can confirm this by running Once again, super helpful reporting this and please keep following up with us with any build issues you have! |
@lockshaw I was thinking that this was an unnecessary use of We have a few options,
I'm not sure the extent flexflow is used with non-cuda backends. If hip is very frequently used and needs to be a first class citizen, we could go all the way to 3. If hip is not frequently used and we mainly just want to make sure we can build with it when necessary, we could do 1. 2 is a middle option. Let me know what you think and please tag anyone else who has additional context |
Great, you are right, it has been successfully compiled on the real machine. Thank you so much! |
@GYDmedwin Thank you for reporting the build issues! |
I'd say 1 is probably want we want to do for now, unless @eddy16112 or @jiazhihao disagree. Obviously 3 would be nice to reach in the long term, but I don't know if it's currently high priority enough to put resources toward (@williamberman can you create an issue for 3?) |
@lockshaw sorry missed this, agreed! Will do |
Summary of the status of this by @williamberman:
Based on Will's description it sounds like this issue still exists, so I will remove the "out of date" flag |
Follow up from: #392
This build has been tested on a non-amd machine targetting rocm. It has been reported that at least when using docker, the build fails. We should test this build directly on an amd machine and potentially integrate as a part of CI.
The amd gpus that aws provides do not support rocm
The text was updated successfully, but these errors were encountered: