New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build for hip gpu backends #392
Conversation
f4ba864
to
a6073c1
Compare
0ad6d3a
to
633f7a9
Compare
6f584a1
to
bf96efc
Compare
One more small thing -- it would be really good to run shellcheck before merging this, since we're modifying bash files. We should probably add a CI workflow for this later too |
Good idea. #432 |
The CI is currently broken because the python setup.py script reads the config.linux script as a string and attempts to parse the values set in it, so it can't be treated as a regular shell script. We added some standard bash conventions so the config script so it could have default values and read overrides in from the environment which breaks the adhoc variable parsing. Will need to fix before merging in |
Regarding the CI, if we can not find a AMD GPU for the CI, we can test the code with |
48f1521
to
ef97c21
Compare
do not call sed to manually change config script clone submodules in docker job
…ead of parsing it adhoc
f7a1fee
to
a8c4ed4
Compare
Hi @williamberman . I am trying to build the project according to the method you gave, but the error message said that kfd could not access the project due to insufficient permission. It may be because of this problem that the compiling error of legion was reported. I guess it is because docker permission is not enough to access kfd during the compilation process. Can you give me some advice? Thank you. |
Hey @GYDmedwin I might need some additional information here :) I take it this is happening during the docker container build? What's the full error and where does it occur in the build process (maybe some more console output would be helpful). It sounds like you're running on a machine with an actual amd gpu if I'm reading your message correctly. Fwiw, we're mainly merging this with the intention of targeting amd but did not run it on an actual machine with an amd gpu due to the particulars involved in finding one that supports rocm. |
Head branch was pushed to by a user without write access
@williamberman Thank you for your reply. Yes, I test this on a machine with the actual amd gpu MI100 and the errors occur during the docker container build. And the main error's details are as follows: `[ 10%] Building CXX object deps/legion/runtime/CMakeFiles/RealmRuntime.dir/realm/numa/numasysif.cc.o [ 38%] Building CXX object deps/legion/runtime/CMakeFiles/RealmRuntime.dir/realm/deppart/image_4_5.cc.o This error occurred in several places, and the content was the same. I only showed three. I guess because of this error, legion compilation failed, and finally FlexFlow compilation failed. Another error message details are as follows: |
Ok great thank you for the extra details! I opened up an issue here #457. I think your best immediate bet would be building with the Makefiles and on your standard system (not in docker) |
Issue: #345
Testing the build
Current status
hip_rocm builds e2e with a few changes to legion and flexflow source
Small source modifications for build
misc small changes to the source to get the build working that should be ok to merge.
Move tools to top level directory
We glob for files under src to get the source files for the flexflow
target. Moving tools to the top level directory prevents the tools
sourcefiles from accidentally being added to the flexflow target
source files.
change substitution_to_dot cuda_add_executable to add_executable.
When building with hip_rocm, we don't have cuda available and shouldn't
need to build with it for substitution_to_dot as the target does
Remaining:
Misc
Additional note on the legion change, I also don't know if the const_cast to remove the volatile qualifier is sound in that context. I mainly added it to get legion compiling with the changed build config