-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Enable CG headers on ROCm #1821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jeffra Just to confirm, does the ROCm CI job use the latest ROCm5.0 docker images with the patched hipify logic for JIT extensions? |
Yes it does. However our AMD CI runners are currently experiencing issues. Are you able to test this on your side with this image? I previously saw Transformer build errors without the CG patches, so hopefully quick to test. |
|
Okay, I'll test with the |
|
@jeffra Does the CI job use a docker image which already has the hacked CG headers copied to whereas when I tested with the
|
|
@jithunnair-amd, that’s excellent! I’ll retest this later tonight outside our CI since it’s down right now. Sounds very promising though. |
|
@rraminen We should also include the following in this PR:
|
|
Confirmed passing on my side, thank you for the quick fix @rraminen and @jithunnair-amd! I agree, let's remove the CG references in the dockerfile and csrc. Also, can you run our formatter? See: https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites |
d5bd899 to
9da89a8
Compare
|
Hi @jithunnair-amd, I have added the suggested changes to this PR.
|
d5bfac0 to
138f687
Compare
|
@jeffra I think we are done with changes from our end. I'm not sure if the AMD CI issues are resolved, so you can merge as appropriate. |
This PR contains the following:
Revert the workaround implemented for HIP Cooperative Groups.
Transformer kernels now use HIP Cooperative Groups APIs.
The condition " CUDA_ARCH >= 700 || defined(HIP_PLATFORM_HCC)" is added to enable the code on CUDA and ROCm devices
CC: @jithunnair-amd