Skip to content

Conversation

@rraminen
Copy link
Contributor

This PR contains the following:

  • Revert the workaround implemented for HIP Cooperative Groups.

  • Transformer kernels now use HIP Cooperative Groups APIs.

  • The condition " CUDA_ARCH >= 700 || defined(HIP_PLATFORM_HCC)" is added to enable the code on CUDA and ROCm devices

CC: @jithunnair-amd

@jithunnair-amd
Copy link
Contributor

@jeffra Just to confirm, does the ROCm CI job use the latest ROCm5.0 docker images with the patched hipify logic for JIT extensions?

@jeffra
Copy link
Collaborator

jeffra commented Mar 11, 2022

@jeffra Just to confirm, does the ROCm CI job use the latest ROCm5.0 docker images with the patched hipify logic for JIT extensions?

Yes it does. However our AMD CI runners are currently experiencing issues. Are you able to test this on your side with this image? I previously saw Transformer build errors without the CG patches, so hopefully quick to test.

@jithunnair-amd
Copy link
Contributor

Okay, I'll test with the rocm/pytorch:rocm5.0.1_ubuntu18.04_py3.7_pytorch_1.10.0 docker image and update with my findings.

@jithunnair-amd
Copy link
Contributor

@jeffra Does the CI job use a docker image which already has the hacked CG headers copied to /opt/rocm/? That might explain why the CI job shows errors such as:

/opt/conda/lib/python3.7/site-packages/deepspeed/ops/csrc/transformer/general_kernels.hip:16:9: error: no member named 'thread_block_tile' in namespace 'cooperative_groups'
    cg::thread_block_tile<TILE_DIM> g = cg::tiled_partition<TILE_DIM>(b);
    ~~~~^

whereas when I tested with the rocm/pytorch:rocm5.0.1_ubuntu18.04_py3.7_pytorch_1.10.0 docker image, without copying the hacked CG headers, I was able to

  1. build deepspeed successfully using pip install .[dev,1bit,autotuning] (as in the CI job), and
  2. run the unit test successfully using pytest tests/unit/test_cuda_forward.py -k test_forward_with_small_bsz[8-3-1024-512-16-3-False-False]
root@d2e5a4e9d3d7:/data/DeepSpeed# pytest tests/unit/test_cuda_forward.py -k test_forward_with_small_bsz[8-3-1024-512-16-3-False-False] 2>&1 | tee test.log
============================= test session starts ==============================
platform linux -- Python 3.7.11, pytest-7.0.1, pluggy-1.0.0
Using --randomly-seed=1431581554
rootdir: /data/DeepSpeed
plugins: hypothesis-4.53.2, forked-1.4.0, randomly-3.11.0, xdist-2.5.0
collected 40 items / 39 deselected / 1 selected

tests/unit/test_cuda_forward.py .                                        [100%]

=============================== warnings summary ===============================
tests/unit/test_cuda_forward.py:196
  /data/DeepSpeed/tests/unit/test_cuda_forward.py:196: PytestUnknownMarkWarning: Unknown pytest.mark.sequential - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.sequential

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 1 passed, 39 deselected, 1 warning in 35.04s =================

@jeffra
Copy link
Collaborator

jeffra commented Mar 11, 2022

@jithunnair-amd, that’s excellent! I’ll retest this later tonight outside our CI since it’s down right now. Sounds very promising though.

@jithunnair-amd
Copy link
Contributor

@rraminen We should also include the following in this PR:

  1. git rm the hacked CG headers in https://github.com/rraminen/DeepSpeed/tree/enable_ROCm_CG_headers/csrc/includes/patch/hip/hcc_detail, and
  2. update the Dockerfile.rocm to remove the copying of the hacked CG headers: https://github.com/rraminen/DeepSpeed/blob/enable_ROCm_CG_headers/docker/Dockerfile.rocm#L7

@jeffra
Copy link
Collaborator

jeffra commented Mar 11, 2022

Confirmed passing on my side, thank you for the quick fix @rraminen and @jithunnair-amd! I agree, let's remove the CG references in the dockerfile and csrc. Also, can you run our formatter? See: https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md#prerequisites

@rraminen rraminen force-pushed the enable_ROCm_CG_headers branch from d5bd899 to 9da89a8 Compare March 11, 2022 16:38
@rraminen
Copy link
Contributor Author

Hi @jithunnair-amd, I have added the suggested changes to this PR.

  1. git rm the hacked CG headers patch file
  2. updated the Dockerfile.rocm to remove the copying of the hacked CG headers

@rraminen rraminen force-pushed the enable_ROCm_CG_headers branch from d5bfac0 to 138f687 Compare March 11, 2022 17:11
@jithunnair-amd
Copy link
Contributor

@jeffra I think we are done with changes from our end. I'm not sure if the AMD CI issues are resolved, so you can merge as appropriate.

@jeffra jeffra enabled auto-merge (squash) March 11, 2022 20:06
@jeffra jeffra disabled auto-merge March 11, 2022 20:06
@jeffra jeffra merged commit 7bcb4fa into deepspeedai:master Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants