ROCm support #252

88Ocelot · 2023-12-25T13:26:37Z

Initial support for ROCm

olegklimov · 2023-12-31T18:02:47Z

Oh I see, you wrote a Dockerfile!

We have no way to test it, because we have no AMD gpus, but maybe we can set up the building process and someone can test.

pip install deepspeed

@88Ocelot does this mean it will install deepspeed at a first launch, and there's no way to install it in Dockerfile currently?

That's a super nice contribution @88Ocelot !

takov751 · 2024-01-01T23:03:07Z

docker-compose.rocm.yml

+      && python -m self_hosting_machinery.watchdog.docker_watchdog'
+    image: refact_self_hosting_rocm
+    build:
+      dockerfile: rocm.Dockerfile


build: + context: . dockerfile: rocm.Dockerfile

This was the only issue i found with this build so far :D I am testing it right now, just waiting for the models to download

After some building and testing i have ecountered a big issue

refact_self_hosted_1 | -- 11 -- 20240102 00:08:39 MODEL STATUS loading model refact_self_hosted_1 | -- 11 -- 20240102 00:08:39 MODEL loading model local_files_only=1 refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL Exllama kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source. refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because: refact_self_hosted_1 | -- 11 -- 1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source. refact_self_hosted_1 | -- 11 -- 2. You are using pytorch without CUDA support. refact_self_hosted_1 | -- 11 -- 3. CUDA and nvcc are not installed in your device. refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL lm_head not been quantized, will be ignored when make_quant. refact_self_hosted_1 | -- 11 -- 20240102 00:08:40 MODEL CUDA extension not installed.

After some testing today i can say that sadly we need to wait more to make this happen . For example flash_attention probably going to work from rocm5.7 when it gets stable release.I saw that you have tried some workarounds, but i believe it did not worked due to rocm library differences

So far even when it builded and started most of the time i just got timeout error , and model was not loaded properly.

88Ocelot added 4 commits December 25, 2023 15:21

Add docker-compose for ROCm

bb3133a

Add Dockerfile for ROCm

db42080

Configure setup.py for ROCm and CUDA support

65c50d0

Add support for rocm-smi

d5a8e72

88Ocelot changed the title ~~Feature/rocm~~ ROCm support Dec 25, 2023

olegklimov requested a review from mitya52 December 31, 2023 17:59

olegklimov mentioned this pull request Dec 31, 2023

Amd gpu supports #20

Open

takov751 reviewed Jan 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm support #252

ROCm support #252

88Ocelot commented Dec 25, 2023

olegklimov commented Dec 31, 2023

takov751 Jan 1, 2024 •

edited

ROCm support #252

Are you sure you want to change the base?

ROCm support #252

Conversation

88Ocelot commented Dec 25, 2023

olegklimov commented Dec 31, 2023

takov751 Jan 1, 2024 • edited

Choose a reason for hiding this comment

takov751 Jan 1, 2024 •

edited