Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROCm support #252

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

ROCm support #252

wants to merge 4 commits into from

Conversation

88Ocelot
Copy link

Initial support for ROCm

@88Ocelot 88Ocelot changed the title Feature/rocm ROCm support Dec 25, 2023
@olegklimov
Copy link
Contributor

Oh I see, you wrote a Dockerfile!

We have no way to test it, because we have no AMD gpus, but maybe we can set up the building process and someone can test.

pip install deepspeed

@88Ocelot does this mean it will install deepspeed at a first launch, and there's no way to install it in Dockerfile currently?

That's a super nice contribution @88Ocelot !

@olegklimov olegklimov mentioned this pull request Dec 31, 2023
&& python -m self_hosting_machinery.watchdog.docker_watchdog'
image: refact_self_hosting_rocm
build:
dockerfile: rocm.Dockerfile
Copy link

@takov751 takov751 Jan 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    build:
+      context: .
      dockerfile: rocm.Dockerfile

This was the only issue i found with this build so far :D I am testing it right now, just waiting for the models to download

After some building and testing i have ecountered a big issue

refact_self_hosted_1  | -- 11 -- 20240102 00:08:39 MODEL STATUS loading model
refact_self_hosted_1  | -- 11 -- 20240102 00:08:39 MODEL loading model local_files_only=1
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL Exllama kernel is not installed, reset disable_exllama to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:
refact_self_hosted_1  | -- 11 -- 1. You disabled CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
refact_self_hosted_1  | -- 11 -- 2. You are using pytorch without CUDA support.
refact_self_hosted_1  | -- 11 -- 3. CUDA and nvcc are not installed in your device.
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL lm_head not been quantized, will be ignored when make_quant.
refact_self_hosted_1  | -- 11 -- 20240102 00:08:40 MODEL CUDA extension not installed.

After some testing today i can say that sadly we need to wait more to make this happen . For example flash_attention probably going to work from rocm5.7 when it gets stable release.I saw that you have tried some workarounds, but i believe it did not worked due to rocm library differences

So far even when it builded and started most of the time i just got timeout error , and model was not loaded properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants