Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing DeepSpeed on Windows - The Correct Procedure / Solution / Solved #4734

Closed
erew123 opened this issue Nov 26, 2023 · 17 comments
Closed
Labels
enhancement New feature or request

Comments

@erew123
Copy link
Contributor

erew123 commented Nov 26, 2023

This is not a feature request, this is actually how you install DeepSpeed on Windows. I have requested Microsoft update their GitHub, but if not, here is my post to them microsoft/DeepSpeed#4729

Please note, there are limitations of DeepSpeed on Windows VS the Linux version.
Also note, there could be other things needed to get bits of DS working on Windows fully

Other than using the instructions above, you can also install the Nvidia Cuda Toolkit, Create a new Python 3.9.18 environment, set your CUDA_HOME environment variable in that environment and download someone else's wheel file it. I actually do have both a cuda 11.8 and 12.1 wheel for Python 3.9.18.

@oobabooga

@erew123 erew123 added the enhancement New feature or request label Nov 26, 2023
@erew123
Copy link
Contributor Author

erew123 commented Nov 26, 2023

FYI Im working on some code (which will need some looking at) but I can confirm DeepSpeed is loaded on Windows, based off those above instructions! Im using CUDA 12.x which is giving issues, but Im looking into that.

image

Another user is using cuda 11.8 and they seem to have it working fine. #4712 (comment)

@rktvr
Copy link

rktvr commented Nov 27, 2023

does it have a noticable speed increase for inference?

@erew123
Copy link
Contributor Author

erew123 commented Nov 27, 2023

@rktvr Yes, I've been working on some code and giving it a go. I need to do a little bit more testing before I can say all is good, so I will confirm in a day or two with a bit more info.

On TTS, its about 3-4x faster is my current estimate.

On Windows there are limitations to it. e.g. It runs on Python 3.9.18, you have to install the CUDA Toolkit 11.8 or 12 (depending on your CUDA) and a couple of other bits.

On Linux you have to install CUDA.

Also, you need to change the CUDA_HOME environment, which Text-Generation-WebUI has already set and I'm not sure if this could have any other impacts.

My advice is DONT go installing it just yet! You may not see any benefit anyway, because you need DeepSpeed implemented in the code that calls the TTS engine anyway.

@erew123
Copy link
Contributor Author

erew123 commented Nov 28, 2023

@rktvr Please have a look here #4712 (comment) I've got it working with Coqui_tts and there is an example screenshot in that link.

@S95Sedan
Copy link

S95Sedan commented Dec 7, 2023

@rktvr Yes, I've been working on some code and giving it a go. I need to do a little bit more testing before I can say all is good, so I will confirm in a day or two with a bit more info.

On TTS, its about 3-4x faster is my current estimate.

On Windows there are limitations to it. e.g. It runs on Python 3.9.18, you have to install the CUDA Toolkit 11.8 or 12 (depending on your CUDA) and a couple of other bits.

On Linux you have to install CUDA.

Also, you need to change the CUDA_HOME environment, which Text-Generation-WebUI has already set and I'm not sure if this could have any other impacts.

My advice is DONT go installing it just yet! You may not see any benefit anyway, because you need DeepSpeed implemented in the code that calls the TTS engine anyway.

Version 11.1 + CUDA 12.1 + stock ooba (3.11) Python install works fine aswell. Also had a look at the 12.4 version but it seems to have way more isues aswell as added deepspeed-kernel which cant be compiled properly either. (ironically is broken aswell).

Attached the compiled .whl which im using right now and should be working. env_report.py has a minor fix for windows in that aswell.
deepspeed-0.11.1+e9503fe-cp311-cp311-win_amd64.rar.zip

ds_report_11 1

Edit 07-12-2023:
Needed some time to find everything and document it but below are all the changes made to compile 11.2:

/build_win.bat:
add:

set DS_BUILD_EVOFORMER_ATTN=0

/csrc/quantization/pt_binding.cpp - lines 244-250 - change to:

    std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end());
    sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node;  // num of GPU per nodes
    at::IntArrayRef sz(sz_vector);
    auto output = torch::empty(sz, output_options);

    const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
    const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node);
    const int elems_per_out_group = elems_per_in_tensor / out_groups;

/csrc/transformer/inference/csrc/pt_binding.cpp
lines 541-542 - change to:

									 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
									  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

lines 550-551 - change to:

						 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
						  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

line 1581 - change to:

		at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options);

/deepspeed/env_report.py
line 10 - add:

import psutil

line 83 - 100 - change to:

def get_shm_size():
    try:
        temp_dir = os.getenv('TEMP') or os.getenv('TMP') or os.path.join(os.path.expanduser('~'), 'tmp')
        shm_stats = psutil.disk_usage(temp_dir)
        shm_size = shm_stats.total
        shm_hbytes = human_readable_size(shm_size)
        warn = []
        if shm_size < 512 * 1024**2:
            warn.append(
                f" {YELLOW} [WARNING] Shared memory size might be too small, consider increasing it. {END}"
            )
            # Add additional warnings specific to your use case if needed.
        return shm_hbytes, warn
    except Exception as e:
        return "UNKNOWN", [f"Error getting shared memory size: {e}"]

@rktvr
Copy link

rktvr commented Dec 7, 2023

@rktvr Yes, I've been working on some code and giving it a go. I need to do a little bit more testing before I can say all is good, so I will confirm in a day or two with a bit more info.
On TTS, its about 3-4x faster is my current estimate.
On Windows there are limitations to it. e.g. It runs on Python 3.9.18, you have to install the CUDA Toolkit 11.8 or 12 (depending on your CUDA) and a couple of other bits.
On Linux you have to install CUDA.
Also, you need to change the CUDA_HOME environment, which Text-Generation-WebUI has already set and I'm not sure if this could have any other impacts.
My advice is DONT go installing it just yet! You may not see any benefit anyway, because you need DeepSpeed implemented in the code that calls the TTS engine anyway.

Version 11.1 + CUDA 12.1 + stock ooba (3.11) Python install works fine aswell. Also had a look at the 12.4 version but it seems to have way more isues aswell as added deepspeed-kernel which cant be compiled properly either. (ironically is broken aswell).

Attached the compiled .whl which im using right now and should be working. env_report.py has a minor fix for windows in that aswell. deepspeed-0.11.1+e9503fe-cp311-cp311-win_amd64.rar.zip

ds_report_11 1

Thank you for the .whl, it works perfectly!

Generating over a minute of audio only takes ~15 seconds now which is a massive improvement.

@erew123
Copy link
Contributor Author

erew123 commented Dec 11, 2023

If anyone is interested in a about 40+ voice files, this link should be live for about 6 days https://filebin.net/t97nd69ac7qm2rsf

Also, I've fully released the updated Coqui TTS extension now https://github.com/erew123/alltalk_tts

If you want to try it :)

@jepjoo
Copy link

jepjoo commented Dec 13, 2023

So Python 3.9 and 3.11 seem to be working according to above discussion.

I'm on Python 3.10, is there a way to install DeepSpeed?

@erew123
Copy link
Contributor Author

erew123 commented Dec 13, 2023

Allegedly/supposedly/possibly, this is a pre-built wheel file for python 3.10 https://huggingface.co/Jmica/audiobook_maker/tree/main

the one that says cp310

I have not tried it, and wuzzoy tried it and said it didnt work for him #4712 (comment)

(Ive not yet tried the 3.11 @S95Sedan method yet... as I've been too deep in other things)

@elkay
Copy link

elkay commented Dec 14, 2023

S95Sedan's build worked fine for me. The one I built myself (successfully) was throwing this error:

ERROR: deepspeed-0.8.3+unknown-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.

@Wuzzooy
Copy link

Wuzzooy commented Dec 14, 2023

S95Sedan's build worked fine for me. The one I built myself (successfully) was throwing this error:

ERROR: deepspeed-0.8.3+unknown-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.

If you tried to install it in a 3.11 python environment, it's normal.

@erew123
Copy link
Contributor Author

erew123 commented Dec 14, 2023

@S95Sedan Just tested out your DeepSpeed 11.2 wheel file and it works great!! Really nice job on that!

Are you ok with me including a reference to you having done this and also linking to the file+adding documentation in AllTalk (Will reference you figured this out)?

Thanks

@erew123
Copy link
Contributor Author

erew123 commented Dec 14, 2023

@S95Sedan If it is ok with you, I have published up your instructions here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#deepspeed-112-for-windows--python-311

Happy to remove them and just solely link to you (if you prefer) or remove your name (if you prefer) etc.

If you're ok with me using your name etc, Id like to amend the instructions within AllTalk and also reference you at the bottom in the "Thanks" area of the documentation.

Let me know! And thanks for the info on this! :)

@S95Sedan
Copy link

S95Sedan commented Dec 14, 2023

@erew123 yeah no worries, feel free to grab whatever you need to make it as complete as possible.

Edit1: Your docs seem to mention 11.2 aswell but the version uploaded here is 11.1.
Edit2: I've corrected the instructions for creating the .whl and uploaded the compiled ones Here grab whatever info or direct links you need from there.

@erew123
Copy link
Contributor Author

erew123 commented Dec 15, 2023

@S95Sedan Thanks so much! That is awesome! :) Ill make sure you are thanks+referenced and linked in AllTalk. Will be on its next update.

@daswer123
Copy link

daswer123 commented Dec 15, 2023

@S95Sedan Thank you so much for the instructions.
I built deepspeed 11.2 for python 3.10.x and CUDA 11.8.

If anyone is interested, you can get the prebuild-wheel here
https://github.com/daswer123/resemble-enhance-windows/releases/tag/deepspeed

image

@erew123
Copy link
Contributor Author

erew123 commented Dec 18, 2023

Closing this ticket off as I think the issue is put to bed now. Thanks everyone who got involved!

@erew123 erew123 closed this as completed Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants