Installing DeepSpeed on Windows - The Correct Procedure / Solution / Solved #4734

erew123 · 2023-11-26T02:22:04Z

This is not a feature request, this is actually how you install DeepSpeed on Windows. I have requested Microsoft update their GitHub, but if not, here is my post to them microsoft/DeepSpeed#4729

Please note, there are limitations of DeepSpeed on Windows VS the Linux version.
Also note, there could be other things needed to get bits of DS working on Windows fully

Other than using the instructions above, you can also install the Nvidia Cuda Toolkit, Create a new Python 3.9.18 environment, set your CUDA_HOME environment variable in that environment and download someone else's wheel file it. I actually do have both a cuda 11.8 and 12.1 wheel for Python 3.9.18.

@oobabooga

erew123 · 2023-11-26T02:44:36Z

FYI Im working on some code (which will need some looking at) but I can confirm DeepSpeed is loaded on Windows, based off those above instructions! Im using CUDA 12.x which is giving issues, but Im looking into that.

Another user is using cuda 11.8 and they seem to have it working fine. #4712 (comment)

rktvr · 2023-11-27T11:44:58Z

does it have a noticable speed increase for inference?

erew123 · 2023-11-27T12:20:14Z

@rktvr Yes, I've been working on some code and giving it a go. I need to do a little bit more testing before I can say all is good, so I will confirm in a day or two with a bit more info.

On TTS, its about 3-4x faster is my current estimate.

On Windows there are limitations to it. e.g. It runs on Python 3.9.18, you have to install the CUDA Toolkit 11.8 or 12 (depending on your CUDA) and a couple of other bits.

On Linux you have to install CUDA.

Also, you need to change the CUDA_HOME environment, which Text-Generation-WebUI has already set and I'm not sure if this could have any other impacts.

My advice is DONT go installing it just yet! You may not see any benefit anyway, because you need DeepSpeed implemented in the code that calls the TTS engine anyway.

erew123 · 2023-11-28T16:36:02Z

@rktvr Please have a look here #4712 (comment) I've got it working with Coqui_tts and there is an example screenshot in that link.

S95Sedan · 2023-12-07T01:26:31Z

@rktvr Yes, I've been working on some code and giving it a go. I need to do a little bit more testing before I can say all is good, so I will confirm in a day or two with a bit more info.

On TTS, its about 3-4x faster is my current estimate.

On Windows there are limitations to it. e.g. It runs on Python 3.9.18, you have to install the CUDA Toolkit 11.8 or 12 (depending on your CUDA) and a couple of other bits.

On Linux you have to install CUDA.

Also, you need to change the CUDA_HOME environment, which Text-Generation-WebUI has already set and I'm not sure if this could have any other impacts.

My advice is DONT go installing it just yet! You may not see any benefit anyway, because you need DeepSpeed implemented in the code that calls the TTS engine anyway.

Version 11.1 + CUDA 12.1 + stock ooba (3.11) Python install works fine aswell. Also had a look at the 12.4 version but it seems to have way more isues aswell as added deepspeed-kernel which cant be compiled properly either. (ironically is broken aswell).

Attached the compiled .whl which im using right now and should be working. env_report.py has a minor fix for windows in that aswell.
deepspeed-0.11.1+e9503fe-cp311-cp311-win_amd64.rar.zip

Edit 07-12-2023:
Needed some time to find everything and document it but below are all the changes made to compile 11.2:

/build_win.bat:
add:

set DS_BUILD_EVOFORMER_ATTN=0

/csrc/quantization/pt_binding.cpp - lines 244-250 - change to:

    std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end());
    sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node;  // num of GPU per nodes
    at::IntArrayRef sz(sz_vector);
    auto output = torch::empty(sz, output_options);

    const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node;
    const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node);
    const int elems_per_out_group = elems_per_in_tensor / out_groups;

/csrc/transformer/inference/csrc/pt_binding.cpp
lines 541-542 - change to:

									 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
									  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

lines 550-551 - change to:

						 {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()),
						  static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()),

line 1581 - change to:

		at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options);

/deepspeed/env_report.py
line 10 - add:

import psutil

line 83 - 100 - change to:

def get_shm_size():
    try:
        temp_dir = os.getenv('TEMP') or os.getenv('TMP') or os.path.join(os.path.expanduser('~'), 'tmp')
        shm_stats = psutil.disk_usage(temp_dir)
        shm_size = shm_stats.total
        shm_hbytes = human_readable_size(shm_size)
        warn = []
        if shm_size < 512 * 1024**2:
            warn.append(
                f" {YELLOW} [WARNING] Shared memory size might be too small, consider increasing it. {END}"
            )
            # Add additional warnings specific to your use case if needed.
        return shm_hbytes, warn
    except Exception as e:
        return "UNKNOWN", [f"Error getting shared memory size: {e}"]

rktvr · 2023-12-07T04:10:46Z

@rktvr Yes, I've been working on some code and giving it a go. I need to do a little bit more testing before I can say all is good, so I will confirm in a day or two with a bit more info.
On TTS, its about 3-4x faster is my current estimate.
On Windows there are limitations to it. e.g. It runs on Python 3.9.18, you have to install the CUDA Toolkit 11.8 or 12 (depending on your CUDA) and a couple of other bits.
On Linux you have to install CUDA.
Also, you need to change the CUDA_HOME environment, which Text-Generation-WebUI has already set and I'm not sure if this could have any other impacts.
My advice is DONT go installing it just yet! You may not see any benefit anyway, because you need DeepSpeed implemented in the code that calls the TTS engine anyway.

Version 11.1 + CUDA 12.1 + stock ooba (3.11) Python install works fine aswell. Also had a look at the 12.4 version but it seems to have way more isues aswell as added deepspeed-kernel which cant be compiled properly either. (ironically is broken aswell).

Attached the compiled .whl which im using right now and should be working. env_report.py has a minor fix for windows in that aswell. deepspeed-0.11.1+e9503fe-cp311-cp311-win_amd64.rar.zip

Thank you for the .whl, it works perfectly!

Generating over a minute of audio only takes ~15 seconds now which is a massive improvement.

erew123 · 2023-12-11T10:18:08Z

If anyone is interested in a about 40+ voice files, this link should be live for about 6 days https://filebin.net/t97nd69ac7qm2rsf

Also, I've fully released the updated Coqui TTS extension now https://github.com/erew123/alltalk_tts

If you want to try it :)

jepjoo · 2023-12-13T13:32:52Z

So Python 3.9 and 3.11 seem to be working according to above discussion.

I'm on Python 3.10, is there a way to install DeepSpeed?

erew123 · 2023-12-13T17:13:02Z

Allegedly/supposedly/possibly, this is a pre-built wheel file for python 3.10 https://huggingface.co/Jmica/audiobook_maker/tree/main

the one that says cp310

I have not tried it, and wuzzoy tried it and said it didnt work for him #4712 (comment)

(Ive not yet tried the 3.11 @S95Sedan method yet... as I've been too deep in other things)

elkay · 2023-12-14T01:23:06Z

S95Sedan's build worked fine for me. The one I built myself (successfully) was throwing this error:

ERROR: deepspeed-0.8.3+unknown-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.

Wuzzooy · 2023-12-14T02:15:20Z

S95Sedan's build worked fine for me. The one I built myself (successfully) was throwing this error:

ERROR: deepspeed-0.8.3+unknown-cp39-cp39-win_amd64.whl is not a supported wheel on this platform.

If you tried to install it in a 3.11 python environment, it's normal.

erew123 · 2023-12-14T05:42:17Z

@S95Sedan Just tested out your DeepSpeed 11.2 wheel file and it works great!! Really nice job on that!

Are you ok with me including a reference to you having done this and also linking to the file+adding documentation in AllTalk (Will reference you figured this out)?

Thanks

erew123 · 2023-12-14T06:50:27Z

@S95Sedan If it is ok with you, I have published up your instructions here https://github.com/erew123/alltalk_tts?tab=readme-ov-file#deepspeed-112-for-windows--python-311

Happy to remove them and just solely link to you (if you prefer) or remove your name (if you prefer) etc.

If you're ok with me using your name etc, Id like to amend the instructions within AllTalk and also reference you at the bottom in the "Thanks" area of the documentation.

Let me know! And thanks for the info on this! :)

S95Sedan · 2023-12-14T11:12:56Z

@erew123 yeah no worries, feel free to grab whatever you need to make it as complete as possible.

Edit1: Your docs seem to mention 11.2 aswell but the version uploaded here is 11.1.
Edit2: I've corrected the instructions for creating the .whl and uploaded the compiled ones Here grab whatever info or direct links you need from there.

erew123 · 2023-12-15T05:13:50Z

@S95Sedan Thanks so much! That is awesome! :) Ill make sure you are thanks+referenced and linked in AllTalk. Will be on its next update.

daswer123 · 2023-12-15T17:03:43Z

@S95Sedan Thank you so much for the instructions.
I built deepspeed 11.2 for python 3.10.x and CUDA 11.8.

If anyone is interested, you can get the prebuild-wheel here
https://github.com/daswer123/resemble-enhance-windows/releases/tag/deepspeed

erew123 · 2023-12-18T00:43:04Z

Closing this ticket off as I think the issue is put to bed now. Thanks everyone who got involved!

erew123 added the enhancement New feature or request label Nov 26, 2023

erew123 mentioned this issue Nov 28, 2023

A possible 4x-5x faster performance increase in coqui_tts processing on low memory cards. #4712

Closed

S95Sedan mentioned this issue Dec 7, 2023

[REQUEST] - Installing DeepSpeed on Windows! (Correct instructions HERE. Please update the Front page of GitHub) microsoft/DeepSpeed#4729

Open

daswer123 mentioned this issue Dec 15, 2023

Add wheels for python 3.10 S95Sedan/Deepspeed-Windows#1

Closed

erew123 closed this as completed Dec 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installing DeepSpeed on Windows - The Correct Procedure / Solution / Solved #4734

Installing DeepSpeed on Windows - The Correct Procedure / Solution / Solved #4734

erew123 commented Nov 26, 2023 •

edited

Loading

erew123 commented Nov 26, 2023 •

edited

Loading

rktvr commented Nov 27, 2023

erew123 commented Nov 27, 2023 •

edited

Loading

erew123 commented Nov 28, 2023

S95Sedan commented Dec 7, 2023 •

edited

Loading

rktvr commented Dec 7, 2023

erew123 commented Dec 11, 2023

jepjoo commented Dec 13, 2023

erew123 commented Dec 13, 2023 •

edited

Loading

elkay commented Dec 14, 2023

Wuzzooy commented Dec 14, 2023 •

edited

Loading

erew123 commented Dec 14, 2023

erew123 commented Dec 14, 2023

S95Sedan commented Dec 14, 2023 •

edited

Loading

erew123 commented Dec 15, 2023

daswer123 commented Dec 15, 2023 •

edited

Loading

erew123 commented Dec 18, 2023

Installing DeepSpeed on Windows - The Correct Procedure / Solution / Solved #4734

Installing DeepSpeed on Windows - The Correct Procedure / Solution / Solved #4734

Comments

erew123 commented Nov 26, 2023 • edited Loading

erew123 commented Nov 26, 2023 • edited Loading

rktvr commented Nov 27, 2023

erew123 commented Nov 27, 2023 • edited Loading

erew123 commented Nov 28, 2023

S95Sedan commented Dec 7, 2023 • edited Loading

rktvr commented Dec 7, 2023

erew123 commented Dec 11, 2023

jepjoo commented Dec 13, 2023

erew123 commented Dec 13, 2023 • edited Loading

elkay commented Dec 14, 2023

Wuzzooy commented Dec 14, 2023 • edited Loading

erew123 commented Dec 14, 2023

erew123 commented Dec 14, 2023

S95Sedan commented Dec 14, 2023 • edited Loading

erew123 commented Dec 15, 2023

daswer123 commented Dec 15, 2023 • edited Loading

erew123 commented Dec 18, 2023

erew123 commented Nov 26, 2023 •

edited

Loading

erew123 commented Nov 26, 2023 •

edited

Loading

erew123 commented Nov 27, 2023 •

edited

Loading

S95Sedan commented Dec 7, 2023 •

edited

Loading

erew123 commented Dec 13, 2023 •

edited

Loading

Wuzzooy commented Dec 14, 2023 •

edited

Loading

S95Sedan commented Dec 14, 2023 •

edited

Loading

daswer123 commented Dec 15, 2023 •

edited

Loading