[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS? #2427

d8ahazard · 2022-10-15T14:08:40Z

While "I get it"...I really don't get why this still doesn't even have BASIC Windows support.

It is published by Microsoft, right?

Compiling from source on windoze doesn't actually seem to generate a .whl file so it could be re-distributed or something.

Pulling from PIP throws any number of errors, from ADAM not being supported because it requires 'lscpu', or just failing because libaio.so can't be found.

Meaning, that for the past several years, this M$-produced piece of software is mostly useless on the OS they create.

This is one of the most annoying things about Python in general. "It's soooo cross-platform". Until you need a specific library, and realize it was really only ever developed for Linux users until someone threw a slug in the readme about how it MIGHT work with windows, but only if you do a hundred backflips while wearing a blue robe and sacrifice a chicken to Cthulhu.

Python does still support releasing different packages for different operating systems, right?

If that's still true, then it would be fantastic if someone out there could release a proper .whl to pypi for us second-class Windoze users. I really don't feel like spending the next several hours trying to upgrade my instance of WSL2 to the right version that won't lose it's mind if I try to use a specific amount of RAM...

d8ahazard · 2022-10-15T14:13:01Z

I mean, this only has open issues for the past two years or more...
#435, #1189, #1631, #1769, #2099, #2191, #2406

n00mkrad · 2022-10-16T12:58:59Z

+1

DeepSpeed is nearly (if not entirely) impossible to install on Windows.

tjruwase · 2022-10-16T19:08:46Z

We hear you. Please try #2428

RezaYazdaniAminabadi · 2022-10-20T04:41:05Z

Hi @n00mkrad and @d8ahazard,

I wonder if you have any update on whether this PR solved the Windows installation issue?
Thanks,
Reza

n00mkrad · 2022-10-20T12:41:19Z

Hi @n00mkrad and @d8ahazard,

I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

Nope.

Trying to run it in VS Powershell:

UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Trying to run in CMD:

D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

d8ahazard · 2022-10-20T13:42:36Z

Hi @n00mkrad and @d8ahazard,
I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

Nope.

Trying to run it in VS Powershell:

UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Trying to run in CMD:

D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

Solved this by installing the windows 10 SDK...but this is also precisely what I'm grumbling about.

Even after getting it to compile, there's no /dist folder and no .whl file, despite the setup.py file clearly indicating this is what should happen.

The .bat file is calling python setup.py bdist_whl...yet we get a .egg.info file.

If I edit the bat to call pip install setup.py, it gets really mad at me...can't find the error it throws ATM.

Like, within the app I'm trying to use deepspeed, I can easily do a try: / import deepspeed command to determine if that dependency exists. Why can't the setup.py script do the same for opts that may be unavailable in Windoze?

Last - when I do finally jump through all the hoops and get setup.py to create something in the /build folder, I have to manually spoof the whl-info directory in order for accelerate to recognize this, and even then, it refuses to load due to a missing module.

"Distributed package doesn't have MPI built in. MPI is only included if you build PyTorch from source on a host that has MPI installed."

camenduru · 2022-10-23T16:51:52Z

@tjruwase @RezaYazdaniAminabadi Hi
Can DeepSpeed work without libaio? if the answer is no there is no way to run DeepSpeed on windows right?

tjruwase · 2022-10-23T18:39:15Z

@d8ahazard, yes DeepSpeed can work without libaio. This library is only used by zero-infinity and zero-inference.

camenduru · 2022-10-23T19:41:07Z

@tjruwase thanks ❤️ if we don't need libaio why this error LINK : fatal error LNK1181: cannot open input file 'aio.lib'
set DS_BUILD_AIO=0
set DS_BUILD_SPARSE_ATTN=0

ChenYFan · 2022-10-24T08:25:05Z

Did Microsoft really consider adapting to windows when developing it? When I start pytorch, it forces linking a GPU with nccl even though I train under cpu only

As we all know, nccl cannot be used on win fucking at all

camenduru · 2022-10-24T15:28:05Z

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

tjruwase · 2022-10-24T15:44:45Z

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

How did you resolve the libaio link error?

n00mkrad · 2022-10-24T15:57:04Z

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

So it's still not working on Windows.

WSL is not always an option depending on the use case.

camenduru · 2022-10-24T15:57:23Z

@tjruwase I can't manage to run on native windows. 😭 and ubuntu already comes with libaio and this issue helped a lot
huggingface/diffusers#807

tjruwase · 2022-10-24T16:41:14Z

@camenduru, can you share the log of the link error? Thanks!

camenduru · 2022-10-24T21:18:37Z

@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3
errors coming from:

pytorch/pytorch#81642 (this one looks serious) 🥵
https://github.com/pytorch/pytorch/blob/v1.12.1/c10/util/safe_numerics.h

const char *cusparseGetErrorString(cusparseStatus_t status);
https://github.com/pytorch/pytorch/blob/v1.12.1/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp

is this one necessary?
[WARNING] please install triton==1.0.0 if you want to use sparse attention (Supported Platforms: Linux)
https://github.com/openai/triton/

camenduru · 2022-10-24T21:29:14Z

error C3861: '_addcarry_u64': identifier not found this one is very interesting it is in the list 🤷

Thomas-MMJ · 2022-10-31T20:42:27Z

@camenduru for wsl2, is it passing the pytest-3 tests/unit and other tests? I got it compiled on wsl2 but it is failing almost every test due to nccl issues.

If you could provide details as to your installation and whether you are passing the unit tests would be appreciated.

camenduru · 2022-11-01T05:59:25Z

@Thomas-MMJ DeepSpeed very slow with wsl2 and I deleted everything sorry I can't help 😞 we need working DeepSpeed on native windows maybe 1 year later idk also why we are putting linux kvm between gpu and cpu we will lose ~5% right?

tjruwase · 2022-11-01T21:18:35Z

@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3 errors coming from:

I think the problem is that it is trying to build all the ops because of the following environment variable setting

Can you try setting that env var to zero?

PleezDeez · 2023-01-12T16:21:05Z

have you tried using Chat GPT3 to solve it? 1 of the other requirements is Triton and a Russian managed to build a working 2.0 version for Windows a couple days ago but Chat GPT could likely find the other holes keeping it from building properly

PleezDeez · 2023-01-15T22:19:46Z

well if anyone feels like tinkering around with this, here's a whl that installs deepspeed version 0.8.0 on windows
https://transfer.sh/eDLOMJ/deepspeed-0.8.0+cd271a4a-cp310-cp310-win_amd64.whl
requires the cracked triton 2.0.0 whl first and the files from its folder dropped into the triton folder in xformers before it will install but it installs... heres the triton whl https://transfer.sh/me0xpC/triton-2.0.0-cp310-cp310-win_amd64.whl

PleezDeez · 2023-01-16T00:02:35Z

It'll throw up c10d flags looking for NCCL which is Linux only when turned on but this is an issue with either accelerate or my computer bc I get the same error when trying to turn on any sort of distributed training at all in windows and I don't know if I possess the coding knowledge to fix it so I leave it up to y'all

PleezDeez · 2023-01-16T00:41:40Z

Oh and it'll error out during accelerate config after saying no to using a deepspeed json file you'd like to use but I got around this by replacing the accelerate config file in windows with a config file I made in WSL

78Alpha · 2023-02-01T03:13:31Z

I must point out that those wheel links redirect to Not Found

JeffMII · 2023-03-14T21:35:15Z

Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?

d8ahazard · 2023-03-15T12:55:28Z

Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?

Not without compiling it yourself, sacrificing three chickens to the dark lord Cthulhu, and playing "Hit me baby one more time" on reverse.

camenduru · 2023-03-15T13:47:42Z

Oh no 😐 I was playing the wrong song.

yadalik · 2023-03-31T16:27:45Z

So, on windows 10, when I do:

pip install deepspeed                                                                               
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      test.c
      LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib"
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      test.c
      LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib"
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort    
          assert False, msg
      AssertionError: Unable to pre-compile sparse_attn
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  sparse_attn cuda is not available from torch
       [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
       [WARNING]  please install triton==1.0.0 if you want to use sparse attention
       [WARNING]  One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
       [ERROR]  Unable to pre-compile sparse_attn
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Trace2333 · 2023-04-02T15:04:54Z

So, on windows 10, when I do:

pip install deepspeed                                                                               
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      test.c
      LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib"
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      test.c
      LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib"
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      ЌҐ г¤ Ґвбп  ©вЁ гЄ § л© д ©«.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort    
          assert False, msg
      AssertionError: Unable to pre-compile sparse_attn
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  sparse_attn cuda is not available from torch
       [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
       [WARNING]  please install triton==1.0.0 if you want to use sparse attention
       [WARNING]  One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
       [ERROR]  Unable to pre-compile sparse_attn
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Same problem,seems no way to solve the problem,but it works fine on linux...

EntropicBlackhole · 2023-04-05T22:16:39Z

What if we all custom build a branch supporting win? I'm honestly tired of so, and so many things not being supported on windows, not allowing me to work with certain packages. Unless we all keep bugging Microsoft with it, they won't really support it on windows, not sure why though. I can only assume something about backwards compatibility and trying to make it work on win 95

marcoseduardopm · 2023-04-11T03:14:45Z

(Note: these steps are for the interference only mode)
After trying forever, I got it working. That's what I have done:

Install the vs build tool 2019. If you already have it installed, repair it;
Install Miniconda (if you haven't it already);
Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;
Open "Anaconda Prompt (MiniConda3)";
Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"
Activate the conda env using "conda activate dsenv";
Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";
Close anaconda prompt;
Open the Start -> "x64 Native Tools Command Prompt for VS 2019";
Initialize conda on the Command prompt using "conda init cmd.exe";
Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;
Activate the conda env using "conda activate dsenv";
Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";
Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);
Go to the deepspeed folder using "cd DeepSpeed";
Make 10 prayers to your god and try to install using "build_win.bat";
A .whl will be created in the dist folder.

To install the generated .whl, just use:
For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl
For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes:
Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540:
New Lines 531 and 532:
{static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()),
static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540:
{static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()),
static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed):
https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

hamed-d · 2023-04-11T08:22:32Z

(Note: these steps are for the interference only mode) After trying forever, I got it working. That's what I have done:

Install the vs build tool 2019. If you already have it installed, repair it;

Install Miniconda (if you haven't it already);

Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;

Open "Anaconda Prompt (MiniConda3)";

Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"

Activate the conda env using "conda activate dsenv";

Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";

Close anaconda prompt;

Open the Start -> "x64 Native Tools Command Prompt for VS 2019";

Initialize conda on the Command prompt using "conda init cmd.exe";

Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;

Activate the conda env using "conda activate dsenv";

Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";

Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);

Go to the deepspeed folder using "cd DeepSpeed";

Make 10 prayers to your god and try to install using "build_win.bat";

A .whl will be created in the dist folder.

To install the generated .whl, just use: For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes: Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540: New Lines 531 and 532: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed): https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

The wheels worked for me in PyTorch 1.13.1 with CUDA 11.7 and python 3.10.9. Thank you. Although, when running a command like

deepspeed script.py --deepspeed

Windows tries to open deepspeed using an application and asks what app it should use to open it. But when importing and running code for deepspeed in python, it works.

daxijiu · 2023-05-24T03:35:23Z

(Note: these steps are for the interference only mode) After trying forever, I got it working. That's what I have done:

Install the vs build tool 2019. If you already have it installed, repair it;

Install Miniconda (if you haven't it already);

Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;

Open "Anaconda Prompt (MiniConda3)";

Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"

Activate the conda env using "conda activate dsenv";

Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";

Close anaconda prompt;

Open the Start -> "x64 Native Tools Command Prompt for VS 2019";

Initialize conda on the Command prompt using "conda init cmd.exe";

Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;

Activate the conda env using "conda activate dsenv";

Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";

Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);

Go to the deepspeed folder using "cd DeepSpeed";

Make 10 prayers to your god and try to install using "build_win.bat";

A .whl will be created in the dist folder.

To install the generated .whl, just use: For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes: Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540: New Lines 531 and 532: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed): https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

Thank you for the method you provided, but it doesn't work for me with v0.9.2 version (win10+python3.10+vs2019). Could you please provide a solution or a whl file for v0.9.2 version?

ai-robert · 2023-09-11T00:42:33Z

Does DeepSpeed training work with WSL2? I've been going around in circles and have heard 3 different things. I ran into my own errors while installing it on WSL2 but I don't know if I should expect success with a few hours more work or if it's a hopeless cause? I'm also fine using a docker container if that's what it takes, I just can't find a straightforward answer on if training with deepspeed is reasonably expected to work on WSL2 at all

vTuanpham · 2023-09-13T05:57:10Z

Yeah, having the same problem, just thought that giving up and switching to wsl might solve the problem but when running, it just fail with: "FAILED: custom_cuda_kernel.cuda.o".

@echo

Deepspeed v0.11.1: Patch release cloned from https://github.com/microsoft/DeepSpeed on 10-28-2003. Compiled for Windows Torch 2.1.0 and CUDA 12.1 .rar because .whl was slightly too big for github.com. Includes 4 fixes described here microsoft/DeepSpeed#2427 (comment) and 4 fixes in other places shown below. I know nothing about C++. I just asked ChatGPT to fix the errors. diff --git a/build_win.bat b/build_win.bat index ec8c8a36..f21d79cc 100644 --- a/build_win.bat +++ b/build_win.bat @@ -1,5 +1,10 @@ @echo off +REM begin-KAS +set DS_BUILD_EVOFORMER_ATTN=0 +set DISTUTILS_USE_SDK=1 +REM end-KAS + set DS_BUILD_AIO=0 set DS_BUILD_SPARSE_ATTN=0 diff --git a/csrc/quantization/pt_binding.cpp b/csrc/quantization/pt_binding.cpp index a4210897..12777603 100644 --- a/csrc/quantization/pt_binding.cpp +++ b/csrc/quantization/pt_binding.cpp @@ -241,11 +241,12 @@ std::vector<at::Tensor> quantized_reduction(at::Tensor& input_vals, .device(at::kCUDA) .requires_grad(false); - std::vector<long int> sz(input_vals.sizes().begin(), input_vals.sizes().end()); - sz[sz.size() - 1] = sz.back() / devices_per_node; // num of GPU per nodes - const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node; + std::vector<int64_t> sz_vector(input_vals.sizes().begin(), input_vals.sizes().end()); + sz_vector[sz_vector.size() - 1] = sz_vector.back() / devices_per_node; // num of GPU per nodes + at::IntArrayRef sz(sz_vector); auto output = torch::empty(sz, output_options); + const int elems_per_in_tensor = at::numel(input_vals) / devices_per_node; const int elems_per_in_group = elems_per_in_tensor / (in_groups / devices_per_node); const int elems_per_out_group = elems_per_in_tensor / out_groups; diff --git a/csrc/transformer/inference/csrc/pt_binding.cpp b/csrc/transformer/inference/csrc/pt_binding.cpp index b7277d1e..a26eaa40 100644 --- a/csrc/transformer/inference/csrc/pt_binding.cpp +++ b/csrc/transformer/inference/csrc/pt_binding.cpp @@ -538,8 +538,8 @@ std::vector<at::Tensor> ds_softmax_context(at::Tensor& query_key_value, if (layer_id == num_layers - 1) InferenceContext::Instance().advance_tokens(); auto prev_key = torch::from_blob(workspace + offset, {bsz, heads, all_tokens, k}, - {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(), - k * InferenceContext::Instance().GetMaxTokenLength(), + {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()), + static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()), k, 1}, options); @@ -547,8 +547,8 @@ std::vector<at::Tensor> ds_softmax_context(at::Tensor& query_key_value, auto prev_value = torch::from_blob(workspace + offset + value_offset, {bsz, heads, all_tokens, k}, - {hidden_dim * InferenceContext::Instance().GetMaxTokenLength(), - k * InferenceContext::Instance().GetMaxTokenLength(), + {static_cast<unsigned>(hidden_dim * InferenceContext::Instance().GetMaxTokenLength()), + static_cast<unsigned>(k * InferenceContext::Instance().GetMaxTokenLength()), k, 1}, options); @@ -1578,7 +1578,7 @@ std::vector<at::Tensor> ds_rms_mlp_gemm(at::Tensor& input, auto output = at::from_blob(output_ptr, input.sizes(), options); auto inp_norm = at::from_blob(inp_norm_ptr, input.sizes(), options); auto intermediate_gemm = - at::from_blob(intermediate_ptr, {input.size(0), input.size(1), mlp_1_out_neurons}, options); + at::from_blob(intermediate_ptr, {input.size(0), input.size(1), static_cast<int64_t>(mlp_1_out_neurons)}, options); auto act_func_type = static_cast<ActivationFuncType>(activation_type);

GotAudio · 2023-10-29T04:44:49Z

I compiled Deepspeed v0.11.1 for windows, cuda 12.1. [python 3.10+Torch 2.1.0+cu121]
I know nothing about C++. I just verified and fixed marcoseduardopm's 4 errors and fixed 4 others as ChatGPT suggested.
See change details, screenshot and commit comment. Download and unrar. (the .whl was too big for github); GotAudio/data@5c5657f

pip install deepspeed-0.11.2+244040c1-cp310-cp310-win_amd64.whl

I had to use these settings;
set DS_BUILD_EVOFORMER_ATTN=0
set DISTUTILS_USE_SDK=1
set DS_BUILD_AIO=0
set DS_BUILD_SPARSE_ATTN=0

cdfpaz · 2023-10-31T23:06:25Z

this shit still open!!!!??? Bye Bye Microsoft

XeonKHJ · 2024-01-18T08:47:19Z

Why the heck this is still open?

ai-robert · 2024-01-18T14:03:10Z

NOTE: Training will not work on Windows AT ALL, not even with WSL/WSL2, and not by running Linux in a Virtual Machine.

Since my last post I learned some relevant things:
Training will not work on Windows AT ALL, not even with WSL/WSL2, and not by running Linux in a Virtual Machine. This is because deepspeed is programmed to use the linux CUDA drivers specifically.

You may think that since WSL/WSL2 can run CUDA then this is not a problem, but this is not the case! That is because WSL/WSL2 actually use the Windows CUDA drivers. You can not install the Linux CUDA drivers on WSL/WSL2. If you try then you will see a message from Nvidia telling you to download the Windows CUDA drivers instead, informing you that WSL/WSL2 makes use of the Windows CUDA drivers.
You may try to get around this by running Linux in a virtual machine like virtual box, however this also will not work because no virtualization software that runs on Windows* will pass through access to the GPU resources.
*I read something somewhere that some kind of Windows Server version in some datacenters may actually allow this, but I have not checked, I'm referring to any Windows you might be able to run on consumer hardware.

Fix #2427 --------- Co-authored-by: Costin Eseanu <costineseanu@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>

d8ahazard · 2024-06-25T01:16:59Z

Nice!

d8ahazard added the enhancement New feature or request label Oct 15, 2022

ghost mentioned this issue Feb 3, 2023

Deepspeed Error on Windows - LINK : fatal error LNK1181: cannot open input file 'aio.lib' oobabooga/text-generation-webui#46

Closed

affanmehmood mentioned this issue Mar 18, 2023

[Error] [Win] Unable to pre-compile async_io on Windows #1769

Closed

janvarev mentioned this issue Apr 15, 2023

Run with DeepSpeed on Windows (partially implemented) oobabooga/text-generation-webui#1225

Closed

rsxdalv mentioned this issue Jul 19, 2023

Potential windows support conda-forge/deepspeed-feedstock#19

Closed

stoperro mentioned this issue Jul 20, 2023

Missing Windows support bitsandbytes-foundation/bitsandbytes#30

Closed

JeavanCode mentioned this issue Sep 10, 2023

Deep speed not supported in Windows. baaivision/EVA#112

Open

tjruwase mentioned this issue Jun 19, 2024

Fixed Windows inference build. #5609

Merged

costin-eseanu closed this as completed in #5609 Jun 24, 2024

[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS? #2427

[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS? #2427

Comments

d8ahazard commented Oct 15, 2022

d8ahazard commented Oct 15, 2022 • edited Loading

n00mkrad commented Oct 16, 2022 • edited Loading

tjruwase commented Oct 16, 2022

RezaYazdaniAminabadi commented Oct 20, 2022

n00mkrad commented Oct 20, 2022

d8ahazard commented Oct 20, 2022

camenduru commented Oct 23, 2022

tjruwase commented Oct 23, 2022

camenduru commented Oct 23, 2022

ChenYFan commented Oct 24, 2022 • edited Loading

camenduru commented Oct 24, 2022

tjruwase commented Oct 24, 2022

n00mkrad commented Oct 24, 2022

camenduru commented Oct 24, 2022

tjruwase commented Oct 24, 2022

camenduru commented Oct 24, 2022

camenduru commented Oct 24, 2022

Thomas-MMJ commented Oct 31, 2022

camenduru commented Nov 1, 2022

tjruwase commented Nov 1, 2022

PleezDeez commented Jan 12, 2023

PleezDeez commented Jan 15, 2023

PleezDeez commented Jan 16, 2023

PleezDeez commented Jan 16, 2023

78Alpha commented Feb 1, 2023

JeffMII commented Mar 14, 2023

d8ahazard commented Mar 15, 2023

camenduru commented Mar 15, 2023

yadalik commented Mar 31, 2023

Trace2333 commented Apr 2, 2023

EntropicBlackhole commented Apr 5, 2023

marcoseduardopm commented Apr 11, 2023 • edited Loading

hamed-d commented Apr 11, 2023

daxijiu commented May 24, 2023

ai-robert commented Sep 11, 2023

vTuanpham commented Sep 13, 2023

GotAudio commented Oct 29, 2023

cdfpaz commented Oct 31, 2023

XeonKHJ commented Jan 18, 2024

ai-robert commented Jan 18, 2024

d8ahazard commented Jun 25, 2024

d8ahazard commented Oct 15, 2022 •

edited

Loading

n00mkrad commented Oct 16, 2022 •

edited

Loading

ChenYFan commented Oct 24, 2022 •

edited

Loading

marcoseduardopm commented Apr 11, 2023 •

edited

Loading