-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conda Environment Install Issue #95
Comments
Hi there! We're in the process of rewriting our installation scripts (that were previously only used within Docker containers) and hoping to also release a conda package in short time. These sorts of issues should be fixed at that point. |
The installation scripts perform sudo -H pip installs, which install system wide. I replaced those with normal pip installs and it installed into the current environment without problems. |
When I installed it I did that in the install script (for both the deep speed, apex and requirements). However, there were still issues in that DeepSpeed would not install to the right environment location. Looking at the installation a little more, this seemed more likely an issue with the wheel created for DeepSpeed in the install.sh file. I was able to get it working by forcing pip to install DeepSpeed into the correct location (the same location that Apex was correctly installed to). |
We have a now have a conda package uploaded and we appreciate any feedback! We have versions compiled for conda install deepspeed cudatoolkit=10.1 -c deepspeed -c pytorch -c conda-forge |
The repo's |
Using the conda install, deepspeed shows up when I run |
Hi @kleingeo, thanks for the report. I can see that on my end now as well. Not sure what happened...I'm looking into it. Interestingly, the |
Yes, I remember having this problem a lot when trying to install deepspeed normally with the install.sh file. With a normal python virtual env it works, but for some reason with Conda, it consistently tries to install to another location. The only thing I found to work was to force pip (when using conda) to force the install location to where the install.sh file installs Apex. |
@ShadenSmith , it is easier to install deepspeed via your conda command than 'install.sh' (prone to fail). In the deepspeed channel, only early-version deepspeed exists. conda search -f deepspeed -c deepspeed When do you plan to release new conda version of deepspeed with Zero2? Thanks |
Hi @jdongca2003, I have some time to dedicate to the DeepSpeed's conda infrastructure now that the v0.2 release is complete. I'm looking at improved packages (per the above bug report) and automating the package build process. |
@ShadenSmith Thanks. I tested your conda deepspeed package on https://github.com/microsoft/DeepSpeedExamples/tree/master/cifar. But it worked well on Tesla P4. Probably deepspeed does not support old GPU architecture. |
In V100, Same error with THCudaChecker happens!! |
Hi @jdongca2003 , @ShadenSmith Could you please explain why this happen? Dose deepspeed not support Tesla K80? Thanks. |
commit 7dc1f95d69a0231b7e880913fb6efa74193971f2 Author: Guo Yejun <yejun.guo@intel.com> Date: Tue Oct 18 15:43:37 2022 +0800 pretain_gpt2.py: use get_accelerator().synchronize() (#25)
Hi, closing this issue as it is stale with respect to Cuda/Torch/DeepSpeed versions. However, we now provide an environment.yml for ease of building in conda, that is located at the root of our repo! |
Trying to get DeepSpeed installed for local use with a Conda environment, but it seems that DeepSpeed in not installing to the environment itself. After building the wheel DeepSpeed is not installing into the proper Conda conda environment location. Apex is installing in the proper environment location. Unclear why DeepSpeed is not working but Apex is.
The text was updated successfully, but these errors were encountered: