Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`? #479

ofirzaf · 2020-10-21T12:30:40Z

Hi,

I want to use the DeepSpeedCPUAdam instead of torch.optim.Adam for reducing the RAM usage of my GPUs while training.
I was wondering if DeepSpeedCPUAdam can be just dropped in instead of torch.optim.Adam or additional steps are needed?
I tried to do exactly that and I got a segmentation fault

Thanks

The text was updated successfully, but these errors were encountered:

tjruwase · 2020-10-21T20:00:13Z

@ofirzaf Thanks for using DeepSpeed.

Yes, you can use DeepSpeedCPUAdam as drop-in replacement for torch.optim.Adam, that will give up to a 6X throughput boost. Let us know of any issues.

ofirzaf · 2020-10-22T01:26:12Z

I tried to make it work with mnist example pytorch published: https://pastebin.pl/view/embed/d3921ab0
When I try to run it I get:

Adam Optimizer #0 is created with AVX2 arithmetic capability.
group 0 param 0 = 288
Segmentation fault (core dumped)

I am running with pytorch 1.6, cuda 10.1, deepspeed installed from master and v0.3 using DS_BUILD_CPU_ADAM=1 install.sh
Also tried to use your docker image to check if the problem is in my installation but the image doesn't have cpu adam installed, got ModuleNotFoundError on cpu_adam_op

RezaYazdaniAminabadi · 2020-10-22T19:01:29Z

Hi @ofirzaf

It seems that CPUAdam is instantiated correctly. However, it has crashed when trying to access the parameters on CPU memory. I see in your code, that the model is converted in line 124, and can be sent over to the gpu memory if the use_cuda flag is set in line 98. CPUAdam requires the parameter (master copy in FP32) and the optimizer states such as variance and momentum reside on CPU memory. Could you please check if such requirement is met in your test?

Thanks,
Reza

ofirzaf · 2020-10-27T23:58:52Z

I initialize the optimizer while the model's parameters which I want to optimize are passed to the GPU when I move the entire model to the GPU. I am not sure how should I keep the parameters and other stats on CPU while still training on GPU.
Maybe I misunderstood and this optimizer should only be used when training using CPU only?

RezaYazdaniAminabadi · 2020-10-28T23:16:32Z

Hi @ofirzaf ,

In this case, you probably want to run your code through DeepSpeed and turning on the ZeRO-Offload feature. You can find more information on ZeRO-offload here: https://www.deepspeed.ai/tutorials/zero-offload/. Also, to get started with DeepSpeed, please check out this tutorial: https://www.deepspeed.ai/getting-started/
Thanks.
Reza

tjruwase · 2020-10-29T05:34:59Z

@ofirzaf, yes you are right, DeepSpeedCPUAdam is only used for running optimizer on CPU only. I apologize that my initial response was confusing because I did not clarify that while torch.optim.Adam works correctly on both GPU and CPU, DeepSpeedCPUAdam only works on CPU. So it is only on CPU execution that DeepSpeedCPUAdam is a drop in replacement for torch.optim.Adam.

tjruwase · 2020-11-24T06:57:07Z

Closing for lack of activity. Please reopen as needed.

peterukk · 2021-08-05T13:20:11Z

So DeepSpeedCPUAdam should work without CUDA?

  File "...python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 37, in installed_cuda_version
    assert cuda_home is not None, "CUDA_HOME does not exist, unable to compile CUDA op(s)"

tjruwase · 2021-08-05T13:48:31Z

@peterukk, DeepSpeedCPUAdam will not work without CUDA (in theory it could). The reason is that DeepSpeedCPUAdam has a mode of execution where it also copies the updated parameters back to GPU using CUDA kernels. Do you have scenario where you want to use DeepSpeedCPUAdam outside CUDA environment? Depending on your answer can you please open a Question or reopen this one as appropriate? Thanks.

peterukk · 2021-08-05T14:23:08Z

@tjruwase thanks, I opened a new issue

* INT4 weight only quantization * pre commit * fix UT * fix UT * fix UT * fix UT * fix UT * fix UT * fix UT * add zero3 test * quantize small weight first to prevent oom * fold quantization config into ds_config * Fix license & refactor ds_config & rebase master * fix UT

* INT4 weight only quantization (#479) * INT4 weight only quantization * pre commit * fix UT * fix UT * fix UT * fix UT * fix UT * fix UT * fix UT * add zero3 test * quantize small weight first to prevent oom * fold quantization config into ds_config * Fix license & refactor ds_config & rebase master * fix UT * Moving quantization into post_init_method and add int4 dequantization kernel (#522) * Add experimental int4 dequantize kernel * move quantiation into post_init_method * fix * Refactor: move int4 code to deepspeed/inference (#528) * Move int 4 code to deepspeed/inference * fix * fix * fix * zero++ tutorial PR (#3783) * [Fix] _conv_flops_compute when padding is a str and stride=1 (#3169) * fix conv_flops_compute when padding is a str when stride=1 * fix error * change type of paddings to tuple * fix padding calculation * apply formatting check --------- Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * fix interpolate flops compute (#3782) * use `Flops Profiler` to test `model.generate()` (#2515) * Update profiler.py * pre-commit run --all-files * Delete .DS_Store * Delete .DS_Store * Delete .DS_Store --------- Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Cheng Li <pistasable@gmail.com> * revert PR #3611 (#3786) * bump to 0.9.6 * ZeRO++ chinese blog (#3793) * zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * remove staging trigger (#3792) * DeepSpeed-Triton for Inference (#3748) Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Ethan Doe <yidoe@microsoft.com> Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * ZeRO++ (#3784) Co-authored-by: HeyangQin <heyangqin@microsoft.com> Co-authored-by: GuanhuaWang <alexwgh333@gmail.com> Co-authored-by: cmikeh2 <connorholmes@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> * adding zero++ to navigation panel of deepspeed.ai (#3796) * Add ZeRO++ Japanese blog (#3797) * zeropp chinese blog * try better quality images * make title larger * even larger... * various fix * center captions * more fixes * fix format * add ZeRO++ Japanese blog * add links --------- Co-authored-by: HeyangQin <heyangqin@microsoft.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> * Bug Fixes for autotuner and flops profiler (#1880) * fix autotuner when backward is not called * fix format --------- Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Missing strided copy for gated MLP (#3788) Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> * Requires grad checking. (#3789) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * bump to 0.10.0 * Fix Bug in transform.cu (#3534) * Bug fix * Fixed formatting error --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> * bug fix: triton importing error (#3799) Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Fix dequant bug * Address PR feedback * Use super() __exit__ * Fix unit tests --------- Co-authored-by: Donglin Zhuang <donglinzhuang@outlook.com> Co-authored-by: Heyang Qin <heyangqin@microsoft.com> Co-authored-by: Bill Luo <50068224+zhiruiluo@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Guorun <84232793+CaffreyR@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: stephen youn <13525892+stephen-youn@users.noreply.github.com> Co-authored-by: Stephen Youn <styoun@microsoft.com> Co-authored-by: Arash Bakhtiari <arash@bakhtiari.org> Co-authored-by: Ethan Doe <yidoe@microsoft.com> Co-authored-by: yidoe <68296935+yidoe@users.noreply.github.com> Co-authored-by: GuanhuaWang <alexwgh333@gmail.com> Co-authored-by: cmikeh2 <connorholmes@microsoft.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Joe Mayer <114769929+jomayeri@users.noreply.github.com> Co-authored-by: Ramya Ramineni <62723901+rraminen@users.noreply.github.com>

tjruwase closed this as completed Nov 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`? #479

Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`? #479

ofirzaf commented Oct 21, 2020

tjruwase commented Oct 21, 2020

ofirzaf commented Oct 22, 2020

RezaYazdaniAminabadi commented Oct 22, 2020

ofirzaf commented Oct 27, 2020

RezaYazdaniAminabadi commented Oct 28, 2020

tjruwase commented Oct 29, 2020 •

edited

Loading

tjruwase commented Nov 24, 2020

peterukk commented Aug 5, 2021

tjruwase commented Aug 5, 2021

peterukk commented Aug 5, 2021

Question: Can DeepSpeedCPUAdam be used as a drop in replacement to torch.optim.Adam? #479

Question: Can DeepSpeedCPUAdam be used as a drop in replacement to torch.optim.Adam? #479

Comments

ofirzaf commented Oct 21, 2020

tjruwase commented Oct 21, 2020

ofirzaf commented Oct 22, 2020

RezaYazdaniAminabadi commented Oct 22, 2020

ofirzaf commented Oct 27, 2020

RezaYazdaniAminabadi commented Oct 28, 2020

tjruwase commented Oct 29, 2020 • edited Loading

tjruwase commented Nov 24, 2020

peterukk commented Aug 5, 2021

tjruwase commented Aug 5, 2021

peterukk commented Aug 5, 2021

Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`? #479

Question: Can `DeepSpeedCPUAdam` be used as a drop in replacement to `torch.optim.Adam`? #479

tjruwase commented Oct 29, 2020 •

edited

Loading