Allow args to be optional in deepspeed.initialize #825

jeffra · 2021-03-05T19:45:05Z

Before this PR args passed into deepspeed.initialize must be an object with [get|set]attribute support. This PR allows users to initialize deepspeed via deepspeed.initialize(model=model, config_params=config_dict). This assumes that local_rank is set as an environment variable (which is already done via deepspeed/pytorch.distributed launchers and/or from deepspeed.init_distributed).

Can only do the above call with kwargs, positional args need to be backwards compatible to support deepspeed.initialize(args, model). NOTE: this change does not require changes to existing code that use deepspeed.initialize with args.

This PR allows hugging face and lightning to not have to pass a SimpleNamespace object to deepspeed.initialize which was previously a hack to get deepspeed working.

For hugging face integration this means they can (optionally) remove the SimpleNamespace object passed to local rank, since it will be set since deepspeed.init_distributed is already called prior to deepspeed.initialize which ensures the local rank is set properly. https://github.com/huggingface/transformers/blob/256482ac9285c467fb97ca3b1b693a4de1d0ac60/src/transformers/integrations.py#L409-L414 /cc @stas00

For lightning integration I don't believe it is using deepspeed.init_distributed for launching but if they assume torch.distributed launcher then this is fine, see: https://github.com/pytorch/pytorch/blob/master/torch/distributed/launch.py#L253. They can also remove SimpleNamespace here as well then. https://github.com/PyTorchLightning/pytorch-lightning/blob/efda48faab666b78bb5f71bff4a0838f9b82dee5/pytorch_lightning/plugins/training_type/deepspeed.py#L243 /cc @SeanNaren

In either case hf/lightning could set or assert os.environ['LOCAL_RANK'] = self.local_rank before calling deepspeed.initialize.

stas00 · 2021-03-05T21:57:56Z

That is indeed much simpler. Thank you, @jeffra!

We will have to wait for your new release to implement that so that we can require that version number.

But probably need to wait for @cli99 to sort out the issue w/ non-DeepSpeed optim/scheduler before the new release if it's not too far.

SeanNaren · 2021-03-06T14:41:48Z

Nice changes @jeffra! Regarding lightning not using deepspeed.init_distributed, I think because this uses auto_mpi discovery as default, which required mpi4py to be installed, hence we just do a torch.distributed init process group ourselves. I may be incorrect now, will check :)

jeffra · 2021-03-16T16:31:04Z

Nice changes @jeffra! Regarding lightning not using deepspeed.init_distributed, I think because this uses auto_mpi discovery as default, which required mpi4py to be installed, hence we just do a torch.distributed init process group ourselves. I may be incorrect now, will check :)

Oh I see, interesting. deepspeed.init_distributed will only use the auto_mpi discovery code path (which requires mpi4py) if the training process(es) were launched without the deepspeed launcher or the torch distributed launcher.

deepspeed/runtime/engine.py

@awan-10

* [WarmupDecayLR] fix log(0) & 1/log(1) bugs (microsoft#772) * fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> * bump to v0.3.12 * Bug fix: Remove client optimizer param_group list item that does not have 'params' (microsoft#827) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline doc typos/improvements (microsoft#659) Admin merging for pure-doc PR that does not trigger build. * Samyamr/inference hook fix (microsoft#851) * Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * ZeRO Stage 2: Clear reduced gradients (microsoft#856) * Ensure gradients of other partitions are cleared after reduction * Remove redundant code Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [runner/launch] propagate the error (microsoft#854) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * docs: minor spelling tweaks (microsoft#858) * Allow args to be optional in deepspeed.initialize (microsoft#825) * Fix ZeRO3 save_checkpoint (microsoft#857) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Make config objects json serializable (microsoft#862) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * bump version 0.3.13 * 1-bit Adam v2 (microsoft#817) Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., microsoft#813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (microsoft#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (microsoft#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit df8c40d. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit 7840085, reversing changes made to a6dba72. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit 6dbdd98. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * consistent checkpoint filenaming (microsoft#865) * consistent checkpoint filenaming * backward compatible rename Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * [doc] launcher (microsoft#868) As discussed in microsoft#662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: microsoft#662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline (microsoft#888) * [doc] pipeline As @g-karthik flagged in microsoft#659 (comment) my previous correction PR had one sentence that said the wrong thing. So this PR attempts to rectify that. Thank you! * tweak * [debug utils] see_memory_usage fixes (microsoft#890) * see_memory_usage fixes * didn't expect pt-1.2 * fix the order of things * fix the order of things * full fp32 weights reconstruction for zero 2+3 (microsoft#892) * save_fp16_model consolidated for zero3 (microsoft#893) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (microsoft#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in microsoft#707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * update kramdown (microsoft#901) security alert related to older kramdown version * update backward api doc (microsoft#903) * Bump kramdown from 2.3.0 to 2.3.1 in /docs (microsoft#905) Bumps [kramdown](https://github.com/gettalong/kramdown) from 2.3.0 to 2.3.1. - [Release notes](https://github.com/gettalong/kramdown/releases) - [Changelog](https://github.com/gettalong/kramdown/blob/master/doc/news.page) - [Commits](https://github.com/gettalong/kramdown/commits) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * We're hiring! + integration posts * [website] We're hiring! + integration posts * [website] we're hiring! * zero.Init() clarification (microsoft#880) * zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * disable pipe test (microsoft#915) This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though. * Add link to AML examples. (microsoft#916) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: brett koonce <koonce@gmail.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: hamlet <gvvvv@163.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sid <sidney.black@aleph-alpha.de>

@awan-10

* test sparse self_attn fix * [WarmupDecayLR] fix log(0) & 1/log(1) bugs (microsoft#772) * fix log(0) & 1/log(1) bugs * simplify Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> * bump to v0.3.12 * Bug fix: Remove client optimizer param_group list item that does not have 'params' (microsoft#827) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline doc typos/improvements (microsoft#659) Admin merging for pure-doc PR that does not trigger build. * Samyamr/inference hook fix (microsoft#851) * Fix mis-aligned-grad When a parameter is not divisible by world size, the partitioned gradients are mis-aligned due to incorrect padding handling. This PR should fix for that. * Formatting fix * Adding static_scale test back for Z3, and also changing hidden size to be not divisile by world_size * also removing alignment from flat fp16 buffers * Testing for hidden dim alignment * inference hook fix * Update stage3.py * formatting * [bug-fix] move params to gpu if offload params is turned off Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * ZeRO Stage 2: Clear reduced gradients (microsoft#856) * Ensure gradients of other partitions are cleared after reduction * Remove redundant code Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [runner/launch] propagate the error (microsoft#854) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * docs: minor spelling tweaks (microsoft#858) * Allow args to be optional in deepspeed.initialize (microsoft#825) * Fix ZeRO3 save_checkpoint (microsoft#857) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Make config objects json serializable (microsoft#862) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * bump version 0.3.13 * 1-bit Adam v2 (microsoft#817) Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., microsoft#813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (microsoft#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (microsoft#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit df8c40d. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit 7840085, reversing changes made to a6dba72. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit 6dbdd98. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * consistent checkpoint filenaming (microsoft#865) * consistent checkpoint filenaming * backward compatible rename Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * [doc] launcher (microsoft#868) As discussed in microsoft#662 this PR modifies the doc: * explains what to use instead of CUDA_VISIBLE_DEVICES * puts the `--hostfile` cl arg in the correct place in the invocation script Fixes: microsoft#662 Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [doc] pipeline (microsoft#888) * [doc] pipeline As @g-karthik flagged in microsoft#659 (comment) my previous correction PR had one sentence that said the wrong thing. So this PR attempts to rectify that. Thank you! * tweak * [debug utils] see_memory_usage fixes (microsoft#890) * see_memory_usage fixes * didn't expect pt-1.2 * fix the order of things * fix the order of things * full fp32 weights reconstruction for zero 2+3 (microsoft#892) * save_fp16_model consolidated for zero3 (microsoft#893) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training (microsoft#861) * Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in microsoft#707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * mlperf attn initial commit * update kramdown (microsoft#901) security alert related to older kramdown version * update backward api doc (microsoft#903) * Bump kramdown from 2.3.0 to 2.3.1 in /docs (microsoft#905) Bumps [kramdown](https://github.com/gettalong/kramdown) from 2.3.0 to 2.3.1. - [Release notes](https://github.com/gettalong/kramdown/releases) - [Changelog](https://github.com/gettalong/kramdown/blob/master/doc/news.page) - [Commits](https://github.com/gettalong/kramdown/commits) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * We're hiring! + integration posts * [website] We're hiring! + integration posts * [website] we're hiring! * zero.Init() clarification (microsoft#880) * zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * disable pipe test (microsoft#915) This test has been giving us trouble for a bit, seeing nondeterministic failures, skipping for now to not break out CI. Need to revisit soon though. * Add link to AML examples. (microsoft#916) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * add inference_batch fn * Add space in help string (microsoft#926) * Fix for fragmented linear inputs in ZeRO 3 Linear layers where reshap… (microsoft#881) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [zero3] GatheredParameters can now handle a list of params (microsoft#884) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * fix cpu_adam memory leak on deepspeed re-use in the same process (microsoft#896) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [benchmarks] flatten/unflatten benchmarks (microsoft#919) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * improved readability + typos (microsoft#895) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * [zero doc] fix misspelled param (microsoft#878) We really really really need those params to be validated... Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Samyamr/stage 3 skip modules without parameters (microsoft#867) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * docs (microsoft#909) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Supporting different hidden dimensions for transformer kernels-v2 (microsoft#934) Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * Pull changes from DeepSpeed * cleanup, reinstantiate sending of logits / layer_past * cleanup, reinstantiate sending of logits / layer_past * bump to 0.3.14 * add pypi badge * Delete check of pdsh (microsoft#941) * fix double linear override; spelling (microsoft#954) * [config] turn exponential notation back on for config dump (microsoft#955) * e-notation for large floats * handle ints too * readability * handle bool Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * document how to override ~/.cache/torch_extensions (microsoft#959) * [zero] faster flatten/unflatten (cpp version) (microsoft#910) * faster flatten/unflatten with apex * switch to cpp flatten/unflatten * style * better comment * missing import * switch to build ops at run time * fixes Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * update lr scheduler doc for doing per step or epoch update (microsoft#913) * update lr scheduler doc for doing per step or epoch update * work * trigger build Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Fix ZeRO-3 UnboundLocalError (microsoft#968) * Fix UnboundLocalError * Get full partition size * ZeRO-Infinity (microsoft#976) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> * revert zero-inf change to launcher * [docs] zero-inf updates * bump to 0.3.15 * ZeRO-Infinity tutorial additions (microsoft#978) * zinf tutorial * more megatron integration docs * [docs] add ZeRO-Inf news items * refactor * ZeRO-Infinity docs (microsoft#979) * zinf tutorial * more megatron integration docs * ZInf + tiling docs * [docs] zero-inf updates * assert no Z2/Z3 with pipeline and fix some docs links (microsoft#980) * add option to force multi-node launcher mode (microsoft#977) * [ZeRO Infinity] Allow Init to take a dict for the deepspeed config (microsoft#983) * Add check to see if json file is already loaded * Update doc * Address review * Remove doc comment Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * make bold+italic work without escaping _ (microsoft#775) Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * remove debug prints: (microsoft#986) * 1-bit LAMB optimizer (microsoft#970) 1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He Paper: https://arxiv.org/abs/2104.06069 Co-authored-by: sdtblck <46172032+sdtblck@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Use odd shape tensor to represent parameter data in partitioned state (microsoft#981) * use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> * Make reduce scatter optional for ZeRO-1 as workaround (microsoft#971) * Make reduce scatter optional for ZeRO-1 as workaround * Make allreduce default for ZeRO 1 Co-authored-by: Jeff Rasley <jerasley@microsoft.com> * Fix all Pipeline Module Parameters being sent to cuda:0 (microsoft#687) * remove communicate overflow (already in utils.CheckOverflow) Co-authored-by: sid <sidney.black@aleph-alpha.de> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: brett koonce <koonce@gmail.com> Co-authored-by: Conglong Li <conglong.li@gmail.com> Co-authored-by: Ammar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: hamlet <gvvvv@163.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Takuya Makino <takuyamakino15@gmail.com> Co-authored-by: Reza Yazdani <reyazda@microsoft.com> Co-authored-by: Sean Naren <sean@grid.ai>

jeffra added 5 commits March 5, 2021 11:15

only set args.local_rank if the attr exists already

124de68

add unit test

6c4b949

formatting

775ffac

make args optional

badced0

fix missing hidden

876eda8

jeffra requested review from arashashari, awan-10, cli99, conglongli, eltonzheng, minjiaz, niumanar, RezaYazdaniAminabadi, samyam, ShadenSmith and tjruwase as code owners March 5, 2021 19:45

jeffra changed the title ~~only set args.local_rank if the attr exists already~~ Allow args to be optional in deepspeed.initialize Mar 5, 2021

Merge branch 'master' into jeffra/args-local-rank

6d4ab2f

tjruwase approved these changes Mar 16, 2021

View reviewed changes

deepspeed/runtime/engine.py Show resolved Hide resolved

restructure sanity check for LOCAL_RANK

17ce3b6

jeffra merged commit 871f304 into master Mar 16, 2021

jeffra deleted the jeffra/args-local-rank branch March 16, 2021 19:38

stas00 mentioned this pull request Mar 17, 2021

[DeepSpeed] simplify init huggingface/transformers#10762

Merged

stas00 mentioned this pull request Mar 26, 2021

Error building extension 'cpu_adam' #889

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow args to be optional in deepspeed.initialize #825

Allow args to be optional in deepspeed.initialize #825

jeffra commented Mar 5, 2021 •

edited

Loading

stas00 commented Mar 5, 2021

SeanNaren commented Mar 6, 2021

jeffra commented Mar 16, 2021

Allow args to be optional in deepspeed.initialize #825

Allow args to be optional in deepspeed.initialize #825

Conversation

jeffra commented Mar 5, 2021 • edited Loading

stas00 commented Mar 5, 2021

SeanNaren commented Mar 6, 2021

jeffra commented Mar 16, 2021

jeffra commented Mar 5, 2021 •

edited

Loading