Megatron tutorial #19

ShadenSmith · 2020-02-05T21:12:30Z

We need to port the Megatron tutorial into docs/ and then update links to it in README.md, etc.

The text was updated successfully, but these errors were encountered:

* change squad baseline to use new apex

* update bing_bert example to use sparse attention (microsoft#19) * update bing_bert example to use sparse transformer * Updated teh BertSparseSelfAttention example based on the ST updates * updated bing_bert example based on final updates for Sparse Attention; also added un/pad of Bert layer input * updated based on Tunji's comment: added a separate script for SA * fixed a typo * added an exception when both transformer kernel and SA are set together. * fixed an issue with last PR: removed keyword self for function call as it was moved out of class Co-authored-by: Arash Ashari <arashari@microsoft.com> Co-authored-by: arashashari <arash.ashari@gmail.com>

…2021-06-07 IFU-master-2021-06-07

* Reuse hf_model list among tests to avoid slow loading * try to debug test skip * another attempt to print test failure * another attempt * more attempt to print skip reason * revert changes that are temporary * remove extra flag for pytest * add a dummy test to test pytest * test skip message * put old test and temp test together to compare * try to find out the reason skip message are not printed * comment all skips * check skip in common.py * revert last commits * shorten name to show skip message * change test name * expand number of columns to 120 when running pytest * detect deepspeed installation * add test code for environment * change pytorch version 2.1.0==>2.0.1 * add py-cpuinfo as requiiremetns to dev * install py-cpuinfo manually * Change COLUMNS to 140 to allow display of pytest skip message * ping pytorch to 2.0.1 * add pip list before install deepspeed * install cpuinfo before install deepspeed * change workflow to work with pytorch 2.1 * add torch install to CI workflow * install py-cpuinfo * enforce autotp test on single socket instance * enforce 2 ranks in cpu autotp tests * enable tests that can only run on torch 2.1 or above * make build faster * remove -j make option * add back skip for codegen * check UT result * update tutorial

ShadenSmith added the documentation Improvements or additions to documentation label Feb 5, 2020

ShadenSmith self-assigned this Feb 5, 2020

ShadenSmith linked a pull request Feb 6, 2020 that will close this issue

Ported Megatron tutorial. #30

Merged

samyam closed this as completed in #30 Feb 7, 2020

gongwei-130 mentioned this issue Aug 7, 2020

'CUDA error: an illegal memory access was encountered' in forward #308

Open

TonyTangYu mentioned this issue Aug 21, 2020

Warning: NaN or Inf found in input tensor when running DeepSpeedExamples/BingBertSquad. #324

Open

GrvLeo mentioned this issue Oct 22, 2020

Fail to use Zero-offload: "ModuleNotFoundError: No module named 'deepspeed.ops.adam.cpu_adam_op'" #483

Closed

rraminen pushed a commit to rraminen/DeepSpeed that referenced this issue Apr 28, 2021

New apex compatible squad (microsoft#19)

9e2c735

* change squad baseline to use new apex

garvct mentioned this issue Jun 29, 2021

Bert training model failed when add --deepspeed_transformer_kernel #1155

Open

rraminen pushed a commit to rraminen/DeepSpeed that referenced this issue Jul 2, 2021

Merge pull request microsoft#19 from ROCmSoftwarePlatform/IFU-master-…

d98da5c

…2021-06-07 IFU-master-2021-06-07

pengwa pushed a commit to pengwa/DeepSpeed that referenced this issue Oct 14, 2022

make CL not truncate eval data (microsoft#19)

acc3e9b

lambda7xx mentioned this issue Feb 24, 2023

[BUG] Zero-Inference usage error with .init_inference() #2372

Closed

phalexo mentioned this issue Oct 11, 2023

[BUG] The code for deepspeed.comm.comm.monitored_barrier() #4488

Open

khayamgondal mentioned this issue Jun 11, 2024

Does deepspeed support aarch64? #5640

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Megatron tutorial #19

Megatron tutorial #19

ShadenSmith commented Feb 5, 2020

Megatron tutorial #19

Megatron tutorial #19

Comments

ShadenSmith commented Feb 5, 2020