microsoft / Megatron-DeepSpeed Public

forked from NVIDIA/Megatron-LM

Notifications
Fork 317
Star 1.7k

Code
Issues 112
Pull requests 15
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: microsoft/Megatron-DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

112 Open 54 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Inquiry on Sequence Parallel Support for VocabParallelEmbedding

#389 opened May 18, 2024 by qinxiangyujiayou

about the optimizer param group

#387 opened May 17, 2024 by L-hongbin

屎山代码DeepSpeed

#386 opened May 11, 2024 by ControllableGeneration

Sequence Parallel is incompatible with Rotary Positional Embedding

#385 opened May 9, 2024 by anogkongda

Spurious all gather performance drop.

#384 opened Apr 29, 2024 by etiennemlb

Call for Conversion from Huggingface to Megads with MoE

#381 opened Apr 24, 2024 by ControllableGeneration

Expert deepcopy raises PickleError

#380 opened Apr 23, 2024 by sxontheway

AttributeError: 'Namespace' object has no attribute 'deepspeed_config_dict'. Did you mean: 'deepspeed_config'? && batch = next(self.data_iterator)

#379 opened Apr 20, 2024 by hi20240217

Assertion failure when there are more than 255 tokenized data files (assert num_datasets < 255 in blendable_dataset.py)

#377 opened Apr 17, 2024 by Jeronymous

Pipeline parallelism + CPU offload?

#369 opened Mar 21, 2024 by webber26232

[BUG] Problems with Mixture-of-Experts (MoE)

#367 opened Mar 16, 2024 by nikit91

Bugs in GPT2 Inference Example

#364 opened Mar 13, 2024 by JianzheXiao

Fine-tune llama2 with sequence parallelism

#360 opened Mar 6, 2024 by AnirudhVIyer

Problem in hf2megads_weight_converter.py

#359 opened Mar 5, 2024 by noob-ctrl

Loss is increasing when fine-tuning from a Megatron-Deepspeed pretrained checkpoint.

#358 opened Mar 5, 2024 by SefaZeng

Unreasonably low throughput on HGX-H100s bug

Something isn't working

#357 opened Mar 1, 2024 by GuanhuaWang

FileNotFoundError: [Errno 2] No such file or directory: 'dataset/index-cache/xxx_doc_idx.npy' bug

Something isn't working

#356 opened Mar 1, 2024 by GuanhuaWang

2nodes, 4 gpu, tp=2,pp=2, timeout

#338 opened Jan 23, 2024 by lonelydancer

Error set num-experts>1 when running the generate_test.sh

#337 opened Jan 22, 2024 by jrt-20

[QUESTION] Does the dev team have a plan to merge Mega-LM 0.4?

#336 opened Jan 18, 2024 by nrailg

it seems there is a version problem please help me

#333 opened Jan 16, 2024 by K-Alex13

[QUESTION]how to use' nsight compute' to profile 'pretrain_llama2_distributed.sh' in the 'examples_deepspeed' folder ?

#332 opened Jan 16, 2024 by GFang6

Doubts about GPU memory

#330 opened Jan 12, 2024 by 980202006

how to convert deepspeed model to megatron, when pp=2, tp=2, nnode=2

#329 opened Jan 11, 2024 by lonelydancer

bos_token_id is assigned to be eos_token_id in _HFTokenizer

#326 opened Jan 8, 2024 by annahung31

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly