Pretrain and generate #43

king-menin · 2020-09-21T12:22:06Z

if i run with
python -m torch.distributed.launch --nproc_per_node 16 pretrain_gpt2.py --model_parallel_size==16
and after run generate:
python generate_samples.py
i have error while initialization: size mismatch for transformer.layers.15.attention.dense.weight: copying a param with shape torch.Size([1024, 64]) from checkpoint, the shape in current model is torch.Size([1024, 1024]).
can i load model on one GPU and train distributed on 16 GPU with --model_parallel_size==16?
thank you!

The text was updated successfully, but these errors were encountered:

github-actions · 2023-07-10T18:27:13Z

Marking as stale. No activity in 60 days. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2023-09-19T18:24:59Z

Marking as stale. No activity in 60 days.

king-menin closed this as completed Sep 23, 2020

king-menin reopened this Sep 25, 2020

github-actions bot added the stale No activity in 60 days on issue or PR label Jul 10, 2023

github-actions bot closed this as completed Jul 20, 2023

jon-barker reopened this Jul 20, 2023

github-actions bot removed the stale No activity in 60 days on issue or PR label Jul 21, 2023

github-actions bot added the stale No activity in 60 days on issue or PR label Sep 19, 2023

ericharper closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pretrain and generate #43

Pretrain and generate #43

king-menin commented Sep 21, 2020

github-actions bot commented Jul 10, 2023

github-actions bot commented Sep 19, 2023

Pretrain and generate #43

Pretrain and generate #43

Comments

king-menin commented Sep 21, 2020

github-actions bot commented Jul 10, 2023

github-actions bot commented Sep 19, 2023