Skip to content

Generation with model parallel Megatron LM #2358

Closed
@rakeshchada

Description

❓ Questions and Help

What is your question?

Is there an example demonstrating how to generate using Megatron LM that was trained using model parallelism? The Megatron LM page shows how to run evaluation but there's no information on running generation.

What have you tried?

I tried running the below command but got an error.

Command:

fairseq-generate \
  $DATA_PATH \
  --path $MODEL_PATH \
  --task language_modeling \
  --gen-subset test \
  --max-sentences 8 \
  --criterion cross_entropy \
  --beam 1 \
  --sampling \
  --sampling-topp 0.9 \
  --temperature 0.01 \
  --prefix-size 200 \
  --distributed-world-size 8 \
  --results-path $RESULTS_PATH \
  --model-parallel-size 8;

Error:
/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCTensorScatterGather.cu:100: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [3,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.

After some debugging, I found that this line in the code caused the above error. But I'm unsure of the cause. It's possible there are some setup issues (data etc). But an example on how to setup and run generation using model parallel megatron LM would be great. Thank you.

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions