ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] #30769

gtanya89 · 2024-05-12T18:35:24Z

System Info

Using Trainer with PyTorch DDP on single node multiple GPU. torch.dist.init_process_group() is setup ok. Seems like Trainer _get_train_sampler() does not use DistributedSampler but rather RandomSampler? Or could this be another issue I am missing? Any inputs appreciated! Thanks!

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Error originates in data collator. Same code works with Single GPU.

Expected behavior

Expect to train in a distributed fashion on multiple GPUs using Trainer API.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2024-05-13T09:03:08Z

cc @pacman100 @muellerzr

yuyemin · 2024-05-13T15:50:36Z

I'm also curious why removing DistributedSampler from _get_train_sampler() as I remember older version has it implemented for training with multi-gpu case.

muellerzr · 2024-05-13T15:55:54Z

It uses Accelerate's sampler for the data now @yuyemin since the trainer has a complete integration. Can you post the error @gtanya89 with the full trace and a reproducer?

gtanya89 · 2024-05-15T03:16:13Z

Here is the error:

I am running it with this command single node multi-GPU training:
torchrun --nnodes 1 --nproc_per_node 2 --rdzv-endpoint=localhost:port bert_dist.py

Pytorch: 2.x
CUDA: 12.2

amyeroberts added the trainer label May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] #30769

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] #30769

gtanya89 commented May 12, 2024

amyeroberts commented May 13, 2024

yuyemin commented May 13, 2024

muellerzr commented May 13, 2024 •

edited

gtanya89 commented May 15, 2024

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] #30769

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] #30769

Comments

gtanya89 commented May 12, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented May 13, 2024

yuyemin commented May 13, 2024

muellerzr commented May 13, 2024 • edited

gtanya89 commented May 15, 2024

muellerzr commented May 13, 2024 •

edited