Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] #30769

Open
4 tasks
gtanya89 opened this issue May 12, 2024 · 4 comments
Labels

Comments

@gtanya89
Copy link

System Info

Using Trainer with PyTorch DDP on single node multiple GPU. torch.dist.init_process_group() is setup ok. Seems like Trainer _get_train_sampler() does not use DistributedSampler but rather RandomSampler? Or could this be another issue I am missing? Any inputs appreciated! Thanks!

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Error originates in data collator. Same code works with Single GPU.

Expected behavior

Expect to train in a distributed fashion on multiple GPUs using Trainer API.

@amyeroberts
Copy link
Collaborator

cc @pacman100 @muellerzr

@yuyemin
Copy link

yuyemin commented May 13, 2024

I'm also curious why removing DistributedSampler from _get_train_sampler() as I remember older version has it implemented for training with multi-gpu case.

@muellerzr
Copy link
Contributor

muellerzr commented May 13, 2024

It uses Accelerate's sampler for the data now @yuyemin since the trainer has a complete integration. Can you post the error @gtanya89 with the full trace and a reproducer?

@gtanya89
Copy link
Author

Here is the error:

Screenshot 2024-05-14 at 3 43 52 PM

I am running it with this command single node multi-GPU training:
torchrun --nnodes 1 --nproc_per_node 2 --rdzv-endpoint=localhost:port bert_dist.py

Pytorch: 2.x
CUDA: 12.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants