Skip to content

tensor mismatch when finetuning a smollm3 model #41129

@codefusser

Description

@codefusser

System Info

When performing a fine-tuning job with batch size of 4 and max steps of 1000, it errors out with some tokenizer error

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None, 'pad_token_id': None}.


  0%|          | 0/500 [00:00<?, ?it/s]Traceback (most recent call last):

  File "/tmp/script.py", line 191, in <module>

    trainer.train()

  File "/root/.cache/uv/environments-v2/script-912247c0edd68a55/lib/python3.12/site-packages/transformers/trainer.py", line 2328, in train

    return inner_training_loop(

           ^^^^^^^^^^^^^^^^^^^^

  File "/root/.cache/uv/environments-v2/script-912247c0edd68a55/lib/python3.12/site-packages/transformers/trainer.py", line 2672, in _inner_training_loop

    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/.cache/uv/environments-v2/script-912247c0edd68a55/lib/python3.12/site-packages/trl/trainer/sft_trainer.py", line 1189, in training_step

    return super().training_step(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/.cache/uv/environments-v2/script-912247c0edd68a55/lib/python3.12/site-packages/transformers/trainer.py", line 4009, in training_step

    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/root/.cache/uv/environments-v2/script-912247c0edd68a55/lib/python3.12/site-packages/trl/trainer/sft_trainer.py", line 1123, in compute_loss

    entropy = torch.sum(per_token_entropy * attention_mask) / attention_mask.sum()

                        ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~

RuntimeError: The size of tensor a (4) must match the size of tensor b (8) at non-singleton dimension 0


  0%|          | 0/500 [00:01<?, ?it/s]

Please what might be wrong?

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

sft finetuning with a10-largex2 GPU

training parameters:

Configure training

config = SFTConfig(
output_dir="./smollm3-jobs-sft",
per_device_train_batch_size=4,
learning_rate=3e-5,
max_steps=1000,
logging_steps=50,
save_steps=200,
push_to_hub=True,
hub_model_id="hubsnippetai/smollm3-jobs-sft"
)

Train

trainer = SFTTrainer(
model=model,
train_dataset=processed_train,
args=config,
)
trainer.train()

Expected behavior

After loading the datasets, the job should continue running while training the model without erroring out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions