Skip to content

Docs: fix Training step by removing tokenizer from trainer initialization#43733

Merged
stevhliu merged 2 commits intohuggingface:mainfrom
nesjett:patch-1
Feb 4, 2026
Merged

Docs: fix Training step by removing tokenizer from trainer initialization#43733
stevhliu merged 2 commits intohuggingface:mainfrom
nesjett:patch-1

Conversation

@nesjett
Copy link
Contributor

@nesjett nesjett commented Feb 4, 2026

Summary

This PR removes the deprecated tokenizer parameter from the Quicktour documentation examples.

As of the v5.0.0 release, the tokenizer argument was officially replaced from the Trainer constructor. Currently, the documentation example fails with a TypeError, preventing users from completing the introductory tutorial.

Changes

Location:

  • docs/source/en/quicktour.md
  • docs/source/en/tasks/language_modelin.md
  • docs/source/en/tasks/masked_language_modeling.md

Action: Replaced tokenizer=tokenizer from the Trainer initialization by processing_class=tokenizer

Reasoning: The tokenizer is officially replaced by processing_class:

TypeError: Trainer.init() got an unexpected keyword argument 'tokenizer'

Related Resources

Breaking Change Reference: Transformers v5.0.0 Release Notes

Affected Doc Page: Quicktour - Training Step

Checklist
[x] This PR fixes a typo or improves the docs.

cc @stevhliu @SunMarc

…tion

We already provide the tokenizer in the data_collator. Trying to provide the tokenizer parameter in the Trainer() __init__() method produces the following error: TypeError: Trainer.__init__() got an unexpected keyword argument 'tokenizer'
Copilot AI review requested due to automatic review settings February 4, 2026 10:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Quicktour training example to align with the current Trainer API by removing the deprecated/removed tokenizer argument, preventing a runtime TypeError during initialization.

Changes:

  • Remove tokenizer=tokenizer from the Trainer(...) initialization snippet in the Quicktour docs.

args=training_args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"],
tokenizer=tokenizer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we changed it to processing_class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up, forgot that being optional does not mean it is not required for some cases.

I pushed the change and provided a similar replacement in 2 more docs

@nesjett nesjett marked this pull request as draft February 4, 2026 15:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@nesjett nesjett marked this pull request as ready for review February 4, 2026 16:01
Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating!

@stevhliu stevhliu merged commit 452c179 into huggingface:main Feb 4, 2026
21 of 22 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@nesjett nesjett deleted the patch-1 branch February 4, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants