Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve

## Information

The problem arises in chapter:

* [ ] Introduction
* [ ] Text Classification
* [ ] Transformer Anatomy
* [ HERE] Multilingual Named Entity Recognition
* [ ] Text Generation
* [ ] Summarization
* [ ] Question Answering
* [ ] Making Transformers Efficient in Production
* [ ] Dealing with Few to No Labels
* [ ] Training Transformers from Scratch
* [ ] Future Directions


## Describe the bug
The error stems from that if you run the exemplar colab code, and push it to your huggingface hub and try to run it. When trying to run inference, it outputs the error: "Can't load tokenizer using from_pretrained, please update its configuration: tokenizers.AddedToken() got multiple values for keyword argument 'special'".

## To Reproduce
Steps to reproduce the behavior:

1. Run the exemplar colab code (https://colab.research.google.com/github/nlp-with-transformers/notebooks/blob/main/04_multilingual-ner.ipynb) up to:
![Screenshot 2024-06-27 at 11 16 59 PM](https://github.com/nlp-with-transformers/notebooks/assets/137890674/b777b296-55d7-43aa-b9fd-1645bff84ff8)

2. Then visit the Huggingface hub once the model is trained and pushed to the hub, and use your personal inference API, and this tokenizer error can be seen:
![Screenshot 2024-06-27 at 11 17 38 PM](https://github.com/nlp-with-transformers/notebooks/assets/137890674/f56a0fd2-49ee-4abc-998e-a5deff331a63)

3. Same also apply if you try to import the model off it, the error occurs at the tokenizer stage, and more precisely I believe at "special_tokens_map.json".
![Screenshot 2024-06-27 at 11 18 28 PM](https://github.com/nlp-with-transformers/notebooks/assets/137890674/4e6eed14-21bf-4eb9-8e01-71a2753377a2)

4. However, this seems to be subvertable if I instead just pass in the special token of "mask_token" as an extra kwargs, as per recommended by GPT-4
![Screenshot 2024-06-27 at 11 20 17 PM](https://github.com/nlp-with-transformers/notebooks/assets/137890674/76c33b8d-446b-49f3-a66c-e36e06e785c8)

```Python
from transformers import AutoTokenizer, AutoModelForTokenClassification

# Manually specify special tokens if the default configuration is problematic
special_tokens_dict = {
    "mask_token": {
    "content": "<mask>",
    "single_word": False,
    "lstrip": True,
    "rstrip": False,
    "normalized": True,
    "special": True,  
    "__type": "AddedToken"
}
}
tokenizer = AutoTokenizer.from_pretrained("shng2025/xlm-roberta-base-finetuned-panx-de", use_fast=True, **special_tokens_dict)
model = AutoModelForTokenClassification.from_pretrained("shng2025/xlm-roberta-base-finetuned-panx-de") 
```

- it seems to be that the model can be imported if I just declare the "mask_token" in general. But I don't know what is causing this error in general.

## Expected behavior

I was expecting that once I fine-tuned the model by running the exemplar code and pushed it to the hub. That the model can be easily ran from the Inference API. Can also try and check my code on my personal notebook: https://colab.research.google.com/drive/1F5L_vL1o6WC3DxGWDF_g6ZPKTJ7dcmxR#scrollTo=orgQubxKVrNX

However, the same error occured when I ran it using the exemplar code directly, so I think it's likely due to some changes made with the library after this book was published causing this? it's still runnable as mentioned if I passed in "mask_token" as a **kwarg. But this is very strange, and I would love to know what's causing this error, as I am still learning, etc. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Information

Describe the bug

To Reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chapter 4 - Inference Error. Pinpointed error, however don't know how to solve #141

Description

Information

Describe the bug

To Reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions