AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

rajanish4 · 2024-06-10T10:04:54Z

System Info

transformers version: 4.42.0.dev0
Platform: Windows-10-10.0.20348-SP0
Python version: 3.9.7
Huggingface_hub version: 0.23.3
Safetensors version: 0.4.3
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 1.13.0 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA RTX A6000

Who can help?

@ArthurZucker

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("facebook/nllb-200-distilled-600M", src_lang="ron_Latn",
token=token)
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/nllb-200-distilled-600M", token=token)

article = "Şeful ONU spune că nu există o soluţie militară în Siria"
inputs = tokenizer(article, return_tensors="pt")
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"], max_length=30)
tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]

Expected behavior

It should output translated text: UN-Chef sagt, es gibt keine militärische Lösung in Syrien

Complete error:

translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["deu_Latn"], max_length=30)
AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id'

The text was updated successfully, but these errors were encountered:

ArthurZucker · 2024-06-10T10:20:34Z

Yes, we had a deprecation cycle and this attribute was removed 😉

rajanish4 · 2024-06-10T12:28:05Z

Thanks, but then how can i provide the language code for translation?

ArthurZucker · 2024-06-10T13:53:03Z

you should simply do tokenizer.encode("deu_Latn")[0]

tokenizer-decode · 2024-07-01T17:44:38Z

Then why the doc says otherwise? This is V4.42.0.
I also don't understand how to use tokenizer.encode("deu_Latn")[0]. What's the keyword? Is this a positional argument? @ArthurZucker

fe1ixxu · 2024-07-02T18:02:32Z

It seems there is an error: whatever the language code I gave to the NLLB tokenizer, it will always output English token id. My version is V4.42.3 @ArthurZucker :

ShayekhBinIslam · 2024-07-02T21:17:18Z

I think, tokenizer.encode("deu_Latn")[0] is the regular BOS token, tokenizer.encode("deu_Latn")[1] is the expected token. @ArthurZucker

ArthurZucker · 2024-07-10T10:46:55Z

Yes! You should use convert_token_to_id rather than encode sorry 😉

tnitn · 2024-07-12T18:41:21Z

What worked for me is
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.convert_tokens_to_ids("deu_Latn"), max_length=30)

ArthurZucker · 2024-07-13T09:03:20Z

yep this is what we expect!

…ssue. huggingface/transformers#31348

LoicGrobol mentioned this issue Jun 11, 2024

Performance regression with transformers 0.41 LoicGrobol/zeldarose#85

Open

blackmesataiwan added a commit to blackmesataiwan/OneRingTranslator that referenced this issue Jul 19, 2024

Fix "'NllbTokenizerFast' object has no attribute 'lang_code_to_id'" i…

07128d0

…ssue. huggingface/transformers#31348

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

rajanish4 commented Jun 10, 2024 •

edited

Loading

ArthurZucker commented Jun 10, 2024

rajanish4 commented Jun 10, 2024

ArthurZucker commented Jun 10, 2024

tokenizer-decode commented Jul 1, 2024

fe1ixxu commented Jul 2, 2024 •

edited

Loading

ShayekhBinIslam commented Jul 2, 2024 •

edited

Loading

ArthurZucker commented Jul 10, 2024

tnitn commented Jul 12, 2024 •

edited

Loading

ArthurZucker commented Jul 13, 2024

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id' #31348

Comments

rajanish4 commented Jun 10, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Complete error:

ArthurZucker commented Jun 10, 2024

rajanish4 commented Jun 10, 2024

ArthurZucker commented Jun 10, 2024

tokenizer-decode commented Jul 1, 2024

fe1ixxu commented Jul 2, 2024 • edited Loading

ShayekhBinIslam commented Jul 2, 2024 • edited Loading

ArthurZucker commented Jul 10, 2024

tnitn commented Jul 12, 2024 • edited Loading

ArthurZucker commented Jul 13, 2024

rajanish4 commented Jun 10, 2024 •

edited

Loading

fe1ixxu commented Jul 2, 2024 •

edited

Loading

ShayekhBinIslam commented Jul 2, 2024 •

edited

Loading

tnitn commented Jul 12, 2024 •

edited

Loading