Code Llama can't find tokenizer #973

silvanmelchior · 2023-09-02T08:29:14Z

System Info

TGI 1.0.3, running on Azure "STANDARD_ND40RS_V2"

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

I am running the following command:

docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=... ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id "codellama/CodeLlama-34b-Instruct-hf"

The container starts, downloads the model, tries to start it, but then fails with the following message:

Tokenizer class "CodeLlamaTokenizer" does not exist or is not currently imported.

Expected behavior

The model starts up and serves requests

The text was updated successfully, but these errors were encountered:

osanseviero · 2023-09-03T14:37:58Z

CodeLlamaTokenizer is not in a transformers release, so you need to install directly from source

silvanmelchior · 2023-09-04T05:35:16Z

Ok, thanks, so I can't use the provided containers directly (1.0.3), but have to build my own?

abhinavkulkarni · 2023-09-04T08:45:23Z

@silvanmelchior: https://huggingface.co/blog/codellama#transformers, please directly install transformers from source.

silvanmelchior · 2023-09-04T15:14:48Z

Thanks for the reply. I am not sure if I fully understand:

I use the provided docker container ghcr.io/huggingface/text-generation-inference:1.0.3, so I just execute the docker run command as mentioned in my issue description. I do not install any pip packages or so, really just a plain, empty system with docker and cuda-support, then do the docker run command.

This for example worked with the Llama models, I then got an endpoint (in my case on port 8080) where I could get predictions from the model. However, it does not work with code llama, as the container does not even start because of the missing tokenizer.

So does this mean I cannot use the provided containers, but would somehow need to build my own, which has the latest release of transformers included?

Narsil · 2023-09-06T12:37:11Z

@ArthurZucker for visibilty (this broke old transformers too, didn't it ?)

ArthurZucker · 2023-09-06T12:44:28Z

I don't think I understand the issue since:

CodeLlama needs the latest release of transformers. TGI needs to take this into account no?
You can still load some of the CodeLlama models with AutoTokenizer with a warning (in transformers) as we don't raise errors, so no breaks

OlivierDehaene · 2023-09-06T12:53:34Z

Is this still an issue? I'm running this command on a g5 just fine:

docker run --gpus all --shm-size 1g -v /data:/data -p 8080:80 ghcr.io/huggingface/text-generation-inference:1.0.3  --model-id codellama/CodeLlama-34b-Instruct-hf --num-shard 4

Narsil · 2023-09-06T12:55:47Z

You can still load some of the CodeLlama models with AutoTokenizer with a warning (in transformers) as we don't raise errors, so no breaks

Then old TGI works as proved by olivier

sarthak405 · 2023-09-06T13:42:17Z

I am facing the same error with ghcr.io/huggingface/text-generation-inference:1.0.3.
I am using a fine-tuned codeLlama 34b model and the command I am running is:
docker run --gpus all --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=<Token> -p 8000:80 ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $model --num-shard 8 --max-input-length 30000 --max-batch-prefill-tokens 32000 --max-total-tokens 32000 --rope-scaling=dynamic --rope-factor=2.0

Below are the detailed logs:

                                                                                                                                                                                                   
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.                                        
The tokenizer class you load from this checkpoint is 'CodeLlamaTokenizer'.                                                                                                                         
The class this function is called from is 'LlamaTokenizer'.                                                                                                                                        
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the
 `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if you understand what it means, and thouro
ughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565                                                                                    
Traceback (most recent call last):                                                                                                                                                                 
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 40, in __init__                                                                                 
    tokenizer = LlamaTokenizer.from_pretrained(                                                                                                                                                    
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained                                                                             
    return cls._from_pretrained(                                                                                                                                                                   
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained                                                                            
    tokenizer = cls(*init_inputs, **init_kwargs)                                                                                                                                                   
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 156, in __init__                                                                             
    self.sp_model = self.get_spm_processor()                                                                                                                                                       
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 162, in get_spm_processor                                                                    
    with open(self.vocab_file, "rb") as f:

TypeError: expected str, bytes or os.PathLike object, not NoneType


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 195, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 187, in get_model
    return FlashLlama(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 48, in __init__
    tokenizer = AutoTokenizer.from_pretrained(

  File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 724, in from_pretrained
    raise ValueError(

ValueError: Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported.```

sarthak405 · 2023-09-06T13:45:35Z

@silvanmelchior were you able to resolve this at your end?

OlivierDehaene · 2023-09-06T14:04:48Z

@sarthak405, it seems your error is related to the vocab: self.vocab_file is None.

sarthak405 · 2023-09-06T14:50:21Z

Right @OlivierDehaene , it was indeed a missing tokenizer.model file. Thank you for the quick resolution.

silvanmelchior mentioned this issue Sep 2, 2023

Add support for code llama silvanmelchior/IncognitoPilot#29

Open

OlivierDehaene closed this as completed Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Llama can't find tokenizer #973

Code Llama can't find tokenizer #973

silvanmelchior commented Sep 2, 2023

osanseviero commented Sep 3, 2023

silvanmelchior commented Sep 4, 2023

abhinavkulkarni commented Sep 4, 2023

silvanmelchior commented Sep 4, 2023

Narsil commented Sep 6, 2023

ArthurZucker commented Sep 6, 2023

OlivierDehaene commented Sep 6, 2023

Narsil commented Sep 6, 2023

sarthak405 commented Sep 6, 2023

sarthak405 commented Sep 6, 2023

OlivierDehaene commented Sep 6, 2023

sarthak405 commented Sep 6, 2023

Code Llama can't find tokenizer #973

Code Llama can't find tokenizer #973

Comments

silvanmelchior commented Sep 2, 2023

System Info

Information

Tasks

Reproduction

Expected behavior

osanseviero commented Sep 3, 2023

silvanmelchior commented Sep 4, 2023

abhinavkulkarni commented Sep 4, 2023

silvanmelchior commented Sep 4, 2023

Narsil commented Sep 6, 2023

ArthurZucker commented Sep 6, 2023

OlivierDehaene commented Sep 6, 2023

Narsil commented Sep 6, 2023

sarthak405 commented Sep 6, 2023

sarthak405 commented Sep 6, 2023

OlivierDehaene commented Sep 6, 2023

sarthak405 commented Sep 6, 2023