Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Llama can't find tokenizer #973

Closed
2 of 4 tasks
silvanmelchior opened this issue Sep 2, 2023 · 12 comments
Closed
2 of 4 tasks

Code Llama can't find tokenizer #973

silvanmelchior opened this issue Sep 2, 2023 · 12 comments

Comments

@silvanmelchior
Copy link

System Info

TGI 1.0.3, running on Azure "STANDARD_ND40RS_V2"

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I am running the following command:

docker run --gpus all --shm-size 1g -p 8080:80 -e HUGGING_FACE_HUB_TOKEN=... ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id "codellama/CodeLlama-34b-Instruct-hf"

The container starts, downloads the model, tries to start it, but then fails with the following message:

Tokenizer class "CodeLlamaTokenizer" does not exist or is not currently imported.

Expected behavior

The model starts up and serves requests

@osanseviero
Copy link
Member

CodeLlamaTokenizer is not in a transformers release, so you need to install directly from source

@silvanmelchior
Copy link
Author

Ok, thanks, so I can't use the provided containers directly (1.0.3), but have to build my own?

@abhinavkulkarni
Copy link
Contributor

@silvanmelchior: https://huggingface.co/blog/codellama#transformers, please directly install transformers from source.

@silvanmelchior
Copy link
Author

Thanks for the reply. I am not sure if I fully understand:

I use the provided docker container ghcr.io/huggingface/text-generation-inference:1.0.3, so I just execute the docker run command as mentioned in my issue description. I do not install any pip packages or so, really just a plain, empty system with docker and cuda-support, then do the docker run command.

This for example worked with the Llama models, I then got an endpoint (in my case on port 8080) where I could get predictions from the model. However, it does not work with code llama, as the container does not even start because of the missing tokenizer.

So does this mean I cannot use the provided containers, but would somehow need to build my own, which has the latest release of transformers included?

@Narsil
Copy link
Collaborator

Narsil commented Sep 6, 2023

@ArthurZucker for visibilty (this broke old transformers too, didn't it ?)

@ArthurZucker
Copy link

I don't think I understand the issue since:

  • CodeLlama needs the latest release of transformers. TGI needs to take this into account no?
  • You can still load some of the CodeLlama models with AutoTokenizer with a warning (in transformers) as we don't raise errors, so no breaks

@OlivierDehaene
Copy link
Member

Is this still an issue? I'm running this command on a g5 just fine:

docker run --gpus all --shm-size 1g -v /data:/data -p 8080:80 ghcr.io/huggingface/text-generation-inference:1.0.3  --model-id codellama/CodeLlama-34b-Instruct-hf --num-shard 4

@Narsil
Copy link
Collaborator

Narsil commented Sep 6, 2023

You can still load some of the CodeLlama models with AutoTokenizer with a warning (in transformers) as we don't raise errors, so no breaks

Then old TGI works as proved by olivier

@sarthak405
Copy link

I am facing the same error with ghcr.io/huggingface/text-generation-inference:1.0.3.
I am using a fine-tuned codeLlama 34b model and the command I am running is:
docker run --gpus all --shm-size 1g -e HUGGING_FACE_HUB_TOKEN=<Token> -p 8000:80 ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id $model --num-shard 8 --max-input-length 30000 --max-batch-prefill-tokens 32000 --max-total-tokens 32000 --rope-scaling=dynamic --rope-factor=2.0

Below are the detailed logs:

                                                                                                                                                                                                   
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.                                        
The tokenizer class you load from this checkpoint is 'CodeLlamaTokenizer'.                                                                                                                         
The class this function is called from is 'LlamaTokenizer'.                                                                                                                                        
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the
 `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=True`. This should only be set if you understand what it means, and thouro
ughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565                                                                                    
Traceback (most recent call last):                                                                                                                                                                 
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 40, in __init__                                                                                 
    tokenizer = LlamaTokenizer.from_pretrained(                                                                                                                                                    
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained                                                                             
    return cls._from_pretrained(                                                                                                                                                                   
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained                                                                            
    tokenizer = cls(*init_inputs, **init_kwargs)                                                                                                                                                   
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 156, in __init__                                                                             
    self.sp_model = self.get_spm_processor()                                                                                                                                                       
                                                                                                                                                                                                   
  File "/opt/conda/lib/python3.9/site-packages/transformers/models/llama/tokenization_llama.py", line 162, in get_spm_processor                                                                    
    with open(self.vocab_file, "rb") as f:

TypeError: expected str, bytes or os.PathLike object, not NoneType


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 81, in serve
    server.serve(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 195, in serve
    asyncio.run(

  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)

File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 187, in get_model
    return FlashLlama(

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_llama.py", line 48, in __init__
    tokenizer = AutoTokenizer.from_pretrained(

  File "/opt/conda/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 724, in from_pretrained
    raise ValueError(

ValueError: Tokenizer class CodeLlamaTokenizer does not exist or is not currently imported.```

@sarthak405
Copy link

@silvanmelchior were you able to resolve this at your end?

@OlivierDehaene
Copy link
Member

@sarthak405, it seems your error is related to the vocab: self.vocab_file is None.

@sarthak405
Copy link

Right @OlivierDehaene , it was indeed a missing tokenizer.model file. Thank you for the quick resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants