Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while loading tokenizer #8

Open
mvish7 opened this issue Jan 3, 2024 · 1 comment
Open

Error while loading tokenizer #8

mvish7 opened this issue Jan 3, 2024 · 1 comment

Comments

@mvish7
Copy link

mvish7 commented Jan 3, 2024

Hello,

Thanks for making the code and models available. I was following the guide to set up the repo and run a CLI demo.

The command line arguments looks like this:

python video_chatgpt/chat.py --model-name weights/llava/llava-v1.5-7b --projection_path weights/projection/mm_projector_7b_1.5_336px.bin --use_asr --conv_mode pg-video-llava

The --model-name argument is path to the folder who's contents are shown here and the --projection_path argument is path to the folder containing mm_projector_7b_1.5_336px.bin file.

I'm facing an error while loading the vocab_file, the resolved vocab_file is weights/llava/llava-v1.5-7b/tokenizer.model
The error traceback is as follows:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /media/vishal/2TB_storage/repos/Video-LLaVA/video_chatgpt/chat.py:362 in     │
│ <module>                                                                     │
│                                                                              │
│   359 │   │   )                                                              │
│   360 │   │   chat.interact()                                                │
│   361 │   else:                                                              │
│ ❱ 362 │   │   chat = VideoChatGPTInterface(                                  │
│   363 │   │   │   args_model_name=args.model_name,                           │
│   364 │   │   │   args_projection_path=args.projection_path,                 │
│   365 │   │   │   use_asr=args.use_asr,                                      │
│                                                                              │
│ /media/vishal/2TB_storage/repos/Video-LLaVA/video_chatgpt/chat.py:29 in      │
│ __init__                                                                     │
│                                                                              │
│    26 │   │   self.use_asr=use_asr                                           │
│    27 │   │   self.conv_mode = conv_mode                                     │
│    28 │   │                                                                  │
│ ❱  29 │   │   model, vision_tower, tokenizer, image_processor, video_token_l │
│    30 │   │   self.tokenizer = tokenizer                                     │
│    31 │   │   self.image_processor = image_processor                         │
│    32 │   │   self.vision_tower = vision_tower                               │
│                                                                              │
│ /media/vishal/2TB_storage/repos/Video-LLaVA/video_chatgpt/eval/model_utils.p │
│ y:101 in initialize_model                                                    │
│                                                                              │
│    98 │   model_name = os.path.expanduser(model_name)                        │
│    99 │                                                                      │
│   100 │   # Load tokenizer                                                   │
│ ❱ 101 │   tokenizer = AutoTokenizer.from_pretrained(model_name)              │
│   102 │                                                                      │
│   103 │   # Load model                                                       │
│   104 │   model = VideoChatGPTLlamaForCausalLM.from_pretrained(model_name, l │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/models/auto/tokenization_auto.py:682 in from_pretrained            │
│                                                                              │
│   679 │   │   │   │   raise ValueError(                                      │
│   680 │   │   │   │   │   f"Tokenizer class {tokenizer_class_candidate} does │
│   681 │   │   │   │   )                                                      │
│ ❱ 682 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_na │
│   683 │   │                                                                  │
│   684 │   │   # Otherwise we have to be creative.                            │
│   685 │   │   # if model is an encoder decoder, the encoder tokenizer class  │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/tokenization_utils_base.py:1805 in from_pretrained                 │
│                                                                              │
│   1802 │   │   │   else:                                                     │
│   1803 │   │   │   │   logger.info(f"loading file {file_path} from cache at  │
│   1804 │   │                                                                 │
│ ❱ 1805 │   │   return cls._from_pretrained(                                  │
│   1806 │   │   │   resolved_vocab_files,                                     │
│   1807 │   │   │   pretrained_model_name_or_path,                            │
│   1808 │   │   │   init_configuration,                                       │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/tokenization_utils_base.py:1959 in _from_pretrained                │
│                                                                              │
│   1956 │   │                                                                 │
│   1957 │   │   # Instantiate tokenizer.                                      │
│   1958 │   │   try:                                                          │
│ ❱ 1959 │   │   │   tokenizer = cls(*init_inputs, **init_kwargs)              │
│   1960 │   │   except OSError:                                               │
│   1961 │   │   │   raise OSError(                                            │
│   1962 │   │   │   │   "Unable to load vocabulary from file. "               │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/models/llama/tokenization_llama.py:71 in __init__                  │
│                                                                              │
│    68 │   │   self.add_eos_token = add_eos_token                             │
│    69 │   │   self.decode_with_prefix_space = decode_with_prefix_space       │
│    70 │   │   self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwa │
│ ❱  71 │   │   self.sp_model.Load(vocab_file)                                 │
│    72 │   │   self._no_prefix_space_tokens = None                            │
│    73 │   │                                                                  │
│    74 │   │   """ Initialisation"""                                          │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/sen │
│ tencepiece/__init__.py:905 in Load                                           │
│                                                                              │
│    902 │   │   raise RuntimeError('model_file and model_proto must be exclus │
│    903 │     if model_proto:                                                 │
│    904 │   │   return self.LoadFromSerializedProto(model_proto)              │
│ ❱  905 │     return self.LoadFromFile(model_file)                            │
│    906                                                                       │
│    907                                                                       │
│    908 # Register SentencePieceProcessor in _sentencepiece:                  │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/sen │
│ tencepiece/__init__.py:310 in LoadFromFile                                   │
│                                                                              │
│    307 │   │   return _sentencepiece.SentencePieceProcessor_serialized_model │
│    308 │                                                                     │
│    309 │   def LoadFromFile(self, arg):                                      │
│ ❱  310 │   │   return _sentencepiece.SentencePieceProcessor_LoadFromFile(sel │
│    311 │                                                                     │
│    312 │   def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha,  │
│    313 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsIds(sel │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) 
[model_proto->ParseFromArray(serialized.data(), serialized.size())] 

The versions of tokenizers and transformers are 0.13.3 and 4.28.0.dev0 respectively.

Could you help me out to solve this error?
Thanks,
Vishal

@mvish7
Copy link
Author

mvish7 commented Jan 8, 2024

I'm not able to download the checkpoint ram_swin_large_14m.pth from the link provided.

Does the error has anything to do with this checkpoint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant