Error while loading tokenizer #8

mvish7 · 2024-01-03T16:39:44Z

Hello,

Thanks for making the code and models available. I was following the guide to set up the repo and run a CLI demo.

The command line arguments looks like this:

python video_chatgpt/chat.py --model-name weights/llava/llava-v1.5-7b --projection_path weights/projection/mm_projector_7b_1.5_336px.bin --use_asr --conv_mode pg-video-llava

The --model-name argument is path to the folder who's contents are shown here and the --projection_path argument is path to the folder containing mm_projector_7b_1.5_336px.bin file.

I'm facing an error while loading the vocab_file, the resolved vocab_file is weights/llava/llava-v1.5-7b/tokenizer.model
The error traceback is as follows:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /media/vishal/2TB_storage/repos/Video-LLaVA/video_chatgpt/chat.py:362 in     │
│ <module>                                                                     │
│                                                                              │
│   359 │   │   )                                                              │
│   360 │   │   chat.interact()                                                │
│   361 │   else:                                                              │
│ ❱ 362 │   │   chat = VideoChatGPTInterface(                                  │
│   363 │   │   │   args_model_name=args.model_name,                           │
│   364 │   │   │   args_projection_path=args.projection_path,                 │
│   365 │   │   │   use_asr=args.use_asr,                                      │
│                                                                              │
│ /media/vishal/2TB_storage/repos/Video-LLaVA/video_chatgpt/chat.py:29 in      │
│ __init__                                                                     │
│                                                                              │
│    26 │   │   self.use_asr=use_asr                                           │
│    27 │   │   self.conv_mode = conv_mode                                     │
│    28 │   │                                                                  │
│ ❱  29 │   │   model, vision_tower, tokenizer, image_processor, video_token_l │
│    30 │   │   self.tokenizer = tokenizer                                     │
│    31 │   │   self.image_processor = image_processor                         │
│    32 │   │   self.vision_tower = vision_tower                               │
│                                                                              │
│ /media/vishal/2TB_storage/repos/Video-LLaVA/video_chatgpt/eval/model_utils.p │
│ y:101 in initialize_model                                                    │
│                                                                              │
│    98 │   model_name = os.path.expanduser(model_name)                        │
│    99 │                                                                      │
│   100 │   # Load tokenizer                                                   │
│ ❱ 101 │   tokenizer = AutoTokenizer.from_pretrained(model_name)              │
│   102 │                                                                      │
│   103 │   # Load model                                                       │
│   104 │   model = VideoChatGPTLlamaForCausalLM.from_pretrained(model_name, l │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/models/auto/tokenization_auto.py:682 in from_pretrained            │
│                                                                              │
│   679 │   │   │   │   raise ValueError(                                      │
│   680 │   │   │   │   │   f"Tokenizer class {tokenizer_class_candidate} does │
│   681 │   │   │   │   )                                                      │
│ ❱ 682 │   │   │   return tokenizer_class.from_pretrained(pretrained_model_na │
│   683 │   │                                                                  │
│   684 │   │   # Otherwise we have to be creative.                            │
│   685 │   │   # if model is an encoder decoder, the encoder tokenizer class  │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/tokenization_utils_base.py:1805 in from_pretrained                 │
│                                                                              │
│   1802 │   │   │   else:                                                     │
│   1803 │   │   │   │   logger.info(f"loading file {file_path} from cache at  │
│   1804 │   │                                                                 │
│ ❱ 1805 │   │   return cls._from_pretrained(                                  │
│   1806 │   │   │   resolved_vocab_files,                                     │
│   1807 │   │   │   pretrained_model_name_or_path,                            │
│   1808 │   │   │   init_configuration,                                       │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/tokenization_utils_base.py:1959 in _from_pretrained                │
│                                                                              │
│   1956 │   │                                                                 │
│   1957 │   │   # Instantiate tokenizer.                                      │
│   1958 │   │   try:                                                          │
│ ❱ 1959 │   │   │   tokenizer = cls(*init_inputs, **init_kwargs)              │
│   1960 │   │   except OSError:                                               │
│   1961 │   │   │   raise OSError(                                            │
│   1962 │   │   │   │   "Unable to load vocabulary from file. "               │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/tra │
│ nsformers/models/llama/tokenization_llama.py:71 in __init__                  │
│                                                                              │
│    68 │   │   self.add_eos_token = add_eos_token                             │
│    69 │   │   self.decode_with_prefix_space = decode_with_prefix_space       │
│    70 │   │   self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwa │
│ ❱  71 │   │   self.sp_model.Load(vocab_file)                                 │
│    72 │   │   self._no_prefix_space_tokens = None                            │
│    73 │   │                                                                  │
│    74 │   │   """ Initialisation"""                                          │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/sen │
│ tencepiece/__init__.py:905 in Load                                           │
│                                                                              │
│    902 │   │   raise RuntimeError('model_file and model_proto must be exclus │
│    903 │     if model_proto:                                                 │
│    904 │   │   return self.LoadFromSerializedProto(model_proto)              │
│ ❱  905 │     return self.LoadFromFile(model_file)                            │
│    906                                                                       │
│    907                                                                       │
│    908 # Register SentencePieceProcessor in _sentencepiece:                  │
│                                                                              │
│ /home/vishal/miniconda3/envs/pg_video_llava/lib/python3.10/site-packages/sen │
│ tencepiece/__init__.py:310 in LoadFromFile                                   │
│                                                                              │
│    307 │   │   return _sentencepiece.SentencePieceProcessor_serialized_model │
│    308 │                                                                     │
│    309 │   def LoadFromFile(self, arg):                                      │
│ ❱  310 │   │   return _sentencepiece.SentencePieceProcessor_LoadFromFile(sel │
│    311 │                                                                     │
│    312 │   def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha,  │
│    313 │   │   return _sentencepiece.SentencePieceProcessor__EncodeAsIds(sel │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) 
[model_proto->ParseFromArray(serialized.data(), serialized.size())]

The versions of tokenizers and transformers are 0.13.3 and 4.28.0.dev0 respectively.

Could you help me out to solve this error?
Thanks,
Vishal

The text was updated successfully, but these errors were encountered:

mvish7 · 2024-01-08T10:27:19Z

I'm not able to download the checkpoint ram_swin_large_14m.pth from the link provided.

Does the error has anything to do with this checkpoint?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while loading tokenizer #8

Error while loading tokenizer #8

mvish7 commented Jan 3, 2024 •

edited

Loading

mvish7 commented Jan 8, 2024

Error while loading tokenizer #8

Error while loading tokenizer #8

Comments

mvish7 commented Jan 3, 2024 • edited Loading

mvish7 commented Jan 8, 2024

mvish7 commented Jan 3, 2024 •

edited

Loading