You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_bert.py in init(self, vocab_file, do_lower_case, do_basic_tokenize, never_split, unk_token, sep_token, pad_token, cls_token, mask_token, tokenize_chinese_chars, **kwargs)
186 self.max_len_sentences_pair = self.max_len - 3 # take into account special tokens
187
--> 188 if not os.path.isfile(vocab_file):
189 raise ValueError(
190 "Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained "
/usr/lib/python3.6/genericpath.py in isfile(path)
28 """Test whether a path is a regular file"""
29 try:
---> 30 st = os.stat(path)
31 except OSError:
32 return False
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Information
I am trying to fine-tune the model that I built from scratch using transformers. When I am trying to load the tokenizer from the model that is just made, it is giving Type Error
Model I am using (Bert, XLNet ...): Model is built from scratch using https://huggingface.co/blog/how-to-train
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
the official example scripts: (give details below)
looks like it's not able to find vocabulary file. Make sure there is a vocab.txt file for bert. Otherwise, you can simply load it by tokenizer = BertTokenizer(vocab_file="path to vocab", and configs).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
馃悰 Bug
TypeError Traceback (most recent call last)
in ()
3 from transformers import BertTokenizer, AdamW, BertForNextSentencePrediction
4
----> 5 tokenizer = BertTokenizer.from_pretrained('/content/drive/My Drive/Colab Notebooks/data/test/')
3 frames
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in from_pretrained(cls, *inputs, **kwargs)
391
392 """
--> 393 return cls._from_pretrained(*inputs, **kwargs)
394
395 @classmethod
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_utils.py in _from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
542 # Instantiate tokenizer.
543 try:
--> 544 tokenizer = cls(*init_inputs, **init_kwargs)
545 except OSError:
546 raise OSError(
/usr/local/lib/python3.6/dist-packages/transformers/tokenization_bert.py in init(self, vocab_file, do_lower_case, do_basic_tokenize, never_split, unk_token, sep_token, pad_token, cls_token, mask_token, tokenize_chinese_chars, **kwargs)
186 self.max_len_sentences_pair = self.max_len - 3 # take into account special tokens
187
--> 188 if not os.path.isfile(vocab_file):
189 raise ValueError(
190 "Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained "
/usr/lib/python3.6/genericpath.py in isfile(path)
28 """Test whether a path is a regular file"""
29 try:
---> 30 st = os.stat(path)
31 except OSError:
32 return False
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Information
I am trying to fine-tune the model that I built from scratch using transformers. When I am trying to load the tokenizer from the model that is just made, it is giving Type Error
Model I am using (Bert, XLNet ...): Model is built from scratch using https://huggingface.co/blog/how-to-train
Language I am using the model on (English, Chinese ...): English
The problem arises when using:
The tasks I am working on is:
Expected behavior
Environment info
transformers
version:The text was updated successfully, but these errors were encountered: