Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception: You're trying to run a Unigram model but you're file was trained with a different algorithm #9871

Closed
3 tasks
jiyanbio opened this issue Jan 28, 2021 · 3 comments

Comments

@jiyanbio
Copy link

Environment info

  • transformers version: 4.2.2
  • Platform: Linux-3.10.107-1-tlinux2_kvm_guest-0049-x86_64-with-glibc2.10
  • Python version: 3.8.5
  • PyTorch version (GPU?): 1.7.1 (False)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Who can help

Information

Model I am using (Bert, XLNet ...):

The problem arises when using:

  • [1 ] the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. open https://github.com/agemagician/ProtTrans/blob/master/Embedding/PyTorch/Basic/ProtAlbert.ipynb
  2. when run the code 'tokenizer = AutoTokenizer.from_pretrained("Rostlab/prot_albert", do_lower_case=False )'
  3. report errors as the follow:
    Downloading: 100%|█████████████████████████████████████████████████████████████████| 505/505 [00:00<00:00, 516kB/s]
    Downloading: 100%|██████████████████████████████████████████████████████████████| 238k/238k [00:03<00:00, 77.0kB/s]
    Traceback (most recent call last):
    File "", line 1, in
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 385, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1768, in from_pretrained
    return cls._from_pretrained(
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1841, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/models/albert/tokenization_albert_fast.py", line 136, in init
    super().init(
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 89, in init
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 659, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 349, in converted
    tokenizer = self.tokenizer(self.proto)
    File "/home/anaconda3/envs/prottrans/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 335, in tokenizer
    raise Exception(
    Exception: You're trying to run a Unigram model but you're file was trained with a different algorithm

Expected behavior

@agemagician
Copy link
Contributor

Use "AlbertTokenizer" rather than "AutoTokenizer", this should solve your issue.
Please, check the updated notebook version.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@arkhan19
Copy link

Prot_albert tokenizer is returning none type, what changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants