Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load DeBERTa-v3 tokenizer #70

Closed
maiiabocharova opened this issue Nov 20, 2021 · 4 comments
Closed

Can't load DeBERTa-v3 tokenizer #70

maiiabocharova opened this issue Nov 20, 2021 · 4 comments

Comments

@maiiabocharova
Copy link

maiiabocharova commented Nov 20, 2021

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base")

Gives me an error
ValueError: This tokenizer cannot be instantiated. Please make sure you have sentencepiece installed in order to use this tokenizer.
But sentencepiece is already installed

Also tried

!pip install deberta
from DeBERTa import deberta
vocab_path, vocab_type = deberta.load_vocab(pretrained_id='base-v3')
tokenizer = deberta.tokenizers[vocab_type](vocab_path)

this gives me
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Please help, how can I use the tokenizer for deberta-base-v3?

@chrischowfy
Copy link

from transformers import DebertaV2Tokenizer, DebertaV2Model
tokenizer = DebertaV2Tokenizer.from_pretrained("microsoft/deberta-v3-base")
is work for me.

@maiiabocharova
Copy link
Author

from transformers import DebertaV2Tokenizer, DebertaV2Model tokenizer = DebertaV2Tokenizer.from_pretrained("microsoft/deberta-v3-base") is work for me.

Thank you, I was able to initialize tokenizer, but later it gives me an error when providing text to tokenizer
tokenizer("Some text")
TypeError: 'NoneType' object is not callable

@chrischowfy
Copy link

chrischowfy commented Nov 28, 2021

It's weird. Maybe the text you tokenized wasn't processed properly
image

@maiiabocharova
Copy link
Author

Hello, the issue was that I used colab and tokenizer needed sentencepiece to be installed. So the solution was to install sentencepiece and afterwards restart the runtime. (I didn't restart it at first)

Thank you sharing the model!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants