Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is_finetuned(self) returns True for facebook/wav2vec2-base #59

Open
nkaenzig-aifund opened this issue Aug 25, 2022 · 1 comment
Open

Comments

@nkaenzig-aifund
Copy link
Contributor

SpeechRecognitionModel.is_finetuned(self) returns True for the base model facebook/wav2vec2-base with is not finetuned.

How to reproduce:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("facebook/wav2vec2-base")

assert model.is_finetuned == False

The issue seems to be that Wav2Vec2Processor.from_pretrained() does return a processor with a PreTrainedTokenizer:
image

Therefore is_finetuned() yields True:
https://github.com/jonatasgrosman/huggingsound/blob/main/huggingsound/speech_recognition/model.py#L58-L78

Possibly this is not an issue of this Repo, but of the files on the huggingface model hub:
https://huggingface.co/facebook/wav2vec2-base/tree/main

Is there a reason why this model has preprocessor_config.json and tokenizer_config.json files?

If you look at facebook/wav2vec2-large, this one doesn't have these fields and therefore Wav2Vec2Processor.from_pretrained() won't return a processor/tokenizer.

@nkaenzig
Copy link
Contributor

Currently I'm using this as a workaround:

model = SpeechRecognitionModel("facebook/wav2vec2-base", device='cuda')
model.processor = None
model.token_set = None

assert model.is_finetuned == False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants