is_finetuned(self) returns True for facebook/wav2vec2-base #59

nkaenzig-aifund · 2022-08-25T20:40:20Z

SpeechRecognitionModel.is_finetuned(self) returns True for the base model facebook/wav2vec2-base with is not finetuned.

How to reproduce:

from huggingsound import SpeechRecognitionModel

model = SpeechRecognitionModel("facebook/wav2vec2-base")

assert model.is_finetuned == False

The issue seems to be that Wav2Vec2Processor.from_pretrained() does return a processor with a PreTrainedTokenizer:

Therefore is_finetuned() yields True:
https://github.com/jonatasgrosman/huggingsound/blob/main/huggingsound/speech_recognition/model.py#L58-L78

Possibly this is not an issue of this Repo, but of the files on the huggingface model hub:
https://huggingface.co/facebook/wav2vec2-base/tree/main

Is there a reason why this model has preprocessor_config.json and tokenizer_config.json files?

If you look at facebook/wav2vec2-large, this one doesn't have these fields and therefore Wav2Vec2Processor.from_pretrained() won't return a processor/tokenizer.

The text was updated successfully, but these errors were encountered:

nkaenzig · 2022-08-25T20:43:54Z

Currently I'm using this as a workaround:

model = SpeechRecognitionModel("facebook/wav2vec2-base", device='cuda')
model.processor = None
model.token_set = None

assert model.is_finetuned == False

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is_finetuned(self) returns True for facebook/wav2vec2-base #59

is_finetuned(self) returns True for facebook/wav2vec2-base #59

nkaenzig-aifund commented Aug 25, 2022

nkaenzig commented Aug 25, 2022

is_finetuned(self) returns True for facebook/wav2vec2-base #59

is_finetuned(self) returns True for facebook/wav2vec2-base #59

Comments

nkaenzig-aifund commented Aug 25, 2022

nkaenzig commented Aug 25, 2022