In [7]:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

# model_id = "openai/whisper-large-v3"
# model_id = "openai/whisper-small"
model_id = "openai/whisper-medium"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [3]:

# dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
# sample = dataset[0]["audio"]

# result = pipe(sample)
# print(result["text"])

Downloading readme: 100%|██████████| 480/480 [00:00<00:00, 2.47MB/s]
Downloading data: 100%|██████████| 1.98M/1.98M [00:00<00:00, 3.03MB/s]
Generating validation split: 100%|██████████| 1/1 [00:00<00:00, 140.04 examples/s]
Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`.


 Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel. Nor is Mr. Quilter's manner less interesting than his matter. He tells us that at this festive season of the year, with Christmas and roast beef looming before us, simile is drawn from eating and its results occur most readily to the mind. He has grave doubts whether Sir Frederick Layton's work is really Greek after all, and can discover in it but little of rocky Ithaca. Linnell's pictures are a sort of Up Guards and Atom paintings, and Mason's exquisite idles are as national as a Jingo poem. Mr. Birkut Foster's landscapes smile at one much in the same way that Mr. Karker used to flash his teeth. And Mr. John Collier gives his sitter a cheerful slap on the back before he says, like a shampoo or a Turkish bath, next man,


In [9]:
result = pipe("data/english_course_wma/Wizard Inglês Modulo 3 - Conversation/Wizard-Lesson 1-01.wma")
result["text"]

Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.


" WIZARD Book 3 Conversation Lesson 1 Did the boy pay one dollar for the paper at the newsstand? Yes, he did. He paid one dollar for the paper at the newsstand. No, he didn't. He didn't pay one dollar for the paper at the newsstand. Did the secretary put the document on the table. No, she didn't. She didn't put the document on the table. Did you access the information from the computer? Yes, I did. I access the information from the computer. Did your mom do her Christmas shopping early this year? Yes, she did. She did her Christmas shopping early this year. Christmas shopping early this year. No, she didn't. She didn't do her Christmas shopping early this year. Did they eat Brazilian food at the restaurant. No, they didn't. They didn't eat Brazilian food at the restaurant. Is your sister doing her homework in her bedroom? Yes, she is. She is doing her homework in her bedroom. No, she isn't. She isn't doing her homework in her bedroom. Are the children playing in the snow? Yes, they are

In [10]:
result

{'text': " WIZARD Book 3 Conversation Lesson 1 Did the boy pay one dollar for the paper at the newsstand? Yes, he did. He paid one dollar for the paper at the newsstand. No, he didn't. He didn't pay one dollar for the paper at the newsstand. Did the secretary put the document on the table. No, she didn't. She didn't put the document on the table. Did you access the information from the computer? Yes, I did. I access the information from the computer. Did your mom do her Christmas shopping early this year? Yes, she did. She did her Christmas shopping early this year. Christmas shopping early this year. No, she didn't. She didn't do her Christmas shopping early this year. Did they eat Brazilian food at the restaurant. No, they didn't. They didn't eat Brazilian food at the restaurant. Is your sister doing her homework in her bedroom? Yes, she is. She is doing her homework in her bedroom. No, she isn't. She isn't doing her homework in her bedroom. Are the children playing in the snow? Yes,

In [14]:
for i, row in enumerate( result['chunks'] ):
    print( i, row['text'] )

0  WIZARD Book 3 Conversation
1  Lesson 1
2  Did the boy pay one dollar for the paper at the newsstand?
3  Yes, he did. He paid one dollar for the paper at the newsstand.
4  No, he didn't. He didn't pay one dollar for the paper at the newsstand. Did the secretary put the document on the table.
5  No, she didn't. She didn't put the document on the table.
6  Did you access the information from the computer?
7  Yes, I did. I access the information from the computer.
8  Did your mom do her Christmas shopping early this year?
9  Yes, she did. She did her Christmas shopping early this year.
10  Christmas shopping early this year.
11  No, she didn't. She didn't do her Christmas shopping early this year.
12  Did they eat Brazilian food at the restaurant.
13  No, they didn't. They didn't eat Brazilian food at the restaurant.
14  Is your sister doing her homework in her bedroom?
15  Yes, she is. She is doing her homework in her bedroom.
16  No, she isn't. She isn't doing her homework in her bedr

In [15]:
for i, row in enumerate( result['chunks'] ):
    print( row['text'] )

 WIZARD Book 3 Conversation
 Lesson 1
 Did the boy pay one dollar for the paper at the newsstand?
 Yes, he did. He paid one dollar for the paper at the newsstand.
 No, he didn't. He didn't pay one dollar for the paper at the newsstand. Did the secretary put the document on the table.
 No, she didn't. She didn't put the document on the table.
 Did you access the information from the computer?
 Yes, I did. I access the information from the computer.
 Did your mom do her Christmas shopping early this year?
 Yes, she did. She did her Christmas shopping early this year.
 Christmas shopping early this year.
 No, she didn't. She didn't do her Christmas shopping early this year.
 Did they eat Brazilian food at the restaurant.
 No, they didn't. They didn't eat Brazilian food at the restaurant.
 Is your sister doing her homework in her bedroom?
 Yes, she is. She is doing her homework in her bedroom.
 No, she isn't. She isn't doing her homework in her bedroom.
 Are the children playing in the sno