# Notebook Activity: Exploring Automatic Speech Recognition with Whisper

Install the required libraries, or use an environment that already includes these libraries

In [None]:
!pip install --upgrade pip
!pip install --upgrade transformers datasets[audio] accelerate

In [1]:
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset

Select a model (in this case, Whisper-tiny) and try to transcribe a sample audio file. Code provides transcription on CPU, so this may be slow especially for larger models.

In [None]:
model_id = "openai/whisper-tiny" # or any other model ID from Hugging Face Hub

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, low_cpu_mem_usage=True, use_safetensors=True
)

processor = AutoProcessor.from_pretrained(model_id)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor
)

Load a sample from the LibriSpeech dataset. 

You can also use your own audio file!

In [None]:
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[10]["audio"]
result = pipe(sample)
print(result["text"])

## Activity:
Report your results on the following:
* Try changing the size of the model. A table of potential model sizes is available on Huggingface [https://huggingface.co/openai/whisper-large-v3]. Note that larger models may require GPU to transcribe in a timely manner.
    * How does transcription change with the model size? 
* Find another audio sample on HuggingFace and test transcription on it, 
OR record your own audio and pull it into your notebook (note that you should convert this to WAV format)