# Speech Transcription on IPUs using Whisper - Inference

This notebook demonstrates speech transcription on the IPU using the [Whisper implementation in the Hugging Face Transformers library](https://huggingface.co/spaces/openai/whisper) alongside [Optimum Graphcore](https://github.com/huggingface/optimum-graphcore).

Whisper is a versatile speech recognition model that can transcribe speech as well as perform multi-lingual translation and recognition tasks.
It was trained on diverse datasets to give human-level speech recognition performance without the need for fine tuning. 

Optimum Graphcore is the interface between the Hugging Face Transformers library and [Graphcore IPUs](https://www.graphcore.ai/products/ipu).
It provides a set of tools enabling model parallelization and loading on IPUs, training and fine-tuning on all the tasks already supported by Transformers while being compatible with the Hugging Face Hub and every model available on it out of the box.

> **Hardware requirements:** The Whisper models `whisper-tiny`, `whisper-base` and `whisper-small` can run two replicas on the smallest IPU-POD4 machine. The most capable model, `whisper-large`, will need to use either an IPU-POD16 or a Bow Pod16 machine. Please contact Graphcore if you'd like assistance running model sizes that don't work in this simple example notebook.

[![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)

## Environment setup

In order to run this notebook you will need to be in an environment with the Poplar SDK installed and enabled. This is done by default on Paperspace. If you are not using Paperspace, refer to the [getting started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for a description of how to set this up.

We also need the Optimum Graphcore interface to the Hugging Face Transformers library, and there are a few extra dependencies we need to be able to handle audio.


In order to improve usability and support for future users, Graphcore would like to collect information about the
applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext gc_logger` from any cell.

In [1]:
%pip install "optimum-graphcore==0.6.1"
%pip install soundfile==0.12.1 librosa==0.10.0.post2 tokenizers==0.12.1
# %pip install matplotlib
# %matplotlib inline
%pip install examples-utils[common]@git+https://github.com/graphcore/examples-utils@latest_stable
# %load_ext examples_utils.notebook_logging.gc_logger

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu:
Collecting optimum-graphcore==0.6.1
  Downloading optimum_graphcore-0.6.1-py3-none-any.whl (212 kB)
     |████████████████████████████████| 212 kB 10.9 MB/s            
[?25hCollecting datasets
  Downloading datasets-2.13.1-py3-none-any.whl (486 kB)
     |████████████████████████████████| 486 kB 123.7 MB/s            
[?25hCollecting sentencepiece
  Downloading sentencepiece-0.1.99-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     |████████████████████████████████| 1.3 MB 127.3 MB/s            
[?25hCollecting diffusers[torch]==0.12.1
  Downloading diffusers-0.12.1-py3-none-any.whl (604 kB)
     |████████████████████████████████| 604 kB 76.1 MB/s            
[?25hCollecting transformers==4.25.1
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
     |████████████████████████████████| 5.8 MB 141.6 MB/s            
[?25hCollecting tokenizers
  Downloading tokenizers-0.13.

## Running Whisper on the IPU

We start by importing the required modules, some of which are needed to configure the IPU.


In [2]:
import os
os.environ["POPART_LOG_LEVEL"] = "INFO"
os.environ["POPLAR_LOG_LEVEL"] = "INFO"

In [3]:
# Generic imports
from datasets import load_dataset
# import matplotlib
# import librosa
# import IPython
# import random

# IPU-specific imports
from optimum.graphcore import IPUConfig
from optimum.graphcore.modeling_utils import to_pipelined

# HF-related imports
from transformers import WhisperProcessor, WhisperForConditionalGeneration

The Whisper model is available on Hugging Face in several sizes, from `whisper-tiny` with 39M parameters to `whisper-large` with 1550M parameters.

We download `whisper-tiny` which we will run using two IPUs.
The [Whisper architecture](https://openai.com/research/whisper) is an encoder-decoder Transformer, with the audio split into 30-second chunks.
For simplicity one IPU is used for the encoder part of the graph and another for the decoder part.
The `IPUConfig` object helps to configure the model to be pipelined across the IPUs.

In [4]:
model_spec = "openai/whisper-tiny.en"

# Instantiate processor and model
processor = WhisperProcessor.from_pretrained(model_spec)
model = WhisperForConditionalGeneration.from_pretrained(model_spec)

# Adapt whisper-tiny to run on the IPU
ipu_config = IPUConfig(ipus_per_replica=2)
pipelined_model = to_pipelined(model, ipu_config)
pipelined_model = pipelined_model.parallelize(for_generation=True).half()

Downloading (…)rocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/844 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/999k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.13M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/1.72k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.94k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/151M [00:00<?, ?B/s]

Now we can load the dataset and process an example audio file.
If precompiled models are not available, then the first run of the model triggers two graph compilations.
This means that our first test transcription could take a minute or two to run, but subsequent runs will be much faster.

In [5]:
# load the dataset and read an example sound file
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
test_sample = ds[2]
sample_rate = test_sample['audio']['sampling_rate']

Downloading builder script:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

Downloading and preparing dataset librispeech_asr_dummy/clean to /tmp/huggingface_caches/datasets/hf-internal-testing___librispeech_asr_dummy/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Dataset librispeech_asr_dummy downloaded and prepared to /tmp/huggingface_caches/datasets/hf-internal-testing___librispeech_asr_dummy/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b. Subsequent calls will reuse this data.


In [6]:
def transcribe(data, rate):
    input_features = processor(data, return_tensors="pt", sampling_rate=rate).input_features.half()

    # This triggers a compilation, unless a precompiled model is available.
    sample_output = pipelined_model.generate(input_features, max_length=448, min_length=3)
    transcription = processor.batch_decode(sample_output, skip_special_tokens=True)[0]
    return transcription

In [None]:

test_transcription = transcribe(test_sample["audio"]["array"], sample_rate)

2023-06-29T16:14:57.103528Z popart:pattern 79.79 I: Pattern TiedGather 1
2023-06-29T16:14:57.103561Z popart:pattern 79.79 I: Pattern TiedGatherAccumulate 1
2023-06-29T16:14:57.148749Z popart:builder 79.79 I: Setting domain '' to opset version 11
2023-06-29T16:14:57.152617Z popart:builder 79.79 I: Setting domain 'ai.graphcore' to opset version 1
2023-06-29T16:14:57.810215Z popart:session 79.79 I: Popart version: 3.3.0+7857 (b88fd7c399)
2023-06-29T16:14:57.810235Z popart:session 79.79 I: Popart release githash: b67b751185
Graph compilation:   0%|          | 0/100 [00:00<?]2023-06-29T16:14:57.813041Z popart:popart 79.79 I: Onnx Model Info ir_version:4, producer:., domain:"", model_version:0 num_opsets:2
2023-06-29T16:14:57.813050Z popart:popart 79.79 I: Onnx Model OpSet domain:"" version:11
2023-06-29T16:14:57.813053Z popart:popart 79.79 I: Onnx Model OpSet domain:"ai.graphcore" version:1
2023-06-29T16:14:57.813059Z popart:popart 79.79 I: Onnx Graph Info name:"BuilderGraph_0" num_nodes:35

In [None]:
print("Compilation succeeded, end of test.")
pipelined_model.detachFromDevice()