# Speech Transcription on IPUs using Whisper - Inference

This notebook demonstrates speech transcription on the IPU using the [Whisper implementation in the Hugging Face Transformers library](https://huggingface.co/spaces/openai/whisper) alongside [Optimum Graphcore](https://github.com/huggingface/optimum-graphcore).

Whisper is a versatile speech recognition model that can transcribe speech as well as perform multi-lingual translation and recognition tasks.
It was trained on diverse datasets to give human-level speech recognition performance without the need for fine tuning. 

Optimum Graphcore is the interface between the Hugging Face Transformers library and [Graphcore IPUs](https://www.graphcore.ai/products/ipu).
It provides a set of tools enabling model parallelization and loading on IPUs, training and fine-tuning on all the tasks already supported by Transformers while being compatible with the Hugging Face Hub and every model available on it out of the box.

> **Hardware requirements:** The Whisper models `whisper-tiny`, `whisper-base` and `whisper-small` can run two replicas on the smallest IPU-POD4 machine. The most capable model, `whisper-large`, will need to use either an IPU-POD16 or a Bow Pod16 machine. Please contact Graphcore if you'd like assistance running model sizes that don't work in this simple example notebook.

[![Join our Slack Community](https://img.shields.io/badge/Slack-Join%20Graphcore's%20Community-blue?style=flat-square&logo=slack)](https://www.graphcore.ai/join-community)

## Environment setup

In order to run this notebook you will need to be in an environment with the Poplar SDK installed and enabled. This is done by default on Paperspace. If you are not using Paperspace, refer to the [getting started guide](https://docs.graphcore.ai/en/latest/getting-started.html#getting-started) for your system for a description of how to set this up.

We also need the Optimum Graphcore interface to the Hugging Face Transformers library, and there are a few extra dependencies we need to be able to handle audio.


In order to improve usability and support for future users, Graphcore would like to collect information about the
applications and code being run in this notebook. The following information will be anonymised before being sent to Graphcore:

- User progression through the notebook
- Notebook details: number of cells, code being run and the output of the cells
- Environment details

You can disable logging at any time by running `%unload_ext gc_logger` from any cell.

In [2]:
!pip install ipython==7.16.3 ipykernel==5.5.6

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu:
Collecting ipython==7.16.3
  Using cached ipython-7.16.3-py3-none-any.whl (783 kB)
Collecting ipykernel==5.5.6
  Using cached ipykernel-5.5.6-py3-none-any.whl (121 kB)
Installing collected packages: ipython, ipykernel
  Attempting uninstall: ipython
    Found existing installation: ipython 8.12.2
    Uninstalling ipython-8.12.2:
      Successfully uninstalled ipython-8.12.2
  Attempting uninstall: ipykernel
    Found existing installation: ipykernel 6.23.3
    Uninstalling ipykernel-6.23.3:
      Successfully uninstalled ipykernel-6.23.3
Successfully installed ipykernel-5.5.6 ipython-7.16.3


In [3]:
!pip list

Package                  Version
------------------------ ------------
accelerate               0.20.3
agate                    1.7.1
agate-dbf                0.2.2
agate-excel              0.2.5
agate-sql                0.5.9
aiofiles                 22.1.0
aiohttp                  3.8.4
aiosignal                1.3.1
aiosqlite                0.19.0
anyio                    3.7.0
appdirs                  1.4.4
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
async-timeout            4.0.2
attrs                    23.1.0
audioread                3.0.0
awscli                   1.27.163
Babel                    2.12.1
backcall                 0.2.0
beautifulsoup4           4.12.2
bleach                   6.0.0
boto3                    1.26.163
botocore                 1.29.163
certifi                  2023.5.7
cffi                     1.15.1
charset-normalizer       3.1.0
cmake                    3.26.3
colorama

In [4]:
%pip install "optimum-graphcore==0.6.1"
%pip install soundfile==0.12.1 librosa==0.10.0.post2 tokenizers==0.12.1
# %pip install matplotlib
# %matplotlib inline
%pip install examples-utils[common]@git+https://github.com/graphcore/examples-utils@latest_stable
# %load_ext examples_utils.notebook_logging.gc_logger

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu:
Note: you may need to restart the kernel to use updated packages.
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu:
Note: you may need to restart the kernel to use updated packages.
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu:
Collecting examples-utils[common]@ git+https://github.com/graphcore/examples-utils@latest_stable
  Cloning https://github.com/graphcore/examples-utils (to revision latest_stable) to /tmp/pip-install-7ysp3daj/examples-utils_def49af3cf634b77bd06d6888f0fb65e
  Running command git clone --filter=blob:none --quiet https://github.com/graphcore/examples-utils /tmp/pip-install-7ysp3daj/examples-utils_def49af3cf634b77bd06d6888f0fb65e
  Running command git checkout -q 40c62e6646db8f9d60d1707a61204c95a15c7ccb
  Resolved https://github.com/graphcore/examples-utils to commit 40c62e6646db8f9d60d1707a61204c95a15c7ccb
  Prepari

## Running Whisper on the IPU

We start by importing the required modules, some of which are needed to configure the IPU.


In [5]:
import os
os.environ["POPART_LOG_LEVEL"] = "INFO"
os.environ["POPLAR_LOG_LEVEL"] = "INFO"

In [6]:
# Generic imports
from datasets import load_dataset
# import matplotlib
# import librosa
# import IPython
# import random

# IPU-specific imports
from optimum.graphcore import IPUConfig
from optimum.graphcore.modeling_utils import to_pipelined

# HF-related imports
from transformers import WhisperProcessor, WhisperForConditionalGeneration

The Whisper model is available on Hugging Face in several sizes, from `whisper-tiny` with 39M parameters to `whisper-large` with 1550M parameters.

We download `whisper-tiny` which we will run using two IPUs.
The [Whisper architecture](https://openai.com/research/whisper) is an encoder-decoder Transformer, with the audio split into 30-second chunks.
For simplicity one IPU is used for the encoder part of the graph and another for the decoder part.
The `IPUConfig` object helps to configure the model to be pipelined across the IPUs.

In [7]:
model_spec = "openai/whisper-tiny.en"

# Instantiate processor and model
processor = WhisperProcessor.from_pretrained(model_spec)
model = WhisperForConditionalGeneration.from_pretrained(model_spec)

# Adapt whisper-tiny to run on the IPU
ipu_config = IPUConfig(ipus_per_replica=2)
pipelined_model = to_pipelined(model, ipu_config)
pipelined_model = pipelined_model.parallelize(for_generation=True).half()

Now we can load the dataset and process an example audio file.
If precompiled models are not available, then the first run of the model triggers two graph compilations.
This means that our first test transcription could take a minute or two to run, but subsequent runs will be much faster.

In [8]:
# load the dataset and read an example sound file
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
test_sample = ds[2]
sample_rate = test_sample['audio']['sampling_rate']

Found cached dataset librispeech_asr_dummy (/tmp/huggingface_caches/datasets/hf-internal-testing___librispeech_asr_dummy/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b)


In [9]:
def transcribe(data, rate):
    input_features = processor(data, return_tensors="pt", sampling_rate=rate).input_features.half()

    # This triggers a compilation, unless a precompiled model is available.
    sample_output = pipelined_model.generate(input_features, max_length=448, min_length=3)
    transcription = processor.batch_decode(sample_output, skip_special_tokens=True)[0]
    return transcription

In [None]:

test_transcription = transcribe(test_sample["audio"]["array"], sample_rate)

2023-06-30T13:38:56.603301Z popart:pattern 1225.1225 I: Pattern TiedGather 1
2023-06-30T13:38:56.603335Z popart:pattern 1225.1225 I: Pattern TiedGatherAccumulate 1
2023-06-30T13:38:56.645276Z popart:builder 1225.1225 I: Setting domain '' to opset version 11
2023-06-30T13:38:56.649238Z popart:builder 1225.1225 I: Setting domain 'ai.graphcore' to opset version 1
2023-06-30T13:38:57.292347Z popart:session 1225.1225 I: Popart version: 3.3.0+7857 (b88fd7c399)
2023-06-30T13:38:57.292367Z popart:session 1225.1225 I: Popart release githash: b67b751185
Graph compilation:   0%|          | 0/100 [00:00<?]2023-06-30T13:38:57.295811Z popart:popart 1225.1225 I: Onnx Model Info ir_version:4, producer:., domain:"", model_version:0 num_opsets:2
2023-06-30T13:38:57.295822Z popart:popart 1225.1225 I: Onnx Model OpSet domain:"" version:11
2023-06-30T13:38:57.295825Z popart:popart 1225.1225 I: Onnx Model OpSet domain:"ai.graphcore" version:1
2023-06-30T13:38:57.295831Z popart:popart 1225.1225 I: Onnx Graph

In [None]:
print("Compilation succeeded, end of test.")
pipelined_model.detachFromDevice()

In [None]:
!pip list