# Running Automated Speech Recognition (ASR) using a fine-tuned wav2vec 2.0 checkpoint on IPU

This notebook will demonstrate how to perform wav2vec 2.0 inference with PyTorch on the Graphcore IPUs. We will use a `wav2vec2-base` model fine-tuned for a CTC downstream task using LibriSpeech.

We will show how to use a wav2vec 2.0 model written in PyTorch from the 🤗`transformers` library from HuggingFace and paralllize it using the 🤗`optimum-graphcore` library.

### Running on Paperspace

The Paperspace environment lets you run this notebook with no set up. To improve your experience we preload datasets and pre-install packages, this can take a few minutes, if you experience errors immediately after starting a session please try restarting the kernel before contacting support. If a problem persists or you want to give us feedback on the content of this notebook, please reach out to through our community of developers using our [slack channel](https://www.graphcore.ai/join-community) or raise a [GitHub issue](https://github.com/gradient-ai/Graphcore-HuggingFace/issues).


Requirements:
- Python packages installed with `python -m pip install -r requirements.txt`

In [None]:
%%bash
apt update
apt-get install libsndfile1 -y

In [None]:
%pip install -r requirements.txt

In [None]:
from examples_utils import notebook_logging
%load_ext gc_logger

### Graphcore Hugging Face models
Hugging Face provides convenient access to pre-trained transformer models. The partnership between Hugging Face and Graphcore allows us to run these models on the IPU.

Hugging Face models ported to the IPU can be found on the Graphcore organisation page on Hugging Face. 

### Utility imports
We start by importing the utilities that will be used later in the tutorial: 

In [None]:
import logging
from tqdm import tqdm
from dataclasses import dataclass, field
from pathlib import Path

import torch
import poptorch

from datasets import load_dataset
from optimum.graphcore import IPUConfig
from optimum.graphcore.modeling_utils import to_pipelined
from transformers import (
    AutoModelForCTC,
    Wav2Vec2Processor,
    HfArgumentParser,
)
from transformers.utils import check_min_version
from transformers.utils.versions import require_version

Values for machine size and cache directories can be configured through environment variables or directly in the notebook:

In [None]:
import os

pod_type = os.getenv("GRAPHCORE_POD_TYPE", "pod4")
executable_cache_dir = os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "/tmp/exe_cache/") + "/wav2vec2_inference"
checkpoint_directory = Path(os.getenv("PERSISTENT_CHECKPOINT_DIR", "/tmp")) / "demo"

## Preparing the model

This notebook uses the model output from the fine-tuning notebook. If you have not run the fine-tuning notebook, or don't have an output directory, then this script will not run.

As this model does not require optimising, the full `base` inference model can fit on a single IPU. This makes the IPU configuration straightforward. The `num_device_iterations` will control how many iterations the IPU will perform before returning to host. With this set to 10, 10 utterances will be sent to the IPU, processed, and sent back as a block of 10. 

We create the pipelined version of the model which makes changes for the IPU version of the model. And finally convert the model into a `poptorch.inferenceModel`.

In [None]:
processor = Wav2Vec2Processor.from_pretrained(checkpoint_directory)
model = AutoModelForCTC.from_pretrained(checkpoint_directory)

num_device_iterations = 10
ipu_config = IPUConfig(inference_device_iterations=num_device_iterations, executable_cache_dir=executable_cache_dir)
opts = ipu_config.to_options(for_inference=True)

ipu_model = to_pipelined(model, ipu_config)
ipu_model.parallelize()

inference_model = poptorch.inferenceModel(ipu_model.half().eval(), options=opts)

In [None]:
model.config

### Compilation

The sample batch is an example of what a batch could look like. Effectively, we are setting the static size for the model input. The first dimension is the product of the `batch_size` and `num_device_iterations`. However, in this case the batch size is just 1. The second dimension is the maximum audio length in samples. We've set this to 20 seconds.

The model will then compile for this input size. If the size is changed later the model will recompile.

In [None]:
max_samples = 400000
sample_batch = {"input_values": torch.zeros([num_device_iterations, max_samples], dtype=torch.half)}

inference_model.compile(**sample_batch)

### LibriSpeech inference

We will test the inference capabilities of a fine-tuned model on a portion of the LibriSpeech `test` split. First, download the dataset using the 🤗`datasets` library from HuggingFace.



In [None]:
ds = load_dataset("librispeech_asr", "clean", split="test")

### Create a batch

Here we take examples from LibriSpeech test and place them into a `zeros` tensor to create a batch.

In [None]:
x = torch.zeros([num_device_iterations, max_samples], dtype=torch.half)

for i in range(num_device_iterations):
    input_values = processor(
        ds[i]["audio"]["array"], return_tensors="pt", padding="longest"
    ).input_values  # Batch size 1
    length = input_values.size(1)
    x[i, :length] = input_values[0]

batch = {"input_values": x}

## Run inference

Running the model will perform `num_device_iterations` on the IPU before returning to host. This means that all of our logits will be returned at once.

In [None]:
output = inference_model(**batch)

### Decode

The max arg of the logits is taked from every frame of the output, this is a 'greedy decode' strategy. The processor will then convert the predicted indexes back into text, and the transcripts will be printed.

In [None]:
logits = output[0]
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

In [None]:
transcription

### Optional: release IPUs in use

The IPython kernel has a lock on the IPUs used in running the model, preventing other users from using them. For example, if you wish to use other notebooks after working your way through this one, it may be necessary to manually run the following cell to release IPUs from use. This will happen by default if using the `Run All` option. More information on the topic can be found at [Managing IPU Resources](https://github.com/gradient-ai/Graphcore-HuggingFace/blob/main/useful-tips/managing_ipu_resources.ipynb).

In [None]:
if inference_model.isAttachedToDevice():
    inference_model.detachFromDevice()