# wav2vec 2.0 Inference on IPU

This notebook will demonstrate how to perform wav2vec 2.0 inference with PyTorch on the Graphcore IPU-POD16 system. We will use a `wav2vec2-base` model fine-tuned for a CTC downstream task using LibriSpeech.

We will show how to use a wav2vec 2.0 model written in PyTorch from the 🤗`transformers` library from HuggingFace and paralllize it easily using the 🤗`optimum-graphcore` library.

### Environment

Requirements:
- A Poplar SDK environment enabled (see the [Getting Started](https://docs.graphcore.ai/en/latest/getting-started.html) guide for your IPU system)
- Python packages installed with `python -m pip install -r requirements.txt`

In [1]:
%pip install -r requirements.txt

Looking in indexes: https://pypi.org/simple, https://pypi.python.org/simple/
Looking in links: https://download.pytorch.org/whl/torch_stable.html


Note: you may need to restart the kernel to use updated packages.


To run this Jupyter notebook on a remote IPU machine:
1. Enable a Poplar SDK environment 
(see the [Getting Started](https://docs.graphcore.ai/en/latest/getting-started.html) 
 guide for your IPU system) and install required packages with `python -m pip install -r requirements.txt`
2. In the same environment, install the Jupyter notebook server: `python -m pip install notebook`
3. Launch a Jupyter Server on a specific port: `jupyter-notebook --no-browser --port <port number>`
4. Connect via SSH to your remote machine, forwarding your chosen port:
`ssh -NL <port number>:localhost:<port number> <your username>@<remote machine>`

For more details about this process, or if you need troubleshooting, 
see our [guide on using IPUs from Jupyter notebooks](../../standard_tools/using_jupyter/README.md)."

### Graphcore Hugging Face models
Hugging Face provides convenient access to pre-trained transformer models. The partnership between Hugging Face and Graphcore allows us to run these models on the IPU.

Hugging Face models ported to the IPU can be found on the Graphcore organisation page on Hugging Face. 

### Utility imports
We start by importing the utilities that will be used later in the tutorial: 

In [2]:
import logging
from tqdm import tqdm
from dataclasses import dataclass, field

import torch
import poptorch

from datasets import load_dataset
from optimum.graphcore import IPUConfig
from optimum.graphcore.modeling_utils import to_pipelined
from transformers import (
    AutoModelForCTC,
    Wav2Vec2Processor,
    HfArgumentParser,
)
from transformers.utils import check_min_version
from transformers.utils.versions import require_version

  from .autonotebook import tqdm as notebook_tqdm


## Preparing the Model

This notebook will be using the model output from the finetuning notebook. If you have not run the finetuning notebook, nor have a output directory, then this script will not run.

As this model does not require optimising the full `base` inference model can fit on a single IPU, this makes the IPU configuration very simple. The `num_device_iterations` will control how many iterations the IPU will perform before returning to host. With this set to 10, 10 utterances will be sent to the IPU, processed, and sent back as a block of 10. 

We create the pipelined version of the model which makes changes for the IPU version of the model. And finally convert the model into a `poptorch.inferenceModel`.

In [3]:
processor = Wav2Vec2Processor.from_pretrained("demo")
model = AutoModelForCTC.from_pretrained("demo")

num_device_iterations = 10
ipu_config = IPUConfig(inference_device_iterations=num_device_iterations)
opts = ipu_config.to_options(for_inference=True)

ipu_model = to_pipelined(model, ipu_config)
ipu_model.parallelize()

inference_model = poptorch.inferenceModel(ipu_model.half().eval(), options=opts)

In [4]:
model.config

Wav2Vec2Config {
  "_name_or_path": "./demo",
  "activation_dropout": 0.0,
  "adapter_kernel_size": 3,
  "adapter_stride": 2,
  "add_adapter": false,
  "apply_spec_augment": true,
  "architectures": [
    "PoptorchPipelinedWav2Vec2ForCTC"
  ],
  "attention_dropout": 0.1,
  "bos_token_id": 1,
  "classifier_proj_size": 256,
  "codevector_dim": 256,
  "contrastive_logits_temperature": 0.1,
  "conv_bias": false,
  "conv_dim": [
    512,
    512,
    512,
    512,
    512,
    512,
    512
  ],
  "conv_kernel": [
    10,
    3,
    3,
    3,
    3,
    2,
    2
  ],
  "conv_stride": [
    5,
    2,
    2,
    2,
    2,
    2,
    2
  ],
  "ctc_loss_reduction": "mean",
  "ctc_zero_infinity": false,
  "diversity_loss_weight": 0.1,
  "do_stable_layer_norm": false,
  "eos_token_id": 2,
  "feat_extract_activation": "gelu",
  "feat_extract_norm": "group",
  "feat_proj_dropout": 0.1,
  "feat_quantizer_dropout": 0.0,
  "final_dropout": 0.0,
  "freeze_feat_extract_train": true,
  "hidden_act": "gelu

### Compilation

The sample batch is an example of what a batch could look like. Effectively we are setting the static size for the model input. The first dimension is the product of the `batch_size` and `num_device_iterations`, however in this case the batch size is just 1. The second dimension is the maximum audio length in samples, we've set this to 20 seconds.

The model will then compile for this input size. If the size is changed later the model will recompile.

In [5]:
max_samples = 400000
sample_batch = {"input_values": torch.zeros([num_device_iterations, max_samples])}

inference_model.compile(**sample_batch)

Graph compilation: 100%|██████████| 100/100 [02:29<00:00]


### LibriSpeech Inferecence

We will test the inference capabilities of a finetuned model on a portion of the LibriSpeech `test` split. First, download the dataset using the 🤗`datasets` library from HuggingFace.



In [6]:
ds = load_dataset("librispeech_asr", "clean", split="test")

Reusing dataset librispeech_asr (/home/thorinf/.cache/huggingface/datasets/librispeech_asr/clean/2.1.0/14c8bffddb861b4b3a4fcdff648a56980dbb808f3fc56f5a3d56b18ee88458eb)


### Create a Batch

Here we take examples from LibriSpeech test and place them into a `zeros` Tensor to create a batch.

In [7]:
x = torch.zeros([num_device_iterations, max_samples])

for i in range(num_device_iterations):
    input_values = processor(
        ds[i]["audio"]["array"], return_tensors="pt", padding="longest"
    ).input_values  # Batch size 1
    length = input_values.size(1)
    x[i, :length] = input_values[0]

batch = {"input_values": x}

It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the ``sampling_rate`` argument to this function. Failing to do so can result in silent errors that might be hard to debug.
It is strongly recommended to pass the ``sampling_ra

## Run Inference

Running the model will perform `num_device_iterations` on the IPU before returning to host. This means that all of our logits will be returned at once.

In [8]:
output = inference_model(**batch)

### Decode

The max arg of the logits is taked from every frame of the output, this is a 'greedy decode' strategy. The processor will then convert the predicted indexes back into text, and the transcripts will be printed.

In [9]:
logits = output[0]
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)

In [10]:
transcription

['concord returned to its place amidst the tents',
 'the english foted to the french baskets of flowers of which they had made a plentifl provision to greet the arrival of the young princess the french in return invited the english to a supper which was to be given the next day',
 'congratulations were poured in upon the princess everywhere during her journey',
 'from the respect paid her on all sides she seemed like a queen and from the adoiration with which she was treated by two or three she appeared an object of worship the queenmother gave the french the most affectionate reception france was her native country and she had suffered too much unhappiness in england for england to have made her forgeve france',
 'she taught her daughter then by her own affection for it that love for a country where they had both been hospitably received and where a brilliant future opened for them',
 'the count had thrown himself back on his seat leaning his shoulders against the partition of the ten