<a href="https://colab.research.google.com/github/sanchit-gandhi/notebooks/blob/main/Distil_Whisper_Benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Distil-Whisper

Distil-Whisper is a distilled version of the Whisper model that is 6 times faster, 49% smaller, and performs within 1% WER on out-of-distribution evaluation sets.

First, let's quickly recap the Whisper model. Whisper is a general purpose speech recognition proposed by OpenAI in the paper [*Robust Speech Recognition via Large-Scale Weak Supervision*](https://cdn.openai.com/papers/whisper.pdf).

The Whisper architecture is a Transformer based encoder-decoder model. First, the encoder maps the input audio to encoder hidden-states in a single forward pass. The decoder then auto-regressively predicts text tokens, conditional on both the previous tokens and the encoder hidden-states.

OpenAI's best Whisper checkpoint, named [Whisper large-v2](https://huggingface.co/openai/whisper-large-v2), has 32 encoder layers and 32 decoder layers. 32 layers is quite a lot! Let's visualize the model:

![img](https://huggingface.co/datasets/patrickvonplaten/images_distil/resolve/main/whiser_arch_old.png)

Here, $\mathbf{X}_{1:T}$ represents the speech input. It is mapped by the encoder (shown in green) through a single forward pass. The encoder ouputs, i.e. the  encoder hidden-states $\mathbf{H}_{1:M}$, are then used in the cross-attention layers in each decoder block.

Starting with a start-of-sequence token $y_0$, the decoder (shown in orange) auto-regressively generates the text tokens in the transcription. In the visiualization above, there are 5 decoder forward passes, one for each $\mathbf{P}(y_i | \mathbf{y}_{0: i - 1}),  \forall i$.

In practice, the decoder is run up to 128 times (depending on the length of the transcription), which means that there many more forward passes through the decoder then the encoder. The result is that the decoder is responsible for over **90% of the inference time** in Whisper.

This is the motivation behind Distil-Whipser: we make the decoder faster in order to speed-up the inference time of model. With this in mind, let's take a look at the Distil-Whisper architecture:

![img](https://huggingface.co/datasets/patrickvonplaten/images_distil/resolve/main/distil_arch_old.png)

Just two decoder layers! That means to generate a transcription of 128 tokens, Distil-Whisper needs to run only 256 decoder layer forward passes, while Whisper large-v2 has to run 4096 forward passes. Since the encoder is only run once, we copy the entire encoder and *freeze* it during training. This means Distil-Whisper inherits Whisper's robustness to different audio conditions.

## Benchmarking

Great, now that we've understood why Distil-Whisper should be faster in theory, let's see if it holds true in practice.

To begin with, we install `transformers`, `accelerate`, and `datasets`.

In this notebook, we use a A100 GPU that is available through a Colab pro subscription, as this is the device we used for benchmarking in the [Distil-Whisper paper](https://huggingface.co/papers/2311.00430). Other GPUs will most likely lead to different speed-ups, but they should be in the same ballpark range:

In [None]:
!pip install --upgrade --quiet transformers accelerate datasets

Collecting transformers
  Downloading transformers-4.34.1-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m49.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.24.1-py3-none-any.whl (261 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m261.4/261.4 kB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting datasets
  Downloading datasets-2.14.6-py3-none-any.whl (493 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m493.7/493.7 kB[0m [31m45.4 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m35.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86

In addition, we will make use of [Flash Attention 2](), as it saves
a lot of memory and speeds up large matmul operations.

In [None]:
!pip install --quiet flash-attn --no-build-isolation



To begin with, let's load the dataset that we will use for benchmarking. We'll load a small dataset consisting of 73 samples from the [LibriSpeech ASR](https://huggingface.co/datasets/librispeech_asr) validation-clean dataset. This amounts to ~9MB of data, so it's very lightweight and quick to download on device:

In [None]:
from datasets import load_dataset

dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

We start by benchmarking [Whisper large-v2](https://huggingface.co/openai/whisper-large-v2) to get our baseline number. We'll load the model in `float16` precision and make sure that loading time takes as little time as possible by passing `low_cpu_mem_usage=True`. In addition, we want to make sure that the model is loaded in [`safetensors`](https://github.com/huggingface/safetensors) format by passing `use_safetensors=True`:

In [None]:
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
import torch

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "openai/whisper-large-v2"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, use_flash_attention_2=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_id)

You are attempting to use Flash Attention 2.0 with a model initialized on CPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.


Downloading (…)neration_config.json:   0%|          | 0.00/4.26k [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/185k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/805 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.48M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/494k [00:00<?, ?B/s]

Downloading (…)main/normalizer.json:   0%|          | 0.00/52.7k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/34.6k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.08k [00:00<?, ?B/s]

Great! For the benchmark, we will only measure the generation time (encoder + decoder), so let's write a short helper function that measures this step:

In [None]:
import time

def generate_with_time(model, inputs):
    start_time = time.time()
    outputs = model.generate(**inputs)
    generation_time = time.time() - start_time
    return outputs, generation_time

This function will return both the decoded tokens as well as the time
it took to run the model.

We now iterate over the audio samples and sum up the generation time.

In [None]:
from tqdm import tqdm

all_time = 0

for sample in tqdm(dataset):
  audio = sample["audio"]
  inputs = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt")
  inputs = inputs.to(device=device, dtype=torch.float16)

  output, gen_time = generate_with_time(model, inputs)
  all_time += gen_time
  print(processor.batch_decode(output, skip_special_tokens=True))

print(all_time)

  1%|▏         | 1/73 [00:00<01:10,  1.02it/s]

[' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.']


  3%|▎         | 2/73 [00:01<01:00,  1.18it/s]

[" Nor is Mr. Quilter's manner less interesting than his matter."]


  4%|▍         | 3/73 [00:03<01:20,  1.15s/it]

[' He tells us that at this festive season of the year, with Christmas and roast beef looming before us, similes drawn from eating and its results occur most readily to the mind.']


  5%|▌         | 4/73 [00:04<01:24,  1.23s/it]

[" He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca."]


  7%|▋         | 5/73 [00:08<02:19,  2.05s/it]

[" Linnell's pictures are a sort of up-guards-and-addams paintings, and Mason's exquisite idylls are as national as a jingo poem. Mr. Burkett Foster's landscapes smile at one much in the same way that Mr. Carker used to flash his teeth. And Mr. John Collier gives his sitter a cheerful slap on the back, before he says, like a shampoo-er in a Turkish bath,"]


  8%|▊         | 6/73 [00:09<01:52,  1.68s/it]

[' It is obviously unnecessary for us to point out how luminous these criticisms are, how delicate in expression.']


 10%|▉         | 7/73 [00:09<01:33,  1.41s/it]

[' On the general principles of art, Mr. Quilter writes with equal lucidity.']


 11%|█         | 8/73 [00:11<01:24,  1.31s/it]

[' Painting, he tells us, is of a different quality to mathematics, and Finnish in art is adding more factor.']


 12%|█▏        | 9/73 [00:11<01:13,  1.15s/it]

[' As for etchings, they are of two kinds, British and foreign.']


 14%|█▎        | 10/73 [00:13<01:30,  1.44s/it]

[' He laments most bitterly the divorce that has been made between decorative art and what we usually call pictures, makes a customary appeal to the last judgment, and reminds us that in the great days of art Michelangelo was the furnishing upholsterer.']


 15%|█▌        | 11/73 [00:14<01:17,  1.26s/it]

[' near the fire, and the ornaments Fred brought home from India on the mantelboard.']


 16%|█▋        | 12/73 [00:16<01:24,  1.38s/it]

[' In fact, he is quite severe on Mr. Ruskin for not recognizing that a picture should denote the frailty of man, and remarks with pleasing courtesy and felicitous grace that many phases of feeling']


 18%|█▊        | 13/73 [00:17<01:09,  1.17s/it]

[' Only, unfortunately, his own work never does get good.']


 19%|█▉        | 14/73 [00:18<01:07,  1.14s/it]

[' Mr. Quilter has missed his chance, for he has failed even to make himself the tupper of painting.']


 21%|██        | 15/73 [00:18<00:56,  1.03it/s]

[' by Harry Quilter M.A.']


 22%|██▏       | 16/73 [00:20<01:04,  1.13s/it]

[' Because you were sleeping instead of conquering, the lovely rose princess has become a fiddle without a bow, while poor Shaggy sits there a cooing dove.']


 23%|██▎       | 17/73 [00:21<01:09,  1.24s/it]

[' He has gone, and gone for good," answered Polychrome, who had managed to squeeze into the room beside the dragon, and had witnessed the occurrences with much interest.']


 25%|██▍       | 18/73 [00:23<01:11,  1.29s/it]

[' I have remained a prisoner only because I wished to be one." And with this he stepped forward and burst the stout chains as easily as if they had been threads.']


 26%|██▌       | 19/73 [00:24<01:03,  1.17s/it]

[' The little girl had been asleep, but she heard the raps and opened the door.']


 27%|██▋       | 20/73 [00:24<00:54,  1.04s/it]

[' The king has fled the disgrace and your friends are asking for you.']


 29%|██▉       | 21/73 [00:25<00:52,  1.00s/it]

[' I begged Ruggedo long ago to send him away, but he would not do so.']


 30%|███       | 22/73 [00:26<00:48,  1.05it/s]

[' I also offered to help your brother to escape, but he would not go.']


 32%|███▏      | 23/73 [00:27<00:43,  1.15it/s]

[' He eats and sleeps very steadily, replied the new king.']


 33%|███▎      | 24/73 [00:27<00:40,  1.21it/s]

[' I hope he doesn\'t work too hard," said Shaggy.']


 34%|███▍      | 25/73 [00:28<00:34,  1.37it/s]

[" He doesn't work at all."]


 36%|███▌      | 26/73 [00:29<00:44,  1.05it/s]

[" In fact, there's nothing he can do in these dominions as well as our gnomes, whose numbers are so great that it worries us to keep them all busy."]


 37%|███▋      | 27/73 [00:30<00:37,  1.22it/s]

[' Not exactly, return calico.']


 38%|███▊      | 28/73 [00:30<00:32,  1.40it/s]

[' Where is my brother now?']


 40%|███▉      | 29/73 [00:31<00:30,  1.45it/s]

[' inquired Shaggy in the metal forest.']


 41%|████      | 30/73 [00:31<00:25,  1.66it/s]

[' Where is that?']


 42%|████▏     | 31/73 [00:32<00:31,  1.35it/s]

[' The metal forest is in the great domed cavern, the largest in all our dominions," replied Calico.']


 44%|████▍     | 32/73 [00:33<00:26,  1.55it/s]

[' Calico hesitated.']


 45%|████▌     | 33/73 [00:34<00:28,  1.40it/s]

[' However, if we look sharp, we may be able to discover one of these secret ways.']


 47%|████▋     | 34/73 [00:34<00:26,  1.45it/s]

[" Oh no, I'm quite sure he didn't."]


 48%|████▊     | 35/73 [00:35<00:25,  1.49it/s]

[" That's funny, remarked Betsy thoughtfully."]


 49%|████▉     | 36/73 [00:36<00:26,  1.42it/s]

[" I don't believe Anne knew any magic or she'd have worked it before."]


 51%|█████     | 37/73 [00:36<00:24,  1.48it/s]

[' I do not know, confessed Shaggy.']


 52%|█████▏    | 38/73 [00:37<00:21,  1.64it/s]

[' True," agreed Calico.']


 53%|█████▎    | 39/73 [00:38<00:27,  1.22it/s]

[' Calico went to the big gong and pounded on it just as Virgadu used to do, but no one answered the summons.']


 55%|█████▍    | 40/73 [00:40<00:39,  1.19s/it]

[" Having returned to the royal cavern, Calico first pounded the gong and then sat in the throne, wearing Ruggedo's discarded ruby crown and holding in his hand the scepter which Ruggedo had so often thrown at his head."]


 56%|█████▌    | 41/73 [00:41<00:32,  1.03s/it]

[' A man said to the universe, Sir, I exist.']


 58%|█████▊    | 42/73 [00:42<00:32,  1.05s/it]

[" Sweat covered Brionne's body, trickling into the tight loincloth that was the only garment he wore."]


 59%|█████▉    | 43/73 [00:44<00:36,  1.21s/it]

[' The cut on his chest still dripping blood, the ache of his overstrained eyes, even the soaring arena around him with thousands of spectators were trivialities not worth thinking about.']


 60%|██████    | 44/73 [00:44<00:31,  1.09s/it]

[' His instant panic was followed by a small sharp blow high on his chest.']


 62%|██████▏   | 45/73 [00:45<00:27,  1.00it/s]

[' One minute, a voice said, and a time buzzer sounded.']


 63%|██████▎   | 46/73 [00:46<00:26,  1.03it/s]

[' A minute is not a very large measure of time and his body needed every fraction of it.']


 64%|██████▍   | 47/73 [00:47<00:23,  1.10it/s]

[" The buzzer's whirr triggered his muscles into complete relaxation."]


 66%|██████▌   | 48/73 [00:48<00:21,  1.18it/s]

[' Only his heart and lungs worked on at a strong measured rate.']


 67%|██████▋   | 49/73 [00:48<00:19,  1.25it/s]

[' He was in reverie, sliding along the borders of consciousness.']


 68%|██████▊   | 50/73 [00:49<00:20,  1.11it/s]

[' The contestants in the twenties needed undisturbed rest. Therefore, nights in the dormitories were as quiet as death.']


 70%|██████▉   | 51/73 [00:51<00:22,  1.00s/it]

[' Particularly so on this last night, when only two of the little cubicles were occupied, the thousands of others standing with dark, empty doors.']


 71%|███████   | 52/73 [00:51<00:19,  1.10it/s]

[' The other voice snapped with a harsh urgency clearly used to command.']


 73%|███████▎  | 53/73 [00:52<00:19,  1.04it/s]

[" I'm here because the matter is of utmost importance, and Brand is the one I must see. Now stand aside!"]


 74%|███████▍  | 54/73 [00:53<00:14,  1.28it/s]

[' The twenties?']


 75%|███████▌  | 55/73 [00:54<00:16,  1.11it/s]

[" He must have drawn his gun, because the intruder said quickly, put that away, you're being a fool. Out."]


 77%|███████▋  | 56/73 [00:55<00:15,  1.12it/s]

[' There was silence then, and, still wondering, Brienne was once more asleep.']


 78%|███████▊  | 57/73 [00:55<00:11,  1.34it/s]

[' 10 seconds...']


 79%|███████▉  | 58/73 [00:56<00:10,  1.38it/s]

[' He asked the handler who was needing his aching muscles.']


 81%|████████  | 59/73 [00:57<00:11,  1.24it/s]

[' A red-haired mountain of a man with an apparently inexhaustible store of energy.']


 82%|████████▏ | 60/73 [00:58<00:10,  1.24it/s]

[' There could be little art in this last and final round of fencing.']


 84%|████████▎ | 61/73 [00:58<00:09,  1.30it/s]

[' Just thrust and parry and victory to the stronger.']


 85%|████████▍ | 62/73 [00:59<00:08,  1.33it/s]

[' Every man who entered the 20s had his own training tricks.']


 86%|████████▋ | 63/73 [01:00<00:08,  1.17it/s]

[' There appeared to be an immediate association with the death trauma as if the two were inextricably linked into one.']


 88%|████████▊ | 64/73 [01:01<00:08,  1.03it/s]

[' The strength that enables someone in a trance to hold his body stiff and unsupported except at two points, the head and heels.']


 89%|████████▉ | 65/73 [01:02<00:06,  1.20it/s]

[' This is physically impossible when conscious.']


 90%|█████████ | 66/73 [01:03<00:06,  1.09it/s]

[' Others had died before during the 20s, and death during the last round was, in some ways, easier than defeat.']


 92%|█████████▏| 67/73 [01:04<00:05,  1.07it/s]

[' Breathing deeply, Brianne softly spoke the auto-hypnotic phrases that triggered the process.']


 93%|█████████▎| 68/73 [01:05<00:04,  1.09it/s]

[' When the buzzer sounded, he pulled his foil from his second startled grasp and ran forward.']


 95%|█████████▍| 69/73 [01:06<00:03,  1.12it/s]

[' Irolde looked amazed at the sudden fury of the attack, then smiled.']


 96%|█████████▌| 70/73 [01:07<00:02,  1.11it/s]

[' He thought it was the last burst of energy. He knew how close they both were to exhaustion.']


 97%|█████████▋| 71/73 [01:08<00:01,  1.10it/s]

[" Breon saw something close to panic on his opponent's face when the man finally recognized his error."]


 99%|█████████▊| 72/73 [01:09<00:00,  1.05it/s]

[' A wave of despair rolled out from Irolg. Brienne sensed it and knew the fifth point was his.']


100%|██████████| 73/73 [01:09<00:00,  1.05it/s]

[" Then the powerful twist that's rested aside, in and under the guard."]
62.970786571502686





Alright! In total it took roughly 63 seconds to transcribe 73 audio samples.

Next, let's see how much time it takes with [Distil-Whisper](https://huggingface.co/distil-whisper/distil-large-v2):

In [None]:
model_id = "distil-whisper/distil-large-v2

distil_model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True, use_flash_attention_2=True
)
distil_model = distil_model.to(device)

You are attempting to use Flash Attention 2.0 with a model initialized on CPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.


We run the same benchmarking loop:

In [None]:
all_time = 0

for sample in tqdm(dataset):
  audio = sample["audio"]
  inputs = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt")
  inputs = inputs.to(device=device, dtype=torch.float16)

  output, gen_time = generate_with_time(distil_model, inputs)
  all_time += gen_time
  print(processor.batch_decode(output, skip_special_tokens=True))

print(all_time)

  3%|▎         | 2/73 [00:00<00:13,  5.17it/s]

[' Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel.']
[" Nor is Mr. Quilter's manner less interesting than his matter."]


  4%|▍         | 3/73 [00:00<00:16,  4.20it/s]

[' He tells us that at this festive season of the year, with Christmas and roast beef looming before us, similes drawn from eating and its results occur most readily to the mind.']


  5%|▌         | 4/73 [00:00<00:16,  4.11it/s]

[" He has grave doubts whether Sir Frederick Leighton's work is really Greek after all, and can discover in it but little of Rocky Ithaca."]


  7%|▋         | 5/73 [00:01<00:22,  2.96it/s]

[" Lennel's pictures are a sort of upgards and atom paintings, and Mason's exquisite idles are as national as a jingo poem. Mr. Burkett Foster's landscapes smile at one much in the same way that Mr. Carker used to flash his teeth. And Mr. John Collier gives his sitter a cheerful slap on the back, before he says, like a shampooer and a Turkish bath, next man,"]


 10%|▉         | 7/73 [00:01<00:17,  3.83it/s]

[' It is obviously unnecessary for us to point out how luminous these criticisms are, how delicate in expression.']
[' On the general principles of art, Mr. Quilter writes with equal lucidity.']


 12%|█▏        | 9/73 [00:02<00:14,  4.28it/s]

[' Painting, he tells us, is of a different quality to mathematics, and finish in art is adding more factor.']
[' As for etchings, they are of two kinds, British and foreign.']


 15%|█▌        | 11/73 [00:02<00:15,  4.03it/s]

[' He laments most bitterly the divorce that has been made between decorative art and what we usually call pictures, makes a customary appeal to the last judgment and reminds us that in the great days of art Michelangelo was the furnishing upholsterer.']
[' near the fire, and the ornaments Fred brought home from India on the mental board.']


 18%|█▊        | 13/73 [00:03<00:14,  4.22it/s]

[' In fact, he is quite severe on Mr. Ruskin for not recognizing that a picture should denote the frailty of man, and remarks with pleasing courtesy and felicitous grace that many phases of feeling']
[' Only, unfortunately, his own work never does get good.']


 21%|██        | 15/73 [00:03<00:12,  4.48it/s]

[' Mr. Quilter has missed his chance, for he has failed even to make himself the tupor of painting.']
[' by Harry Quilter M.A.']


 22%|██▏       | 16/73 [00:04<00:13,  4.08it/s]

[' Because you were sleeping instead of conquering, the lovely rose princess has become a fiddle without a bow, while poor Shaggy sits there a cooing dove.']


 23%|██▎       | 17/73 [00:04<00:15,  3.73it/s]

[' He has gone, and gone for good, answered Polychrome, who had managed to squeeze into the room beside the dragon, and had witnessed the occurrences with much interest.']


 25%|██▍       | 18/73 [00:04<00:15,  3.51it/s]

[' I have remained a prisoner only because I wished to be one." And with this he stepped forward and burst the stout chains as easily as if they had been threads.']


 26%|██▌       | 19/73 [00:04<00:14,  3.74it/s]

[' The little girl had been asleep, but she heard the raps and opened the door.']


 27%|██▋       | 20/73 [00:05<00:13,  3.96it/s]

[' The king has fled a disgrace and your friends are asking for you.']


 29%|██▉       | 21/73 [00:05<00:12,  4.08it/s]

[' I begged Ruggedo long ago to send him away, but he would not do so.']


 32%|███▏      | 23/73 [00:05<00:11,  4.31it/s]

[' I also offered to help your brother to escape, but he would not go.']
[' He eats and sleeps very steadily, replied the new king.']


 34%|███▍      | 25/73 [00:06<00:10,  4.67it/s]

[' I hope he doesn\'t work too hard," said Shaggy.']
[" He doesn't work at all."]


 36%|███▌      | 26/73 [00:06<00:11,  4.08it/s]

[" In fact, there's nothing he can do in these dominions as well as our gnomes, whose numbers are so great that it worries us to keep them all busy."]


 38%|███▊      | 28/73 [00:06<00:09,  4.62it/s]

[" Not exactly, we've turned calico."]
[' Where is my brother now?']


 41%|████      | 30/73 [00:07<00:08,  4.95it/s]

[' inquired Shaggy in the metal forest.']
[' Where is that?']


 44%|████▍     | 32/73 [00:07<00:07,  5.14it/s]

[' The metal forest is in the great domed cavern, the largest in all our dominions," replied Calico.']
[' Calago hesitated.']


 47%|████▋     | 34/73 [00:08<00:07,  5.16it/s]

[' However, if we look sharp, we may be able to discover one of these secret ways.']
[" Oh no, I'm quite sure he didn't."]


 49%|████▉     | 36/73 [00:08<00:07,  5.18it/s]

[' That\'s funny," remarked Betsy thoughtfully.']
[" I don't believe Anne knew any magic, or she'd have worked it before."]


 52%|█████▏    | 38/73 [00:08<00:06,  5.68it/s]

[' I do not know, confessed Shaggy.']
[' True, agreed Calico.']


 53%|█████▎    | 39/73 [00:09<00:06,  4.96it/s]

[' Calico went to the big gong and pounded on it just as Virgado used to do, but no one answered the summons.']


 56%|█████▌    | 41/73 [00:09<00:07,  4.23it/s]

[" Having returned to the royal cavern, Calico first pounded the gong and then sat in the throne, wearing Ruggedo's discarded ruby crown and holding in his hand the scepter which Ruggedo had so often thrown at his head."]
[' A man said to the universe, Sir, I exist.']


 58%|█████▊    | 42/73 [00:09<00:07,  4.16it/s]

[" Sweat covered Brion's body, trickling into the tight loincloth that was the only garment he wore."]


 59%|█████▉    | 43/73 [00:10<00:08,  3.66it/s]

[' The cut on his chest, still dripping blood, the ache of his overstrained eyes. Even the soaring arena around him with thousands of spectators were trivialities not worth thinking about.']


 60%|██████    | 44/73 [00:10<00:07,  3.89it/s]

[' His instant panic was followed by a small sharp blow high on his chest.']


 62%|██████▏   | 45/73 [00:10<00:07,  3.93it/s]

[' One minute, a voice said, and a time buzzer sounded.']


 64%|██████▍   | 47/73 [00:11<00:06,  4.19it/s]

[' A minute is not a very large measure of time, and his body needed every fraction of it.']
[' The buzzer swore triggered his muscles into complete relaxation.']


 66%|██████▌   | 48/73 [00:11<00:05,  4.47it/s]

[' Only his heart and lungs worked on at a strong measured rate.']


 67%|██████▋   | 49/73 [00:11<00:05,  4.47it/s]

[' He was in reverie, sliding along the borders of consciousness.']


 68%|██████▊   | 50/73 [00:11<00:05,  4.22it/s]

[' The contestants in the 20s needed undisturbed rest. Therefore, nights in the dormitories were as quiet as death.']


 71%|███████   | 52/73 [00:12<00:04,  4.28it/s]

[' Particularly so, on this last night, when only two of the little cubicles were occupied, the thousands of others standing with dark, empty doors.']
[' The other voice snapped with a harsh urgency clearly used to command.']


 74%|███████▍  | 54/73 [00:12<00:03,  4.78it/s]

[" I'm here because the matter is of utmost importance, and brand is the one I must see. Now stand aside!"]
[' The 20s.']


 77%|███████▋  | 56/73 [00:13<00:03,  4.68it/s]

[" He must have drawn his gun because the intruder said quickly, Put that away, you're being a fool. Out!"]
[' There was silence then, and, still wondering, Brian was once more asleep.']


 79%|███████▉  | 58/73 [00:13<00:02,  5.20it/s]

[' 10 seconds.']
[' He asked the handler who is needing his aching muscles.']


 81%|████████  | 59/73 [00:13<00:02,  5.04it/s]

[' A red-haired mountain of a man with an apparently inexhaustible store of energy.']


 82%|████████▏ | 60/73 [00:13<00:02,  5.01it/s]

[' There could be little art in this last and final round of fencing.']


 84%|████████▎ | 61/73 [00:14<00:02,  4.99it/s]

[' Just thrust and parry and victory to the stronger.']


 85%|████████▍ | 62/73 [00:14<00:02,  4.95it/s]

[' Every man who entered the 20s had his own training tricks.']


 86%|████████▋ | 63/73 [00:14<00:02,  4.63it/s]

[' There appeared to be an immediate association with the death trauma as if the two were inextricably linked into one.']


 89%|████████▉ | 65/73 [00:14<00:01,  4.79it/s]

[' The strength that enables someone in a trance to hold his body stiff and unsupported, except at two points, the head and heels.']
[' This is physically impossible when conscious.']


 90%|█████████ | 66/73 [00:15<00:01,  4.58it/s]

[' Others had died before during the 20s, and death during the last round was, in some ways, easier than defeat.']


 92%|█████████▏| 67/73 [00:15<00:01,  4.52it/s]

[' Breathing deeply, Briann softly spoke the autohypnotic phrases that triggered the process.']


 93%|█████████▎| 68/73 [00:15<00:01,  4.57it/s]

[' When the buzzer sounded, he pulled his foil from his second startled grasp and ran forward.']


 95%|█████████▍| 69/73 [00:15<00:00,  4.61it/s]

[' I rolled to look amazed at the sudden fury of the attack, then smiled.']


 96%|█████████▌| 70/73 [00:16<00:00,  4.48it/s]

[' He thought it was the last burst of energy. He knew how close they both were to exhaustion.']


 97%|█████████▋| 71/73 [00:16<00:00,  4.51it/s]

[" Brian saw something close to panic on his opponent's face when the man finally recognized his error."]


 99%|█████████▊| 72/73 [00:16<00:00,  4.50it/s]

[' A wave of despair rolled out from Irog. Brian sensed it and knew the fifth point was his.']


100%|██████████| 73/73 [00:16<00:00,  4.36it/s]

[" Then the powerful twist that's rested aside, in and under the guard."]
10.020044803619385





Only 10 seconds - that amounts to a 6x speed-up!

## Memory

In addition to being significantly faster, Distil-Whisper also has fewer parameters. Let's have a look at how many fewer exactly.

In [None]:
distil_model.num_parameters() / model.num_parameters() * 100

49.000047275167184

Distil-Whisper is 49% of the size of Whisper. Note that this ratio is much lower if we would just compare the size of the decoder:

In [None]:
distil_model.model.decoder.num_parameters() / model.model.decoder.num_parameters() * 100


13.175161920253482

As expected the decoder is much smaller. One might have guessed that it should even be less, around 2/32 (or 6%), but we can't forget that the decoder has a very large word embedding that requires a lot of parameters.

## Next steps

Hopefully this notebook shed some light on the motivation behind Distil-Whisper! For now, we've measured Distil-Whisper mainly on GPU, but are now actively looking into collaborating to release code how to effectively accelerate Distil-Whisper on CPU as well. Updates will be posted on the Distil-Whisper [repository](https://github.com/huggingface/distil-whisper).

Another key application of Distil-Whisper is *speculative decoding*. In speculative decoding, we can use Distil-Whisper as an *assitant model* to Whisper-large-v2 to reach a speed-up of 2x without **any** loss in performance. More on that in a follow-up notebook that will come out soon!