Whisper-App

This repository contains all the work I have done (and I'm doing) in developing a web app for Speech-to-Text, based on OpenAI Whisper

Updates

08/12/2022: added Notebook to explain inner working of match_layers.py
25/11/2022: clean separation between frontend and backend
24/11/2022: no need anymore to change the Whisper codebase to load the custom model

Features

You can load and use a custom trained model, using HF Transformers
You can enable comparison of the transcription with expected text, providing a csv file (f_name, sentence)
You can run on a GPU, and it is way faster
supported models: medium, large (vanilla) and medium for custom

Utility

match_layers

One common use case could be that we're fine-tuning a Whisper model, for example to have higher accuracy on a special domain's language.

The fine tuning can be done using HF Transformers, using the approach described here.

In this case, the utility can be used to match and show how to load the custom tuned model in Whisper codebase.

You can find some more information on this utility in the Wiki.

I have also added a Notebook that does the matching and enables you to explore, step-by-step, how the matching is done (for example having a look at the names of the layers matched).

Libraries used

Torch
Hugging Face Transformers
OpenAI Whisper
Streamlit
st-annotated-text
soundfile
tqdm
pickle
pandas
PIL

Environment

based on Python 3.10.6
can be rebuilt using the provided requirements.txt

Running on GPU

I have tested and the code works fine on a VM equipped with:

NVIDIA GPU P100
Ubuntu 22.04-2022.11.06
Python 3.10

To enable the code to run on GPU you need only to set:

DEVICE = cuda

in config file.

It is, obviously, much faster running on GPU, especially with long files (> 60 sec.)

In this table I report the results of two tests done, enabling and disabling the GPU:

Test n.	Audio dur. in sec.	time on CPU (s.)	time on GPU (s.)
1	129	55	11
2	255	110	19.8

about 5 times faster!

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
AnalyzeDataset.ipynb		AnalyzeDataset.ipynb
LICENSE		LICENSE
README.md		README.md
config.py		config.py
match_layers.ipynb		match_layers.ipynb
match_layers.py		match_layers.py
requirements.txt		requirements.txt
transcriber.py		transcriber.py
utils.py		utils.py
whisper_app.py		whisper_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper-App

Updates

Features

Utility

Libraries used

Environment

Running on GPU

About

Releases

Packages

Languages

License

luigisaetta/whisper-app

Folders and files

Latest commit

History

Repository files navigation

Whisper-App

Updates

Features

Utility

Libraries used

Environment

Running on GPU

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages