Melvin ASR is an application serving REST and WebSocket endpoints for the transcription of audio files.
REST API: The API is based on HTTP requests that handles the transcription of files in an async workflow, enabling user to send an audio file in a first request and receive the transcription via a second request as soon as the transcript is ready. See REST Documentation
WebSocket API: The API does provide streaming capabilities. See WebSocket Documentation
Before you begin, ensure you have installed the following tools:
- Python 3.10
- Docker & Docker Compose
- Visual Studio Code
- ffmpeg
-
Clone this repository to your local machine:
git clone https://github.com/shuffle-project/melvin-asr.git
-
Build and run the app using Docker Compose from the root directory:
docker-compose up
-
Access the REST-API at http://localhost:8393
-
Access the Websocket-API at http://localhost:8394. This is build upon python's websockets package.
Besides the local Docker Compose stack, there is an option to run both services directory on your local machine.
pip install -r ./requirements.txt
Locally for a development environment the websocket and the flask api are started seperatly.
python app.py
To optimize ASR there have been multiple Proof-of-concepts to find out which solutions are working most efficiently. Take a look at the following pages:
- Parallel transcription utilizing one and multiple Whisper models
- Optimizing the streaming architecture
The configuration of the service is done in the config.yml
and config.local.yml
file. The config.local.yml
is used for local development, config.yml
for Docker.
These files are read by the src/helper/config.py
module, which is providing configurations to the service logic.
The project uses Ruff for linting and formating code, Pytest for Unit tests. See Test Documentation
The project is delivered and deployed as a docker container. Depending on the usage of GPU or CPU, there are different factors that come in play. See Deployment Documentation
We are maintaining our code following trunk based development. This means we are working on features branches, integrating into one trunk, the main branch. Please keep your side branches small, and bring them back to main as soon as possible.
This project is licensed under the MIT License - see the LICENSE file for details.