This is a simple flask API that accepts a ogg base64 data (although it may be compatible to other types of audio formats), converts to WAV (using librosa and soundfile) and then transcribe using vosk, returning the text transcribed.
The current code is with the Portuguese-BR model, however, it can be easily changed to other vosk model (https://alphacephei.com/vosk/models).
Install packages
pip install -r requirements
Go to flask API folder
cd ./flaskapp
Start flask server (http://localhost:5000)
flask run
Instead running a flask server, use gunicorn WSGI HTTP server
gunicorn -w 1 --bind 0.0.0.0:3800 wsgi
To create a docker image, build it with:
docker build -t audiotranscriberapi .
Then run it port-forwarding the required port
docker run -p 3800:3800 audiotranscriberapi
It's recommended to use an API tool like Postman.
On Headers:
Include the key Content-Type
with value application/json
as we will send the base64 audio data using a JSON format.
In Body:
Create a JSON where the data
key has the base64 audio data, for example:
{
"data": "BASE64DATA"
}
Finally on URL field, select the POST method and send the JSON to the following address: http://localhost:5000/transcribe
.
If successful, it will return a JSON with code 200 and the transcribed text in data
.