A simple and reliable REST API service for audio recognition
🎙️ Recognizes English with wav2vec2-large-xlsr-53-english
🧩 Accepts files
and base64
📄 Has support for Swagger
and Redoc
💾 Logs hashes
instead of sensitive information
🔥 Uses caching
, queues
and data validation
🔐 Uses JWT
for authentication
🛡️ DOS
protected
🏎️ GPU
accelerated
Steps:
- Make sure you have docker installed
- Download xlsr-53 and put in
./nn_model
folder - Run
docker compose up --build
in the terminal
Steps:
- Make sure that the nvidia graphics card is physically installed on your device
- Run
docker compose -f .\docker-compose-gpu.yml up --build
api/token/
- JWTapi/v1/schema/swagger/
orapi/v1/schema/redoc/
- Documentationapi/v1/
- DRF browsable API
Steps:
- Make sure you have python 3.12 installed
- Make sure you have docker installed
- Download xlsr-53 and put in
./nn_model
folder - Run
python -m pip install -r .\src\requirements\dev.txt
in the terminal to install dependencies - Write some new code
- Run
python manage.py migrate
in the terminal fromsrc
folder to apply migrations - Run
python manage.py createsuperuser
in the terminal fromsrc
folder to create user - Run
python manage.py runserver
in the terminal fromsrc
folder to start django dev server - Run
python manage.py celery dev
in the terminal fromsrc
folder to start celery dev server - Run
python -m pytest .
in the terminal from.
folder to run tests
Change the values in the ./prod.env
file
PostgreSQL
POSTGRES_HOST
- HostPOSTGRES_NAME
- Table prefixesPOSTGRES_PASSWORD
- PasswordPOSTGRES_USER
- UserPOSTGRES_DB
- Database namePOSTGRES_PORT
- Port
Redis
REDIS_ADDRESS
- AddressREDIS_TIMEOUT
- Cache lifetime in seconds
RabbitMQ
RABBITMQ_ADDRESS
- Address including vhostRABBITMQ_DEFAULT_USER
- UserRABBITMQ_DEFAULT_PASS
- PasswordRABBITMQ_DEFAULT_VHOST
- VHost
Neural Network Model
NN_CONVERTER_FORMAT
- Format to which the audio will be convertedNN_CONVERTER_BITRATE
- Bitrate of converted audioNN_CONVERTER_MONO
- Convert audio to monoNN_MODEL_PATH
- Path to the neural network model inside dockerNN_MAX_LENGTH
- Maximum length of audio to be processedNN_SAMPLE_RATE
- Sample rate of audio coming into the neural network
Django
DJANGO_SUPERUSER_USERNAME
- UsernameDJANGO_SUPERUSER_PASSWORD
- PasswordDJANGO_SUPERUSER_EMAIL
- EmailDJANGO_SECRET_KEY
- Secret KeyDJANGO_ALLOWED_HOSTS
- Allowed hosts
from pathlib import Path
from requests import get, post, Response
from time import sleep
def authenticate(url: str, login: str, password: str) -> dict[str, str]:
response: Response = post(
url, data={'username': login, 'password': password}
)
data: dict = response.json()
jwt: str = data.get('access', '')
headers: dict[str, str] = {'Authorization': f'Bearer {jwt}'}
return headers
def recognize(
url: str, file: Path, wait: float, headers: dict[str, str],
) -> str:
response: Response = post(
url, files={'file': open(file, 'rb')}, headers=headers
)
data: dict = response.json()
link_to_recognized_text: str = data.get('link', '')
while True:
redirect: Response = get(link_to_recognized_text, headers=headers)
redirect_data: dict = redirect.json()
ready: bool = redirect_data.get('ready', False)
if ready:
text: str = redirect_data.get('text', '')
return text
else:
sleep(wait)
auth_url: str = 'http://localhost:8000/api/token/'
recognition_url: str = 'http://localhost:8000/api/v1/file/'
login: str = 'admin'
password: str = 'admin'
headers = authenticate(auth_url, login, password)
file: Path = Path('tests', 'data', 'audio', 'audio.wav')
recognized_text: str = recognize(recognition_url, file, 0.2, headers)
print(recognized_text)