Machine Learning in Production

Run the dockers

Setup

In order to work as intended, the docker-compose stack requires some setup:

A docker network named www. Use the following command to create it:
```
docker network create www
```
A Traefik service working on the www network.

Traefik is a service that is capable of routing requests to web sub-domain to services built using docker. We are using it just for this purpose, although it can also perform other tasks.

To create this service, check the file extra/docker-compose.traefik.yaml.

A .env file need to be created first. This file is not included in the repository since it is server-dependant.

The content is the following:

DOMAIN=<domain of the machine (used only for traefik labels)>

CELERY_BROKER_URL=pyamqp://rabbitmq/
CELERY_BACKEND_URL=redis://redis/
CELERY_QUEUE=

DATABASE_SCHEMA=mlpdb
DATABASE_USER=mlp
DATABASE_PASS=mlp
DATABASE_HOST=database

DATABASE_URL=postgresql://${DATABASE_USER}:${DATABASE_PASS}@${DATABASE_HOST}/${DATABASE_SCHEMA}

GRAFANA_ADMIN_PASS=grafana

Remember that these password are written in a non-encripted way. This is not a safe solution.

Execute the docker

Then launch the docker through the docker compose, execute the following command from the root directory of this repository:

docker-compose up -d

Generate data

This proof-of-concept software use synthetic data generated by sampling some distributions. To generate these data, just rund the following command and it will populate the /dataset folder with TSV (Tab Separated Value) files.

python dataset_generator.py

Generate traffic

In order to simulate the use the application from of external users, the script traffic_generator.py can be used.

Basic command to execute with default parameters is

python traffic_generator.py

Some parameters can be used to control the behavior of the users:

--config <path> is a path to a configuration file. A configuration file is a .tsv (Tab Separated Value) file that contains all the parameters for the UserData and UserLabeller behavior. See the files config/user.tsv and config/user_noise.tsv for some examples.
-p number of parallel thread to run. Each thread will contact the application independently.
-d probability to have a response. If set to 1.0, it is certain that there will always be a response. If set to 0.0, the user will never set a response.
To control the waiting time use the -tmin and -tmax parameters. The number is expressed in seconds. For less than a second use decimals (i.e. 100ms is written as 0.1).

-tmin is the minimum amount of time to wait after a request to the application.

-tmax maximum amount of time to wait after a request to the application. The wait is randomly choosed between the -tmin and -tmax values. Higher values mean a slow generation of new cdata. Bigger is the difference between these two parameters and higher is the variance in the waiting time.

Development

To develop this application, a Python virutal environmnet is highly recommended. If a development machine with Docker is not available, it is possible to use the three requirements.txt file to create a fully working environment:

requirements.api.txt contains all the packages for the API service,
requirements.worker.txt contains all the packages for the Celery worker service,
requirements.txt contains extra packages and utilities required by scripts or for the development.

To create a virtual environment using the python-venv package, use the following command:

python -m venv MLPenv

Then remember to activate the environment before launching the scripts:

source ./MLPenv/bin/activate

References

FastAPI and database interaction

Metrics with Prometheus

Grafana

Disclaimer

This software was build as proof-of-concept and as a support material for the course Machine Learning in Production.

It is not intended to be used in a real production system, although some state-of-the-art best practice has been followed to implement it.

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
api		api
config		config
database		database
datas		datas
dataset		dataset
docs		docs
extra		extra
grafana		grafana
models		models
worker		worker
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.api		Dockerfile.api
Dockerfile.worker		Dockerfile.worker
Makefile		Makefile
README.md		README.md
dataset_generator.py		dataset_generator.py
docker-compose.yaml		docker-compose.yaml
ml_code.ipynb		ml_code.ipynb
requirements.api.txt		requirements.api.txt
requirements.docs.txt		requirements.docs.txt
requirements.txt		requirements.txt
requirements.worker.txt		requirements.worker.txt
traffic_generator.py		traffic_generator.py

IDSIA/MLprod

Folders and files

Latest commit

History

Repository files navigation

Machine Learning in Production

Run the dockers

Setup

Execute the docker

Generate data

Generate traffic

Development

References

FastAPI and database interaction

Metrics with Prometheus

Grafana

Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Languages