GitHub - phandaiduonghcb/mlops: Project for configuring an MLOPs pipeline

MLOPs pipeline for a classification problem

This repository contains source code, configuration files for setuping a MLOPs pipeline that can be automatically triggered to train and deploy the best model using AWS services.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

(back to top)

Built With

(back to top)

Getting Started

Through this guide, I will try to explain how can I use AWS services and some tools to create an MLOPs pipeline that will be triggered to train and deploy everytime the latest commit whose message contains "Airflow" term is pushed.

Classification problem

A resnet18 neural network will be used for image classification. The dataset will be put into ./ml/data/train, ./ml/data/valid, ./ml/data/test. The configuration file for training, testing will be put into ./ml/configs. I take the flower dataset as an example for this project.

Code and Data versioning

DVC is used for data versioning and GIT is used for code versioning. Data cache will be stored in a S3 remote storage.

Install DVC:
```
  pip install dvc
  pip install dvc-s3
```

Use DVC to track your data. Remember to setup AWS credentials using ./server_setup/aws_config and copy all files in that folder to ~/.aws/

  dvc add ml/data/*
  git add data/*
  git commit -m "Add raw data" # Git tracks the metadata of the dataset
  dvc remote add dvc-flower-bucket s3://dvc-flower-bucket # created s3 bucket for storing data cache
  dvc remote modify dvc-flower-bucket profile duongpd7
  dvc push -r dvc-flower-bucket

Setup an EC2 server

Airflow, MLflow will be installed on an EC2 server. Here are ports that are used in EC2:

8080: Airflow
1234: Mlflow tracking server
4321: deployed endpoint url
6000: used to trigger deployment process when there is an model put to s3.

Create working directory and prepare folders for volume mounting. After creating folders, copy files in server_setup/:

  mkdir workspace # Store logs, models, artifacts created by Airflow and Mlflow
  mkdir deployment # Build and run deployment docker image
  cd workspace
  mkdir -p ./dags ./logs ./plugins ./config ./mlruns ./training_runs # Have to create manually to avoid permission issue.

Now run airflow server and mlflow server using docker compose:

  echo -e "AIRFLOW_UID=$(id -u)" > .env
  docker compose up

Setup port 6000 to listening for request sent by from lambda triggered by S3.

  xinetd-deployment-trigger
  apt install xinetd
  # After install xinetd, copy server_setup/scripts/xinetd-deployment-trigger to /etc/xinetd.d/ and modify the its "server" path to server_setup/scripts/trigger-deployment.sh placed on EC2.
  systemctl start xinetd
  systemctl enable xinetd

Setup S3 - Lambda for triggering

S3 is configured to trigger lambda function whenever a model.zip file is put into it. The lambda function will send a request to the EC2 server at port 6000 to tell it to rebuild and deploy the model uploaded. The lambda function should be created from docker image. Source is stored at deployment/lambda. The app.py file should be modified for the correct url of the EC2.

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Phan Dai Duong - phandaiduonghcb@gmail.com

Project Link: https://github.com/phandaiduonghcb/mlops

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.dvc		.dvc
.github/workflows		.github/workflows
deployment		deployment
hooks		hooks
images		images
ml		ml
server_setup		server_setup
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLOPs pipeline for a classification problem

About The Project

Built With

Getting Started

Classification problem

Code and Data versioning

Setup an EC2 server

Setup S3 - Lambda for triggering

License

Contact

About

Releases

Packages

Languages

License

phandaiduonghcb/mlops

Folders and files

Latest commit

History

Repository files navigation

MLOPs pipeline for a classification problem

About The Project

Built With

Getting Started

Classification problem

Code and Data versioning

Setup an EC2 server

Setup S3 - Lambda for triggering

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages