Skip to content

This repository provides a comprehensive ML infrastructure for CTR prediction, focusing on AWS services and offering practical learning experience for MLOps.

Notifications You must be signed in to change notification settings

SilkZwx/aws-mlops-handson

 
 

Repository files navigation

AWS MLOps Handson

This repository is designed to provide a comprehensive ML infrastructure for CTR (Click-Through Rate) prediction.
With a focus on AWS services, this repository offer practical learning experience for MLOps.

Key Features

Python Development Environment

We guide you through setting up a Python development environment that ensures code quality and maintainability.
This environment is carefully configured to enable efficient development practices and facilitate collaboration.

Train Pipeline

This repository includes the implementation of a training pipeline.
This pipeline covers the stages, including data preprocessing, model training, and evaluation.

Prediction Server

This repository provides an implementation of a prediction server that serves predictions based on your trained CTR prediction model.

AWS Deployment

To showcase industry-standard practices, this repository guide you in deploying the training pipeline and inference server on AWS.

AWS Infra Architecture

AWS Infra Architecture made by this repository.

ML Pipeline

ml_pipeline

Predict Server

predict_server

Requirements

Software Install (Mac)
pyenv brew install pyenv
Poetry curl -sSL https://install.python-poetry.org | python3 -
direnv brew install direnv
Terraform brew install terraform
Docker install via dmg
awscli curl "https://awscli.amazonaws.com/AWSCLIV2.pkg" -o "AWSCLIV2.pkg"

Setup

Install Python Dependencies

Use pyenv to install Python 3.9.0 environment

$ pyenv install 3.9.0
$ pyenv local 3.9.0

Use poetry to install library dependencies

$ poetry install

Configure environment variable

Use direnv to configure environment variable

$ cp .env.example .env
$ direnv allow .

Set your environment variable setting

AWS_REGION=
AWS_ACCOUNT_ID=
AWS_PROFILE=
AWS_BUCKET=
AWS_ALB_DNS=
USER_NAME=
VERSION=2023-05-11
MODEL=sgd_classifier_ctr_model

Create AWS Resources

move current directory to infra

$ cd infra

Use terraform to create aws resources.
Apply terraform

$ terraform init
$ terraform apply

Prepare train data

unzip train data

$ unzip train_data.zip

upload train data to S3

$ aws s3 mv train_data s3://$AWS_BUCKET

Code static analysis tool

Tool Usage
isort library import statement check
black format code style
flake8 code quality check
mypy static type checking
pysen manage static analysis tool

Usage

Build ML Pipeline

$ make build-ml

Run ML Pipeline

$ make run-ml

Build Predict API

$ make build-predictor

Run Predict API locally

$ make up

Shutdown Predict API locally

$ make down

Run formatter

$ make format

Run linter

$ make lint 

Run pytest

$ make test 

About

This repository provides a comprehensive ML infrastructure for CTR prediction, focusing on AWS services and offering practical learning experience for MLOps.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 43.1%
  • Jupyter Notebook 26.5%
  • HCL 25.3%
  • Makefile 4.5%
  • Dockerfile 0.6%