Skip to content
Repo of repeatable ML model pipeline development and deployment
Jupyter Notebook Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
data added dataset images to projct Jan 12, 2020
images
notebooks updated neural network workbook Jan 13, 2020
packages Revert "added neural network" Jan 13, 2020
scripts
.dockerignore
.gitignore
Dockerfile
Makefile
Procfile config heroku for deployment Jan 11, 2020
README.md
command_line_notes.txt
requirements.txt

README.md

Deploying Machine Learning Models

Repo of repeatable ML model pipeline from development through to deployment.

made-with-python made-with-sklearn made-with-jupyter made-with-vscode made-with-docker

deployed-on-heroku deployed-on-aws

GitHub license Maintenance

CircleCI Heroku App Status

Projects

  1. Predict the sale price of houses using Lasso regression
  2. Differentiate weeds from crop seedlings using a convolutional neural network (CNN)

Machine Learning Pipeline: Overview

Basic Architecture Design

High level overview of the architecture and tools used for each phase / environment.

Basic Architecture Design

Process

  1. Data Gathering
  2. Data Analysis
  3. Feature Engineering - deal with missing data, convert categorical variables (i.e. cardinality, rare labels, strings, etc.), assess data distributions (i.e. deal with skewed data), and deal with outliers (outliers may throw off algorithms affected by their presence which may cause overfitting), feature scaling (ML may be sensitive to feature scales - see NOTES below)
  4. Feature Selection - algorithms/procedures to identify the most predictive features and limit models to only those features that add value, provide greatest interpretability, and easier to implement into Production (reduced risk of data errors, reduced data redundancy (constant variables, quasi-constant variables, duplication, correlation), less code, smaller JSON messages, etc.)
  5. ML Model Building - [A] regression, classification, or clustering; [B] linear, tree, neural network, etc; [C] supervised, unsupervised, or reinforcement learning (all choices are dependent on the business problem to be solved). Can often include Meta-Ensembing (model of models)
  6. ML Model Assessment - evaluate uplift in business value in addition to ROC-AUC, Accuracy, MSE, RMSE, MAE, etc. metrics
  7. Model Deployment - includes Feature Engineering, Feature Selection, and ML Model components (not just the ML Model)

NOTES

ML models sensitive to feature scales:

  • Linear & Logistic Regression
  • Neural Networks
  • Support Vector Machines
  • K-Nearest Neighbours
  • K-Means Clustering
  • Linear Discriminant Analysis (LDA)
  • Principal Component Analysis (PCA)

Tree-based ML models insensitive to feature scales:

  • Classification & Regression Trees
  • Random Forests
  • Gradient Boosted Trees

Feature Engineering: Overview

  • TODO - Add content

Feature Selection: Overview

  • TODO - Add content

Project Details

1. Predict the sale price of houses using Lasso regression

Goal of this project is to predict the sale prices of houses based on key features related to the property.

1.1. Prerequisites

1.2. Process

  • Set up GitHub repo for source code control (this repo).
  • Initial data gathering/analysis, feature engineering/selection, model building/evaluation done locally in Jupyter notebooks.
  • Move to IDE (VS Code) to convert the logic in the notebooks to a deployable/scaleable model using .py files.
  • Wrap model in a Flask application with testing, versioning, and packaging for deployment and re-use.
  • Add CI/CD through the inclusion of GemFury (ML model packaging), and Circleci (automated testing and deployment).
  • Deploy to Production on Heroku (PaaS) without containers automatically once CI/CD pipeline runs successfully.
  • Dockerize the model and app for deployment to Production.
    • TODO: Deploy model as Docker container to Heroku - requires a plan upgrade on Heroku to allow for Docker Layer caching.
    • TODO: Deploy model as Docker container to AWS (IaaS).

2. Differentiate weeds from crop seedlings using a convolutional neural network (CNN)

Goal of this project is to correctly identify the weed type from a variety of weed and crop RGB images using a relatively big dataset (~2GB).

2.1. Prerequisites

2.2. Process

  • Initial data gathering/analysis, feature engineering/selection, model building/evaluation done locally in Jupyter notebooks.
  • TODO: Move to IDE (VS Code) to convert the logic in the notebooks to a deployable/scaleable model using .py files.
  • TODO: Leverage the same Flask application created for Project #1 for testing, versioning, and packaging for deployment and re-use.
  • TODO: Leverage the same CI/CD pipeline created for Project # 1 for version control, automated ML model packaging, and automated testing and deployment.
  • TODO: Deploy to Production.
You can’t perform that action at this time.