# Overview

The value of a machine learning model is only realized when it is deployed into production. This guide walks through the entire process from obtaining data to building, deploying, and assessing the buisiness value of machine learning models.

# Data and Model Research

Before we can expose a machine learning model, we must do our due diligence to analyze the data and build the model. This is where you do the following activities:
- [Data gathering](#Data-Gathering)
- [Data analysis](#Data-Analysis)
- [Feature engineering](#Feature-Engineering) (data pre-processing)
- [Feature selection](#Feature-Selection) (variable selection)
- [Model building](#Model-Building)
- [Model assessment](#Model-Assessment) (business uplift evaluation)

Typically, when we discuss deploying a model, we're referring to writing code and a deployment pipeline for these three stages:
- Feature engineering
- Feature selection
- Model building

# Data Gathering

Data gathering involves finding data sources and figuring out how we'll make them available to data scientists. 

- Where do we obtain the data?
- How frequently is it updated?
- Does the data provider offer support?
- Is the data source clean, complete, comprehensive, and unbiased?
- What is the cost of the data and does it fit into the budget?

# Data Analysis

- What is the data telling us?
- What are the variables?
- How are the variables related to one another?
- What variables can we use (e.g. regulations)?

# Feature Engineering

There are a variety of problems we can find with the data for different variables in our datasets. Feature engineering involves transforming the data before sending it to an ML algorithm. This involves performing tasks such as filling in missing values within a variable or encoding categorical values or dates.

[More details here](./Feature-Engineering.ipynb)

# Feature Selection

Feature selection is determining which variables are the most predictive, and building our model using those. 

We start with all of the variables in our dataset, then we eliminate features that don't give us the results we want.

[More details here](./Feature-Selection.ipynb)

# Model Building

Build many different ML algorithms, analyze their performance, and choose the ones that give us the best results. 

In this stage, we will use statistical metrics to measure the model performance. For example, we might use:
- Mean Square Error (regression)
- Accuracy or area under the ROC Curve (ROC-AUC / classification)

# Model Assessment

Ultimately, the entire point of creating a machine learning model is to provide some business value. So, we ultimately have to go back and measure the uplift in our business metrics.

Topics to add to this section:
- Confusion Matrix
- Grid Search 
- K-fold Cross Validation
- XGBoost

# Model Deployment

# Production Environment

# References

- [Deployment of Machine Learning Models - Udemy](https://www.udemy.com/course/deployment-of-machine-learning-models)