## Predictive Modeling Project Template

> “Applied machine learning is an empirical skill. You cannot get better at it by reading books and articles. You have to practice. We present six-step machine learning project template that you can use to jump-start your project in Python.

> The best way to get started using Python for machine learning is to complete a project.

- It will force you to install and start the Python interpreter (at the very least).

- It will given you a bird’s eye view of how to step through a small project.

- It will give you confidence, maybe to go on to your own small projects.


## Use A Structured Step-By-Step Process

Any predictive modeling machine learning project can be broken down into six common tasks: 
1. Define Problem. 
2. Summarize Data. 
3. Prepare Data.
4. Evaluate Algorithms. 
5. Improve Results. 
6. Present Results

## Machine Learning Project Template in Python

<center>


![](./images/ml_project_template.png)

<center>


### How To Use The Project Template

1. Create a new file for your project (e.g. project name.py). <br><br>
2. Copy the project template.  <br><br>
3. Paste it into your empty project file.  <br><br>
4. Start to fill it in, using recipes from this book and others.1

### Machine Learning Project Template Steps

#### Prepare Problem

This step is about loading everything you need to start working on your problem. This includes:  

  - Python modules, classes and functions that you intend to use.  
  
  - Loading your dataset from CSV.  
  
  - It is also the place where you might need to make a reduced sample of your dataset if it is too large to work with
    
  - Ideally, your dataset should be small enough to build a model or create a visualization within a minute, ideally 30 seconds. 
  - You can always scale up well performing models later.

### Summarize Data

This step is about better understanding the data that you have available. This includes understanding your data using:  

- Descriptive statistics such as summaries.  
  
- Data visualizations such as plots with Matplotlib, ideally using convenience functions from Pandas. 

Take your time and use the results to prompt a lot of questions, assumptions and hypotheses that you can investigate later with specialized models.

### Prepare Data

This step is about preparing the data in such a way that it best exposes the structure of the problem and the relationships between your input attributes with the output variable. This includes tasks such as:

- Cleaning data by removing duplicates, marking missing values and even imputing missing values.<br><br>
- Feature selection where redundant features may be removed and new features developed. <br><br>
- Data transforms where attributes are scaled or redistributed in order to best expose the structure of the problem later to learning algorithms.

Start simple. Revisit this step often and cycle with the next step until you converge on a subset of algorithms and a presentation of the data that results in accurate or accurate-enough models to proceed.

## Evaluate Algorithms

This step is about finding a subset of machine learning algorithms that are good at exploiting the structure of your data (e.g. have better than average skill). This involves steps such as:  

- Separating out a validation dataset to use for later confirmation of the skill of your developed model.  <br><br>
- Defining test options using scikit-learn such as cross validation and the evaluation metric to use.  <br><br> 
- Spot-checking a suite of linear and nonlinear machine learning algorithms.  <br><br>
- Comparing the estimated accuracy of algorithms. On a given problem you will likely spend most of your time on this and the previous step until you converge on a set of 3-to-5 well performing machine learning algorithms.

## Improve Accuracy

Once you have a shortlist of machine learning algorithms, you need to get the most out of them. There are two different ways to improve the accuracy of your models:  S

- Search for a combination of parameters for each algorithm using scikit-learn that yields the best results. 
   
- Combine the prediction of multiple models into an ensemble prediction using ensemble techniques. 
  
The line between this and the previous step can blur when a project becomes concrete. There may be a little algorithm tuning in the previous step. And in the case of ensembles, you may bring more than a shortlist of algorithms forward to combine their prediction

## Finalize Model

Once you have found a model that you believe can make accurate predictions on unseen data, you are ready to finalize it. Finalizing a model may involve sub-tasks such as:  

- Using an optimal model tuned by scikit-learn to make predictions on unseen data.  
- Creating a standalone model using the parameters tuned by scikit-learn.
- Saving an optimal model to file for later use. 
  
Once you make it this far you are ready to present results to stakeholders and/or deploy your model to start making predictions on unseen data.

## Tips For Using The Template Well

- **Fast First Pass**. Make a first-pass through the project steps as fast as possible. This will give you confidence that you have all the parts that you need and a baseline from which to improve. <br><br>
- **Cycles.** The process in not linear but cyclic. You will loop between steps, and probably spend most of your time in tight loops between steps 3-4 or 3-4-5 until you achieve a level of accuracy that is sufficient or you run out of time. <br><br>
- **Attempt Every Step**. It is easy to skip steps, especially if you are not confident or familiar with the tasks of that step. Try and do something at each step in the process, even if it does not improve accuracy. You can always build upon it later. Don’t skip steps, just reduce their contribution.  <br><br>
- **Ratchet Accuracy.** The goal of the project is model accuracy. Every step contributes towards this goal. Treat changes that you make as experiments that increase accuracy as the golden path in the process and reorganize other steps around them. Accuracy is a ratchet that can only move in one direction (better, not worse).  <br><br>
- **Adapt As Needed.** Modify the steps as you need on a project, especially as you become more experienced with the template. Blur the edges of tasks, such as steps 4-5 to best serve model accuracy.