In [1]:
# - Applied machine learning is an empirical skill. 

# - You cannot get better at it by reading books and articles. 

# - You have to practice.

In [2]:
# 18.1 Practise Machine Learning With Projects

In [3]:
# - Working through machine learning problems from end-to-end is critically important.

# - Working through a project forces you to think about how the model will be used, to challenge your 
# assumptions and to get good at all parts of a project, not just your favorite parts. 

# - The best way to practice predictive modeling machine learning projects is to use standardized datasets 
# from the UCI Machine Learning Repository. 

In [4]:
# 18.1.1 Use a Structured Step-By-Step Process

In [5]:
# - 1. Define Problem. 
# - 2. Summarize Data. 
# - 3. Prepare Data.
# - 4. Evaluate Algorithms. 
# - 5. Improve Results.
# - 6. Present Results.

In [6]:
# 18.2 Machine Learning Project Template in Python

In [7]:
# 18.2.1 Template Summary

In [8]:
# Python Project Template

# 1. Prepare Problem
    # a) Load libraries
    # b) Load dataset

# 2. Summarize Data
    # a) Descriptive statistics
    # b) Data visualizations

# 3. Prepare Data
    # a) Data Cleaning
    # b) Feature Selection
    # c) Data Transforms

# 4. Evaluate Algorithms
    # a) Split-out validation dataset
    # b) Test options and evaluation metric
    # c) Spot Check Algorithms
    # d) Compare Algorithms

# 5. Improve Accuracy
    # a) Algorithm Tuning
    # b) Ensembles

# 6. Finalize Model
    # a) Predictions on validation dataset
    # b) Create standalone model on entire training dataset
    # c) Save model for later use

In [9]:
# 18.2.2 How To Use The Project Template

In [10]:
# - Copy the template and paste it in your new project file or notebook and start filling in the relevant recipes.

In [11]:
# 18.3 Machine Learning Project Template Steps

In [12]:
# 18.3.1 Prepare Problem

In [13]:
# - load libraries

# - load data

# - define global configurations

# - you might need to make a reduced sample of your dataset if it is too large to work with. 
# Ideally, your dataset should be small enough to build a model or create a visualization within a minute, 
# ideally 30 seconds. You can always scale up well performing models later.

In [14]:
# 18.3.2 Summarize Data

In [15]:
# - understanding your data using:
    # - Descriptive statistics such as summaries.
    # - Data visualizations such as plots with Matplotlib/seaborn, ideally using convenience functions from Pandas.
    
# - Take your time and use the results to prompt a lot of questions, assumptions and hypotheses that you can 
# investigate later with specialized models.

In [16]:
# 18.3.3 Prepare Data

In [17]:
# - This includes tasks such as:
    # - Cleaning data by removing duplicates, marking missing values and even imputing missing values.
    # - Feature selection where redundant features may be removed and new features developed.
    # - Data transforms where attributes are scaled or redistributed in order to best expose the structure of the problem later to 
    # learning algorithms.
    
# - Start simple. 

# - Revisit this step often and cycle with the next step until you converge on a subset of algorithms and a 
# presentation of the data that results in accurate or accurate-enough models to proceed.

In [18]:
# 18.3.4 Evaluate Algorithms

In [19]:
# - This involves steps such as:
    # - Separating out a validation dataset to use for later confirmation of the skill of your developed model.
    # - Defining test options using scikit-learn such as cross-validation and the evaluation metric to use.
    # - Spot-checking a suite of linear and nonlinear machine learning algorithms.   
    # - Comparing the estimated accuracy of algorithms.

# - On a given problem you will likely spend most of your time on this and the previous step until you converge 
# on a set of 3-to-5 well performing machine learning algorithms.

In [20]:
# 18.3.5 Improve Accuracy

In [21]:
# - There are two different ways to improve the accuracy of your models:
    # - Search for a combination of parameters for each algorithm using scikit-learn that yields the best results.
    # - Combine the prediction of multiple models into an ensemble prediction using ensemble techniques.

# - The line between this and the previous step can blur when a project becomes concrete. 

# - There may be a little algorithm tuning in the previous step. 

# - And in the case of ensembles, you may bring more than a shortlist of algorithms forward to combine their 
# predictions.

In [22]:
# 18.3.6 Finalize Model

In [23]:
# - Finalizing a model may involve sub-tasks such as:
    # - Using an optimal model tuned by scikit-learn to make predictions on unseen data.   
    # - Creating a standalone model using the parameters tuned by scikit-learn.
    # - Saving an optimal model to file for later use.
    
# - Once you make it this far you are ready to present results to stakeholders and/or deploy your 
# model to start making predictions on unseen data.

In [24]:
# 18.4 Tips For Using The Template Well

In [25]:
# - Fast First Pass:
    # - Make a first-pass through the project steps as fast as possible. 
    # - This will give you confidence that you have all the parts that you need and a baseline from which to improve.

# - Cycles: 
    # - The process in not linear but cyclic. 
    # - You will loop between steps, and probably spend most of your time in tight loops between steps 3-4 or 
    # 3-4-5 until you achieve a level of accuracy that is sufficient or you run out of time.

# - Attempt Every Step: 
    # - It is easy to skip steps, especially if you are not confident or familiar with the tasks of that step. 
    # - Try and do something at each step in the process, even if it does not improve accuracy. 
    # - You can always build upon it later. Don’t skip steps, just reduce their contribution.

    # - Ratchet Accuracy:  
        # - The goal of the project is model accuracy. 
        # - Every step contributes towards this goal. 
        # - Treat changes that you make as experiments that increase accuracy as the golden path in the process 
        # and reorganize other steps around them. 
        # - Accuracy is a ratchet that can only move in one direction (better, not worse).
        
# - Adapt As Needed:
    # - Modify the steps as you need on a project, especially as you become more experienced with the template. 
    # - Blur the edges of tasks, such as steps 4-5 to best serve model accuracy.