*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*

# Introduction to machine learning

![machine learning paradigm](figures/deep-learning-with-javascript.jpg)

*image from Deep Learning with Javascript*

# Act 1

![](figures/loan_tree.gif)

In [2]:
def decide(income, criminal_record, years_job, credit_payments):
    if income < 30000:
        if criminal_record:
            return 1
        else:
            return 0
    elif income <= 70000:
        if years_job < 1:
            return 0
        elif years_job <= 5:
            if credit_payments:
                return 1
            else:
                return 0
        else:
            return 1
    else:
        if criminal_record:
            return 0
        else:
            return 1

In [3]:
decide(income=20000, criminal_record=1, years_job=3, credit_payments=1)

1

In [4]:
import random
import pandas as pd

In [5]:
random.seed(333)
data = []
for i in range(100):
    income = random.randint(0, 100000)
    criminal_record = random.randint(0, 1)
    years_job = random.randint(0, 10)
    credit_payments = random.randint(0, 1)
    decision = decide(income, criminal_record, years_job, credit_payments)
    data.append({'income':income, 'criminal_record':criminal_record, 'years_job':years_job,
                    'credit_payments':credit_payments, 'decision':decision})

In [6]:
df = pd.DataFrame(data)
df.head(20)

Unnamed: 0,income,criminal_record,years_job,credit_payments,decision
0,72723,1,5,0,0
1,53005,0,4,1,1
2,55011,1,3,1,1
3,73126,0,6,0,1
4,44408,0,4,1,1
5,63087,1,10,1,1
6,30393,1,8,1,1
7,4727,1,2,1,1
8,9274,0,2,0,0
9,6801,1,2,1,1


# Act 2

In [369]:
import pandas as pd
from sklearn import tree
from sklearn.tree import export_text

In [371]:
dtree = tree.DecisionTreeClassifier().fit(
    df[['income','criminal_record','years_job','credit_payments']], df['decision'])

In [372]:
print(export_text(dtree, feature_names=['income','criminal_record','years_job','credit_payments']))

|--- income <= 16421.50
|   |--- criminal_record <= 0.50
|   |   |--- class: 0
|   |--- criminal_record >  0.50
|   |   |--- class: 1
|--- income >  16421.50
|   |--- income <= 69501.50
|   |   |--- income <= 51679.50
|   |   |   |--- criminal_record <= 0.50
|   |   |   |   |--- credit_payments <= 0.50
|   |   |   |   |   |--- class: 0
|   |   |   |   |--- credit_payments >  0.50
|   |   |   |   |   |--- income <= 29602.50
|   |   |   |   |   |   |--- class: 0
|   |   |   |   |   |--- income >  29602.50
|   |   |   |   |   |   |--- years_job <= 0.50
|   |   |   |   |   |   |   |--- class: 0
|   |   |   |   |   |   |--- years_job >  0.50
|   |   |   |   |   |   |   |--- class: 1
|   |   |   |--- criminal_record >  0.50
|   |   |   |   |--- income <= 48033.00
|   |   |   |   |   |--- years_job <= 3.50
|   |   |   |   |   |   |--- income <= 26917.00
|   |   |   |   |   |   |   |--- class: 1
|   |   |   |   |   |   |--- income >  26917.00
|   |   |   |   |   |   |   |--- income <= 40043.00

# What Is Machine Learning?

*building models of data*

Fundamentally, machine learning involves building mathematical models to help understand data.
"Learning" enters the fray when we give these models *tunable parameters* that can be adapted to observed data; in this way the program can be considered to be "learning" from the data.
Once these models have been fit to previously seen data, they can be used to predict and understand aspects of newly observed data.

## Categories of Machine Learning

- *Supervised learning*: Models that can predict labels based on labeled training data

  - *Classification*: Models that predict labels as two or more discrete categories
  - *Regression*: Models that predict continuous labels
  
- *Unsupervised learning*: Models that identify structure in unlabeled data

  - *Clustering*: Models that detect and identify distinct groups in the data
  - *Dimensionality reduction*: Models that detect and identify lower-dimensional structure in higher-dimensional data

### Classification: Predicting discrete labels

We will first take a look at a simple *classification* task, in which you are given a set of labeled points and want to use these to classify some unlabeled points.

Imagine that we have the data shown in this figure:

![](figures/05.01-classification-1.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-1)

![](figures/05.01-classification-2.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-2)

Now that this model has been trained, it can be generalized to new, unlabeled data.
In other words, we can take a new set of data, draw this model line through it, and assign labels to the new points based on this model.
This stage is usually called *prediction*. See the following figure:

![](figures/05.01-classification-3.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-3)

### Regression: Predicting continuous labels

In contrast with the discrete labels of a classification algorithm, we will next look at a simple *regression* task in which the labels are continuous quantities.

Consider the data shown in the following figure, which consists of a set of points each with a continuous label:

![](figures/05.01-regression-1.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-1)

As with the classification example, we have two-dimensional data: that is, there are two features describing each data point.
The color of each point represents the continuous label for that point.

There are a number of possible regression models we might use for this type of data, but here we will use a simple linear regression to predict the points.
This simple linear regression model assumes that if we treat the label as a third spatial dimension, we can fit a plane to the data.
This is a higher-level generalization of the well-known problem of fitting a line to data with two coordinates.

We can visualize this setup as shown in the following figure:

![](figures/05.01-regression-2.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-2)

This plane of fit gives us what we need to predict labels for new points.
Visually, we find the results shown in the following figure:

![](figures/05.01-regression-4.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-4)

### Clustering: Inferring labels on unlabeled data

The classification and regression illustrations we just looked at are examples of supervised learning algorithms, in which we are trying to build a model that will predict labels for new data.
Unsupervised learning involves models that describe data without reference to any known labels.

One common case of unsupervised learning is "clustering," in which data is automatically assigned to some number of discrete groups.
For example, we might have some two-dimensional data like that shown in the following figure:

![](figures/05.01-clustering-1.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Clustering-Example-Figure-2)

By eye, it is clear that each of these points is part of a distinct group.
Given this input, a clustering model will use the intrinsic structure of the data to determine which points are related.
Using the very fast and intuitive *k*-means algorithm (see [In Depth: K-Means Clustering](05.11-K-Means.ipynb)), we find the clusters shown in the following figure:

![](figures/05.01-clustering-2.png)
[figure source in Appendix](06.00-Figure-Code.ipynb#Clustering-Example-Figure-2)