# CHAPTER 9 (PART 1)
# Machine Learning

In many ways, machine learning is the primary means by which data science manifests itself to the broader world. Machine learning is where these computational and algorithmic skills of data science meet the statistical thinking of data science, and the result is a collection of approaches to inference and data exploration that are not about effective theory so much as effective computation. 

Fundamentally, machine learning involves building mathematical models to help understand data. “Learning” enters the fray when we give these models tunable parameters that can be adapted to observed data; in this way the program can be considered to be “learning” from the data. Once these models have been fit to previously seen data, they can be used to predict and understand aspects of newly observed data. 

## 9.1 Introduction to Machine Learning

At the most fundamental level, machine learning can be categorized into two main types: supervised learning and unsupervised learning. 

*Supervised learning* involves somehow modeling the relationship between measured features of data and some label associated with the data; once this model is determined, it can be used to apply labels to new, unknown data. This is further subdivided into *classification* tasks and *regression* tasks: in classification, the labels are discrete categories, while in regression, the labels are continuous quantities. We will see examples of both types of supervised learning in the following section. 

*Unsupervised learning* involves modeling the features of a dataset without reference to any label, and is often described as “letting the dataset speak for itself.” These models include tasks such as *clustering* and *dimensionality reduction*. Clustering algorithms identify distinct groups of data, while dimensionality reduction algorithms search for more succinct representations of the data. We will see examples of both types of unsupervised learning in the following section. 

In addition, there are so-called *semi-supervised learning* methods, which fall somewhere between supervised learning and unsupervised learning. Semi-supervised learning methods are often useful when only incomplete labels are available. 


### 9.1.1 Classification: Predicting Discrete Labels

We will first take a look at a simple classification task, in which you are given a set of labeled points and want to use these to classify some unlabeled points. Imagine that we have the data shown in figure below. Here we have two-dimensional data; that is, we have two features for each point, represented by the *(x,y)* positions of the points on the plane. In addition, we have one of two class labels for each point, here represented by the colors of the points. From these features and labels, we would like to create a model that will let us decide whether a new point should be labeled “blue” or “red.” 

There are a number of possible models for such a classification task, but here we will use an extremely simple one. We will make the assumption that the two groups can be separated by drawing a straight line through the plane between them, such that points on each side of the line fall in the same group. Here the model is a quantitative version of the statement “a straight line separates the classes,” while the model parameters are the particular numbers describing the location and orientation of that line for our data. The optimal values for these model parameters are learned from the data (this is the “learning” in machine learning), which is often called *training the model*. 

<img src="classification-1.png" img>
<center>Figure 9.1: A simple data set for classification<center>

A visual representation of what the trained model looks like for this data.
<img src="classification2.png" img>
<center>Figure 9.2: A simple classification model<center>

Now that this model has been trained, it can be generalized to new, unlabeled data. In other words, we can take a new set of data, draw this model line through it, and assign labels to the new points based on this model. This stage is usually called *prediction*. 
<img src="classification-3.png" img>

<center>Figure 9.3: Applying a classification model to new data</center>

### 9.1.2 Regression: Predicting continuous labels 

In contrast with the discrete labels of a classification algorithm, we will next look at a simple *regression* task in which the labels are continuous quantities.

Consider the data shown below, which consists of a set of points, each with a continuous label.

<img src="regression-1.png" img>
<center>Figure 9.4: A simple dataset for regression<center>

As with the classification example, we have two-dimensional data; that is, there are two features describing each data point. The color of each point represents the continuous label for that point. 

There are a number of possible regression models we might use for this type of data, but here we will use a simple linear regression to predict the points. This simple linear regression model assumes that if we treat the label as a third spatial dimension, we can fit a plane to the data. This is a higher-level generalization of the well-known problem of fitting a line to data with two coordinates. 

We can visualize this setup as shown below.

<img src="regression-2.png" img>
<center>Figure 9.5: A three-dimensional view of the regression data<center>

Notice that the feature 1–feature 2 plane here is the same as in the two-dimensional plot from before; in this case, however, we have represented the labels by both color and three-dimensional axis position. From this view, it seems reasonable that fitting a plane through this three-dimensional data would allow us to predict the expected label for any set of input parameters. Returning to the two-dimensional projection, when we fit such a plane we get the result shown below.

<img src="regression-3.png" img>
<center>Figure 9.6: A representation of the regression model<center>


This plane of fit gives us what we need to predict labels for new points. Visually, we find the results shown below.

<img src="regression-4.png" img>
<center>Figure 9.7: Applying the regression model to new data<center>

As with the classification example, this may seem rather trivial in a low number of dimensions. But the power of these methods is that they can be straightforwardly applied and evaluated in the case of data with many, many features.

### 9.1.3 Clustering: Inferring labels on unlabeled data 

The classification and regression illustrations we just looked at are examples of supervised learning algorithms, in which we are trying to build a model that will predict labels for new data. Unsupervised learning involves models that describe data without reference to any known labels.

One common case of unsupervised learning is “clustering,” in which data is automatically assigned to some number of discrete groups. For example, we might have some two-dimensional data like that shown below.

<img src="clustering-1.png" img>
<center>Figure 9.8: Example data for clustering<center>


By eye, it is clear that each of these points is part of a distinct group. Given this input, a clustering model will use the intrinsic structure of the data to determine which points are related. Using the very fast and intuitive k-means algorithm (will be explained later), we find the clusters shown below.

<img src="clustering-2.png" img>
<center>Figure 9.9: Data labeled with a k-means clustering model<center>

<br>
k-means fits a model consisting of k cluster centers; the optimal centers are assumed to be those that minimize the distance of each point from its assigned center. Again, this might seem like a trivial exercise in two dimensions, but as our data becomes larger and more complex, such clustering algorithms can be employed to extract useful information from the dataset. <br>

### 9.1.4 Summary 

Here we have seen a few simple examples of some of the basic types of machine learning approaches. Needless to say, there are a number of important practical details that we have glossed over, but hopefully this section was enough to give you a basic idea of what types of problems machine learning approaches can solve. 

In short, we saw the following: 
- *Supervised learning*: Models that can predict labels based on labeled training data 
- *Classification*: Models that predict labels as two or more discrete categories 
- *Regression*: Models that predict continuous labels 
- *Unsupervised learning*: Models that identify structure in unlabeled data 
- *Clustering*: Models that detect and identify distinct groups in the data 
- *Dimensionality Reduction*: Models that detect and identify lower dimensional structure in higher dimensional data