# Supervised Learning with Python

Machine Learning is about data. Data is either labeled or not.

## Labeled Data

Labeled data is data that has an outcome or a title. For example, in the Iris Flower Dataset we looked at in the introduction, we have labels/outcomes. This dataset is called labeled because we have predictors/features, which are the measurements of the petals and the sepals. The label is the species/name of the flower.

Another example is the imagenet dataset hosted at http://www.image-net.org/. This is a database of 14 million images with labels.

### Structured Data
A structured dataset is one in which the meaning of the data does not change if you change the order of its features or examples. The Iris Flower Dataset is a structured dataset because it does not matter the order of the petal width and petal length in the dataset. How you arrange the fields does not change the information you are passing across. Here is the table again.

![structured data](assets/Iris/3.jpg)

### Unstructured Data
Unstructured data is one in which the order of the data cannot be changed without altering the meaning of the data. Examples of this are images, sound files, and video files. Images are usually stored as 2-D or 3-D arrays (or tensors). A 2-D array is a table. However, if you had an image with 28 columns and 28 rows, you couldn't alternate the 10th and 20th columns of the array without changing the meaning of the data.

# Labels

In supervised learning, there are two types of labels.

## Categorical Labels
In the example of imagenet, every image has a label. In the example of the Iris dataset, every example has a label, which is the specie of the flower. These are both examples of categorical labels. In essence, the label can have a value that falls within a limited range.

While the labels make sense to us, they are usually stored in one of two formats for the computer to work with. The first format uses a numerical encoding. An example of this is:

<table>
    <tr>
        <th>Species</th><th>Encoding</th>
    </tr>
    <tr>
        <td>I. setosa</td><td>0</td>
    </tr>
    <tr>
        <td>I. versicolor</td><td>1</td>
    </tr>
    <tr>
        <td>I. verginica</td><td>2</td>
    </tr>
</table>
    

The second format uses a one-hot encoding. In this format, a column exists for each option. The column is set to 1 if the entry is for that species, and 0 otherwise. An example of this is:

<table>
    <tr>
        <th>Species</th><th>is_setosa</th><th>is_versicolor</th><th>is_verginica</th>
    </tr>
    <tr>
        <td>I. setosa</td><td>1</td><td>0</td><td>0<td>
    </tr>
    <tr>
        <td>I. versicolor</td><td>0</td><td>1</td><td>0</td>
    </tr>
    <tr>
        <td>I. verginica</td><td>0</td><td>0</td><td>1</td>
    </tr>
</table>

You might notice that the first format tells you the column to have a 1 in for the second format. The one-hot encoding format is useful because it simplifies vector multiplications. It is even more important because it clarifies relationships. When you look at the number representation, you might be tempted to think that the different species have a linear relationship, and that one specie is bigger than the preceding one. One-hot encoding clarifies the relationship between species.

The one-hot encoding is used for providing input to prediction models, and for getting output.

Output is provided either as a value that is 0 or 1, or as a probability, during predictions. The model computes the probability of each column during prediction, and passes it out either directly, so you can see the probabilities, or uses an `argmax()` to tell you which column has the highest probability.

When the output of your model is a categorical column, your model is said to perform **classification**

## Numerical Labels

Let's look at the Iris dataset again. If we decided that we wanted to predict the petal length given the species and three other measurements, our label would be a continuous numeral. This type of model is said to perform **regression**.

# Algorithms
Supervised Learning algorithms fall into two classes: **classification** and **regression**

## Classification
A classification problem is one in which you predict the probability of an event or a class. For example, predicting whether it is going to rain or not when given certain details like temperature and humidity, or predicting what class of animal you are dealing with when given the photo of the animal. The following algorithms are commonly used when building classification models.

### Logistic Regression
### Decision Trees
### Nearest Neighbors
### Random Forests

## Regression
A regression problem is one in which you predict an actual value, such as a temperature or humidity, when given certain features. The following algorithms are commonly used when building regression models.

### Linear Regression
### Lasso Regression
### Ridge Regression
### ElasticNet Regression
### Kernel Regression