# Framing

**Supervised machine learning** is ML systems learning how to combine input to
produce useful predictions on never-before-seen data.

## Labels

A **label** is the thing we're predicting - the $y$ variable in simple linear
regression. The label could be the future price of wheat, the kind of animal
shown in a picture, the meaning of an audio clip, or just about anything.

## Features

A **feature** is an input variable - the $x$ variable in simple linear
regression. A simple machine learning project might use a single feature, while
a more sophisticated machine learning project could use millions of features,
specified as $x_1, x_2, ..., x_N$.

For example, when making an email spam detector the features could include the
following:

- Words in the email text.
- The sender's address.
- The time of day the email was sent.
- Whether the email contains the phrase "one weird trick".

## Examples

An **example** is a particular instance of data, $\vec{x}$. We break examples
into two categories, which we will go over below.

A **labeled example** includes both feature(s) and the label, such that
$\lbrace features, label\rbrace: (x, y)$. We use labeled examples to **train**
the model. In our spam detector example, the labeled examples would be
individual emails that users have explicitly marked as "spam" or "not spam".

An **unlabeled example** contains feature(s) without a label, such that
$\lbrace features, ?\rbrace: (x, ?)$. Once we've trained our model with labeled
examples, we use that model to predict the label on unlabeled examples. In the
spam detector, unlabeled examples are new emails that humans haven't yet
labeled.

## Models

A model defines the relationship between features and the label. For example,
our spam detection model might associate certain features strongly with "spam".
Let's highlight two phases of a model's life:

- **Training** means creating or **learning** the model. That is, you show the
  model labeled examples and enable the model to gradually learn the
  relationships between features and label.
- **Inference** means applying the trained model to unlabeled examples. That is,
  you use the trained model to make useful predictions, labeled $y\prime$.

## Regression vs. Classification

A **regression** model predicts continuous values. For example, regression
models might make predictions that answer questions like the following:

- What is the value of a house in California?
- What is the probability that a user will click on this ad?

A **classification** model predicts discrete values. For example, classification
models might make predictions that answer questions like the following:

- Is a given email message spam or not spam?
- Is this an image of a dog, a cat, or a hamster?