# When Model Meets Data

There are three major components of a machine learning system: data, models, and learning. The main question of machine learning is "What do we mean by good models?". The word model has many subtleties.  It is also not entirely obvious how to objectively define the word “good”. One of the guiding principles of machine learning is that good models should perform well on unseen data. This requires us to define some performance metrics, such as accuracy or distance from ground truth, as well as figuring out ways to do well under these performance metrics.

## Data as Vectors

We assume that our data can be read by a computer, and represented adequately in a numerical format. Data is assumed to be tabular, where we think of each row of the table as representing a particular instance or example, and each column to be a particular feature. Data can come in many shapes and forms not just tabular such as images, genomic etc...

Text data is typically converted to numberical e.g. male = 1 and female = -1. Even numerical data that could potentially be directly read into a machine learning algorithm should be carefully considered for units, scaling, and constraints. Without additional information, one should shift and scale all columns of the dataset such that they have an empirical mean of 0 and an empirical variance of 1.

- $N$: Number of examples in the dataset
- $x_n$: The nth example in the dataset
- $D$: Numer of featues in the dataset

Recall that data is represented as vectors, which means that each example (each data point) is a $D$-dimensional vector.

Let us consider the problem of predicting annual salary from age. This is called a supervised learning problem where we have a label $y_n$ (the salary) associated with each example $x_n$ (the age).  The label $y_n$ has various other names, including target, response variable, and annotation. A dataset is written as a set of example label pairs: $\{ (x_1, y_1), \cdots, (x_n, y_n), \cdots, (x_N, y_N) \}$

Representing data as vectors xn allows us to use concepts from linear algebra. In many machine learning algorithms, we need to additionally be able to compare two vectors. Computing the similarity or distance between two examples allows us to formalize the intuition that examples with similar features should have similar labels. The comparison of two vectors requires that we construct a geometry (notebook 3) and allows us to optimize the resulting learning problem using techniques

Since we have vector representations of data, we can manipulate data to find potentially better representations of it. We will discuss finding good representations in two ways: finding lower-dimensional approximations of the original feature vector, and using nonlinear higher-dimensional combinations of the original feature vector. An example of finding a low-dimensional approximation of the original data space by finding the principal components. Finding principal components is closely related to concepts of eigenvalue and singular value decomposition. For the high-dimensional representation, we will see an explicit feature map $\phi(.)$ that allows us to represent inputs $x_n$ using a higher-dimensional representation $\phi(x_n)$. The main motivation for higher-dimensional representations is that we can construct new features as non-linear combinations of the original features, which in turn may make the learning problem easier.

### Models as Functions