# Data preparation

## Structuring your experiment data ##

Always divide your data into the following blocks:

* Split your data into a _training set_ and a _test set_
* Split the training set further into _training set_ and a _validation_ set

When evaluating the correctness of your results, always validate with the **validation** set. Do not use the test set until you think your model is ready for prime time.

## Considerations for splitting data

### Random shuffling

If you have a data set ordered in classes, ensure you do random shuffling. I.e: if you have a dataset representing single digits (classes 0-9) and you split your data 80/20 without random shuffling, your training set will not have instances of classes 8-9.

### Time series

Do not do random shuffling if you have time series data. Otherwise, the progression of time will be lost

### Data redundancy

Remove data dups. If you shuffle your data and it das dups, these dups might end up in both the test, validation and training sets.

## K-fold Cross-validation

If your data set is too small, splitting the data set might cause your sets to loose statistical significance. In this case, use the K-fold approach to split the data into _x_ folds, each of which will be evaluated individually and the result of those runs averaged to produce a more solid result.

\begin{align}
CV_{(k)} = \frac{1}{k} \sum_{i=1}^k Loss_{i}
\end{align}

* _k_ is the number of folds, typically 10
* _Loss_ function typically is mean squared errors, log-loss or accuracy


# Data processing

## Vectorization

All your data must be expressed as tensors (either floating point numbers, or ints).

## Data normalization

Features must have values which have similar ranges. Typically, you will try to have all input and output values to have a range of 0-1. As a recommendations, values should have mean 0 and standard deviation 1:

For a 2D matrix:
```
x -= x.mean(axis=0)
x /= x.std(axis=0)
```

# Handling overfitting

A model should _generalize_ well. After a certain number of training iterations, the model will start learning things which are particular to the training set but not general. To deal with overfitting:

* Get more training data
* Invest in regularization

## Regularization

Impose constraints to the network. These constraints will make it more difficult for the network to learn particularities of the training set.

### Reduce the size of the network

Reduce number of layers and number of units per layer. Recommendation: start with fewer layers and parameters, and start making your architecture more complex until you get to a point of diminishing returns.

### Weight regularization

Impose a penalty for having large weights in the network, thus forcing the weights across the network to be more regular. This is done by adding to the loss function so that it penalizes larger weights.

### Dropout

Set some of the values in the output vector to zero, at random.