How do you prepare input data and targets before training?

Many data-preprocessing and feature-engineering techniques are domain specific, but the following are common to all data domains...

### Vectorization

Transforming the data that needs to be processed into a tensor of floating-point data.

### Value Normalization

Rescaling or shifting data to a more convenient distribution (i.e. 0 to 255 color values into 0 to 1 scale, etc)

In general, it isn't safe to feed into a neural network data that takes relatively large values or data that is heterogeneous. This can trigger large gradient updates that prevent the network from converging.

To foster easier learning, follow these guidelines:
- *Take small values*: 0-1 if possible
- *Be homogenous*: All features should take values in roughly the same range.

Some stricter guidelines that are common but are not always necessary:
- Normalize each feature independently to have mean of 0
- Normalize each feature independently to have a standard deviation of 1

### Handling Missing Values

In general, with neural networks it's safe to input missing values as 0, with the condition that 0 isn't already a meaningful value. The network will learn from exposure to the data that the value 0 means missing data and will start ignoring the value.

If you are expecting missing values in the test data but didn't train the network on any examples with missing data, you should go back an artificially create some examples with missing data.

### Feature Engineering

Process of using your own knowledge about the data and about the machine-learning algorithm at hand to make the algorithm work better by applying hardcoded transformations to the data before it goes into the model.

*Example*: Reading a Clock-Face. Don't feed the whole image as data into the model. Do some preprocessing to find the coordinates of the ends of the clockhands and feed that instead.

- Good features allow you to solve problems more elegantly while using fewer resources.
- Good features let you solve a problem with far less data.