### 3.4.2 Preparing the data

We can't feed list of integers directly into neural network (NN).

Let say our data is:
```
samples = [
    [0,4,5],
    [1,2,3,4,5],
    [3,5,7,8]
]
```


**Option 1: use Embedding layer**

Pad list to have the same length, turn them into integer tensor with shape `(samples, words_indicies)`, then use **Embedding** layer as the first layer.

Let say we pick the list length is 4, our data become:
```
samples = [
    [0,4,5,-1], <-- -1 is the padding value
    [1,2,3,4],  <-- we exclude the 5, to make it length 4
    [3,5,7,8]   <-- do nothing
]
```


**Option 2: use Dense layer**

One-hot-encode our list into vectors of 1 and 0 with the same length. For example, if our data is `[3, 5]` and set vector length to 10, our vector values will be all 0s except for indices 3 and 5 which is 1s. Then we can use **Dense** layer as the first layer.

Our data will become:
```
samples = [
    [1,0,0,0,1,1,0,0,0,0], <-- [0,4,5]
    [0,1,1,1,1,1,0,0,0,0], <-- [1,2,3,4,5]
    [0,0,0,1,0,1,0,1,1,0]  <-- [3,4,7,8]
]
```

In [20]:
# One-hot encoding example
import numpy as np

samples = [
    [0,4,5],
    [1,2,3,4,5],
    [3,5,7,8]
]

def vectorize_samples(inputs, dimension=10):
    results = np.zeros((len(inputs), dimension))
    for i, s in enumerate(inputs):
        """
        Note:
        results[i, s] below will be like: results[i, [0,4,5]] = 1.0
        if we want to make all values 1.0 then we can do: results[i, :] = 1.0
        """
        results[i, s] = 1.0
    return results

In [21]:
vectorize_samples(samples, 10)

array([[1., 0., 0., 0., 1., 1., 0., 0., 0., 0.],
       [0., 1., 1., 1., 1., 1., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 1., 0., 1., 1., 0.]])