In [1]:
from sklearn.datasets import make_classification

# Tabular Data

Tabular data is the data that you most often see. It is data that you can cleanly write in a table. It has a set number of rows and columns, and for our example below, all the data is numeric.

This is the one type of data that we will go over that is not necessarily suited to neural networks. Because it is so simple and so well studied, traditional ML can do quite well on it. 

That being said it makes a nice springboard to begin the rest of the tutorial.

To make this data we will be using sklearn `make_classification`. This will generate a dummy classification dataset:

In [2]:
dataset = make_classification(n_samples=10_000, n_features=20, n_classes=2)
x, y = dataset

In [3]:
x, y

(array([[-1.47591055,  0.25345616,  0.6174182 , ...,  0.44527873,
         -2.02793885, -0.25553664],
        [ 1.19614338, -1.66752205, -1.60501694, ..., -0.1298167 ,
         -1.5453044 , -0.56323096],
        [ 1.136674  , -0.53942846, -0.97723932, ...,  0.68611902,
          0.9081234 ,  0.86679452],
        ...,
        [ 1.12859474,  0.62318725, -0.17071723, ..., -0.37103146,
         -2.11036497,  1.72595764],
        [-0.94219602,  0.31865075,  0.04442349, ...,  0.60564122,
         -1.12027859,  0.74158706],
        [ 1.00780519,  1.14463957, -0.50560505, ...,  0.31718227,
          0.38186864, -0.4792807 ]]), array([1, 1, 0, ..., 1, 1, 0]))

Because we have two classes, this is binary classification, so predicting either 0 or a 1 based off of these 20 features.

So now that we have the data we can just throw it into a NN right? 

Well not quite yet. Because a NN is basically a linear ML alg, we first need to scale all the inputs:

In [4]:
from sklearn.preprocessing import StandardScaler

ss = StandardScaler()

standardized_x = ss.fit_transform(x)

Perfect, now we can just throw it into a NN :) 

Yup for this data there is not too much else to it but to build the NN.

In [7]:
import tensorflow as tf

# dropout probability
p = .1

We are going to be using keras to build our NN. Because this is tabular data we can follow a fairly simple structure of a NN:

1. Standardize/Normalize
2. (Optional) Regularize/Dropout
3. Apply a Dense Layer

Let me talk about the first and the last.

Standardizing is important because of the way that NNs train by using gradient descent. If a particular layer's input is too big, then the gradients might be massive and the training process goes out of wack. 

The dense layer is the core of the NN and applies a non-linear transformation to the inputs allowing the NN to represent any non-linear function - or something like that. Regardless without that you couldn't learn.

Dropout is a simple way of regularizing NNs. The reason I put this as optional, is that there is some debate on whether you need dropout in addition to batch normalization.

Ultimately you can experiment with the amt of dropout you need in your network, and if it's none, so be it.

---

So all that being said below is our first NN.

In [8]:
inputs = tf.keras.layers.Input((20,), name='numeric_inputs')

In [9]:
x = tf.keras.layers.Dropout(p)(inputs)
x = tf.keras.layers.Dense(100, activation='relu')(x)

x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
x = tf.keras.layers.Dense(20, activation='relu')(x)

x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
x = tf.keras.layers.Dense(10, activation='relu')(x)

x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(p)(x)
out = tf.keras.layers.Dense(1, activation='sigmoid', name='output')(x)

Now there are probably a couple of questions as to the above:

* Why so many layers?
* Why so many neurons in each layer

Well a good rule of thumb is that your NN can have as many params as the number of data points that you have, and the above NN has half as many, so we could probably increase the number of parameters. 

As for the width vs the depth of the network, well there has been a ton of results on either side of the aisle and honeslty I'm not sure what to tell you other than experimentation.

Some things you might want to keep in mind are:

* Skip connections seem to be pretty cool
* Alternating small and large layers might be a thing too

In [10]:
model = tf.keras.models.Model(inputs=inputs, outputs=out)
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [11]:
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
numeric_inputs (InputLayer)  [(None, 20)]              0         
_________________________________________________________________
dropout (Dropout)            (None, 20)                0         
_________________________________________________________________
dense (Dense)                (None, 100)               2100      
_________________________________________________________________
batch_normalization_v2 (Batc (None, 100)               400       
_________________________________________________________________
dropout_1 (Dropout)          (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 20)                2020      
_________________________________________________________________
batch_normalization_v2_1 (Ba (None, 20)                80    

As a final amendment to our data, I always like to use keras's `fit_generator` function, so I will often make a generator to feed data to the NN instead of using the default fit funtion.

In [12]:
import numpy as np

def bootstrap_sample_generator(batch_size):
    while True:
        batch_idx = np.random.choice(
            standardized_x.shape[0], batch_size)
        yield ({'numeric_inputs': standardized_x[batch_idx]}, 
               {'output': y[batch_idx]})

In [13]:
batch_size = 32

model.fit_generator(
    bootstrap_sample_generator(batch_size),
    steps_per_epoch=10_000 // batch_size,
    epochs=5,
    max_queue_size=10,
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x13b29ae80>