[View in Colaboratory](https://colab.research.google.com/github/DJCordhose/deep-learning-crash-course-notebooks/blob/master/U3-M5-nn-intro.ipynb)

# Introduction to Neural Networks with TensorFlow and Keras layers

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline
%pylab inline
import matplotlib.pyplot as plt

Populating the interactive namespace from numpy and matplotlib


In [3]:
import pandas as pd
print(pd.__version__)

ModuleNotFoundError: No module named 'pandas'

In [None]:
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
print(tf.__version__)

In [None]:
# let's see what compute devices we have available, hopefully a GPU 
sess = tf.Session()
devices = sess.list_devices()
for d in devices:
    print(d.name)

In [None]:
# a small sanity check, does tf seem to work ok?
hello = tf.constant('Hello TF!')
print(sess.run(hello))

In [None]:
from tensorflow import keras
print(keras.__version__)

## Loading and preparing our data set for classification

In [None]:
df = pd.read_csv('./data/insurance-customers-1500.csv', sep=';')

In [None]:
df.head()

In [None]:
df.describe()

## First important concept: You train a machine with your data to make it learn the relationship between some input data and a certain label - this is called supervised learning

<img src='https://raw.githubusercontent.com/DJCordhose/deep-learning-crash-course-notebooks/master/img/encoding3.jpg'>

In [None]:
# we deliberately decide "group" is going to be our label, 
# it is often named lower case y
y = df['group']

In [None]:
# since 'group' is now the label we want to predict, 
# we need to remove it from the training data 
df.drop('group', axis='columns', inplace=True)

In [None]:
# input data often is named upper case X, 
# the upper case indicates, that each row is a vector
X = df.as_matrix()

## Neural Networks using TensorFlow and Keras layers
* Neural Networks consist of artificial neurons you organize in layers
* each neuron is very simple, but, theoretically, having enough of them in a single layer can approximate any funtion
* practically, we use 2 or 3 layers, as this has turned out to work well
* the more neurons and the more layers you use the longer the network takes to train
* neural networks often are no longer approachable using cross validation and grid search to find suitable hyper parameters

## Neuron (aka node or unit)

A neuron takes a number of numerical inputs, multiplies each with a weight, sums up all weighted input and adds bias (constant) to that sum. From this it creates a single numerical output. For one input (one dimension) this would be a description of a line. For more dimensions this describes a hyper plane that can serve as a decision boundary. Typically, this output is transformed using an activation function which compresses the output to a value between 0 and 1 (sigmoid), or between -1 and 1 (tanh) or sets all negative values to zero (relu).

It is not really important to understand the details of a neural network. Practically how you configure them to form something more powerful is much more important. This, however, is still a very experimental domain, so there really is no conscise explanation and understanding how they work.

<img src='https://raw.githubusercontent.com/DJCordhose/deep-learning-crash-course-notebooks/master/img/neuron.jpg'>

### We use a sequential mode, that means data flows without junctions from in to out

In [None]:
model = keras.Sequential()

### We start with a single fully connected layer having 50 neurons
* we have three inputs
  * age 
  * speed
  * miles
* activation function is tanh
* why these parameters: random for now

In [None]:
from tensorflow.keras.layers import Dense

model.add(Dense(50, name='hidden1', activation='tanh', input_dim=3))

### The final layer just transforms to likelyhood for each of our 3 classes

In [None]:
num_categories = 3
model.add(Dense(num_categories, name='softmax', activation='softmax'))

### First, let us have a look at how the input and output from this model would look like

* this model has not been trainined, so do not expect the outputs to be reasonable
* we are only interested in the format of input and output
* note that there is a mismatch between prediction and our known truths in format
* we will fix this in the next step

In [None]:
input = X[0:10]

In [None]:
# combinations of customer data
input

In [None]:
# predicted output: likeliyhoods for groups
model.predict(input)

In [None]:
# true, known output
y[0:10]

### These are the parameters of the model that need to be learned

In [None]:
model.summary()

### Bringing it all together
* _sparse_categorical_crossentropy_
  * _crossentropy_: Loss is defined by https://en.wikipedia.org/wiki/Cross_entropy
  * _categorical_: we are comparing categorical data
  * _sparse_: allows us to leave our labels as they are without explicitly turning them into a one-hot encoding 
* _adam_: is the least tedious algorithm to minimize loss (http://cs231n.github.io/neural-networks-3/#ada)
  * auto-tunes most important parameters including learning rate   

In [None]:
model.compile(loss='sparse_categorical_crossentropy',
             optimizer='adam')

# Caution: we have not trained our model, yet, the parameters are still initinialized randomly