<a href="https://colab.research.google.com/gist/nv-classes/b00007a21a27f4bad79f9f570212f7c1/copy-of-copy-of-t81_558_class_03_2_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to TensorFlow and Keras
Adapted from a Jupyter notebook by [Jeff Heaton](https://sites.wustl.edu/jeffheaton/)

TensorFlow is an open-source software library for deep learning. TensorFlow was originally developed by the Google Brain team for Google's research and production purposes and later released under the Apache 2.0 open source license on November 9, 2015.

* [TensorFlow Homepage](https://www.tensorflow.org/)
* [TensorFlow GitHib](https://github.com/tensorflow/tensorflow)
* [TensorFlow Google Groups Support](https://groups.google.com/forum/#!forum/tensorflow)
* [TensorFlow Google Groups Developer Discussion](https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss)
* [TensorFlow FAQ](https://www.tensorflow.org/resources/faq)

## Why TensorFlow

* Supported by Google
* _Works out of the box in Google Colab_
* Works well on Windows, Linux, and Mac
* Excellent GPU support
* Python-based

## Deep Learning Tools
TensorFlow is not the only game in town. The biggest competitor to TensorFlow/Keras is PyTorch.

* **[TensorFlow](https://www.tensorflow.org/)** - Google's deep learning API.  The focus of this class, along with Keras.

* **[PyTorch](https://pytorch.org/)** - PyTorch is an open-source machine learning library based on the Torch library, used for computer vision and natural language applications processing. Facebook's AI Research lab primarily develops PyTorch.

* **[Keras](https://keras.io/)** - Acts as a higher-level frontend to Tensorflow (and now, in its latest incarnation, also for PyTorch and [Jax](https://docs.jax.dev/en/latest/)).

Generally, Keras requires significantly fewer lines of code to perform deep learning applications. However, if you are creating entirely new neural network structures in a research setting, direct use of PyTorch or TensorFlow can provide easier access to some of the low-level internals of deep learning.

## Using TensorFlow Directly

We will mostly focus on Keras (using TensorFlow as a backend), which allows us to specify the number of hidden layers and create the neural network easily. TensorFlow is a low-level mathematics API, similar to [Numpy](http://www.numpy.org/). However, unlike Numpy, TensorFlow is built for deep learning. TensorFlow compiles these compute graphs into highly efficient C++/[CUDA](https://en.wikipedia.org/wiki/CUDA) code running on GPUs. In the next section, we will see a TensorFlow example that has nothing to do with neural networks.

### Mandelbrot in TensorFlow

Next, we examine an example where we use TensorFlow directly. The example shows that TensoFlow does not only provide neural networks, but can also provide other mathematical. The code in the next section renders a [Mandelbrot set](https://en.wikipedia.org/wiki/Mandelbrot_set).

In [None]:
import tensorflow as tf
import numpy as np

import PIL.Image
from io import BytesIO
from IPython.display import Image, display

def render(a):
  a_cyclic = (a*0.3).reshape(list(a.shape)+[1])
  img = np.concatenate([10+20*np.cos(a_cyclic),
                        30+50*np.sin(a_cyclic),
                        155-80*np.cos(a_cyclic)], 2)
  img[a==a.max()] = 0
  a = img
  a = np.uint8(np.clip(a, 0, 255))
  f = BytesIO()
  return PIL.Image.fromarray(a)

# difine functions for mandelbrot generation with tf routines
def mandelbrot_helper(grid_c, current_values, counts,cycles):

  for i in range(cycles):
    temp = current_values*current_values + grid_c
    not_diverged = tf.abs(temp) < 4
    current_values.assign(temp),
    counts.assign_add(tf.cast(not_diverged, tf.float32))

def mandelbrot(render_size,center,zoom,cycles):
  f = zoom/render_size[0]
  real_start = center[0]-(render_size[0]/2)*f
  real_end = real_start + render_size[0]*f
  imag_start = center[1]-(render_size[1]/2)*f
  imag_end = imag_start + render_size[1]*f

  real_range = tf.range(real_start,real_end,f,dtype=tf.float64)
  imag_range = tf.range(imag_start,imag_end,f,dtype=tf.float64)
  real, imag = tf.meshgrid(real_range,imag_range)
  grid_c = tf.constant(tf.complex(real, imag))
  current_values = tf.Variable(grid_c)
  counts = tf.Variable(tf.zeros_like(grid_c, tf.float32))

  mandelbrot_helper(grid_c, current_values,counts,cycles)
  return counts.numpy()

With the above code defined, we can now calculate and render a Mandlebrot plot.

In [None]:
counts = mandelbrot(
    #render_size=(3840,2160), # 4K
    #render_size=(1920,1080), # HD
    render_size=(640,480),
    center=(-0.5,0),
    zoom=4,
    cycles=200
)
img = render(counts)
print(img.size)
img

#img.save("test.png")

Mandlebrot rendering programs are both simple and infinitely complex at the same time. This view shows the entire Mandlebrot universe simultaneously, as a view completely zoomed out. However, if you zoom in on any non-black portion of the plot, you will find infinite hidden complexity.

## Introduction to Keras

[Keras](https://keras.io/) is a layer on top of TensorFlow that makes it much easier to create neural networks. Rather than define every operation, you set the individual layers of the network with a much more high-level API. Unless you are researching entirely new structures of deep neural networks, it is unlikely that you need to program TensorFlow directly.  

## Simple TensorFlow Regression: MPG

This example shows how to encode an MPG dataset for regression and predict values. We will predict the miles per gallon (MPG) for a car based on the car's weight, cylinders, engine size, and other features.


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
import pandas as pd
import io
import os
import requests
import numpy as np
from sklearn import metrics

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/auto-mpg.csv",
    na_values=['NA', '?'])

cars = df['name']

# Handle missing value
df['horsepower'] = df['horsepower'].fillna(df['horsepower'].median())

# Pandas to Numpy
x_mpg = df[['cylinders', 'displacement', 'horsepower', 'weight',
       'acceleration', 'year', 'origin']].values
y_mpg = df['mpg'].values # regression

# Build the neural network
model = Sequential()
model.add(Dense(25, activation='relu')) # Hidden 1
model.add(Dense(10, activation='relu')) # Hidden 2
model.add(Dense(1)) # Output
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(x_mpg,y_mpg,epochs=100)

## Neural Network Hyperparameters

The neural network above contains four layers. The first layer is the input layer and does not show up explicitly. It is automatically generated from the input and provides one node (neuron)for every column in the data set (including dummy variables, if present).

There are also two hidden layers, with 25 and 10 neurons each. You might be wondering how the programmer chose these numbers. Selecting a hidden neuron structure is one of the most common questions about neural networks. Unfortunately, there is no right answer. These are hyperparameters. They are settings that can affect neural network performance, yet there are no clearly defined means of setting them.

In general, more hidden neurons mean more capability to fit complex problems. However, too many neurons can lead to overfitting and lengthy training times. Too few can lead to underfitting the problem and will sacrifice accuracy. Also, how many layers you have is another hyperparameter. In general, more layers allow the neural network to perform more of its feature engineering and data preprocessing. But this also comes at the expense of training times and the risk of overfitting. In general, you will see that neuron counts start larger near the input layer and tend to shrink towards the output layer in a triangular fashion.



## Regression Prediction

Next, we will perform actual predictions. The program assigns these predictions to the **pred** variable. These are all MPG predictions from the neural network. Notice that this is a 2D array? You can always see the dimensions of what Keras returns by printing out **pred.shape**. Neural networks can return multiple values, so the result is always an array. Here the neural network only returns one value per prediction (there are 398 cars, so 398 predictions). However, a 2D range is needed because the neural network has the potential of returning more than one value.   

In [None]:
pred_mpg = model.predict(x_mpg)
print(f"Shape: {pred_mpg.shape}")

We would like to see how good these predictions are.  We know the correct MPG for each car so we can measure how close the neural network was.

In [None]:
# Measure RMSE error.  RMSE is common for regression.
score = np.sqrt(metrics.mean_squared_error(pred_mpg,y_mpg))
print(f"Final score (RMSD): {score}")

In [None]:
# Sample predictions
for i in range(0,200,20):
    print(f"{i+1}. Car name: {cars[i]}, MPG: {y_mpg[i]}, "
          + f"predicted MPG: {pred_mpg[i]}")

## Simple Classification: Iris dataset (ML "Hello World")

We want to report a classification as their percent confidence in each class rather than just the top label.

We previously saw how to train a neural network to predict the MPG of a car. Based on four measurements, we will now see how to predict a class, in this case the type of iris flower. The code to classify iris flowers is similar to MPG; however, there are several important differences:

* The output neuron count matches the number of classes (in the case of Iris, 3).
* The *Softmax activation function* is utilized by the output layer and the loss function we use is *cross entropy*.

In [None]:
import pandas as pd
import io
import requests
import numpy as np
from sklearn import metrics
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping

df = pd.read_csv(
    "https://data.heatonresearch.com/data/t81-558/iris.csv",
    na_values=['NA', '?'])

# Convert to numpy - Classes/Labels
x_iris = df[['sepal_l', 'sepal_w', 'petal_l', 'petal_w']].values
dummies = pd.get_dummies(df['species']) # Classification
species = dummies.columns
y_iris = dummies.values


# Build neural network
model = Sequential()
model.add(Dense(50, activation='relu')) # Hidden 1
model.add(Dense(25, activation='relu')) # Hidden 2
model.add(Dense(y_iris.shape[1],activation='softmax')) # Output

model.compile(loss='categorical_crossentropy', optimizer='adam')
model.fit(x_iris,y_iris,epochs=100)

In [None]:
# Print out number of species found:
print(species)

Now that we have a trained neural network, we would like to use it. Exactly like before, we will generate predictions. This time, instead of one value (MPG prediction), three values are generated for each of the 150 iris flowers. These correspond to the probability of belonging to the three types of iris (Iris-setosa, Iris-versicolor, and Iris-virginica).  

In [None]:
pred_iris = model.predict(x_iris)

print(f"Shape: {pred_iris.shape}")
print(species)

np.set_printoptions(formatter={'float': lambda x: format(x, '.3f')})
print(pred_iris[0:150:15])


Usually, the program considers the column with the highest prediction to be the prediction of the neural network.  It is easy to convert the predictions to the expected iris species.  The argmax function finds the index of the maximum prediction for each row.

In [None]:
predict_classes = np.argmax(pred_iris,axis=1)
expected_classes = np.argmax(y_iris,axis=1)
print(f"Predictions:\n {predict_classes}")
print(f"Expected:\n {expected_classes}")

Accuracy might be a more easily understood error metric.  It is essentially a test score.  For all of the iris predictions, what percent were correct?  The downside is it does not consider how confident the neural network was in each prediction.

In [None]:
from sklearn.metrics import accuracy_score

correct = accuracy_score(expected_classes,predict_classes)
print(f"Accuracy: {correct}")