# Introduction to TensorFlow in Python

### Course Contents:
1. Introduction to TensorFlow
2. Linear models
3. Neural Networks
4. High Level APIs

# 1. Introduction to TensorFlow

### Chapter Contents:

1. Constants and variables
2. Basic operations
3. Advanced operations

## Constants and variables

### What is TensorFlow?

- Open-source library for graph-based numerical computation
    - Developed by the Google Brain Team
- Low and high level APIs
    - Addition, multiplication, differentiation
    - Machine learning models
- Important changes in TensorFlow 2.0
    - Eager execution by default
    - Model building with Keras and Estimators

### What is a tensor?

- Generalization of vectors and matrices
- Collection of numbers
- Specific shape

### Defining tensors in TensorFlow

In [1]:
#import warnings
#warnings.filterwarnings("ignore")

import tensorflow as tf

# 0D Tensor
d0 = tf.ones((1,))

# 1D Tensor
d1 = tf.ones((2,))

# 2D Tensor
d2 = tf.ones((2, 2))

# 3D Tensor
d3 = tf.ones((2, 2, 2))

print(f"d0:\n{d0.numpy()}\n")
print(f"d1:\n{d1.numpy()}\n")
print(f"d2:\n{d2.numpy()}\n")
print(f"d3:\n{d3.numpy()}")



d0:
[1.]

d1:
[1. 1.]

d2:
[[1. 1.]
 [1. 1.]]

d3:
[[[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]]


### Defining constants in TensorFlow

- A constant is the simplest category of tensor
    - Not trainable
    - Can have any dimension

In [2]:
from tensorflow import constant

# Define a 2x3 constant.
a = constant(3, shape=[2, 3])

# Define a 2x2 constant.
b = constant([1, 2, 3, 4], shape=[2, 2])

print(f"a:\n{a.numpy()}\n")
print(f"b:\n{b.numpy()}")

a:
[[3 3 3]
 [3 3 3]]

b:
[[1 2]
 [3 4]]


### Using convenience functions to define constants

|Operation|Example|Description|
|---------|-------|-----------|
|`tf.constant()`|`constant([1, 2, 3])`|Creates a constant tensor from a tensor-like object.|
|`tf.zeros()`|`zeros([2, 2])`|Creates a tensor with all elements set to zero.|
|`tf.zeros_like()`|`zeros_like(input_tensor)`|Creates a constant tensor from a tensor-like object.|
|`tf.ones()`|`ones([2, 2])`|Creates a tensor with all elements set to one (1).|
|`tf.ones_like()`|`ones_like(input_tensor)`|Creates a tensor of all ones that has the same shape as the input.|
|`tf.fill()`|`fill([3, 3], 7)`|Creates a tensor filled with a scalar value.|

### Defining and initializing variables

In [3]:
import tensorflow as tf

# Define a variable
a0 = tf.Variable([1, 2, 3, 4, 5, 6], dtype=tf.float32)
a1 = tf.Variable([1, 2, 3, 4, 5, 6], dtype=tf.int16)

# Define a constant
b = tf.constant(2, tf.float32)

# Compute their product
c0 = tf.multiply(a0, b)
c1 = a0 * b

print(f"a0: {a0.numpy()}")
print(f"a1: {a1.numpy()}")
print(f"b:  {b.numpy()}")
print(f"c0: {c0.numpy()}")
print(f"c1: {c1.numpy()}")

a0: [1. 2. 3. 4. 5. 6.]
a1: [1 2 3 4 5 6]
b:  2.0
c0: [ 2.  4.  6.  8. 10. 12.]
c1: [ 2.  4.  6.  8. 10. 12.]


#### Exercise: Defining data as constants

In [4]:
import numpy as np
import pandas as pd
credit_numpy = np.array(pd.read_csv("data/uci_credit_card.csv")[["EDUCATION", "MARRIAGE", "AGE", "BILL_AMT1"]])

In [5]:
# Import constant from TensorFlow
from tensorflow import constant

# Convert the credit_numpy array into a tensorflow constant
credit_constant = constant(credit_numpy)

# Print constant datatype
print('\n The datatype is:', credit_constant.dtype)

# Print constant shape
print('\n The shape is:', credit_constant.shape)


 The datatype is: <dtype: 'float64'>

 The shape is: (30000, 4)


#### Exercise: Defining variables

In [6]:
from tensorflow import Variable

# Define the 1-dimensional variable A1
A1 = Variable([1, 2, 3, 4])

# Print the variable A1
print('\n A1: ', A1)

# Convert A1 to a numpy array and assign it to B1
B1 = A1.numpy()

# Print B1
print('\n B1: ', B1)


 A1:  <tf.Variable 'Variable:0' shape=(4,) dtype=int32, numpy=array([1, 2, 3, 4])>

 B1:  [1 2 3 4]


## Basic operations

### What is a TensorFlow operation?

![tensorflow_operation.png](attachment:tensorflow_operation.png)

- TensorFlow has a model of computation that revolves around the use of graphs.
- A TensorFlow graph contains edges and nodes, where edges are tensors and nodes are operations.

### Applying the addition operator

In [7]:
# Import constant and add from tensorflow
from tensorflow import constant, add

# Define 0-dimensional tensors
A0 = constant([1])
B0 = constant([2])

# Define 1-dimensional tensors
A1 = constant([1, 2])
B1 = constant([3, 4])

# Define 2-dimensional tensors
A2 = constant([[1, 2], [3, 4]])
B2 = constant([[5, 6], [7, 8]])

print(f"A0:\n{A0.numpy()}\n")
print(f"B0:\n{B0.numpy()}\n")
print(f"A1:\n{A1.numpy()}\n")
print(f"B1:\n{B1.numpy()}\n")
print(f"A2:\n{A2.numpy()}\n")
print(f"B2:\n{B2.numpy()}")

A0:
[1]

B0:
[2]

A1:
[1 2]

B1:
[3 4]

A2:
[[1 2]
 [3 4]]

B2:
[[5 6]
 [7 8]]


In [8]:
# Perform tensor addition with add()
C0 = add(A0, B0)
C1 = add(A1, B1)
C2 = add(A2, B2)

print(f"C0:\n{C0.numpy()}\n")
print(f"C1:\n{C1.numpy()}\n")
print(f"C2:\n{C2.numpy()}")

C0:
[3]

C1:
[4 6]

C2:
[[ 6  8]
 [10 12]]


### Performing tensor addition

- The `add()` operation performs **element-wise addition** with two tensors
- Element-wise addition requires both tensors to have the same shape:
    - Scalar addition: $1 + 2 = 3$
    - Vector addition: $\begin{bmatrix}1 & 2\end{bmatrix} + \begin{bmatrix}3 & 4\end{bmatrix} = \begin{bmatrix}4 & 6\end{bmatrix}$
    - Matrix addition: $\begin{bmatrix}1 & 2 \\ 3 & 4 \ \end{bmatrix} + \begin{bmatrix}5 & 6 \\ 7 & 8 \ \end{bmatrix} = \begin{bmatrix}6 & 8 \\ 10 & 12 \ \end{bmatrix}$
- The `add()` operator is overloaded

### How to perform multiplication in TensorFlow

- **Element-wise multiplication** performed using `multiply()` operation
    - The tensors multiplied must have the same shape
    - E.g. $[1, 2, 3]$ and $[3, 4, 5]$ or $[1, 2]$ and $[3, 4]$
- **Matrix multiplication** performed with `matmul()` operator
    - The `matmul(A, B)` operation multiplies A by B
    - Number of columns of A must equal the number of rows of B

### Applying the multiplication operators

In [9]:
# Import operators from tensorflow
from tensorflow import ones, matmul, multiply

# Define tensors
A0 = ones(1)
A31 = ones([3, 1])
A34 = ones([3, 4])
A43 = ones([4, 3])

- What types of operations are valid?
    - `multiply(A0, A0)`, `multiply(A31, A31)`, and `multiply(A34, A34)`
    - `matmul(A43, A34)`, but not `matmul(A43, A43)`

### Summing over tensor dimensions

- The `reduce_sum()` operator sums over the dimensions of a tensor
    - `reduce_sum(A)` sums over all dimensions of A
    - `reduce_sum(A, i)` sums over dimension i

In [10]:
# Import operations from tensorflow
from tensorflow import ones, reduce_sum

# Define 2x3x4 tensor of ones
A = ones([2, 3, 4])

# Sum over all dimensions
B = reduce_sum(A)

# Sum over dimensions 0, 1, and 2
B0 = reduce_sum(A, 0)
B1 = reduce_sum(A, 1)
B2 = reduce_sum(A, 2)

print(f"A:\n{A.numpy()}\n")
print(f"B:\n{B.numpy()}\n")
print(f"B0:\n{B0.numpy()}\n")
print(f"B1:\n{B1.numpy()}\n")
print(f"B2:\n{B2.numpy()}")

A:
[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]

B:
24.0

B0:
[[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]

B1:
[[3. 3. 3. 3.]
 [3. 3. 3. 3.]]

B2:
[[4. 4. 4.]
 [4. 4. 4.]]


#### Exercise: Performing element-wise multiplication

In [11]:
from tensorflow import constant, ones_like, multiply

# Define tensors A1 and A23 as constants
A1 = constant([1, 2, 3, 4])
A23 = constant([[1, 2, 3], [1, 6, 4]])

# Define B1 and B23 to have the correct shape
B1 = ones_like(A1)
B23 = ones_like(A23)

# Perform element-wise multiplication
C1 = multiply(A1, B1)
C23 = multiply(A23, B23)

# Print the tensors C1 and C23
print('C1: {}'.format(C1.numpy()))
print('\n C23: {}'.format(C23.numpy()))

C1: [1 2 3 4]

 C23: [[1 2 3]
 [1 6 4]]


#### Exercise: Making predictions with matrix multiplication

In [12]:
from tensorflow import matmul, constant

# Define features, params, and bill as constants
features = constant([[2, 24], [2, 26], [2, 57], [1, 37]])
params = constant([[1000], [150]])
bill = constant([[3913], [2682], [8617], [64400]])

# Compute billpred using features and params
billpred = matmul(features, params)

# Compute and print the error
error = bill - billpred
print(error.numpy())

[[-1687]
 [-3218]
 [-1933]
 [57850]]


## Advanced operations

### Overview of advanced operations

- We have covered basic operations in TensorFlow
    - `add()`, `multiply()`, `matmul()`, and `reduce_sum()`
- In this lesson, we explore advanced operations
    - `gradient()`, `reshape()`, and `random()`
    
|Operation|Use|
|---------|---|
|`gradient()`|Computes the slope of a function at a point|
|`reshape()`|Reshapes a tensor (e.g. 10x10 to 100x1)|
|`random()`|Populates tensor with entries drawn from a probability distribution|

### Finding the optimum

- In many problems, we will want to find the optimum of a function.
    - **Minimum**: Lowest value of a loss function.
    - **Maximum**: Highest value of objective function.
- We can do this using the `gradient()` operation.
    - **Optimum**: Find a point where gradient = 0.
    - **Minimum**: Change in gradient > 0
    - **Maximum**: Change in gradient < 0

### Gradients in TensorFlow

In [13]:
# Import tensorflow under the alias tf
import tensorflow as tf

# Define x
x = tf.Variable(-1.0)

# Define y within instance of GradientTape
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.multiply(x, x)
    
# Evaluate the gradient of y at x = -1
g = tape.gradient(y, x)
print(g.numpy())

-2.0


### Images as tensors

- An operation that is particularly useful in image classification problems: reshaping
    - The greyscale image has a natural representation as a matrix with values between 0 and 255
    - While some algorithms exploit this shape, others require you to reshape matrices into vectors before using them as inputs.

### How to reshape a grayscale image

In [14]:
# Import tensorflow as alias tf
import tensorflow as tf

# Generate grayscale image
gray = tf.random.uniform([2, 2], maxval=255, dtype='int32')
print(f"Original:\n{gray.numpy()}\n")

# Reshape grayscale image
gray = tf.reshape(gray, [2*2, 1])
print(f"Reshaped:\n{gray.numpy()}")

Original:
[[189 102]
 [215 224]]

Reshaped:
[[189]
 [102]
 [215]
 [224]]


### How to reshape a color image

In [15]:
# Import tensorflow as alias tf
import tensorflow as tf

# Generate color image
color = tf.random.uniform([2, 2, 3], maxval=255, dtype='int32')
print(f"Original:\n{color.numpy()}\n")

# Reshape color image
color = tf.reshape(color, [2*2, 3])
print(f"Reshaped:\n{color.numpy()}")

Original:
[[[ 36 112 244]
  [ 25 242 228]]

 [[164  38  77]
  [207  93 126]]]

Reshaped:
[[ 36 112 244]
 [ 25 242 228]
 [164  38  77]
 [207  93 126]]


#### Exercise: Optimizing with gradients

You are given a loss function, $y = x^2$, which you want to minimize. You can do this by computing the slope using the `GradientTape()` operation at different values of `x`. If the slope is positive, you can decrease the loss by lowering `x`. If it is negative, you can decrease it by increasing `x`. This is how gradient descent works.

In [16]:
from tensorflow import GradientTape, multiply, Variable

def compute_gradient(x0):
    # Define x as a variable with an initial value of x0
    x = Variable(x0)
    with GradientTape() as tape:
        tape.watch(x)
        # Define y using the multiply operation
        y = multiply(x, x)
    # Return the gradient of y with respect to x
    return tape.gradient(y, x).numpy()

# Compute and print gradients at x = -1, 1, and 0
print(compute_gradient(-1.0))
print(compute_gradient(1.0))
print(compute_gradient(0.0))

-2.0
2.0
0.0


# 2. Linear models

### Chapter Contents:

1. Input data
2. Loss functions
3. Linear regression
4. Batch training

## Input data

- Numeric data
- Image data
- Text data

### Importing data for use in TensorFlow

- Data can be imported using `tensorflow`
    - Useful for managing complex pipelines
    - Not necessary for this chapter
- Simpler option used in this chapter
    - Import data using `pandas`
    - Convert data to `numpy` array
    - Use in `tensorflow` without modification

### How to import and convert data

- We will focus on data stored in csv format in this chapter
- Pandas also has methods for handling data in other formats
    - E.g. `read_json()`, `read_html()`, `read_excel()`

In [17]:
# Import numpy and pandas
import numpy as np
import pandas as pd

# Load data from csv
housing = pd.read_csv("data/kc_house_data.csv")

# Convert to numpy array
housing = np.array(housing)

### Parameters of read_csv()

|Parameter|Description|Default|
|---------|-----------|-------|
|`filepath_or_buffer`|Accepts a file path or a URL.|`None`|
|`sep`|Delimiter between columns.|`,`|
|`delim_whitespace`|Boolean for whether to delimit whitespace.|`False`|
|`encoding`|Specifies encoding to be used if any.|`None`|

### Setting the data type

In [18]:
## First approach: Using the array method from numpy

# Load KC dataset
housing = pd.read_csv("data/kc_house_data.csv")

# Convert price columns to float32
price = np.array(housing["price"], np.float32)

# Convert waterfront column to Boolean
waterfront = np.array(housing["waterfront"], bool)

print(price)
print(type(price))
print(waterfront)
print(type(waterfront))

[221900. 538000. 180000. ... 402101. 400000. 325000.]
<class 'numpy.ndarray'>
[False False False ... False False False]
<class 'numpy.ndarray'>


In [19]:
## Second approach: Using the cast operation from TensorFlow

# Load KC dataset
housing = pd.read_csv("data/kc_house_data.csv")

# Convert price columns to float32
price = tf.cast(housing["price"], tf.float32)

# Convert waterfront column to Boolean
waterfront = tf.cast(housing["waterfront"], tf.bool)

print(price)
print(type(price))
print(waterfront)
print(type(waterfront))

tf.Tensor([221900. 538000. 180000. ... 402101. 400000. 325000.], shape=(21613,), dtype=float32)
<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor([False False False ... False False False], shape=(21613,), dtype=bool)
<class 'tensorflow.python.framework.ops.EagerTensor'>


## Loss functions

### Introduction to loss functions

- Fundamental `tensorflow` operation
    - Used to train a model
    - Measure of model fit
- Higher value -> worse fit
    - Minimize the loss function

### Common loss functions in TensorFlose

- TensorFlow has operations for common loss functions
    - Mean squared error (MSE)
    - Mean absolute error (MAE)
    - Huber error
- Loss functions are accessible from `tf.keras.losses()`
    - `tf.keras.losses.mse()`
    - `tf.keras.losses.mae()`
    - `tf.keras.losses.Huber()`

### Why do we care about loss functions?

![loss_functions.png](attachment:loss_functions.png)

- **MSE**
    - Strongly penalizes outliers
    - High (gradient) sensitivity near minimum
- **MAE**
    - Scales linearly with size of error
    - Low sensitivity near minimum
- **Huber**
    - Similar to MSE near minimum
    - Similar to MAE away from minimum


- For greater sensitivity near the minimum $\rightarrow$ MSE or Huber loss
- To minimze the impact of outliers $\rightarrow$ MAE or Huber loss

### Defining a loss function

```python
# Import TensorFlow under standard alias
import tensorflow as tf

# Compute the MSE loss
loss = tf.keras.losses.mse(targets, predicitons)

# Define a linear regression model
def linear_regression(intercept, slope = slope, features = features):
    return intercept + features*slope

# Define a loss function to compute the MSE
def loss_function(intercept, slope, targets = targets, features = features):
    # Compute the predictions for a linear model
    predictions = linear_regression(intercept, slope)
    
    # Return the loss
    return tf.keras.losses.mse(targets, predictions)

# Compute the loss for test data inputs
loss_function(intercept, slope, test_targets, test_features)

# Compute the loss for default data inputs
loss_function(intercept, slope)
```

#### Exercise: Modifying the loss function

In [20]:
from tensorflow import Variable, float32, keras

features = Variable([1, 2, 3, 4, 5], dtype=float32)
targets = Variable([2, 4, 6, 8, 10], dtype=float32)

In [21]:
# Initialize a variable named scalar
scalar = Variable(1.0, dtype=float32)

# Define the model
def model(scalar, features = features):
    return scalar * features

# Define a loss function
def loss_function(scalar, features = features, targets = targets):
    # Compute the predicted values
    predictions = model(scalar, features)
    
    # Return the mean absolute error loss
    return keras.losses.mae(targets, predictions)

# Evaluate the loss function and print the loss
print(loss_function(scalar).numpy())

3.0


## Linear regression

### The linear regression model

- A linear regression model assumes a linear relationship:
    - $\textit{price = intercept + size * slope + error}$
- This is an example of a univariate regression.
    - There is only one feature, `size`.
- Multiple regression models have more than one feature.
    - E.g. `size` and `location`

### Linear regression in TensorFlow

In [22]:
# Define the targets and features
price = np.array(housing["price"], np.float32)
size = np.array(housing["sqft_living"], np.float32)

# Define the intercept and slope
intercept = tf.Variable(0.1, np.float32)
slope = tf.Variable(0.1, np.float32)

# Define a linear regression model
def linear_regression(intercept, slope, features = size):
    return intercept + features*slope

# Compute the predicted values and loss
def loss_function(intercept, slope, targets = price, features = size):
    predictions = linear_regression(intercept, slope)
    return tf.keras.losses.mse(targets, predictions)

# Define an optimization operation
opt = tf.keras.optimizers.Adam()

# Minimize the loss function and print the loss
for j in range(1000):
    opt.minimize(lambda: loss_function(intercept, slope), var_list=[intercept, slope])
    print(loss_function(intercept, slope))

tf.Tensor(426196570000.0, shape=(), dtype=float32)
tf.Tensor(426193880000.0, shape=(), dtype=float32)
tf.Tensor(426191100000.0, shape=(), dtype=float32)
tf.Tensor(426188370000.0, shape=(), dtype=float32)
tf.Tensor(426185700000.0, shape=(), dtype=float32)
tf.Tensor(426182930000.0, shape=(), dtype=float32)
tf.Tensor(426180200000.0, shape=(), dtype=float32)
tf.Tensor(426177500000.0, shape=(), dtype=float32)
tf.Tensor(426174800000.0, shape=(), dtype=float32)
tf.Tensor(426172060000.0, shape=(), dtype=float32)
tf.Tensor(426169340000.0, shape=(), dtype=float32)
tf.Tensor(426166650000.0, shape=(), dtype=float32)
tf.Tensor(426163930000.0, shape=(), dtype=float32)
tf.Tensor(426161180000.0, shape=(), dtype=float32)
tf.Tensor(426158520000.0, shape=(), dtype=float32)
tf.Tensor(426155740000.0, shape=(), dtype=float32)
tf.Tensor(426153050000.0, shape=(), dtype=float32)
tf.Tensor(426150300000.0, shape=(), dtype=float32)
tf.Tensor(426147580000.0, shape=(), dtype=float32)
tf.Tensor(426144900000.0, shape

tf.Tensor(425756100000.0, shape=(), dtype=float32)
tf.Tensor(425753380000.0, shape=(), dtype=float32)
tf.Tensor(425750630000.0, shape=(), dtype=float32)
tf.Tensor(425747940000.0, shape=(), dtype=float32)
tf.Tensor(425745250000.0, shape=(), dtype=float32)
tf.Tensor(425742470000.0, shape=(), dtype=float32)
tf.Tensor(425739800000.0, shape=(), dtype=float32)
tf.Tensor(425737130000.0, shape=(), dtype=float32)
tf.Tensor(425734340000.0, shape=(), dtype=float32)
tf.Tensor(425731650000.0, shape=(), dtype=float32)
tf.Tensor(425728930000.0, shape=(), dtype=float32)
tf.Tensor(425726180000.0, shape=(), dtype=float32)
tf.Tensor(425723500000.0, shape=(), dtype=float32)
tf.Tensor(425720800000.0, shape=(), dtype=float32)
tf.Tensor(425718050000.0, shape=(), dtype=float32)
tf.Tensor(425715300000.0, shape=(), dtype=float32)
tf.Tensor(425712580000.0, shape=(), dtype=float32)
tf.Tensor(425709900000.0, shape=(), dtype=float32)
tf.Tensor(425707180000.0, shape=(), dtype=float32)
tf.Tensor(425704420000.0, shape

tf.Tensor(425318700000.0, shape=(), dtype=float32)
tf.Tensor(425316020000.0, shape=(), dtype=float32)
tf.Tensor(425313300000.0, shape=(), dtype=float32)
tf.Tensor(425310550000.0, shape=(), dtype=float32)
tf.Tensor(425307870000.0, shape=(), dtype=float32)
tf.Tensor(425305150000.0, shape=(), dtype=float32)
tf.Tensor(425302430000.0, shape=(), dtype=float32)
tf.Tensor(425299740000.0, shape=(), dtype=float32)
tf.Tensor(425297050000.0, shape=(), dtype=float32)
tf.Tensor(425294330000.0, shape=(), dtype=float32)
tf.Tensor(425291550000.0, shape=(), dtype=float32)
tf.Tensor(425288860000.0, shape=(), dtype=float32)
tf.Tensor(425286170000.0, shape=(), dtype=float32)
tf.Tensor(425283450000.0, shape=(), dtype=float32)
tf.Tensor(425280770000.0, shape=(), dtype=float32)
tf.Tensor(425278000000.0, shape=(), dtype=float32)
tf.Tensor(425275300000.0, shape=(), dtype=float32)
tf.Tensor(425272600000.0, shape=(), dtype=float32)
tf.Tensor(425269900000.0, shape=(), dtype=float32)
tf.Tensor(425267130000.0, shape

tf.Tensor(424876340000.0, shape=(), dtype=float32)
tf.Tensor(424873560000.0, shape=(), dtype=float32)
tf.Tensor(424870900000.0, shape=(), dtype=float32)
tf.Tensor(424868220000.0, shape=(), dtype=float32)
tf.Tensor(424865530000.0, shape=(), dtype=float32)
tf.Tensor(424862740000.0, shape=(), dtype=float32)
tf.Tensor(424860060000.0, shape=(), dtype=float32)
tf.Tensor(424857370000.0, shape=(), dtype=float32)
tf.Tensor(424854700000.0, shape=(), dtype=float32)
tf.Tensor(424851900000.0, shape=(), dtype=float32)
tf.Tensor(424849240000.0, shape=(), dtype=float32)
tf.Tensor(424846500000.0, shape=(), dtype=float32)
tf.Tensor(424843770000.0, shape=(), dtype=float32)
tf.Tensor(424841100000.0, shape=(), dtype=float32)
tf.Tensor(424838370000.0, shape=(), dtype=float32)
tf.Tensor(424835680000.0, shape=(), dtype=float32)
tf.Tensor(424833000000.0, shape=(), dtype=float32)
tf.Tensor(424830240000.0, shape=(), dtype=float32)
tf.Tensor(424827520000.0, shape=(), dtype=float32)
tf.Tensor(424824800000.0, shape

tf.Tensor(424428930000.0, shape=(), dtype=float32)
tf.Tensor(424426240000.0, shape=(), dtype=float32)
tf.Tensor(424423520000.0, shape=(), dtype=float32)
tf.Tensor(424420800000.0, shape=(), dtype=float32)
tf.Tensor(424418100000.0, shape=(), dtype=float32)
tf.Tensor(424415360000.0, shape=(), dtype=float32)
tf.Tensor(424412680000.0, shape=(), dtype=float32)
tf.Tensor(424409960000.0, shape=(), dtype=float32)
tf.Tensor(424407270000.0, shape=(), dtype=float32)
tf.Tensor(424404550000.0, shape=(), dtype=float32)
tf.Tensor(424401800000.0, shape=(), dtype=float32)
tf.Tensor(424399100000.0, shape=(), dtype=float32)
tf.Tensor(424396400000.0, shape=(), dtype=float32)
tf.Tensor(424393670000.0, shape=(), dtype=float32)
tf.Tensor(424391000000.0, shape=(), dtype=float32)
tf.Tensor(424388300000.0, shape=(), dtype=float32)
tf.Tensor(424385500000.0, shape=(), dtype=float32)
tf.Tensor(424382820000.0, shape=(), dtype=float32)
tf.Tensor(424380140000.0, shape=(), dtype=float32)
tf.Tensor(424377450000.0, shape

tf.Tensor(423938600000.0, shape=(), dtype=float32)
tf.Tensor(423935900000.0, shape=(), dtype=float32)
tf.Tensor(423933180000.0, shape=(), dtype=float32)
tf.Tensor(423930430000.0, shape=(), dtype=float32)
tf.Tensor(423927780000.0, shape=(), dtype=float32)
tf.Tensor(423925020000.0, shape=(), dtype=float32)
tf.Tensor(423922300000.0, shape=(), dtype=float32)
tf.Tensor(423919600000.0, shape=(), dtype=float32)
tf.Tensor(423916860000.0, shape=(), dtype=float32)
tf.Tensor(423914200000.0, shape=(), dtype=float32)
tf.Tensor(423911460000.0, shape=(), dtype=float32)
tf.Tensor(423908770000.0, shape=(), dtype=float32)
tf.Tensor(423906080000.0, shape=(), dtype=float32)
tf.Tensor(423903360000.0, shape=(), dtype=float32)
tf.Tensor(423900640000.0, shape=(), dtype=float32)
tf.Tensor(423897960000.0, shape=(), dtype=float32)
tf.Tensor(423895270000.0, shape=(), dtype=float32)
tf.Tensor(423892550000.0, shape=(), dtype=float32)
tf.Tensor(423889800000.0, shape=(), dtype=float32)
tf.Tensor(423887100000.0, shape

In [23]:
# Print the trained parameters
print(intercept.numpy(), slope.numpy())

1.0991763 1.0991884


## Batch training

### What is batch training?

- Let's say the dataset is much larger and you want to perform the training on a GPU, which has only small amount of memory.
- Since you can't fit the entire dataset in memory, you will instead divide it into batches and train on those batches sequentially.
- A single pass over all of the batches is called an **epoch** and the process itself is called **batch training**.
- It will be quite useful when you work with large image datasets.
- Beyond alleviating memory constraints, batch training will also allow you to update model weights and optimize parameters after each batch, rather than at the end of the epoch.

### The chunksize parameter

- `pd.read_csv()` allows us to load data in batches
    - Avoid loading entire dataset
    - `chunksize` parameter provides batch size

In [24]:
import pandas as pd
import numpy as np

# Load data in batches
for batch in pd.read_csv("data/kc_house_data.csv", chunksize=100):
    # Extract price column
    price = np.array(batch["price"], np.float32)
    
    # Extract size column
    size = np.array(batch["sqft_lot"], np.float32)

### Training a linear model in batches

In [25]:
# Import tensorflow, pandas, and numpy
import tensorflow as tf
import pandas as pd
import numpy as np

# Define trainable variables
intercept = tf.Variable(0.1, tf.float32)
slope = tf.Variable(0.1, tf.float32)

# Define the model
def linear_regression(intercept, slope, features):
    return intercept + features*slope

# Compute predicted values and return loss function
def loss_function(intercept, slope, targets, features):
    predictions = linear_regression(intercept, slope, features)
    return tf.keras.losses.mse(targets, predictions)

# Define optimization operation
opt = tf.keras.optimizers.Adam()

# Load the data in batches from pandas
for batch in pd.read_csv("data/kc_house_data.csv", chunksize=100):
    # Extract the target and feature columns
    price_batch = np.array(batch["price"], np.float32)
    size_batch = np.array(batch["sqft_lot"], np.float32)
    
    # Minimize the loss function
    opt.minimize(lambda: loss_function(intercept, slope, price_batch, size_batch), var_list=[intercept,slope])
    
# Print parameter values
print(intercept.numpy(), slope.numpy())

0.31781912 0.2983102


### Full sample versus batch training
|**Full Sample**|**Batch Training**|
|---------------|------------------|
|1. One update per epoch|1. Multiple updates per epoch|
|2. Accepts dataset without modification|2. Requires division of dataset|
|3. Limited by memory|3. No limit on dataset size|

# 3. Neural Networks

### Chapter Contents:

1. Dense layers
2. Activation functions
3. Optimizers
4. Training a network in Tensorflow

## Dense layers

### What is a neural network?

- We get from a linear regression to a neural network by adding a **hidden layer**.
- We also typically pass the hidden layer output to an **activation function**.
- Finally, we sum together the outputs of the hidden layers to compute our prediction.
- This entire process of generation a prediction is referred to as **forward propagation**.

![neural_network.png](attachment:neural_network.png)

- In this chapter, we will construct neural networks with three types of layers:
    - An **input layer**,
    - some number of **hidden layers**,
    - and an **output layer**.   
- The input layer consists of our features.
- The output layer contains our prediction.
- Each hidden layer takes inputs from the previous layer, applies numerical weights to them, sums them together, and then applies an activation function.


- In the neural network graph, we have applied a particular type of hidden layer called a **dense layer**.
    - A **dense layer** applies weights to all nodes from the previous layer.

### A simple dense layer

In [26]:
import tensorflow as tf

# Define inputs (features)
inputs = tf.constant([[1, 35]], dtype=tf.float32)

# Define weights
weights = tf.Variable([[-0.05], [-0.01]])

# Define the bias
bias = tf.Variable([0.5])

# Multiply inputs (features) by the weights
product = tf.matmul(inputs, weights)

# Define dense layer
dense = tf.keras.activations.sigmoid(product+bias)

### Define a complete model

```python
import tensorflow as tf

# Define input (features) layer
inputs = tf.constant(data, tf.float32)

# Define first dense layer
dense1 = tf.keras.layers.Dense(10, activation='sigmoid')(inputs)

# Define second dense layer
dense2 = tf.keras.layers.Dense(5, activation='sigmoid')(dense1)

# Define output(predictions) layer
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense2)
```

### High-level versus low-level approach

- **High-level approach**
    - High-level API operations 
```python
dense = keras.layers.dense(10, activation='sigmoid')
```

- **Low-level approach**
    - Linear-algebraic operations
```python
prod = matmul(inputs, weights)
dense = keras.activations.sigmoid(prod)
```

#### Exercise: The linear algebra of dense layers

In [27]:
from tensorflow import Variable, ones, matmul, keras

borrower_features = np.array([[ 2.,  2., 43.]], dtype=np.float32)

In [28]:
# Initialize bias1
bias1 = Variable(1.0)

# Initialize weights1 as 3x2 variable of ones
weights1 = Variable(ones((3, 2)))

# Perform matrix multiplication of borrower_features and weights1
product1 = matmul(borrower_features, weights1)

# Apply sigmoid activation function to product1 + bias1
dense1 = keras.activations.sigmoid(product1 + bias1)

# Print shape of dense1
print("\n dense1's output shape: {}".format(dense1.shape))

# Initialize bias2 and weights2
bias2 = Variable(1.0)
weights2 = Variable(ones((2, 1)))

# Perform matrix multiplication of dense1 and weights2
product2 = matmul(dense1, weights2)

# Apply activation to product2 + bias2 and print the prediction
prediction = keras.activations.sigmoid(product2 + bias2)
print('\n prediction: {}'.format(prediction.numpy()[0,0]))
print('\n actual: 1')


 dense1's output shape: (1, 2)

 prediction: 0.9525741338729858

 actual: 1


#### Exercise: The low-level approach with multiple examples

In [29]:
from tensorflow import matmul, keras

weights1 = Variable([[-0.6 ,  0.6 ],
                     [ 0.8 , -0.3 ],
                     [-0.09, -0.08]])

bias1 = Variable([0.1])

borrower_features = constant([[ 3.,  3., 23.],
                              [ 2.,  1., 24.],
                              [ 1.,  1., 49.],
                              [ 1.,  1., 49.],
                              [ 2.,  1., 29.]])

In [30]:
# Compute the product of borrower_features and weights1
products1 = matmul(borrower_features, weights1)

# Apply a sigmoid activation function to products1 + bias1
dense1 = keras.activations.sigmoid(products1 + bias1)

# Print the shapes of borrower_features, weights1, bias1, and dense1
print('\n shape of borrower_features: ', borrower_features.shape)
print('\n shape of weights1: ', weights1.shape)
print('\n shape of bias1: ', bias1.shape)
print('\n shape of dense1: ', dense1.shape)


 shape of borrower_features:  (5, 3)

 shape of weights1:  (3, 2)

 shape of bias1:  (1,)

 shape of dense1:  (5, 2)


#### Exercise: Using the dense layer operation

In [31]:
# Define the first dense layer
dense1 = keras.layers.Dense(7, activation='sigmoid')(borrower_features)

# Define a dense layer with 3 output nodes
dense2 = keras.layers.Dense(3, activation='sigmoid')(dense1)

# Define a dense layer with 1 output node
predictions = keras.layers.Dense(1, activation='sigmoid')(dense2)

# Print the shapes of dense1, dense2, and predictions
print('\n shape of dense1: ', dense1.shape)
print('\n shape of dense2: ', dense2.shape)
print('\n shape of predictions: ', predictions.shape)


 shape of dense1:  (5, 7)

 shape of dense2:  (5, 3)

 shape of predictions:  (5, 1)


## Activation functions

### What is an activation function?

- Components of a typical hidden layer
    - **Linear**: Matrix multiplication
    - **Nonlinear**: Activation function

### A simple example

In [32]:
import numpy as np
import tensorflow as tf

# Define example borrower features
young, old = 0.3, 0.6
low_bill, high_bill = 0.1, 0.5

# Apply matrix multiplication step for all feature combinations
young_high = 1.0*young + 2.0*high_bill
young_low = 1.0*young + 2.0*low_bill
old_high = 1.0*old + 2.0*high_bill
old_low = 1.0*old + 2.0*low_bill

# Difference in default predictions for young
print(young_high - young_low)

# Difference in default predictions for old
print(old_high - old_low)


# Difference in default predictions for young
print(tf.keras.activations.sigmoid(young_high).numpy() - tf.keras.activations.sigmoid(young_low).numpy())

# Difference in default predictions for old
print(tf.keras.activations.sigmoid(old_high).numpy() - tf.keras.activations.sigmoid(old_low).numpy())

0.8
0.8
0.16337562
0.14204395


### The sigmoid activation function

- **Sigmoid activation function**
    - Binary classification
    - Low-level: `tf.keras.activations.sigmoid()`
    - High-level: `sigmoid`
    
![sigmoid_activation_function.png](attachment:sigmoid_activation_function.png)

### The relu activation function

- **ReLu activation function**
    - Hidden layers
    - Low-level: `tf.keras.activations.relu()`
    - High-level: `relu`
    
![relu_activation_function.png](attachment:relu_activation_function.png)

### The softmax activation function

- **Softmax activation function**
    - Output layer (>2 classes)
    - Low-level: `tf.keras.activations.softmax()`
    - High-level: `softmax`

### Activation functions in neural networks

```python
import tensorflow as tf

# Define input layer
inputs = tf.constant(borrower_features, tf.float32)

# Define dense layer 1
dense1 = tf.keras.layers.Dense(16, activation='relu')(inputs)

# Define dense layer 2
dense2 = tf.keras.layers.Dense(8, activation='sigmoid')(dense1)

# Define output layer
outputs = tf.keras.layers.Dense(4, activation='softmax')(dense2)
```

#### Exercise: Binary classification problems

```python
# Construct input layer from features
inputs = constant(bill_amounts, dtype=float32)

# Define first dense layer
dense1 = keras.layers.Dense(3, activation='relu')(inputs)

# Define second dense layer
dense2 = keras.layers.Dense(2, activation='relu')(dense1)

# Define output layer
outputs = keras.layers.Dense(1, activation='sigmoid')(dense2)

# Print error for first five examples
error = default[:5] - outputs.numpy()[:5]
print(error)
```

#### Exercise: Multiclass classification problems

```python
# Construct input layer from borrower features
inputs = constant(borrower_features, dtype=float32)

# Define first dense layer
dense1 = keras.layers.Dense(10, activation='sigmoid')(inputs)

# Define second dense layer
dense2 = keras.layers.Dense(8, activation='relu')(dense1)

# Define output layer
outputs = keras.layers.Dense(6, activation='softmax')(dense2)

# Print first five predictions
print(outputs.numpy()[:5])
```

## Optimizers

- Gradient descent
- Stochastic gradient descent (SGD)
- RMSprop
- Adam

### The gradient descent optimizer
- **Stochastic gradient descent (SGD) optimizer**
    - `tf.keras.optimizers.SGD()`
    - `learning_rate`
- Simple and easy to interpret

### The RMS prop optimizer
- **Root mean squared (RMS) propagation optimizer**
    - Applies different learning rates to each feature
    - `tf.keras.optimizers.RMSprop()`
    - `learning_rate`
    - `momentum`
    - `decay`
- Allows for momentum to both build and decay

### The adam optimizer

- **Adaptive moment (adam) optimizer**
    - `tf.keras.optimizers.Adam()`
    - `learning_rate`
    - `beta1`
- Performs well with default parameter values

### A complete example

```python
import tensorflow as tf

# Define the model function
def model(bias, weights, features = borrower_features):
    product = tf.matmul(features, weights)
    return tf.keras.activations.sigmoid(product+bias)

# Compute the predicted values and loss
def loss_function(bias, weights, targets = default, features = borrower_features):
    predictions = model(bias, weights)
    return tf.keras.losses.binary_crossentropy(targets, predictions)

# Minimize the loss function with RMS propagation
opt = tf.keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.9)
opt.minimize(lambda: loss_function(bias, weights), var_list=[bias, weights])
```

#### Exercise: The dangers of local minima

```python
# Initialize x_1 and x_2
x_1 = Variable(6.0,float32)
x_2 = Variable(0.3,float32)

# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)

for j in range(100):
    # Perform minimization using the loss function and x_1
    opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
    # Perform minimization using the loss function and x_2
    opt.minimize(lambda: loss_function(x_2), var_list=[x_2])

# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
```

#### Exercise: Avoiding local minima

```python
# Initialize x_1 and x_2
x_1 = Variable(0.05,float32)
x_2 = Variable(0.05,float32)

# Define the optimization operation for opt_1 and opt_2
opt_1 = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.99)
opt_2 = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.00)

for j in range(100):
	opt_1.minimize(lambda: loss_function(x_1), var_list=[x_1])
    # Define the minimization operation for opt_2
	opt_2.minimize(lambda: loss_function(x_2), var_list=[x_2])

# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())
```

## Training a network in TensorFlow

### Random initializers

- Often need to initialize thousands of variables
    - `tf.ones()` may perform poorly
    - Tedious and difficult to initialize variables individually
- Alternatively, draw initial values from distribution
    - Normal
    - Uniform
    - Glorot initializer

### Initializing variables in TensorFlow

In [33]:
import tensorflow as tf

# Define 500x500 random variable
weights = tf.Variable(tf.random.normal([500, 500]))

# Define 500x500 truncated random normal variable
weights = tf.Variable(tf.random.truncated_normal([500, 500]))

# Define a dense layer with the default initializer
dense = tf.keras.layers.Dense(32, activation='relu')

# Define a dense layer with the zeros initializer
dense = tf.keras.layers.Dense(32, activation='relu', kernel_initializer='zeros')

### Applying dropout

![dropout.png](attachment:dropout.png)

- A simple solution to the overfitting problem
- An operation that will randomly drop the weights connected to certain nodes in a layer during the training process
- This will force your network to develop more robust rules for classification, since it cannot rely on any particular nodes being passed to an activation function
- This will tend to improve out-of-sample performance

### Implementing dropout in a network

```python
import numpy as np
import tensorflow as tf

# Define input data
inputs = np.array(borrower_features, np.float32)

# Define dense layer 1
dense1 = tf.keras.layers.Dense(32, activation='relu')(inputs)

# Define dense layer 2
dense2 = tf.keras.layers.Dense(16, activation='relu')(dense1)

# Apply dropout operation
dropout1 = tf.keras.layers.Dropout(0.25)(dense2)

# Define output layer
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dropout1)
```

#### Exercise: Initialization in TensorFlow

```python
# Define the layer 1 weights
w1 = Variable(random.normal([23, 7]))

# Initialize the layer 1 bias
b1 = Variable(ones([7]))

# Define the layer 2 weights
w2 = Variable(random.normal([7, 1]))

# Define the layer 2 bias
b2 = Variable([0.0])
```

#### Exercise: Defining the model and loss function

```python
# Define the model
def model(w1, b1, w2, b2, features = borrower_features):
	# Apply relu activation functions to layer 1
	layer1 = keras.activations.relu(matmul(features, w1) + b1)
    # Apply dropout rate of 0.25
	dropout = keras.layers.Dropout(0.25)(layer1)
	return keras.activations.sigmoid(matmul(dropout, w2) + b2)

# Define the loss function
def loss_function(w1, b1, w2, b2, features = borrower_features, targets = default):
	predictions = model(w1, b1, w2, b2)
	# Pass targets and predictions to the cross entropy loss
	return keras.losses.binary_crossentropy(targets, predictions)
```

#### Exercise: Training neural networks with TensorFlow

```python
# Train the model
for j in range(100):
    # Complete the optimizer
	opt.minimize(lambda: loss_function(w1, b1, w2, b2), 
                 var_list=[w1, b1, w2, b2])

# Make predictions with model using test features
model_predictions = model(w1, b1, w2, b2, test_features)

# Construct the confusion matrix
confusion_matrix(test_targets, model_predictions)
```

# 4. High Level APIs

### Chapter Contents:

1. Defining neural networks with Keras
2. Training and validation with Keras
3. Training models with the Estimators API
4. Congratulations!

## Defining neural networks with Keras

- Keras sequential API
- Keras functional API

### Classifying sign language letters

- Throughout this chapter, we'll focus on using Keras to classify four letters from the **Sign Language MNIST dataset**.

### The sequential API

![sequential_api.png](attachment:sequential_api.png)

- Input layer
- Hidden layers
- Output layer
- Ordered in sequence

### Building a sequential model

In [34]:
# Import tensorflow
from tensorflow import keras

# Define a sequential model
model = keras.Sequential()

# Define first hidden layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(28*28,)))

# Define second hidden layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Compile the model
model.compile('adam', loss='categorical_crossentropy')

# Summarize the model
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_5 (Dense)             (None, 16)                12560     
                                                                 
 dense_6 (Dense)             (None, 8)                 136       
                                                                 
 dense_7 (Dense)             (None, 4)                 36        
                                                                 
Total params: 12,732
Trainable params: 12,732
Non-trainable params: 0
_________________________________________________________________
None


### The functional API

![functional_api.png](attachment:functional_api.png)

- If you want to train two models jointly to predict the same target, the functional API is for that.

### Using the functional API

In [35]:
# Import tensorflow
import tensorflow as tf

# Define model 1 input layer shape
model1_inputs = tf.keras.Input(shape=(28*28,))

# Define model 2 input layer shape
model2_inputs = tf.keras.Input(shape=(10,))

# Define layer 1 for model 1
model1_layer1 = tf.keras.layers.Dense(12, activation='relu')(model1_inputs)

# Define layer 2 for model 1
model1_layer2 = tf.keras.layers.Dense(4, activation='softmax')(model1_layer1)

# Define layer 1 for model 2
model2_layer1 = tf.keras.layers.Dense(8, activation='relu')(model2_inputs)

# Define layer 2 for model 2
model2_layer2 = tf.keras.layers.Dense(4, activation='softmax')(model2_layer1)

# Merge model 1 and model 2
merged = tf.keras.layers.add([model1_layer2, model2_layer2])

# Define a functional model
model = tf.keras.Model(inputs=[model1_inputs, model2_inputs], outputs=merged)

# Compile the model
model.compile('adam', loss='categorical_crossentropy')

#### Exercise: The sequential model in Keras

In [36]:
from tensorflow import keras

# Define a Keras sequential model
model = keras.Sequential()

# Define the first dense layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the second dense layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Print the model architecture
print(model.summary())

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_12 (Dense)            (None, 16)                12560     
                                                                 
 dense_13 (Dense)            (None, 8)                 136       
                                                                 
 dense_14 (Dense)            (None, 4)                 36        
                                                                 
Total params: 12,732
Trainable params: 12,732
Non-trainable params: 0
_________________________________________________________________
None


#### Exercise: Compiling a sequential model

In [37]:
from tensorflow import keras

# Define a Keras sequential model
model = keras.Sequential()

# Define the first dense layer
model.add(keras.layers.Dense(16, activation='sigmoid', input_shape=(784,)))

# Apply dropout to the first layer's output
model.add(keras.layers.Dropout(0.25))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Compile the model
model.compile('adam', loss='categorical_crossentropy')

# Print a model summary
print(model.summary())

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_15 (Dense)            (None, 16)                12560     
                                                                 
 dropout (Dropout)           (None, 16)                0         
                                                                 
 dense_16 (Dense)            (None, 4)                 68        
                                                                 
Total params: 12,628
Trainable params: 12,628
Non-trainable params: 0
_________________________________________________________________
None


#### Exercise: Defining a multiple input model

In [38]:
from tensorflow import keras

# Define model 1 input layer shape
m1_inputs = keras.Input(shape=(28*28,))

# Define model 2 input layer shape
m2_inputs = keras.Input(shape=(28*28,))

# For model 1, pass the input layer to layer 1 and layer 1 to layer 2
m1_layer1 = keras.layers.Dense(12, activation='sigmoid')(m1_inputs)
m1_layer2 = keras.layers.Dense(4, activation='softmax')(m1_layer1)

# For model 2, pass the input layer to layer 1 and layer 1 to layer 2
m2_layer1 = keras.layers.Dense(12, activation='relu')(m2_inputs)
m2_layer2 = keras.layers.Dense(4, activation='softmax')(m2_layer1)

# Merge model outputs and define a functional model
merged = keras.layers.add([m1_layer2, m2_layer2])
model = keras.Model(inputs=[m1_inputs, m2_inputs], outputs=merged)

# Print a model summary
print(model.summary())

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_3 (InputLayer)           [(None, 784)]        0           []                               
                                                                                                  
 input_4 (InputLayer)           [(None, 784)]        0           []                               
                                                                                                  
 dense_17 (Dense)               (None, 12)           9420        ['input_3[0][0]']                
                                                                                                  
 dense_19 (Dense)               (None, 12)           9420        ['input_4[0][0]']                
                                                                                            

## Training and validation with Keras

### Overview of training and evaluation

1. Load and clean data
2. Define model
3. Train and validate model
4. Evaluate model

### How to train a model

```python
# Import tensorflow
import tensorflow as tf

# Define a sequential model
model = tf.keras.Sequential()

# Define the hidden layer
model.add(tf.keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the output layer
model.add(tf.keras.layers.Dense(4, activation='softmax'))

# Compile model
model.compile('adam', loss='categorical_crossentropy')

# Train model
model.fit(image_features, image_labels)
```

### The fit() operation

- Required arguments
    - `features`
    - `labels`
- Many optional arguments
    - `batch_size`
    - `epochs`
    - `validation_split`

### Batch size and epochs

- The number of examples in each batch is the **batch size**, which is 32 by default.
- The number of times you train on the full set of batches is called the number of **epochs**.
- Using multiple epochs allows the model to revisit the same batches, but with different model weights and possibly optimizer parameters, since they are updated after each batch.

### Performing validation

- `validation_split`: divides the dataset into two parts, train set and validation set.

```python
# Train model with validation split
model.fit(features, labels, epochs=10, validation_split=0.20)
```

### Changing the metric

```python
# Recompile the model with the accuracy metric
model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model with validation split
model.fit(features, labels, epochs=10, validation_split=0.20)
```

### The evaluation() operation

```python
# Evaluate the test set
model.evaluate(test)
```

#### Exercise: Training with Keras

In [39]:
from tensorflow import keras
sign_language_labels = np.array(pd.get_dummies(pd.read_csv("data/slmnist.csv", header=None)[0]), dtype=np.float32)
sign_language_features = np.array(pd.read_csv("data/slmnist.csv", header=None).drop(0, axis=1), dtype=np.float64)

In [40]:
# Define a sequential model
model = keras.Sequential()

# Define a hidden layer
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Compile the model
model.compile('SGD', loss='categorical_crossentropy')

# Complete the fitting operation
model.fit(sign_language_features, sign_language_labels, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x2800642c9d0>

#### Exercise: Metrics and validation with Keras

In [41]:
# Define sequential model
model = keras.Sequential()

# Define the first layer
model.add(keras.layers.Dense(32, activation='sigmoid', input_shape=(784,)))

# Add activation function to classifier
model.add(keras.layers.Dense(4, activation='softmax'))

# Set the optimizer, loss function, and metrics
model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])

# Add the number of epochs and the validation split
model.fit(sign_language_features, sign_language_labels, epochs=10, validation_split=0.10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x28006316ad0>

#### Exercise: Overfitting detection

In [42]:
# Define sequential model
model = keras.Sequential()

# Define the first layer
model.add(keras.layers.Dense(1024, activation='relu', input_shape=(784,)))

# Add activation function to classifier
model.add(keras.layers.Dense(4, activation='softmax'))

# Finish the model compilation
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), 
              loss='categorical_crossentropy', metrics=['accuracy'])

# Complete the model fit operation
model.fit(sign_language_features, sign_language_labels, epochs=50, validation_split=0.50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x28006d03a90>

#### Exercise: Evaluating models

```python
# Evaluate the small model using the train data
small_train = small_model.evaluate(train_features, train_labels)

# Evaluate the small model using the test data
small_test = small_model.evaluate(test_features, test_labels)

# Evaluate the large model using the train data
large_train = large_model.evaluate(train_features, train_labels)

# Evaluate the large model using the test data
large_test = large_model.evaluate(test_features, test_labels)

# Print losses
print('\n Small - Train: {}, Test: {}'.format(small_train, small_test))
print('Large - Train: {}, Test: {}'.format(large_train, large_test))
```

## Training models with the Estimators API

### What is the Estimators API?

![tensorflow_apis.png](attachment:tensorflow_apis.png)

- High level submodule
- Less flexible
- Enforces best practices
- Faster deployment
- Many premade models

### Model specification and training

1. Define feature columns
2. Load and transform data
3. Define an estimator
4. Apply train operation

### Define feature columns

```python
# Import tensorflow under its standard alias
import tensorflow as tf

# Deifne a numeric feature column
size = tf.feature_column.numeric_column("size")

# Define a categorical feature column
rooms = tf.feature_column.categorical_column_with_vocabulary_list("rooms", ["1", "2", "3", "4", "5"])

# Create feature column list
features_list = [size, rooms]

# Define a matrix feature column
features_list = [tf.feature_column.numeric_column('image', shape=(784,))]
```

### Loading and transforming data

```python
# Define input data function
def input_fn():
    # Define feature dictionary
    features = {"size": [1340, 1690, 2720], "rooms": [1, 3, 4]}
    # Define labels
    labels = [221900, 538000, 180000]
    return features, labels
```

### Define and train a regression estimator

```python
# Define a deep neural network regression
model0 = tf.estimator.DNNRegressor(feature_columns=feature_list, hidden_units=[10, 6, 6, 3])

# Train the regression model
model0.train(input_fn, steps=20)
```

### Define and train a deep neural network

```python
# Define a deep neural network classifier
model1 = tf.estimator.DNNClassifier(feature_columns=feature_list, hidden_units=[32, 16, 8], n_classes=4)

# Train the classifier
model1.train(input_fn, steps=20)
```

#### Exercise: Preparing to train with Estimators

In [43]:
import numpy as np
import pandas as pd
from tensorflow import feature_column, estimator

housing = pd.read_csv("data/kc_house_data.csv")

In [44]:
# Define feature columns for bedrooms and bathrooms
bedrooms = feature_column.numeric_column("bedrooms")
bathrooms = feature_column.numeric_column("bathrooms")

# Define the list of feature columns
feature_list = [bedrooms, bathrooms]

def input_fn():
    # Define the labels
    labels = np.array(housing['price'])
    # Define the features
    features = {'bedrooms':np.array(housing['bedrooms']), 
                'bathrooms':np.array(housing['bathrooms'])}
    return features, labels

#### Exercise: Defining Estimators

In [45]:
# Define the model (DNNRegressor) and set the number of steps
model = estimator.DNNRegressor(feature_columns=feature_list, hidden_units=[2,2])
model.train(input_fn, steps=1)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\MELIHC~1\\AppData\\Local\\Temp\\tmpozp8n67a', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Use Variable.rea

<tensorflow_estimator.python.estimator.canned.dnn.DNNRegressorV2 at 0x280061bb8e0>

In [46]:
# Define the model (LinearRegressor) and set the number of steps
model = estimator.LinearRegressor(feature_columns=feature_list)
model.train(input_fn, steps=2)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'C:\\Users\\MELIHC~1\\AppData\\Local\\Temp\\tmp6yal0t_o', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
INFO:tensorflow:Calling model_fn.


  self.bias = self.add_variable(


INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into C:\Users\MELIHC~1\AppData\Local\Temp\tmp6yal0t_o\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 426471400000.0, step = 0
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 2...
INFO:tensorflow:Saving checkpoints for 2 into C:\Users\MELIHC~1\AppData\Local\Temp\tmp6yal0t_o\model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 2...
INFO:tensorflow:Loss for final step: 426469820000.0.


<tensorflow_estimator.python.estimator.canned.linear.LinearRegressorV2 at 0x280062fd030>

## Congratulations!

### What you learned

- **Chapter 1**
    - Low-level, basic, and advanced operations
    - Graph-based computation
    - Gradient computation and optimization
- **Chapter 2**
    - Data loading and transformation
    - Predefined and custom loss functions
    - Linear models and batch training
- **Chapter 3**
    - Dense neural network layers
    - Activation functions
    - Optimization algorithms
    - Training neural networks
- **Chapter 4**
    - Neural networks in Keras
    - Training and validation
    - The Estimators API

### TensorFlow extensions

- **TensorFlow Hub**
    - Pretrained models
    - Transfer learning
- **TensorFlow Probability**
    - More statistical distributions
    - Trainable distributions
    - Extended set of optimizers

### TensorFlow 2.0

- TensorFlow 2.0
    - `eager_execution()`
    - Tighter `keras` integration
    - `Estimators`
    - `function()`