# TensorFlow

* Open source library for graph-based numerical computation (??)
* Low and high level APIs
  * Addition, multiplication, differentiation
  * Machine learning models

References:

* [TensorFlow Website](https://www.tensorflow.org/)
* [Introduction to TensorFlow in Python](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python) // many of the examples in here are from this course

In [293]:
import tensorflow as tf
import numpy as np
import pandas as pd

## Defining tensors

A tensor is a genrealization of vectors and matrices. Like a collection of numbers arranged in a particular shape. Imagine a loaf of bread, cut into slices, and the slices are cut into 9 pieces. A piece of the slice is a 0 dimensional vector. A row or column of pieces is a 1 dimensional vector. A whole slice is a 2 dimensional vector. The whole loaf is a 3 dimensional vector.

### 0d

Here is a "scalar" or "rank-0" tensor . A scalar contains a single value, and no "axes".

In [294]:
print(tf.constant(5).numpy())

5


### 1d

A "vector" or "rank-1" tensor is like a list of values. A vector has 1-axis:

In [295]:
print(tf.constant([2.0, 3.0, 4.0]).numpy())

[2. 3. 4.]


In [296]:
print(tf.constant([5]).numpy())

[5]


### 2d

A "matrix" or "rank-2" tensor has 2-axes:

In [297]:
print(tf.constant([[1, 2], [3, 4], [5, 6]], dtype=tf.float16).numpy())

[[1. 2.]
 [3. 4.]
 [5. 6.]]


### 3d

In [298]:
d3 = tf.ones((2, 2, 2))
print(d3.numpy())

[[[1. 1.]
  [1. 1.]]

 [[1. 1.]
  [1. 1.]]]


## Constants

* A constant is the simplest category of tensor
  * It's not trainable (??)
  * Can have any dimension
* `shape` is `[rows, columns]`

In [299]:
print(tf.constant(3, shape=[2, 3]).numpy())

[[3 3 3]
 [3 3 3]]


In [300]:
print(tf.constant([1, 2, 3, 4], shape=[2, 2]).numpy())

[[1 2]
 [3 4]]


### Zeros

In [301]:
print(tf.zeros([2, 2]).numpy())

[[0. 0.]
 [0. 0.]]


### Zeros Like

In [302]:
a = tf.constant(4, shape=[2, 3]).numpy()
print(tf.zeros_like(a).numpy())

[[0 0 0]
 [0 0 0]]


### Ones

In [303]:
# Lists work:
print(tf.ones([2, 2]).numpy())

# So do tuples:
print()
print(tf.ones((2, 2)).numpy())

[[1. 1.]
 [1. 1.]]

[[1. 1.]
 [1. 1.]]


In [304]:
print(tf.ones((2, 2)).numpy())

[[1. 1.]
 [1. 1.]]


### Ones Like

In [305]:
a = tf.constant(4, shape=[2, 3]).numpy()
print(tf.ones_like(a).numpy())

[[1 1 1]
 [1 1 1]]


### Fill

In [306]:
print(tf.fill([3, 3], 7).numpy())

[[7 7 7]
 [7 7 7]
 [7 7 7]]


### Creating a constnat from a numpy array

In [307]:
arr = np.array([5, 6, 7])
b = tf.constant(arr)
print(b.dtype)
print(b.numpy())
print(b.shape)

<dtype: 'int64'>
[5 6 7]
(3,)


## Variables

* Unlike a constant, a variable's value can be changed
* Data type and shape are fixed

In [308]:
# dtype could also be like tf.float32
a0 = tf.Variable([1, 2, 3], dtype=tf.int16)
print("Varible:", a0.numpy())

b = tf.constant(2, tf.int16)
print("Constant:", b.numpy())

c1 = a0 * b
print("Multiplication method 1:", c1.numpy())
print("Multiplication method 2:", tf.multiply(a0, b).numpy())

# "Note that tensorflow 2 allows you to use data as either a numpy array 
# or a tensorflow constant object. Using a constant will ensure that any 
# operations performed with that object are done in tensorflow."
print("Multiplication method 3:", tf.multiply([1, 2, 3], [2]).numpy())
print("Multiplication method 4:", tf.multiply(np.array([1, 2, 3]), np.array([2])).numpy())

Varible: [1 2 3]
Constant: 2
Multiplication method 1: [2 4 6]
Multiplication method 2: [2 4 6]
Multiplication method 3: [2 4 6]
Multiplication method 4: [2 4 6]


## Operations

### Element wise operations

In [309]:
a = tf.constant([1])
b = tf.constant([2])
print(tf.add(a, b).numpy())

[3]


In [310]:
a = tf.constant([1, 5])
b = tf.constant([3, 10])
print(tf.add(a, b).numpy())

[ 4 15]


In [311]:
a = tf.constant([1, 5])
b = tf.constant([3, 10])
print(tf.multiply(a, b).numpy())

[ 3 50]


In [312]:
a = tf.constant([1, 2, 3, 4], shape=[2, 2])
b = tf.constant([5, 6, 7, 8], shape=[2, 2])

print("a:\n", a.numpy())
print("\nb:\n", b.numpy())
print("\na + b:\n", tf.add(a, b).numpy())
print("\na + b:\n", (a + b).numpy())

a:
 [[1 2]
 [3 4]]

b:
 [[5 6]
 [7 8]]

a + b:
 [[ 6  8]
 [10 12]]

a + b:
 [[ 6  8]
 [10 12]]


### Matrix Multiplication

* [Multiplying a Matrix by Another Matrix](https://www.mathsisfun.com/algebra/matrix-multiplying.html)
* To multiple matrix `a` by matrix `b`, `b` must have the same number of rows as `a` has columns since we're taking a dot product

In [313]:
a = tf.constant([1, 2, 3, 4], shape=[2, 2])
b = tf.constant([5, 6, 7, 8], shape=[2, 2])

print("a:\n", a.numpy())
print("\nb:\n", b.numpy())
print("\na * b:\n", tf.matmul(a, b).numpy())
print("\na * b:\n", (a @ b).numpy())

# 1 * 5 + 2 * 7 = 19
# 1 * 6 + 2 * 8 = 22
# 3 * 5 + 4 * 7 = 34
# 3 * 6 + 4 * 8 = 50

a:
 [[1 2]
 [3 4]]

b:
 [[5 6]
 [7 8]]

a * b:
 [[19 22]
 [43 50]]

a * b:
 [[19 22]
 [43 50]]


In [314]:
features = tf.constant([2, 3, 4, 5], shape=[2, 2])
coefficients = tf.constant([1000, 50], shape=[2, 1])
predictions = tf.matmul(features, coefficients)

print("features:\n", features.numpy())
print("\ncoefficients:\n", coefficients.numpy())
print("\npredictions:\n", predictions.numpy())

features:
 [[2 3]
 [4 5]]

coefficients:
 [[1000]
 [  50]]

predictions:
 [[2150]
 [4250]]


### Summing over tensor dimensions

In [315]:
a = tf.ones([2, 3, 4])

# 24 because it's a 2 x 3 x 4 dimensional tensor with each element being 1
tf.reduce_sum(a).numpy()

24.0

In [316]:
# We reduce the size of the tensor by summing over one of its dimensions
print("\nDimension 0:\n", tf.reduce_sum(a, 0).numpy()) # A 3x4 tensor
print("\nDimension 1:\n", tf.reduce_sum(a, 1).numpy()) # A 2x4 tensor
print("\nDimension 2:\n", tf.reduce_sum(a, 2).numpy()) # A 2x3 tensor


Dimension 0:
 [[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]

Dimension 1:
 [[3. 3. 3. 3.]
 [3. 3. 3. 3.]]

Dimension 2:
 [[4. 4. 4.]
 [4. 4. 4.]]


In [317]:
a = tf.constant([3, 4, 5, 6], shape=[2, 2])
print("a:\n", a.numpy())

print("\nTotal sum:\n", tf.reduce_sum(a).numpy())

# Reducing along rows (axis=0) means we're squeezing from bottom and top so that
# the separate rows become one row. This gives is the column sums.
print("\nColumn sums:\n", tf.reduce_sum(a, 0).numpy())

# Reducing along columns (axis=1) means we're squeezing from left and right so that
# the separate columns become one column. This gives us the row sums.
print("\nRow sums:\n", tf.reduce_sum(a, 1).numpy())

a:
 [[3 4]
 [5 6]]

Total sum:
 18

Column sums:
 [ 8 10]

Row sums:
 [ 7 11]


## Gradient

Computes the slope of a function at a point.

In [318]:
# We'll find the minumum for y = x ** 2

# Define x
x = tf.Variable(-1.0)

# Define y within an instance of GradientTape
# This allows us to computer the rate of change of y with respect to x
with tf.GradientTape() as tape:
    tape.watch(x)
    y = tf.multiply(x, x)

# Evaluate the gradient of y with respect to x at x = -1
g = tape.gradient(y, x)
print("Slope at {}: {}:".format(x.numpy(), g.numpy()))

# Recall from calc that the derivative for y = x ** 2 is 2x
# So the derivative at x = -1 is 2 * -1 = -2

Slope at -1.0: -2.0:


## Reshape

Reshapes a tensor (like a 10x10 into a 100x1)

In [319]:
a = tf.constant([1, 2, 3, 4], shape=[2, 2])
print("Original 2x2:\n", a.numpy())
print("\nReshaped into 4x1 using list:\n", tf.reshape(a, [4, 1]).numpy())
print("\nRehsaped into a 4x1 using tuple:\n", tf.reshape(a, (4, 1)).numpy())

Original 2x2:
 [[1 2]
 [3 4]]

Reshaped into 4x1 using list:
 [[1]
 [2]
 [3]
 [4]]

Rehsaped into a 4x1 using tuple:
 [[1]
 [2]
 [3]
 [4]]


## Random

Populates a tensor with entries drawn from a probability distribution

### Float between 0 and 1

In [320]:
a = tf.random.uniform([2, 2])
print(a.numpy())

[[0.8147571  0.4983344 ]
 [0.622983   0.22424698]]


### Float between 0 and n

In [321]:
a = tf.random.uniform([2, 2], maxval=10)
print(a.numpy())

[[6.0342765 3.837868 ]
 [5.8309007 3.4020329]]


### Integer between 0 and n

In [322]:
a = tf.random.uniform([2, 2], maxval=10, dtype="int32")
print(a.numpy())

[[5 5]
 [7 1]]


## Casting

### Integer to boolean

In [323]:
series = pd.Series([1, 0, 1, 1])
print(tf.cast(series, tf.bool).numpy())

[ True False  True  True]


### Integer to float

In [324]:
series = pd.Series([23, 45, 67])
print(tf.cast(series, tf.float32).numpy())

[23. 45. 67.]


## Loss Functions

In [325]:
# Note that if you pass lists of integers then TensorFlow will return an integer (12) not 12.5
targets = [10.0, 12.0]
predictions = [14.0, 9.0]
tf.keras.losses.mse(targets, predictions).numpy()

# Sum the squares of predicitons minus targets: (14 - 10)** 2 + (9 - 12) ** 2 = 16 + 9 = 25
# Divide by number of observations = 25 / 2 = 12.5

12.5

From [DataCamp example](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63343?ex=6):

In [326]:
features = tf.constant([1, 2, 3, 4, 5], tf.float32)
targets = tf.constant([2, 4, 6, 8, 10], tf.float32)
scalar = tf.Variable(1.0, tf.float32)

# Define the model
def model(scalar, features = features):
    return scalar * features

# Define a loss function
def loss_function(scalar, features = features, targets = targets):
    # Compute the predicted values
    predictions = model(scalar, features)
    
    # Return the mean absolute error loss
    return tf.keras.losses.mae(targets, predictions)

# Evaluate the loss function and print the loss
print(loss_function(scalar).numpy())

3.0


## Training a Linear Model

Modified example inspired by a [DataCamp exercise](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63343?ex=9):

## One parameter

In [327]:
x = np.array(range(10))
y = 10 + x * 5

# Define a linear regression model
def linear_regression(intercept, slope, features = x):
    return intercept + slope * features

# Set loss_function() to take the variables as arguments
def loss_function(intercept, slope, features, targets):
    # Set the predicted values
    predictions = linear_regression(intercept, slope, features)
    
    # Return the mean squared error loss
    return tf.keras.losses.mse(targets, predictions)

print("Training...")

# Initialize an Adam optimizer
opt = tf.keras.optimizers.Adam(0.5)

# The optimizer will adjust the intercept and slope
intercept = tf.Variable(0, dtype=tf.float32)
slope = tf.Variable(0, dtype=tf.float32)

for i in range(301):
    # Apply minimize, pass the loss function, and supply the variables
    opt.minimize(lambda: loss_function(intercept, slope, x, y), var_list=[intercept, slope])

    if i % 50 == 0:
        current_loss = loss_function(intercept, slope, x, y).numpy()
        print("MSE after", i, "iterations:", current_loss)


print("\nEstimated intercept:", intercept.numpy())
print("Estimated slope: ", slope.numpy())

Training...
MSE after 0 iterations: 1052.125
MSE after 50 iterations: 3.633875
MSE after 100 iterations: 0.40788904
MSE after 150 iterations: 0.03056418
MSE after 200 iterations: 0.0010345107
MSE after 250 iterations: 1.1735907e-05
MSE after 300 iterations: 1.2565943e-08

Estimated intercept: 9.999792
Estimated slope:  5.0000343


## Multiple parameters

In [328]:
x1 = np.arange(0, 10)
x2 = np.arange(30, 40)
actual_params = [10, 5, 25]
y = actual_params[0] + actual_params[1] * x1 + actual_params[2] * x2

print("x1:", x1)
print("x2:", x2)
print("actual params:", actual_params)
print("y:", y)

# Define a linear regression model
def linear_regression(params, feature1, feature2):
    return params[0] + params[1] * feature1 + params[2] * feature2

# Set loss_function() to take the variables as arguments
def loss_function(params, feature1, feature2, targets):
    # Set the predicted values
    predictions = linear_regression(params, feature1, feature2)
    
    # Return the mean squared error loss
    return tf.keras.losses.mse(targets, predictions)

print("\nTraining...")

# Initialize an Adam optimizer
opt = tf.keras.optimizers.Adam(0.5)

# The optimizer will adjust the intercept and slope
params = tf.Variable([0, 0, 0], dtype=tf.float32)

for i in range(2001):
    # Apply minimize, pass the loss function, and supply the variables
    opt.minimize(lambda: loss_function(params, x1, x2, y), var_list=[params])

    if i % 250 == 0:
        current_loss = loss_function(params, x1, x2, y).numpy()
        print("MSE after", i, "iterations:", current_loss)

# It comes up with different params that still work
print("\nEstimated params:", params.numpy())
print("\nPredictions:", linear_regression(params, x1, x2).numpy())

x1: [0 1 2 3 4 5 6 7 8 9]
x2: [30 31 32 33 34 35 36 37 38 39]
actual params: [10, 5, 25]
y: [ 760  790  820  850  880  910  940  970 1000 1030]

Training...
MSE after 0 iterations: 772563.25
MSE after 250 iterations: 762.1769
MSE after 500 iterations: 130.75668
MSE after 750 iterations: 11.037022
MSE after 1000 iterations: 0.4575019
MSE after 1250 iterations: 0.0088784555
MSE after 1500 iterations: 7.480383e-05
MSE after 1750 iterations: 3.0845405e-07
MSE after 2000 iterations: 3.837049e-08

Estimated params: [27.377342   5.5793157 24.420744 ]

Predictions: [ 759.99963  789.99976  819.99976  849.9998   879.9999   909.99994
  940.       970.00006 1000.0001  1030.0002 ]


## Creating a neural network using low level functions

Imagine 3 inputs, 2 hidden layer neurons, and 1 output layer

Inspired by [Introduction to TensorFlow example exercise](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63344?ex=2).

First, we feed the inputs to the hidden layer:

In [329]:
inputs = np.array([[2, 10, 43]], np.float32)
print("inputs:\n", inputs)

# We figure out the total net input to each hidden layer neuron, squash the 
# total net input using an activation function (here we use the sigmoid function), 
# then repeat the process with the output layer neurons.

# Initialize bias for the hidden layer
hidden_layer_bias = tf.Variable(1.0)
print("\nhidden_layer_bias:", hidden_layer_bias.numpy())

# These are the weights from the inputs to the hidden layer neurons
# It's 3 rows by 2 columns because there are 3 inputs feeding into 2 neurons
hidden_layer_weights = tf.Variable(tf.ones((3, 2)))
print("\nhidden_layer_weights:\n", hidden_layer_weights.numpy())

# Perform matrix multiplication of the features and weights1
hidden_layer_inputs_times_weights = tf.matmul(inputs, hidden_layer_weights)
# 1 * 2 + 1 * 10 + 1 * 43 = 55
print("\nhidden_layer_inputs_times_weights:\n", hidden_layer_inputs_times_weights.numpy())

# Apply sigmoid activation function to product1 + bias1
# Because the value is greater than 1, it gets squashed to 1
hidden_layer_outputs = tf.keras.activations.sigmoid(hidden_layer_inputs_times_weights + hidden_layer_bias)
print("\ndense1:\n", hidden_layer_outputs.numpy())

inputs:
 [[ 2. 10. 43.]]

hidden_layer_bias: 1.0

hidden_layer_weights:
 [[1. 1.]
 [1. 1.]
 [1. 1.]]

hidden_layer_inputs_times_weights:
 [[55. 55.]]

dense1:
 [[1. 1.]]


Then we feed the outputs of the hidden layer to the output layer:

In [330]:
output_layer_bias = tf.Variable(1.0)
output_layer_weights = tf.Variable(tf.ones((2, 1)))
output_layer_inputs_times_weights = tf.matmul(hidden_layer_outputs, output_layer_weights)
prediction = tf.keras.activations.sigmoid(output_layer_inputs_times_weights + output_layer_bias)

# "Our model produces predicted values in the interval between 0 and 1. For the example we considered, 
# the actual value was 1 and the predicted value was a probability between 0 and 1. This, of course, 
# is not meaningful, since we have not yet trained our model's parameters."
print('prediction: {:.3f}'.format(prediction.numpy()[0,0]))
print('actual: 1')

prediction: 0.953
actual: 1


## Using the high level Keras API

Here we feed the inputs into the same 2-neuron hidden layer, then the 1-neuron output layer.
We could easily add other hidden layers as well, feeding in the output of prior hidden layers.

In [331]:
inputs = np.array([[2, 10, 43]], np.float32)
# Hidden layers typically use Rectified Linear Unit (ReLu) Activation activation function which is max(0, value)
hidden_layer = tf.keras.layers.Dense(2, activation="sigmoid")(inputs)
# If this were a classification problem with 2+ output classes, we would use softmax here
# If we did, outputs could be interpreted as class probabilities in multiclass classification problems
# (each rows values would output to 1 aka 100%)
prediction = tf.keras.layers.Dense(1, activation="sigmoid")(hidden_layer)

# The output changes each time because we're using an untrained model with randomly initialized parameters
print('prediction: {:.3f}'.format(prediction.numpy()[0,0]))

prediction: 0.520


## More low level training

Also a via [DataCamp exercise](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63344?ex=14):

In [332]:
%%script echo

# Define the model
def model(w1, b1, w2, b2, features = borrower_features):
    # Apply relu activation functions to layer 1
    layer1 = keras.activations.relu(matmul(features, w1) + b1)
    # Apply dropout
    dropout = keras.layers.Dropout(0.25)(layer1)
    return keras.activations.sigmoid(matmul(dropout, w2) + b2)

# Define the loss function
def loss_function(w1, b1, w2, b2, features = borrower_features, targets = default):
    predictions = model(w1, b1, w2, b2)
    # Pass targets and predictions to the cross entropy loss
    return keras.losses.binary_crossentropy(targets, predictions)

Train the model
for j in range(100):
    # Complete the optimizer
    opt.minimize(lambda: loss_function(w1, b1, w2, b2), 
                 var_list=[w1, b1, w2, b2])

# Make predictions with model
model_predictions = model(w1, b1, w2, b2, test_features)

# Construct the confusion matrix
confusion_matrix(test_targets, model_predictions)




## Dropout function

Helps prevent overfitting by randomly drops weights connected to certain nodes in a layer during the training process. Force network to develop more robust rules for classification. Tends to improve out of sample performance.

In [333]:
%%script echo

# ... other layers

# Drop weights connected to 25% of the nodes randomly
dropout1 = tf.keras.layers.Dropout(0.25)(dense2)

# ... then pass that to the output layer
outputs = tf.keras.layers.Dense(1, activation="sigmoid")(dropout)




## Optimizers

* Stochastic Gradient Descent (SGD)
  * Improved version of gradient descent that is less likely to get stuck in local minimums. For simple problems, SGD algorithms work well.
  * `keras.optimizers.SGD(learning_rate=0.01)`
  * `learning_rate` - between 0.5 and 0.001. Higher introduces more force.
  * Simpler and easy to interpret vs modern optimizations
* Root Mean Squared (RMS) optimizer
  * Applies different learning rates to each feature which is useful for high dimensional problems
  * `keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.99)`
  * `learning_rate`
  * `momentum`
  * `decay` - setting a low value prevents momentum from accumulating over long periods of the training process
* Adaptive Moment (Adam) optimizer
  * Generally a good first choice
  * Require 10x many iterations to achieve similar loss vs SGD
  * `beta1` - lowers momentum
  * Performs well with default parameter values


## Random

### Draw numbers from a normal distribution

In [334]:
tf.random.normal([3, 3]).numpy()

array([[-0.43805295,  1.052406  , -0.26469442],
       [ 1.0663421 ,  1.1168493 ,  0.3420491 ],
       [-1.3589494 , -0.19850916, -0.09751613]], dtype=float32)

### Draw numbers from a normal distribution

.. And ignore very large and very small values

In [335]:
tf.random.truncated_normal([3, 3]).numpy()

array([[-0.47778726,  0.17757209,  1.0410289 ],
       [-0.02693135,  0.16252817, -0.59477305],
       [-1.1060272 , -0.6403249 ,  0.8212007 ]], dtype=float32)

## Sequential API

via [DataCamp](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63345?ex=2):

In [336]:
%%script echo

# Define a Keras sequential model
model = keras.Sequential()

# Define the first dense layer
# The input shape represents the number of inputs which in this example
# is 784 because we're analyzing a 28x28 image that's reshaped
model.add(keras.layers.Dense(16, activation='relu', input_shape=(784,)))

# Define the second dense layer
model.add(keras.layers.Dense(8, activation='relu'))

# Define the output layer
model.add(keras.layers.Dense(4, activation='softmax'))

# Print the model architecture
print(model.summary())




Then we compile it:

In [337]:
%%script echo

# Define the first dense layer
model.add(keras.layers.Dense(16, activation="sigmoid", input_shape=(784,)))

# Apply dropout to the first layer's output
model.add(keras.layers.Dropout(0.25))

# Define the output layer
model.add(keras.layers.Dense(4, activation="softmax"))

# Compile the model
# We could also set the `metrics` argument to ['accuracy'] since
# reporting on the loss isn't that useful.
# Optimizer could also be `SGD` or `RMSprop`
model.compile('adam', loss='categorical_crossentropy')

# Alternative syntax for passing arguments for the optimizer
# model.compile(optimizer=keras.optimizers.Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# Print a model summary
print(model.summary())




## Training the model

via [DataCamp](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63345?ex=5):

Only two required arguments for `fit`:

1. `features`
2. `labels`

But many optional including:

3. `batch_size` - the number of examples in each batch (32 by default). For example, imagine your training data consists of 128 samples. With the default batch size of 32, there would be 4 batches.
4. `epochs` - the number of times you train on the full set of batches. This allows the model to revisit the same batches but with different model weights and possibly optimizer parameters since they are optimized after each batch.
5. `validation_split` - selecting a value of 0.2 will put 20% of the data in the validation set. You can see how well the model performs on both the data it was trained on (the training set) and data it was not trained on (the validation set)


In [338]:
%%script echo

model.fit(image_features, image_labels)




## Evaluating on the test set

Provides you with further assurance that you have not overfitted.

"Notice that the gap between the test and train set losses is high for large_model, suggesting that overfitting may be an issue. Furthermore, both test and train set performance is better for large_model. This suggests that we may want to use large_model, but reduce the number of training epochs" [#](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63345?ex=9)

In [339]:
%%script echo

model.evaluate(test_features, test_labels)




## Functional API

"In some cases, the sequential API will not be sufficiently flexible to accommodate your desired model architecture and you will need to use the functional API instead. If, for instance, you want to train two models with different architectures jointly, you will need to use the functional API to do this." [#](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63345?ex=4)

In [340]:
%%script echo

# For model 1, pass the input layer to layer 1 and layer 1 to layer 2
m1_layer1 = keras.layers.Dense(12, activation='sigmoid')(m1_inputs)
m1_layer2 = keras.layers.Dense(4, activation='softmax')(m1_layer1)

# For model 2, pass the input layer to layer 1 and layer 1 to layer 2
m2_layer1 = keras.layers.Dense(12, activation='relu')(m2_inputs)
m2_layer2 = keras.layers.Dense(4, activation='softmax')(m2_layer1)

# Merge model outputs and define a functional model
merged = keras.layers.add([m1_layer2, m2_layer2])
model = keras.Model(inputs=[m1_inputs, m2_inputs], outputs=merged)

# Print a model summary
print(model.summary())




## Estimators

via [DataCamp](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63345?ex=11):

In [341]:
%%script echo

# Define feature columns for bedrooms and bathrooms
bedrooms = feature_column.numeric_column("bedrooms")
bathrooms = feature_column.numeric_column("bathrooms")

# Define the list of feature columns
feature_list = [bedrooms, bathrooms]

def input_fn():
    # Define the labels
    labels = np.array(housing["price"])
    # Define the features
    features = {'bedrooms':np.array(housing['bedrooms']), 
                'bathrooms':np.array(housing["bathrooms"])}
    return features, labels




In [342]:
%%script echo

# Define the model and set the number of steps
model = estimator.DNNRegressor(feature_columns=feature_list, hidden_units=[2,2])
model.train(input_fn, steps=1)




"Note that you have other premade estimator options, such as BoostedTreesRegressor(), and can also create your own custom estimators."[#](https://campus.datacamp.com/courses/introduction-to-tensorflow-in-python/63345?ex=12)

In [343]:
%%script echo

# Define the model and set the number of steps
model = estimator.LinearRegressor(feature_columns=feature_list)
model.train(input_fn, steps=2)


