# Chapter Three. Neural Networks in TensorFlow

In this chapter, you'll learn how to predict credit card default using neural networks defined and trained in TensorFlow. You will define dense layers, apply activation functions, select an optimizer, and apply regularization to reduce overfitting. You will take advantage of TensorFlow's flexibility by using both lowlevel linear algebra and high-level Keras API operations to define and train models.

> **Topics:**
- 1. Dense layers
    - 1.1 The linear algebra of dense layers
    - 1.2 The low-level approach with multiple examples
    - 1.3 Using the dense layer operation
- 2. Activation functions
    - 2.1. Binary classification problems
    - 2.2. Multiclass classification problems
- 3. Optimizers
    - 3.1. The dangers of local minima
    - 3.2. Avoiding local minima
- 4. Training a network in TensorFlow
    - 4.1. Initialization in TensorFlow
    - 4.2. Training neural networks with TensorFlow

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

from tensorflow import keras, Variable, ones, matmul

filepath = '../_datasets/'

## 1. Dense layers

### The linear regression model

![][01-NN_LR]

### What is a neural network?

![][02-NN]

![][03-NN]

### A trivial dense layer
```Python
import tensorflow as tf

# Define input data
inputs = tf.constant([[1, 35]])

# Define weights
weights = tf.Variable([[-0.05], [-0.01]])

# Multiply inputs by the weights
product = tf.matmul(inputs, weights)

# Define dense layer
dense = tf.keras.activations.sigmoid(product)

```

### Defining a complete model
```Python
import tensorflow as tf

# Define input layer
inputs = tf.constant(data, tf.float32)

# Define first dense layer
dense1 = tf.keras.layers.Dense(10, activation='sigmoid')(inputs)

# Define second dense layer
dense2 = tf.keras.layers.Dense(5, activation='sigmoid')(dense1)

# Define output layer
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(dense2)
```

### High-level versus low-level approach
- High-level approach
    - High-level API operations
```Python    
dense = keras.layers.Dense(10, activation='sigmoid')
```

- Low-level approach
    - Linear-algebraic operations
```Python    
prod = matmul(inputs, weights)
dense = keras.activations.sigmoid(prod)
```


[01-NN_LR]:_Docs/01-NN_LR.png
[02-NN]:_Docs/02-NN.png
[03-NN]:_Docs/03-NN.png

### 1.1 The linear algebra of dense layers
There are two ways to define a dense layer in `tensorflow`. The first involves the use of low-level, linear algebraic operations. The second makes use of high-level `keras` operations. In this exercise, we will use the first method to construct the network shown in the image below.

![][04-3_2_1_network]

The input layer contains 3 features -- education, marital status, and age -- which are available as `borrower_features`. The hidden layer contains 2 nodes and the output layer contains a single node.

For each layer, you will take the previous layer as an input, initialize a set of weights, compute the product of the inputs and weights, and then apply an activation function. Note that `Variable()`, `ones()`, `matmul()`, and `keras()` have been imported from `tensorflow`.

[04-3_2_1_network]:_Docs/04-3_2_1_network.png

In [2]:
borrower_features = np.array([[ 2.,  1., 24.]], dtype = np.float32())

In [3]:
# Initialize weights1 as 3x2 variable of ones
weights1 = Variable(ones((3, 2)))

# Perform matrix multiplication of borrower_features and weights1
product1 = matmul(borrower_features, weights1)

# Apply sigmoid activation function to product1
dense1 = keras.activations.sigmoid(product1)

# Print shape of dense1
print("\n dense1's output shape: {}".format(dense1.shape))

# From previous step
weights1 = Variable(ones((3, 2)))
product1 = matmul(borrower_features, weights1)
dense1 = keras.activations.sigmoid(product1)

# Initialize weights2 as 2x1 variable of ones
weights2 = Variable(ones((2, 1)))

# Perform matrix multiplication of dense1 and weights2
product2 = matmul(dense1,weights2)

# Apply activation to product2 and print the prediction
prediction = keras.activations.sigmoid(product2)
print('\n prediction: {}'.format(prediction.numpy()[0,0]))
print('\n actual: 1')


 dense1's output shape: (1, 2)

 prediction: 0.8807970285415649

 actual: 1


### 1.2 The low-level approach with multiple examples
In this exercise, we'll build further intuition for the low-level approach by constructing the first dense hidden layer for the case where we have multiple examples. We'll assume the model is trained and the first layer weights, `weights1`, is available. We'll then perform matrix multiplication of the `borrower_features` tensor by the `weights1` variable. Recall that the `borrower_features` tensor includes education, marital status, and age. Finally, we'll apply the sigmoid function to the elements of `products1`, yielding `dense1`.

$
products1 = \begin{bmatrix} 
                3 & 2 & 23 \\ 
                2 & 1 & 24 \\ 
                1 & 1 & 49 \\ 
                1 & 1 & 49 \\ 
                2 & 1 & 29 
             \end{bmatrix} 
             \begin{bmatrix} 
                 -0.6 & 0.6  \\ 
                  0.8 & -0.3 \\ 
                  -0.09 & -0.08 
             \end{bmatrix}
$

In [4]:
borrower_features = np.array([[ 3.,  3., 23.],
                              [ 2.,  1., 24.],
                              [ 1.,  1., 49.],
                              [ 1.,  1., 49.],
                              [ 2.,  1., 29.]], dtype = np.float32())
weights1 = np.array([[-0.6 ,  0.6 ],
                     [ 0.8 , -0.3 ],
                     [-0.09, -0.08]], dtype = np.float32())

borrower_features = tf.constant(borrower_features)
weights1 = tf.constant(weights1)

In [5]:
# Compute the product of borrower_features and weights1
products1 = matmul(borrower_features, weights1)

# Apply a sigmoid activation function to products1
dense1 = keras.activations.sigmoid(products1)

# Print the shapes of borrower_features, weights1, and dense1
print('\n shape of borrower_features: ', borrower_features.shape)
print('\n shape of weights1: ', weights1.shape)
print('\n shape of dense1: ', dense1.shape)


 shape of borrower_features:  (5, 3)

 shape of weights1:  (3, 2)

 shape of dense1:  (5, 2)


Note that our input data, `borrower_features`, is 5x3 because it consists of 5 examples for 3 features. The shape of `weights1` is 3x2, as it was in the previous exercise, since it does not depend on the number of examples. Finally, `dense1` is 5x2, which means that we can multiply it by the following set of weights, `weights2`, which we defined to be 2x1 in the previous exercise.

### Using the dense layer operation
We've now seen how to define dense layers in `tensorflow` using linear algebra. In this exercise, we'll skip the linear algebra and let `keras` work out the details. This will allow us to construct the network below, which has 2 hidden layers and 10 features, using less code than we needed for the network with 1 hidden layer and 3 features.

![][05-10_7_3_1_network]

To construct this network, we'll need to define three dense layers, each of which takes the previous layer as an input, multiplies it by weights, and applies an activation function. Note that input data has been defined and is available as a 100x10 tensor: `borrower_features`. Additionally, the `keras.layers` module is available.

[05-10_7_3_1_network]:_Docs/05-10_7_3_1_network.png

In [6]:
borrower_features = np.array(pd.read_csv(filepath+'borrowed_features.csv', header=None), dtype = np.float32())
borrower_features = tf.constant(borrower_features)

In [7]:
# Define the first dense layer
dense1 = keras.layers.Dense(7, activation='sigmoid')(borrower_features)

# Define a dense layer with 3 output nodes
dense2 = keras.layers.Dense(3, activation='sigmoid')(dense1)

# Define a dense layer with 1 output node
predictions = keras.layers.Dense(1, activation='sigmoid')(dense2)

# Print the shapes of dense1, dense2, and predictions
print('\n shape of dense1: ', dense1.shape)
print('\n shape of dense2: ', dense2.shape)
print('\n shape of predictions: ', predictions.shape)


 shape of dense1:  (100, 7)

 shape of dense2:  (100, 3)

 shape of predictions:  (100, 1)


With just 8 lines of code, you were able to define 2 dense hidden layers and an output layer. This is the advantage of using high-level operations in `tensorflow`. Note that each layer has 100 rows because the input data contains 100 examples.

## 2. Activation functions

### What is an activation function?
- **Components of a typical hidden layer**
    - **Linear:** Matrix multiplication
    - **Nonlinear:** Activation function
    
### The sigmoid activation function
- **Sigmoid activation function**
    - Binary classification
    - Low-level: `tf.keras.activations.sigmoid()`
    - High-level: `sigmoid`
    
### The relu activation function
- **ReLu activation function**
    - Hidden layers
    - Low-level: `tf.keras.activations.relu()`
    - High-level: `relu`
    
### The softmax activation function
- **Softmax activation function**
    - Output layer (>2 classes)
    - High-level: `tf.keras.activations.softmax()`
    - Low-level: `softmax`
    
![][06-relu_sigmoid]

### Activation functions in neural networks
```Python
import tensorflow as tf
# Define input layer
inputs = tf.constant(borrower_features, tf.float32)
# Define dense layer 1
dense1 = tf.keras.layers.Dense(16, activation='relu')(inputs)
# Define dense layer 2
dense2 = tf.keras.layers.Dense(8, activation='sigmoid')(dense1)
# Define output layer
outputs = tf.keras.layers.Dense(4, activation='softmax')(dense2)

```
[06-relu_sigmoid]:_Docs/06-relu_sigmoid.png

### 2.1 Binary classification problems
In this exercise, you will again make use of credit card data. The target variable, `default`, indicates whether a credit card holder defaults on her payment in the following period. Since there are only two options--default or not--this is a binary classification problem. While the dataset has many features, you will focus on just three: the size of the three latest credit card bills. Finally, you will compute predictions from your untrained network, `outputs`, and compare those the target variable, `default`.

The tensor of features has been loaded and is available as `bill_amounts`. Additionally, the `constant()`, `float32`, and `keras.layers.Dense()` operations are available.

In [8]:
df_uci_credit_card = pd.read_csv(filepath+'uci_credit_card.csv')
df_uci_credit_card.head()

Unnamed: 0,ID,LIMIT_BAL,SEX,EDUCATION,MARRIAGE,AGE,PAY_0,PAY_2,PAY_3,PAY_4,...,BILL_AMT4,BILL_AMT5,BILL_AMT6,PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,default.payment.next.month
0,1,20000.0,2,2,1,24,2,2,-1,-1,...,0.0,0.0,0.0,0.0,689.0,0.0,0.0,0.0,0.0,1
1,2,120000.0,2,2,2,26,-1,2,0,0,...,3272.0,3455.0,3261.0,0.0,1000.0,1000.0,1000.0,0.0,2000.0,1
2,3,90000.0,2,2,2,34,0,0,0,0,...,14331.0,14948.0,15549.0,1518.0,1500.0,1000.0,1000.0,1000.0,5000.0,0
3,4,50000.0,2,2,1,37,0,0,0,0,...,28314.0,28959.0,29547.0,2000.0,2019.0,1200.0,1100.0,1069.0,1000.0,0
4,5,50000.0,1,2,1,57,-1,0,-1,0,...,20940.0,19146.0,19131.0,2000.0,36681.0,10000.0,9000.0,689.0,679.0,0


In [9]:
default = df_uci_credit_card['default.payment.next.month'].values.reshape(-1,1)
bill_amounts = df_uci_credit_card[['BILL_AMT1','BILL_AMT2','BILL_AMT3']].values

In [10]:
# Construct input layer from features
inputs = tf.constant(bill_amounts, tf.float32)

# Define first dense layer
dense1 = keras.layers.Dense(3, activation='relu')(inputs)

# Define second dense layer
dense2 = keras.layers.Dense(2, activation='relu')(dense1)

# Define output layer
outputs = keras.layers.Dense(1, activation='sigmoid')(dense2)

# Print error for first five examples
error = default[:5] - outputs.numpy()[:5]
print(error)

[[1.]
 [1.]
 [0.]
 [0.]
 [0.]]


If you run the code several times, you'll notice that the errors change each time. This is because you're using an untrained model with randomly initialized parameters. Furthermore, the errors fall on the interval between -1 and 1 because `default` is a binary variable that takes on values of 0 and 1 and `outputs` is a probability between 0 and 1.

### 2.2 Multiclass classification problems
In this exercise, we expand beyond binary classification to cover multiclass problems. A multiclass problem has targets that can take on three or more values. In the credit card dataset, the education variable can take on 6 different values, each corresponding to a different level of education. We will use that as our target in this exercise and will also expand the feature set from 3 to 10 columns.

As in the previous problem, you will define an input layer, dense layers, and an output layer. You will also print the untrained model's predictions, which are probabilities assigned to the classes.

In [11]:
borrower_features = df_uci_credit_card[['BILL_AMT1','BILL_AMT2','BILL_AMT3',"BILL_AMT4","BILL_AMT5","PAY_AMT1","PAY_AMT2","PAY_AMT3","PAY_AMT4","PAY_AMT5"]].values

In [12]:
# Construct input layer from borrower features
inputs = tf.constant(borrower_features, tf.float32)

# Define first dense layer
dense1 = keras.layers.Dense(10, activation='sigmoid')(inputs)

# Define second dense layer
dense2 = keras.layers.Dense(8, activation='relu')(dense1)

# Define output layer
outputs = keras.layers.Dense(6, activation='softmax')(dense2)

# Print first five predictions
print(outputs.numpy()[:5])

[[0.03770198 0.13165766 0.22784594 0.18070209 0.16888447 0.25320792]
 [0.03395255 0.1424841  0.27360177 0.24699776 0.18191315 0.12105078]
 [0.0301488  0.13805786 0.27993178 0.245279   0.18558043 0.12100209]
 [0.03586568 0.12992041 0.26556113 0.20972696 0.14697164 0.21195407]
 [0.09635121 0.1684512  0.1920806  0.2175894  0.14326315 0.18226445]]


## 3. Optimizers

### How to find a minimum

![][07-local_minima_dots_4_10]

![][08-optimizers]

### The gradient descent optimizer
- **Stochastic gradient descent (SGD) optimizer**
    - `tf.keras.optimizers.SGD()`
    - `learning_rate`

### The RMS prop optimizer
- **Root mean squared (RMS) propagation optimizer**
    - Applies different learning rates to each feature
    - `tf.keras.optimizers.RMSprop()`
    - `learning_rate`
    - `decay`

### The adam optimizer
- **Adaptive moment (adam) optimizer**
    - `tf.keras.optimizers.Adam()`
    - `learning_rate`
    - `beta1`
    - `beta2`

### A complete example
```Python
import tensorflow as tf

# Compute the predicted values and loss
def loss_function(weights):
    product = tf.matmul(borrower_features, weights)
    predictions = tf.keras.activations.sigmoid(product)
    return tf.keras.losses.binary_crossentropy(default, predictions)

# Minimize the loss function with adam
opt = tf.keras.optimizers.Adam(lr=0.1, beta_1=0.9, beta_2=0.8)
opt.minimize(lambda: loss_function(weights), var_list=[weights])
```
[07-local_minima_dots_4_10]:_Docs/07-local_minima_dots_4_10.png
[08-optimizers]:_Docs/08-optimizers.png

### 3.1 The dangers of local minima
Consider the plot of the following loss function, `loss_function()`, which contains a global minimum, marked by the dot on the right, and several local minima, including the one marked by the dot on the left.

![][07-local_minima_dots_4_10]

In this exercise, you will try to find the global minimum of `loss_function()` using `keras.optimizers.SGD()`. You will do this twice, each time with a different initial value of the input to `loss_function()`. First, you will use `x_1`, which is a variable with an initial value of 6.0. Second, you will use `x_2`, which is a variable with an initial value of 0.3. Note that `loss_function()` has been defined and is available.

```Python
# Initialize x_1 and x_2
x_1 = Variable(6.0,float32)
x_2 = Variable(0.3,float32)

# Define the optimization operation
opt = keras.optimizers.SGD(learning_rate=0.01)

for j in range(100):
	# Perform minimization using the loss function and x_1
	opt.minimize(lambda: loss_function(x_1), var_list=[x_1])
	# Perform minimization using the loss function and x_2
	opt.minimize(lambda: loss_function(x_2), var_list=[x_2])

# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())

<script.py> output:
    4.3801394 0.42052683
```

Notice that we used the same optimizer and loss function, but two different initial values. When we started at 6.0 with `x_1`, we found the global minimum at 4.38, marked by the dot on the right. When we started at 0.3, we stopped around 0.42 with `x_2`, the local minimum marked by a dot on the far left.


### 3.2 Avoiding local minima
The previous problem showed how easy it is to get stuck in local minima. We had a simple optimization problem in one variable and gradient descent still failed to deliver the global minimum when we had to travel through local minima first. One way to avoid this problem is to use momentum, which allows the optimizer to break through local minima. We will again use the loss function from the previous problem, which has been defined and is available for you as `loss_function()`.

![][07-local_minima_dots_4_10]

Several optimizers in `tensorflow` have a momentum parameter, including `SGD` and `RMSprop`. You will make use of RMSprop in this exercise.

```Python
# Initialize x_1 and x_2
x_1 = Variable(0.05,float32)
x_2 = Variable(0.05,float32)

# Define the optimization operation for opt_1
opt_1 = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.99)
opt_2 = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.00)

for j in range(100):
	opt_1.minimize(lambda: loss_function(x_1), var_list=[x_1])
    # Define the minimization operation for opt_2
	opt_2.minimize(lambda: loss_function(x_2), var_list=[x_2])

# Print x_1 and x_2 as numpy arrays
print(x_1.numpy(), x_2.numpy())

<script.py> output:
    4.3150263 0.4205261
```

Recall that the global minimum is approximately 4.38. Notice that `opt_1` built momentum, bringing `x_1` closer to the global minimum. To the contrary, `opt_2`, which had a momentum parameter of 0.0, got stuck in the local minimum on the left.


[07-local_minima_dots_4_10]:_Docs/07-local_minima_dots_4_10.png

## 4. Training a network in TensorFlow

![][11-eggholder_function]

Find the global minima!!!

- How we can select initial values for x and y, the two inputs of the eggholder function??
- What if we have a loss function that depends of hundred of variables?

### Random initializers
- **Often need to initialize thousands of variables**
    - `tf.ones()` may perform poorly
    - Tedious and difficult to initialize variables individually
- **Alternatively, draw initial values from distribution**
    - Random normal
    - Uniform
    - Glorot initializer
    
### Initializing variables in TensorFlow
```Python
import tensorflow as tf

# Define 500x500 random normal variable
weights = tf.Variable(tf.random.normal([500, 500]))

# Define 500x500 truncated random normal variable
weights = tf.Variable(tf.random.truncated_normal([500, 500]))

# Define a dense layer with the default initializer
dense = tf.keras.layers.Dense(32, activation='relu')

# Define a dense layer with the zeros initializer
dense = tf.keras.layers.Dense(32, activation='relu', kernel_initializer='zeros')
```

### Neural networks and overfitting

![][09-overfitting]

- Overfitting is specially problematic for Neural Networks, which containt many parameters and are quite good at memorization.

### Applying dropout

- A simple solution to the overfitting problem is to use dropout, an operation that will randomly drop nodes in a layer during the training process as shown on the rigth neural network.
- This will force your network to develop more robust rules for classification, since it cannot rely on any particular nodes being passed to an activation function.

![][10-dropout]

### Implementing dropout in a network
```Python
import numpy as np
import tensorflow as tf

# Define input data
inputs = np.array(borrower_features, np.float32)

# Define dense layer 1
dense1 = tf.keras.layers.Dense(32, activation='relu')(inputs)

# Define dense layer 2
dense2 = tf.keras.layers.Dense(16, activation='relu')(dense1)

# Apply dropout operation
dropout1 = tf.keras.layers.Dropout(0.25)(dense2)

# Define output layer
outputs = tf.layers.Dense(1, activation='sigmoid')(dropout1)

```

[09-overfitting]:_Docs/09-overfitting.png
[10-dropout]:_Docs/10-dropout.png
[11-eggholder_function]:_Docs/11-eggholder_function.png

### 4.1 Initialization in TensorFlow
A good initialization can reduce the amount of time needed to find the global minimum. In this exercise, we will initialize weights and biases for a neural network that will be used to predict credit card default decisions. To build intuition, we will use the low-level, linear algebraic approach, rather than making use of convenience functions and high-level `keras` operations. We will also expand the set of input features from 3 to 23. 

In [13]:
# Define the layer 1 weights
weights1 = tf.Variable(tf.random.normal([23, 7]))

# Initialize the layer 1 bias
bias1 = tf.Variable(tf.ones([7]))

# Define the layer 2 weights
weights2 = tf.Variable(tf.random.normal([7,1]))

# Define the layer 2 bias
bias2 = tf.Variable(0.0)

### 4.2 Training neural networks with TensorFlow
In this exercise, you will train a neural network to predict whether a credit card holder will default. The features and targets you will use to train your network are available in the Python shell as `borrower_features` and `default`. You defined the weights and biases in the previous exercise.

Note that `output_layer` is defined as $σ(layer1∗weights2+bias2)$, where $σ$ is the sigmoid activation, `layer1` is a tensor of nodes for the first hidden dense layer, `weight2` is a tensor of weights, and `bias2` is the bias tensor.

The trainable variables are `weights1`, `bias1`, `weights2`, and `bias2`.

In [14]:
# defining label
default = df_uci_credit_card['default.payment.next.month'].values.reshape(-1,1).astype(np.float32())
default

array([[1.],
       [1.],
       [0.],
       ...,
       [1.],
       [1.],
       [1.]], dtype=float32)

In [15]:
# Selecting scaler
from sklearn import preprocessing
max_abs_scaler = preprocessing.MaxAbsScaler()

# Selecting features and applying scaler
borrower_features = df_uci_credit_card.drop(['ID','default.payment.next.month'], axis = 1).values.astype(np.float32())
borrower_features = max_abs_scaler.fit_transform(borrower_features)
borrower_features

array([[0.02      , 1.        , 0.33333334, ..., 0.        , 0.        ,
        0.        ],
       [0.12      , 1.        , 0.33333334, ..., 0.00161031, 0.        ,
        0.00378311],
       [0.09      , 1.        , 0.33333334, ..., 0.00161031, 0.00234451,
        0.00945777],
       ...,
       [0.03      , 0.5       , 0.33333334, ..., 0.00676329, 0.00468901,
        0.00586382],
       [0.08      , 0.5       , 0.5       , ..., 0.00310145, 0.12417444,
        0.00341236],
       [0.05      , 0.5       , 0.33333334, ..., 0.00161031, 0.00234451,
        0.00189155]], dtype=float32)

In [16]:
def loss_function(weights1, bias1, weights2, bias2, features, targets):
	# Apply relu activation functions to layer 1
	layer1 = tf.nn.relu(matmul(features, weights1) + bias1)
    # Apply dropout
	dropout = keras.layers.Dropout(0.25)(layer1)
	layer2 = tf.nn.sigmoid(matmul(dropout, weights2) + bias2)
    # Pass targets and layers2 to the cross entropy loss
	return keras.losses.binary_crossentropy(targets, layer2)

opt = keras.optimizers.RMSprop(learning_rate=0.01, momentum=0.99)

for j in range(0, 30000, 2000):
	features, targets = borrower_features[j:j+2000, :], default[j:j+2000, :]
    # Complete the optimizer
	opt.minimize(lambda: loss_function(weights1, bias1, weights2, bias2, features, targets), var_list=[weights1, bias1, weights2, bias2])
    
print(weights1.numpy())

W0623 22:37:27.090510 24356 deprecation.py:323] From C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\math_grad.py:1220: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


[[ 2.0169091e-01 -2.2284541e+00 -3.3028159e-01 -2.0048859e+00
  -4.3291447e-01 -2.2897229e+00  3.8875139e-01]
 [-7.7550590e-01 -4.8801219e-01  2.2942179e-01  5.3426832e-01
  -7.9427963e-01 -3.6275822e-01  1.4135907e+00]
 [ 6.5496072e-02 -2.2146351e+00 -3.9696997e-01 -1.6366566e+00
  -1.0747992e+00  7.1611100e-01  7.8944489e-04]
 [-3.7804073e-01 -1.7501838e+00 -1.1841477e-01  2.1301384e+00
   7.8164917e-01 -5.5147207e-01  5.3297257e-01]
 [-9.9887699e-02 -6.3926972e-02 -5.9422247e-02 -8.3798029e-02
   8.3038813e-01 -6.8809366e-01 -5.5373991e-01]
 [-3.4567182e+00 -8.4834415e-01 -2.7652435e+00  1.5756580e+00
   1.9418348e+00 -3.5993240e+00 -1.6360222e+00]
 [-2.9568233e+00 -1.6227802e+00 -8.5211337e-02 -1.8524590e-01
   5.5522573e-01 -8.8754028e-01 -2.5353632e+00]
 [-9.4716591e-01  5.1678133e-01 -9.5865762e-01  7.5114298e-01
   5.1255792e-01 -2.1275477e+00 -1.3426723e+00]
 [-1.4681213e+00 -8.4127659e-01 -1.3692361e+00 -9.9739438e-01
   1.1932820e+00 -9.7684968e-01 -1.5188956e+00]
 [-1.56961

One of the benefits of using `tensorflow` is that you have the option to customize models down to the linear algebraic-level, as we've shown in the last two exercises. If you look at `weights1`, which was printed to the console, you can see that the objects we're working with are simply tensors. In the following chapter, we'll see how you can make use of high level APIs to streamline the process for standard models that do not require customization.