<a href="https://colab.research.google.com/github/kboyles8/CAP4630/blob/master/HW_5/HW5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Summary of Class

In this assignment, a summary of the concepts, methods, and algorithms I have learned in this class is provided. Code examples are given where appropriate to reinforce the concepts.

## Global Imports and Setup

Commonly used packages are imported, and initial setup is performed.

In [0]:
%tensorflow_version 2.x

import numpy as np
import tensorflow as tf

# Fix seed for consistent results
np.random.seed(42)
tf.random.set_seed(42)

### Data Gathering and Generation

This method is used later to generate random data.

In [0]:
def get_random_data(w, b, mu, sigma, m):
    num_train = (int)(m * 0.8)

    C = np.random.randint(0, 2, size=(m, 1))
    X_1 = np.random.uniform(size=(m, 1))
    N = np.random.normal(mu, sigma, size=(m, 1))

    X_2 = w * X_1 + b + (-1)**C * N

    data = np.concatenate((X_1, X_2), axis=1)
    labels = C

    return ((data[:num_train], labels[:num_train]), (data[num_train:], labels[num_train:]))


This sets up the CIFAR data for use in the Keras model sections

In [3]:
(cifar_train_images_original, cifar_train_labels_original), (cifar_test_images_original, cifar_test_labels_original) = tf.keras.datasets.cifar10.load_data()

cifar_train_images_shaped = cifar_train_images_original.reshape((50000, 32, 32, 3))
cifar_train_images_shaped = cifar_train_images_shaped.astype('float32') / 255
cifar_test_images = cifar_test_images_original.reshape((10000, 32, 32, 3))
cifar_test_images = cifar_test_images.astype('float32') / 255

cifar_train_images = cifar_train_images_shaped[:45000]
cifar_validation_images = cifar_train_images_shaped[45000:]  # Use the last 5000 images as validation data

# categorically encode the labels
cifar_train_labels_cat = tf.keras.utils.to_categorical(cifar_train_labels_original)
cifar_train_labels = cifar_train_labels_cat[:45000]
cifar_validation_labels = cifar_train_labels_cat[45000:]
cifar_test_labels = tf.keras.utils.to_categorical(cifar_test_labels_original)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


## General Concepts

### Artificial Intelligence (AI)

Artificial Intelligence (AI) is a broad category that represents intelligent programs which can immitate human behavior. This includes topics such as speech recognition, image classification, and understanding human language. This class focussed on the classification of images.

### Symbolic AI

Symbolic AI is a subset of AI in which rules are established that determine the behavior of the program. Given an input and a set of rules, the program produces some output. Due to the complicated nature of many tasks such as image classification, it is infeasible to develop rules for the task, making this type of AI somewhat limited.

### Machine Learning

Machine Learning is a subset of AI which involves programs which can "learn" from data they are exposed to. In this form, input and expected output are given to the program, and a set of rules are produced. These rules can be used later for predicting output for arbitrary input. While the process for setting up models can be complex, this process allows for far more complex behaviors to be modeled without the need to program all rules manually.

### Supervised Learning

In Supervised learning, input data to a model have labels attached which identify them. These labels are used when creating rules and classifications during learning. Typically, the model is trained to predict the label of data based on the previously seen labeled data. This form of learning is what the class focused on.

### Unsupervised Learning

In Unsupervised learning, input data has no attached labels. The model does not know how any of the data is related and must create its own rules and categories for the data. These rules can be used later to group unknown input data into the generated categories.

## Basic Concepts

### Linear Regression

Linear Regression is a formula for predicting continuous values. The equation for the predictions of the model is of the form $\hat{y} = b + w_1x_1 + w_2x_2 + \dots + w_nx_n$, where $\hat{y}$ is the predicted label, $b$ is the bias, $X = \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix}$ are the features, and $W = \begin{bmatrix} w_1 & w_2 & \dots & w_n \end{bmatrix}$ are the weights of the features.

Using matricies, the equation can be rewritten as $\hat{y} = b + WX^T$.

### Logistic Regression

Logistic Regression is a formula for predicting binary values. The result of the equation is represented as a percentage chance that the result is of one of the two categories. The sigmoid activation function is used to restrict the value to a range of $(0, 1)$. Below are functions for the sigmoid function, and logistic regression predictions.

In [0]:
# Perform the sigmoid activation function on an input `z`
def sigmoid(z):
    return 1.0 / (1.0 + np.exp(np.negative(z)))

# Process the input `X_b` according to the weights `W_b`
def process_input(X_b, W_b):
    # Apply weights and bias
    z = W_b.T.dot(X_b)

    # Perform sigmoid activation function
    return sigmoid(z)

# Make a prediction based on the result `a`
def predict(a):
    return 0 if a < 0.5 else 1

### Error and Loss

Loss is a measurement of how far off a model's prediction is from the true value. A loss of 0 is a perfect prediction. Loss is important as it is used to optimize the weights for a model and make better predictions.

One form of loss is Squared Error. It is calculated as $L = (y - \hat{y})^2$, where $y$ is the true label and $\hat{y}$ is the predicted label.

Mean Squared Error is a representation of Squared Error loss over a batch of predictions instead of one prediction. It is calculated as $MSE = \frac{1}{m} \sum_{i = 1}^{m}L^{(i)}$, where $m$ is the number of elements in the dataset.

Below is code for another type of loss, binary cross-entropy. This is used with Logistic Regression.

In [0]:
# Calcualte the binary cross-entropy loss for the prediction `a` and true label `label`
def binary_crossentropy(a, label):
    return -label*np.log(a) - (1 - label)*np.log(1 - a)

### Gradients

A gradient is a vector which represents the direction and magnitude of steepest ascent for a function at a given point. It is used in combination with the loss function to determine which direction to move to reduce the loss. The gradeint of the loss is calculated as
$\nabla{}L = \begin{bmatrix}
\frac{\partial{}L}{\partial{b}} &
\frac{\partial{}L}{\partial{w_1}} &
\frac{\partial{}L}{\partial{w_2}} &
\dots &
\frac{\partial{}L}{\partial{w_n}} &
\end{bmatrix}$

Below is code to calculate binary cross-entropy loss, used in Logistic Regression.

In [0]:
# Calculate the gradient of the binary cross-entropy loss for the prediction `a`, true label `label`, and input `X_b`
def loss_gradient(a, label, X_b):
    return (a - label) * X_b

### Gradient Descent

Gradient Descent is a process for minimizing the loss of a model by adjusting the weights. To begin, weights are set to some arbitrary value. Predictions are made on some input data, and the loss is calculated. The gradient of that loss is calculated to determine the direction of steepest increase in loss. Finally, the weights are updated as $W = W - \alpha\nabla{}L$, where $\alpha$ is the learning rate. This value simply scales the effect the gradient has on the weights.

There are 3 forms of Gradient Descent:

- Batch: Uses all data from the training set
- Mini-batch: Uses a subset of the data from the training set
- Stochastic: Uses only a single element from the training set


### Training

Training a model is simply the process of iteratively running the model against test data, making predictions, and adjusting the weights of the model to minimize the loss. Doing this can lead the model to make better predictions for future runs.

An Epoch represents a cycle in which all elements of the training set are considered. Multiple epochs will go over the training data multiple times.

The rate at which a model changes is called the learning rate. By lowering this value, the model will change slower. This may be helpful to avoid divergence in training. This can happen when the model changes so much that it overshoots the ideal weights and causes the gradient to be even larger next time. This can continue forever, with the model getting further and further from the ideal value.

Below is code for performing Gradient Descent using Logistic Regression to train a model against some test data. More information about the training of models will be found in the section Training a Keras Model.

#### Create Data to Use

In [0]:
# Set parameters
w = 2
b = 4
mu = 3
sigma = 1.5
m = 1000

# Gather train and test data
(train_data, train_labels), (test_data, test_labels) = get_random_data(w, b, mu, sigma, m)

#### Define Test Function

This function will test the logistic regression model against the test data, and return the loss and accuracy.

In [0]:
# Perform a test of the model using the test data and labels, with weights `W_b`
def test_model(test_data, test_labels, W_b):
    correct_predictions = 0
    total_loss = 0

    for i_data in range(len(test_data)):
        # Process the input
        X_b = np.concatenate(([1], test_data[i_data]))
        a = process_input(X_b, W_b)

        # Make a prediction
        p = predict(a)
        if p == test_labels[i_data]:
            correct_predictions += 1
        
        # Determine loss
        total_loss += binary_crossentropy(a, test_labels[i_data])

    # Return a summary
    return (total_loss[0] / len(test_data), correct_predictions / len(test_data))

#### Run the Model

This runs the Logistic Regression model over 10 epochs, with a learning rate of 0.01, and displays the results for each epoch.

In [9]:
def logistic_regression(train_data, train_labels, test_data, test_labels, epochs, learing_rate):
    # Randomize the initial weights
    W_b = np.random.random_sample((3, ))

    for epoch in range(epochs):
        # Only perform stochastic gradient descent
        for i_data in range(len(train_data)):
            # Process the input
            X_b = np.concatenate(([1], train_data[i_data]))
            a = process_input(X_b, W_b)

            # Determine the gradient of the loss
            Lg_b = loss_gradient(a, train_labels[i_data], X_b)

            # Apply the gradient to the weights
            W_b -= Lg_b * learning_rate
        
        # Analyze the loss and accuracy for each epoch
        loss, accuracy = test_model(test_data, test_labels, W_b)
        print(f'Epoch {epoch+1}/{epochs} - val_loss: {loss} - val_accuracy: {accuracy}')
    
    # Return the trained weights
    return W_b

epochs = 10
learning_rate = 0.01

W_b = logistic_regression(train_data, train_labels, test_data, test_labels, epochs, learning_rate)

Epoch 1/10 - val_loss: 0.2852680517706631 - val_accuracy: 0.91
Epoch 2/10 - val_loss: 0.21490160669396283 - val_accuracy: 0.94
Epoch 3/10 - val_loss: 0.17917091160280488 - val_accuracy: 0.96
Epoch 4/10 - val_loss: 0.15762120145374078 - val_accuracy: 0.96
Epoch 5/10 - val_loss: 0.14321953921728175 - val_accuracy: 0.97
Epoch 6/10 - val_loss: 0.13292257130692947 - val_accuracy: 0.97
Epoch 7/10 - val_loss: 0.1252011842969439 - val_accuracy: 0.975
Epoch 8/10 - val_loss: 0.11920285429900232 - val_accuracy: 0.975
Epoch 9/10 - val_loss: 0.11441447294496017 - val_accuracy: 0.98
Epoch 10/10 - val_loss: 0.11050855719863474 - val_accuracy: 0.98


## Building a Keras Model

The primary type of model focused on in this class is convolutional neural networks. These networks were used for classification of images. In this section, information about building a Keras model is outlined, and a model for identifying CIFAR images is created.

### Types of Layers

To begin, a description of each type of layer is given. These layers are combined to create the network.

#### Dense Layer

Dense layers are the typical type of layer. They use a form of Gradient Descent to adjust weights based on input to produce a set of outputs. Each output can be a different learned set of weights, leading to the identification of features.

#### Conv2D

Convolutional layers were used for image recognition in this class. These function by applying a kernel matrix to an input matrix, to produce an output matrix.

The stride value determines how far to move the kernel with each operation. By default, it moves by 1.

The process starts in the top left corner of the input array. The kernel is multiplied into a section of the input array equal to its size, and the resulting matrix is summed and placed into position (0, 0) in the output array. The kernel is then shifted right by the stride value, and the process continues until the end of the input array is reached. The kernel moves back to the left and down by the stride amount.

If there is not room at the end of the array due to the stride amount, the input array can optionally be padded to allow running the final operation. If padding is not set, the last operation is skipped instead.

For example, an input matrix of 
$\begin{bmatrix}
1 & 2 & 1 & 2 \\
2 & 1 & 2 & 1 \\
1 & 2 & 1 & 2 \\
2 & 1 & 2 & 1
\end{bmatrix}$
, and a kernel of
$\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}$
would produce an output of
$\begin{bmatrix}
2 & 4 & 2 \\
4 & 2 & 4 \\
2 & 4 & 2
\end{bmatrix}$

#### Maxpooling2D

MaxPooling is another layer used with image recognition in the class. Typically, they were used after a Conv2D layer. These layers take an input matrix and a window size `s`, and produce an output matrix. The stride is assumed to be the window size, but can be different.

The process starts in the top left corner of the input array. The maximum value within an `s` by `s` region is found and placed into the output array at (0, 0). The window moves right by `s`, and the process repeats until the window cannot fit into the input array. The window then moves down `s` and back to the left of the input array. This continues until the window falls off the bottom of the input array.

If the window only partially fits into the input array, the array can optionally be padded to allow running the final operation. This will pad with the same values at the end of the array, effectively treating the empty spots as $-\infty$. If padding is not set, the last operation is skipped instead.

For example, an input matrix of
$\begin{bmatrix}
1 & 2 & 1 & 2 \\
2 & 4 & 2 & 1 \\
1 & 2 & 4 & 2 \\
2 & 1 & 2 & 1
\end{bmatrix}$
with a window size of `2` would produce an output of
$\begin{bmatrix}
4 & 2 \\
2 & 4
\end{bmatrix}$

#### Flatten

The Flatten layer takes an input of arbitrary shape and flattens it into a one-dimensional array. For example, an input matrix of size `(16, 8)` would become an array of size `(128,)`.

This was used specifically to transition from a series of Conv2D layers into Dense layers. Conv2D layers work on 2D arrays, while Dense layers can only handle 1D arrays.

#### Dropout

Dropout layers work by randomly setting a percentage of the input nodes to `0`, removing them from consideration in future layers. This is useful to help prevent overfitting by adding more noise to the training data.

### Activation Functions

Several activation functions were used in this class, which are outlined below.

#### Sigmoid

The Sigmoid activation function is used to clamp values to a range of $(0, 1)$, which is useful for binary classifications. A code example is given in the section on Logistic Regression.

#### SoftMax

SoftMax was used for multi-class single-label classification. That is, given a set of categories, each input is predicted to be in one of the categories. The SoftMax activation function generates a probability from $(0, 1)$ for each category class. Using these probabilities, it can be determined what the most likely label is for an input.

#### Rectified Linear Unit (ReLU)

ReLU is used to restrict the output of a layer to non-negative numbers. The formula is $ReLU(x) = max(0, x)$. I commonly saw this activation function used for hidden layers.

### Creating a Convolutional Model

The process for creating a model with Keras follows some common patterns and guidelines, but can require trial and error in many cases. In the code below, a simple convolutional model is created to identify images from the CIFAR dataset.

3 Conv2D layers are used to generate features based on the input images. Between the convolutions, a MaxPooling2D layer is used to extract only the most important of these features. A Dropout layer is used to reduce overfitting. The data is then flattened and fed into a hidden Dense layer to generate more features. Finally, the data is fed into a Dense layer with SoftMax activation to make a prediction on which category is represented by the image.

In [10]:
conv_model = tf.keras.models.Sequential(layers=(
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    tf.keras.layers.MaxPooling2D((2, 2)),

    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),

    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),

    tf.keras.layers.Dropout(0.1),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
))

conv_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 4, 4, 64)          36928     
_________________________________________________________________
dropout (Dropout)            (None, 4, 4, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1024)              0

## Compiling a Keras Model

Once a model has been built, the model must be compiled. To do so, some additional properties must be defined. These properties are outlined in this section, and the model from before is compiled.

#### Choose an Optimizer

Fisrt, an optimizer is selected. This defines the formula used to optimize the weights of each layer for every batch. Two of these are listed below.

- SGD (Stochastic Gradient Descent): Perform standard gradient descent
- RMSprop: Keeps a running average of the square of the gradient. This gives the optimizer some momentum, which smoothes optimization and prevents outliers from having as large of an effect.

The optimizer also takes a learning rate, which affects how fast the model changes. This rate is important as a learning rate that is too large can cause overshooting, divergence, and other issues with training. A value too small can also be a problem, as it will make the training process take a very long time.

For this model, RMSprop was chosen with a learning rate of 0.001.


#### Choose a Loss Type

Different types of loss have different uses. In this class, I learned the general guideline to pick the loss type based on the type of classification.

- Binary Classification: If there are only 2 classes, use `binary_crossentropy`
- Multi-class single-label: If there are more than 2 classes, use `categorical_crossentropy`

Since there are 10 classes, `categorical_crossentropy` is used.

#### Choose Metrics

Metrics do not affect the training of the model. They are used only by the observer to judge the performance of the model. For this class, we were mainly interested in `accuracy`.

### Compiling the Model

Using the information above, the model is compiled.

In [0]:
# Compile the model
conv_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.001), 
                   loss='categorical_crossentropy', 
                   metrics=['accuracy'])

## Training a Keras Model

Once a model is compiled, it is time to train it. 

In this section, the aspects of training are described in more detail, and the compiled model from before is trained to recognize MNIST digits.

#### Datasets

Training the model is done by using a set of data called the training set. This is what the model makes predictions and optimizations on.

A second set, the validation set, is used to judge the performance of the model each epoch. This gives an indication of how the training is going.

A third set of data, the test set, is used to test the model after training. The reason for not reusing the validation set is to prevent bias in the results. Keeping the data separated prevents decision making based on a known set of data, and helps to train a model that works for the general problem instead of a specific set of data.

#### Epochs and Batch Size

Epochs are the number of times the model will run through all of the training data. Batch size is how many elements of the training data are considered at a time when performing optimization on the weights. Modifying these values changes the speed and behavior of training.

I chose to use 10 epochs, with a batch size of 32.

#### Overfitting and Underfitting

Overfitting is a serious issue with machine learning models. This happens when the model develops rules that are too specific to the training dataset and don't generalize well. This causes a very low loss on the training data, but a high loss on unseen test data. The model cannot adapt to the new data.

To avoid overfitting, the model must not train for too long on the training data. Another way of helping this is to add Dropoout layers, which create noise in the training data and help to prevent fitting too much on the training data. Another way to prevent it is to reduce the complexity of the model. This causes the model to create more general rules to fit the data, rather than many rules that are specific to the training set.

Underfitting is the opposite issue. The model has high loss on both the training set and the test set. The model cannot fit to the training set or generalize to new data. To help this, the model may need to be trained for longer, have a higher learning rate, have a more complex structure, or use different techniques.

### Training the model

The model is trained against the training dataset, and validated against the validation set to monitor progress. The model does well at categorizing both sets of data.


In [12]:
conv_model.fit(cifar_train_images, 
               cifar_train_labels, 
               epochs=10, 
               batch_size=32,
               validation_data=(cifar_validation_images, cifar_validation_labels))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f76e00621d0>

### Testing the model

Finaly, the model is tested against the never-used test dataset. The model has relatively high accuracy, indicating that the model was effective.

In [13]:
conv_model.evaluate(cifar_test_images, cifar_test_labels)



[1.211479663848877, 0.6499999761581421]

## Fine-tuning a Pretrained Model

To avoid the hassle of creating and training networks from scratch, pretrained models are provided. These models are well-designed and tested, and trained against very large sets of data to generalize well to many different problems. In this section, the pretrained convolutional model DenseNet121 is used to create a model to identify the CIFAR images. 

### Download the Pretrained Model

First, the pretrained model must be downloaded and configured. The model is created with initial weights from ImageNet training. The "top" of the network is the final dense layers. This is left off so that a custom set of dense layers can be added. The model is set to be untrainable so that the weights will not change when adapting the added Dense layers for the CIFAR image set.

In [14]:
from tensorflow.keras.applications import DenseNet121

conv_base = DenseNet121(
    weights='imagenet', 
    include_top=False, 
    input_shape=(32, 32, 3))

conv_base.trainable = False

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/densenet/densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5


### Build and Compile the Pretrained Model

Next, the model is built and compiled. This is very similar to creating the convolutional model from before, but the convolutional part is replaced by the pretrained model.

In [15]:
pretrained_conv_model = tf.keras.models.Sequential(layers=(
    conv_base,

    tf.keras.layers.Dropout(0.1),

    tf.keras.layers.Flatten(),

    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
))

pretrained_conv_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
densenet121 (Model)          (None, 1, 1, 1024)        7037504   
_________________________________________________________________
flatten_1 (Flatten)          (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 128)               131200    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
Total params: 7,169,994
Trainable params: 132,490
Non-trainable params: 7,037,504
_________________________________________________________________


In [0]:
# Compile the model
pretrained_conv_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.001), 
                              loss='categorical_crossentropy', 
                              metrics=['accuracy'])

### Train and Test the Model

Next, the model is trained using the same parameters as the custom model.

In [17]:
pretrained_conv_model.fit(cifar_train_images, 
                          cifar_train_labels, 
                          epochs=10, 
                          batch_size=32,
                          validation_data=(cifar_validation_images, cifar_validation_labels))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f76ba4e1240>

The model is evaluated. It seems to perform about the same on the test data as the custom convolutional network.

In [18]:
conv_model.evaluate(cifar_test_images, cifar_test_labels)



[1.211479663848877, 0.6499999761581421]

### Fine-tuning the Model

The pretrained models can also be fine-tuned by allowing only some of the layers to be changed. The code below unfreezes the layer `conv5_block1_0_bn`, and all following layers so they can be updated.

In [0]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
    if layer.name == 'conv5_block1_0_bn':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

Next, the model is re-compiled and re-trained

In [0]:
# Compile the model
pretrained_conv_model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.001), 
                              loss='categorical_crossentropy', 
                              metrics=['accuracy'])

In [21]:
pretrained_conv_model.fit(cifar_train_images, 
                          cifar_train_labels, 
                          epochs=10, 
                          batch_size=32,
                          validation_data=(cifar_validation_images, cifar_validation_labels))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7f7694117da0>

In [22]:
conv_model.evaluate(cifar_test_images, cifar_test_labels)



[1.211479663848877, 0.6499999761581421]

## Conclusion

I have learned a lot about machine learning and Artificial Intelligence throughout this course. I have gained insight into the algorithms and processes that are used to generate neural networks and perform predictions on data, as well as the process for creating these networks. The information I have learned will be useful to me in the future, should I ever need to create a neural network for machine learning in my career.