# **1. General Concepts**

### **What is Artificial Intelligence?**

To understand the influence and popularity of artificial intelligence, one must first look back in history. Since ancient times, people have tried bringing "life" to machines, giving them some form of independence and individuality. This manifested in "automatons," machines that capable of performing a programmed set of instructions. Without a person's control, these machines were able to perform given jobs (albeit for small, minimal tasks). 

An example of automatons: https://youtu.be/uzM32wVTbsY

There were also myths of statues and machines coming to life, acting on their own will and being able to feel. Examples of this include the Greek stories of the Argonaunts' experience against the giant machine Talos, or Pygmalion falling in love with the statue Galatea that would be granted life by Aphrodite. Long before modern civilization the idea of bringing human qualities to objects existed.

  With the advent of the digital computer came the field of artificial intelligence. To understand what "artificial intelligence" is, one must first think what "intelligence" means in general. View the following two examples

  > A computer and a young child being given the task of adding two numbers together.

  > The computer and a young child playing a game of checkers. 

  For the first example, would one necessarily say the two show signs of intelligence? For both, they be given the answer "no" (although there may be a slightly more mixed answer to the child). 

  However, note the second example. While both subjects are still performing a relatively simple job, there is a lot more factors and biases that fall into play. Overall, this example needs a lot more decision making than the first. This is where artificial intelligence becomes involved. It is allowing computers to essentially "act" like humans in our decision making abilities.

  Now, by giving computers the ability to perform intensive decision making operations, what benefit do we gain? What jobs can be done? Doing so allows computers to perform tasks that range from image classification to stock prediction to driving cars. These are problems that we cannot program a computer to solve through a deterministic approach, as there are far too many variables and cases to factor in. We are essentially trying to program computers in such a way that they are able to perform non-deterministic jobs, ones that rely on heuristics instead of absolute certainty. 

Over the years there have been many different ways that artificial intelligence have been studies and attempted. Some of these methods include computational neuroscience, symbol recognition, and statistical reasoning.

### **What is Machine Learning?**

Machine learning (ML) can be considered a subset of artificial intelligence (AI); all ML is AI while not all AI is ML. Machine learning takes our idea of programming and completely revamps it. Instead of giving our computer some form of input and a programmed set of instructions, we rely on the computer essentially "building itself" the program. Instead of necessarily programming the specific actions for the computer to do to solve the problem, we instead program a means for it interpret given data and perform a specific action based on it. Examples of these include decision trees and neural networks. 

The ending goal of machine learning is what our ancestors dreamed of, allowing a machine to act and perform on its own, not needing to specifically follow a preprogrammed set of instructions. However, this again begs the question of if this is truly artificial intelligence. Despite being self-taught, it relied on a large amount of data being fed to it. Also, the algorithms and overall model of the program that allowed it to be able to learn was built by humans. So could one say that we merely programmed it to follow rules that would allow it to learn? Thereby still leaving the machine as merely...a machine. A cold shell incapable of anything other than following orders. 


### **What is Deep Learning?**


Deep learning is a form of machine learning. It relies on the use of neural networks, a simuation of our brain's neurons. Inside our brain, there are billions upon billions of neurons, which are nerve that connect to each other through axons and dendrites. When a neuron "fires," it sends a signal along it's axon to the receiving neuron's dendrite. This results in a chain reaction of the receiving node firing off and repeating the process. A neural network works in a similar fashion. There are multiple neurons connected to each other in layers, and each connecting edge has a specific weight to it. For each neuron, a function occurs based on the input given to it, with it's output being passed to the next layer. 

Through this process, we are able to end up with a "self-taught" program. We give in a large amount of data, usually on the magnitude of hundreds to thousands, and allow the program to incorrectly guess until it is able to do so correctly. 


# **2. Basic Concepts**

### **Linear Regression**

Linear regression is one way we form relationships between data. Given a set of inputs and their respective outputs, the goal of linear regression is to find a line that best models the graphed data. The formula used for representing our line is of the form

$\hat{y}= b + w_0 + \sum_{i=1}^{n} w_ix_i$

This is merely a modified version of the line equation, $y = mx + b$, where

$w_0$ is what we call a bias term. It is a constant, 1, that we give the model to help better fit the data as the activation functions occur. 

$w_i$ is a set of weights that we use to create a line out of our given data by multiplying our x-values with, similar to m in the line equation. 

$x_i$ is our set of inputs

$b$ is how much we we need to vertically shift the line

Through linear regression, we will end up with a resulting line that should be fairly accurate. This means that if we end up giving an input that was not in our initial training dataset, the resulting line would be able to give a decently accurate guess on the resulting output. 



### **Logistic Regression**

Logistic regression is another way we form relationships with data. While linear regression was based around trying to form a linear relationship between our input and output sets in order to allow the program to guess an answer on future inputs, logistic regression deals in discrete values. This is useful for classification problems, where you try to group pieces of data together based on some characteristic. 

The formula for logistic regression is of the form

$P=\frac{1}{1+e^{-(w_0 + \sum_{i=1}^{n}w_ix_i)}}$



### **Gradient**

Gradient is a vector of a function's growth over time, giving us a direction to the point of greatest increase. It can be thought of as the function's partial derivatives for every input variable, or essentially "how steep is the slope at x point." At a local maximum or minimum, gradient would be zero.

### **Gradient Descent**

To optimize our regression function, our goal is to minimize the error in our guess from the actual result. This involves trial and error of different weight values so we can the optimal set (which would occur when we found the global maximum of function's gradient). To speed up the process, gradient descent can be used. This algorithm involves making large strides while our current set of weights are very innacurate, and making smaller ones as we get closer and closer to the true values. To do this algorithm, we begin by picking up a random starting value and calculating the loss function after comparing our resulting outputs with the true outputs (this process is known as calculating the Loss Function). Depending on our result, we would move in the direction of the negative gradient. By doing this we would eventually converge to our optimal values. 

Three popular forms of gradient descent are: batch, mini-batch, and stochastic gradient descent. Batch gradient descent operates on the entire batch of data for each step, taking the error for each input and using their average to calculate the gradient. Mini-batch operates similarly, except on small subsets of data. Stochastic gradient descent relies on randomly choosing a single piece of data and using that to update our gradient.


# **3. Building a Model**


Building a model first consists of preparing two sets of images for your model to use. The first is a set of training data that will be used to actually train the model and have it learn the necessary values needed to properly fit what is given, and the second is known as a validation set. This will be used both during and after training to test the model's current fit with regards to the training set. Usually the initial chunk of data is broken into an 80:20 split for our two sets. 

Original source code for snippets: https://github.com/schneider128k/machine_learning_course/blob/master/fashion_items_classification_dense_layers_2.ipynb

In [0]:
%tensorflow_version 2.x
import tensorflow as tf

fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

print("Train Images Size:", train_images.shape)
print("Train Labels Size:", train_labels.shape)
print("Test Images Size:", test_images.shape)
print("Test Labels Size:", test_labels.shape)

Now that we have loaded our dataset, it is time to actually build our model. This involves creating layers of neurons, each with their own activation function. By default, we need at least an input and an output layer, with the hidden layers in between being chosen based on model optimization. 

In [0]:
def build():
  model = tf.keras.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation=tf.nn.relu),
      tf.keras.layers.Dense(10, activation=tf.nn.softmax)
  ])
  return model

To understand some of the terminology in the code above:

*   Sequential: Initializes the model as a linear series of neuron layers
*   Flatten: Given an input tensor of a specific size, it flattens it into a 1D array
*   Dense: A standard dense neural network that consists of a given amount of neurons and performs a given activation function



Notice that the activation for the two layers are different. This is because each layer can have a different type of function, each of which gives a specific type of output and should be used depending on what type of problem is given. 

Some specific types of activation functions are:

* elu: Returns x if x > 0, alpha*(exp(x)-1) if x , 0
* relu: Returns a tensor of max(x, 0) for each element
* sigmoid: Returns 1 / 1 + exp(-x)
* linear: Returns the given input tensor

For more information about activation functions:  https://keras.io/activations/



# **4. Compiling a Model**

Once we have created our model, it is time to compile our model. In keras, this involves passing in an optimizer and a loss function. 



### **Optimizers**

Optimizer functions allow us to determine how much the weights of the functions will be changed based on the loss function's results. 

Some types of optimizer functions include:

*   Stochastic Gradient Descent
*   Adam
*   Adagrad

More types of optimizer functions (in keras): https://keras.io/optimizers/

### **Loss Functions**

Loss Functions essentially tell us how bad our network's guess was. In training, you want this value minimized (however, by minimizing too much you run into overfitting, which will be discussed further on). 

Some types of loss functions include:

*   Mean squared error
*   Squared hinge
*   Categorical Cross-entropy
*   Binary Cross-entropy

These functions take in a guess, y, and the learning rate, $\alpha$. $\alpha$ controls how slow/fast the model will learn. The goal for choosing $\alpha$ is to have a value that is not too large (you may jump over the minimum we are trying to find), and not too small (the model will take forever to find the minimum). 

More type of loss functions (in keras): https://keras.io/losses/

The following code snippets an example of the functions "binary cross-entropy" and "mean squared error," written in numpy.

In [0]:
import numpy as np

def binary_cross_entropy(y, a):
  return -(y * np.log10(a)) - ((1 - y) * np.log10(1 - a))

def mean_squared_error(y, a):
  return (1 / 2) * (a - y) * (a - y)

In [0]:
def compile(model):
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
  return model

You can also pass in a metrics function, which is how your model will be judged when it comes to its performance. This is similar to its loss functions, but are not used for model training. 

# **5. Training a Model**


Now that we have compiled the model, it is time to start training the model and having it fit our data. This involves running through our data and performing gradient descent for a given amount of epochs. The goal for the number of epochs is to fit the data properly, but not too much. We want to avoid the following issues:

**Overfitting:**

![Overfitting](https://miro.medium.com/max/1000/1*Di7rY6ALXtkhlmlcKRSCoA.png)

Overfitting occurs when the model understands the model to such a degree that it is able to essentially mimic the dataset's shape. This is not a good thing to have occur because the model will not be ready to handle outside data. Currently, it is the equivalent of showing a child code. Are they able to copy it down and create an exact copy of the code? Yes. Will they be able to write something else? Will they be able to rewrite their code in such a way that it can work for a similar problem but of different structure? No. 


**Underfitting:**

![Underfitting](https://miro.medium.com/max/1000/1*kZfqaD6hl9iYGYXkMwV-JA.png)


Underfitting occurs when your model essentially fails to learn. If we were to graph the data of our inputs and their true outputs, our model was unable to form a similar shape. This can occur if the model was not complex enough for evaluation the function for the dataset. 

Source for images: https://towardsdatascience.com/overfitting-vs-underfitting-a-complete-example-d05dd7e19765


In [0]:
model = build()
model = compile(model)

history = model.fit(train_images, train_labels, epochs=25, validation_data=(test_images, test_labels))

# **6. Finetuning a pretrained model**


  When you are viewing a new problem that requires a model, one should first look to see if a previously made model can be used. Whether it may be yours or an open-source available one. This is because even though the problems needed may be of different topics, the underlying structure and methodology could be shared with other types of models. 

  For example, one can obtain and finetune a premade convolutional neural network base to work with their given data. To start the process of finetuning, one must first obtain a pre-trained model. With the new model, freeze all of the model's layers so as not to lose the model's pretrained information. Now, a classifier must be designed and attached to the base. Once attached, train the classifier, making sure to avoid overfitting. Overfitting can be prevented by only training the outermost layer of our model. 

  After one has created and trained a new model, they should be able to use it to solve their initial problem with some degree of success. Depending on how satisfactory that value is, they may go back and try to modify some parameters in the model. This can include variables such as epochs, size of epochs, number of neurons, and learning rate. With pretrained models, one can also try to unfreeze different layers and adding data augmentation to the training dataset. This involves modifying selected data such that it is different in some way from its original state, while still being able to produce the same classification output. For example, in image classification one can resize images or change the color filter. The overall goal of this is to produce a new piece of data that is "similar, yet different."

For original code source, as well as the ability to view a full example of finetuning: https://drive.google.com/file/d/10U3mokqzeJPUNWRYFNMK7ZgjHhlav3K5/view?usp=sharing

## **Example:**


In [0]:
from keras.applications import Xception
from keras import layers
from keras import models
from keras import optimizers
from keras.preprocessing.image import ImageDataGenerator

Here, we chose the Xception convolutional neural network base. You can see I used a pretrained model from keras, in this situation it was Xception.

When I create my sequential model, I add my Xception model but then I add a relu activation layer and a sigmoid activation layer.

While this does not directly change the conv_base pretrained model it does help to improve our performance as we can add layers specific to our use case - whether we're doing binary classification or a different classification problem.

In [0]:
conv_base = Xception(
    weights='imagenet', 
    include_top=False, 
    input_shape=(150, 150, 3))

conv_base.summary()

Before adding any extra layers, it is important to freeze here as we do not want anything in this section to be modified yet. 

In [0]:
conv_base.trainable = False

Now we can add our layers ontop of this base. Here, we chose to create a sequential system with only two layers. The input layer will consist of 256 neurals all triggered by the "relu" activation function, which will all be funelled into a 1-neuron output layer with a sigmoid activation function. 

In [0]:
model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

Before compiling and training the model, the following snipped shows how data augmentation works. For our training dataset, we will resizing and reshaping the image before passing it to our network. 

In [0]:
train_datagen = ImageDataGenerator(
    rescale=1./255, 
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary')

validation_datagen = ImageDataGenerator(rescale=1./255)

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary')

Now we can begin compiling and training as normal

In [0]:
model.compile(
    loss='binary_crossentropy', 
    optimizer=optimizers.RMSprop(lr=2e-5), 
    metrics=['acc'])

In [0]:
history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=50
)

After training, we can begin finetuning. To start, unfreeze a specific amount of layers in our model. 

In [0]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
  if layer.name == 'conv2d_4':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

From there, you can begin retraining the model with new parameters given. Based on results of the model after this, one can go back and repeat the following process. 

In [0]:
model.compile(
    loss='binary_crossentropy',
    optimizer=optimizers.RMSprop(lr=1e-5), 
    metrics=['acc'])

history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=100,
    validation_data=validation_generator,
    validation_steps=50)