# Dense & Deep Neural Networks

We have only covered simple parametric models so far and we've only been studying the single neuron model. Neural networks is literally an expansion of these concepts. Instead of a single nueorn, neural networks contain multiple layers of interconnected neurons. Each hidden layer of neurons receive input from the last layer. The output of the final layer is the final output of the network as a whole. This algorithm is inspired by the human brain, and definitely a bit more realistic than a single neuron model!

Neural networks work for both regression and classification tasks. In particular, neural networks are a great strategy for models that aren't linearly separable (and even look a little wonky :)), and it can pretty much learn any dataset (but we may run into the problem of overfitting). Neural networks can also perform dimensionality reduction, meaning we can take a multidimensional matrix and reduce it down to a smaller dimension to work with. 

# How does a neural network actually work?
Each individual node acts as its own single neuron model, composed of weights and biases and an output. This time, however, we are dealing with an entire layer of nodes. For a single neuron, the $z$ output is the same: $z = b+ w_1x_1 + w_2x_2 +...w_nx_n$. This scalar output effectively functions as a new input for the next layer of neurons- a phase called *feed-forward*. We can do this process for each layer until an output is arrived at. Passing a signal through these layers creates a highly composite function that approximates a target function $f$.

The Univervsal Approximation Theorem summarizes this pretty well. Essentially, the use of a nonlinear activation function (i.e.: Sigmoid!?) within the neural network can approximate any "reasonable" function, given that a network has at least one hidden layer of nonlinear units and a single linear output. Without this component of nonlinearity, the model would be reduced down to linear regression and it would not be able to learn the complex relationships between points. For example, we can use the Sigmoid function, but other common functions include: tanh, RELU, and Maxout. 

The next big step to the neural network algorithm is the cost function, dubbed *backpropogating*. We find the gradient of the cost function and update with a learning rate, as backpropogation is a method to adjust the weights from the errors found during learning. Errors from the ouput are sent backward through the network to adjust the weights and bisases. This iteration continues until the cost function is minimized, so this is just a large scale version of our single neuron model!


Neural networks are extensively powerful, but they also have their own limitations. For one, neural networks are computationally expensive and have been criticized to require too many training samples in order to be suitable for real-world operation. As such, they also tend to be prone to overfitting and require careful parameter tuning. Still, these remain a powerful tool for solving ML problems. 


Neural networks are used across many industries, including medical image classifications, financial predictions, chemical compound identification, and many,many more. It's a huge element of supervised learning. In this notebook, we will utilize the TensorFlow package to analyze

# Application

In [2]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.13.0


In [3]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

In [4]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

In [5]:
predictions = model(x_train[:1]).numpy()
predictions

array([[ 1.1483834 , -0.50336653,  0.22717337,  0.15045643,  0.722004  ,
         0.12627143, -0.42363054,  0.21585673,  0.7473945 , -0.2706663 ]],
      dtype=float32)

In [6]:
tf.nn.softmax(predictions).numpy()

array([[0.22302099, 0.04275627, 0.08877064, 0.08221509, 0.14560339,
        0.08025058, 0.04630509, 0.08777171, 0.14934768, 0.05395855]],
      dtype=float32)

In [7]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

In [8]:
loss_fn(y_train[:1], predictions).numpy()

2.5226011

In [9]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

In [10]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x2ab03be9df0>

In [11]:
model.evaluate(x_test,  y_test, verbose=2)

313/313 - 1s - loss: 0.0687 - accuracy: 0.9780 - 574ms/epoch - 2ms/step


[0.0687047615647316, 0.9779999852180481]