# Artificial Neural Network (ANN)

***Overview***

An artificial neural network (ANN) is a computational model inspired by biological networks of neurons found in the brain. It consists of interconnected units ("neurons" or "nodes") which receive, process, and transmit information. They are capable of recognizing intricate patterns, modeling complex relationships, and making useful predictions. 

Relationships between nodes are represented by "weights" or float values which signify the strength of the connection between them. These are adjusted during the training process as the model learns relevant patterns and attempts to optimize performance. 

The flow of information through an ANN mimics the flow of electrochemical signals as they propagate through brain tissue. It can be described as follows:  

### Layered Structure ###

1. Input layer: receives raw data as input
2. Hidden layer(s): one or more intermediate layers that perform additional operations
3. Output layer: generates a final prediction

**The Model**

The following describes the information processing that occurs in a single layer:

$$
O = f(W \cdot X + b)
$$

Where:

- *X*: the node's input as a vector whose dimensionality depends on the number of neurons in the previous layer
- *W*: the weight matrix for the layer, whose values update as training occurs 
- *b*: a vector of "bias" values that shift the activation function independently of the input, enabling it to better capture complex patterns and relationships in the data
- *f*: the activation function applied to the weighted sum of inputs
- *O*: output representing transformed data. 

**Training Process**

During the training process, the network adjusts its weights and biases to minimize the error (or loss function) between its predictions and the actual output. The learning process typically uses **gradient descent** or variants to update the weights.

$$
\text{New weight} = \text{Old weight} - \eta \cdot \frac{\partial L}{\partial W}
$$

Where **\eta** is the learning rate.

The gradient of the loss function **L** with respect to the weight matrix **W** is expressed by
$$
\frac{\partial L}{\partial W}
$$

**Activation Function Selection**

An activation function is applied to the weighted sum of inputs plus bias at each layer, introducing nonlinearity to the model. It plays a critical role by determining the output of each neuron based on its input, enabling ANNs to model complex relationships. 

Examples include: 

1. Sigmoid: outputs values between 0 and 1. 
2. Rectified linear unit (ReLU): popular in the hidden layers of deep networks, ReLU "rectifies" negative values to 0 while leaving positive values unchanged. 
3. Softmax: commonly applied in the output layer for multi-class classification problems. Outputs a normalized vector of probabilities between 0 and 1 whose sum equals 1. 

*** Implementation in Python ***

We'll be setting up our model manually to demonstrate its inner mechanics. This approach helps us understand the fundamental concepts behind neural networks, such as how layers, weights, and activation functions work together to make predictions.

In practice, it can be convenient to use one of several powerful libraries and frameworks for neural network implementation. These offer pre-built models with various architectures and optimization routines to streamline model development and training by removing the need for writing low-level code from scratch. In this notebook, we will utilize TensorFlow, an open-source framework developed by Google. It provides a comprehensive ecosystem for building and deploying machine learning models. TensorFlow is widely used because it offers:

- Ease of Use: High-level APIs like Keras make it easy to build and train models.
- Performance: Optimized for both CPU and GPU, allowing for efficient training and inference.
- Flexibility: Supports a wide range of machine learning tasks, from simple linear regression to complex deep learning models.
- Community and Support: A large community of developers and extensive documentation.
- Alternatives to TensorFlow

Alternatives include: 

- PyTorch: Developed by Facebook, PyTorch is known for its dynamic computation graph and ease of use, especially in research settings.
- Keras: Initially an independent project, Keras is now part of TensorFlow. It provides a high-level API for building neural networks.
- MXNet: An open-source deep learning framework that is scalable and supports multiple languages.
- Caffe: Developed by the Berkeley Vision and Learning Center (BVLC), Caffe is known for its speed and modularity.
- Theano: One of the earliest deep learning libraries, Theano is now mostly used for educational purposes.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler