# Neural Networks

The very first and the simplest neural network model was known as Perceptron, which was designed by Rosenblatt in 1978. Perceptron was inspired by the human nervous system. The working mechanism of perceptron is very similar to that of logistic regression.

## Logistic Regression vs Perceptron

Logistic regression is a binary classification algorithm that uses a linear function followed by a logistic/sigmoid activation function to estimate the probability of belonging to a particular class.

A perceptron is a basic building block of a neural network. It takes a set of input features, applies weights to them, and produces an output using a step function.

![image.png](attachment:878a8fbb-4407-4986-b5b5-62a11ce9ad5c.png)

The above diagram is the generalization of both perceptron and logistic regression together. In both the cases (Perceptron and Logistic Regression) the weighted sum of the input features, along with the bias term, is passed through a activation function to produce the predicted output. The only difference between Perceptron and Logistic Regression is the activation function.

For logistic regression:
    $f(z_i) = \frac{1}{1 + e^{-z_i}}$

For Perceptron:
$f(z_i) = \begin{cases}
    1, & \text{if } z_i > 0 \\
    0, & \text{otherwise}
\end{cases}$



## Multi-Layer Perceptron

MLP (aka feedforward neural network) is an extension of the perceptron, where multiple layers of neurons are stacked together. It consists of an input layer, one or more hidden layers, and an output layer. Each neuron in a layer is connected to all neurons in the adjacent layers. The hidden layers enable the model to learn more complex representations of the input data.

The architecture of MLP can be represented as follows:

![image.png](attachment:f35bb77e-6c9e-4eb8-9b49-fcfca7da20bb.png)

```scss
Input Layer -> Receives the input features and passes them to the hidden layers
Hidden Layer(s) -> Applies a weighted sum of inputs and an activation function to produce an output
Output Layer -> Applies the same process to generate the final output

## Activation Functions

An activation function introduces non-linearity into the network, enabling it to learn complex relationships between inputs and outputs. Commonly used activation functions include:

**Sigmoid:** It squashes the weighted sum into the range [0, 1].

$Sigmoid(x) = \frac{1}{1 + e^{-x}}$

**Tanh:** Similar to sigmoid, but squashes the weighted sum into the range [-1, 1].

$Tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

**ReLU (Rectified Linear Unit):** Sets negative values to 0 and keeps positive values as they are.

$ReLU(x) = max(0, x)$

**Leaky ReLU:** Similar to ReLU, but allows a small negative value for negative inputs.

$Leaky-ReLU(x) = max(0.01*x, x)$

## Preceptron

![image.png](attachment:5bdd0409-a713-4767-b6e4-6f8bb05c6390.png)

## Multi-Layer Preceptron ( Neural Network)

![image.png](attachment:32ec1d25-903b-4308-9de0-a8a266b86928.png)