<a href="https://colab.research.google.com/github/rajesh-coventry/Foundational-Neural-Network-Perceptron-PyTorch/blob/master/01_Foundational_Neural_Network_(Perceptron).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Foundational Neural Network (Perceptron):**

Perceptron is a commonly used term in the arena of `Machine Learning` and `Artificial Intelligence`. Being the most basic component of Machine Learning and Deep Learning technologies, the perceptron is the elementary unit of an `Artificial Neural Network`.

## **What is Perceptron?**

- A perceptron is the `smallest element` of a neural network.

- Perceptron is a `single-layer neural network` or a Machine Learning algorithm.

- It works as an `artificial neuron` to perform computations by learning elements and processing them for detecting the business intelligence and capabilities of the input data.

- A `perceptron network` is a group of simple logical statements that come together to create an array of complex logical statements, known as the `neural network`.

> ![](https://www.analytixlabs.co.in/wp-content/uploads/2022/07/biological-neuron-analytix-labs.jpg)

- The human brain is a `complex network` of billions of interconnected cells known as `Neurons`. These cells process and transmit signals. Biological neurons respond to both `chemical and electrical signals` to create the `Biological Neural Network` $(BNN)$.

- The input and output signals can either be `excitatory` or `inhibitory`, meaning that they can either increase or decrease the potential of the neuron to fire.

- The structure of a biological neuron consists of a `Synapse`, `dendrites`, `Soma` or `the cell body`, and `axon`. All these components participate in the `neural processing` performed by `neurons`. `Synapse` connects an `axon` to another `neuron` and also processes the inputs. `Dendrites` receive the signals while the `Soma` sums up all the incoming signals. The transmission of signals to other neurons is carried by the `axon`. A `Biological Neural Network` slowly yet efficiently processes highly complex parallel inputs.

## **Artificial Neuron:**

- An `artificial neuron` is based on a model of `biological neurons` but it is a `mathematical function`.

- The neuron takes inputs in the form of `binary values` i.e. `1` or `0`, meaning that they can either be `ON` or `OFF`.

- The output of an `artificial neuron` is usually calculated by applying a `threshold function` to the sum of its `input values`.

- The `threshold function` can be either `linear` or `nonlinear`. A `linear threshold function` produces an output of 1 if the sum of the input values is greater than or equal to a certain threshold, and an output of 0 if the sum of the input values is less than that threshold. A `nonlinear threshold function`, on the other hand, can produce any output value between 0 and 1, depending on the inputs.

- An `Artificial Neural Network` $(ANN)$ is built on artificial neurons and based on a `Feed-Forward` strategy. It is known as the simplest type of neural network as it continues learning irrespective of the data being `linear` or `nonlinear`. The information flow through the nodes is continuous and stops only after reaching the output node.

## **Biological Neural Network Vs Artificial Neural Network:**

The structure of `artificial neurons` is derived from `biological neurons` and the network is also formed on a similar principle but there are some differences between a `biological neural network` and an `artificial neural network`.

> ![](https://www.analytixlabs.co.in/wp-content/uploads/2022/07/Biological-Neural-Network-Vs-Artificial-Neural-Network.jpg)

## **Perceptron Vs Neuron:**

- The `perceptron` is a mathematical model of the `biological neuron`. It produces binary outputs from input values while taking into consideration `weights` and `threshold values`. Though created to imitate the working of `biological neurons`, the perceptron model has since been replaced by more advanced models like `backpropagation networks` for training `artificial neural networks`. Perceptrons use a brittle activation function to give a positive or negative output based on a specific value.

- A `neuron`, also known as a node in a `backpropagation artificial neural network produces graded values between 0 and 1. It is a generalization of the idea of the perceptron as the neuron also adds weighted inputs. However, it does not produce a binary output but a graded value based on the proximity of the input to the desired value of 1. The results are biased towards the extreme values of 0 or 1 as the node uses a sigmoidal output function. The graded values can be interpreted to define the probability of the input’s category.

## **Components of a Perceptron:**

> ![](https://www.analytixlabs.co.in/wp-content/uploads/2022/07/04.jpg)

**Each perceptron comprises four different parts:**

1. **`Input Values`:** A set of values or a dataset for predicting the output value. They are also described as a dataset’s features and dataset.

2. **`Weights`:** The real value of each feature is known as weight. It tells the importance of that feature in predicting the final value.

3. **`Bias`:** The activation function is shifted towards the left or right using bias. You may understand it simply as the y-intercept in the line equation.

4. **`Summation Function`:** The summation function binds the weights and inputs together. It is a function to find their sum.

5. **`Activation Function`:** It introduces non-linearity in the perceptron model.

## **Why do we Need Weight and Bias?**

- `Weight` and `bias` are two important aspects of the perceptron model.

- These are `learnable parameters` and as the network gets trained it adjusts both parameters to achieve the desired values and the correct output.

> ![](https://www.analytixlabs.co.in/wp-content/uploads/2022/07/05.jpg)

- `Weights` are used to measure the `importance of each feature` in predicting output value.

- Features with values close to zero are said to have lesser weight or significance. These have less importance in the prediction process compared to the features with values further from zero known as `weights with a larger value`.

- Besides, high-weighted features having greater predictive power than low-weighting ones, the weight can also be positive or negative.

- If the weight of a feature is positive then it has a direct relation with the target value, and if it is negative then it has an inverse relationship with the target value.

> ![](https://www.analytixlabs.co.in/wp-content/uploads/2022/07/06.jpg)

In contrast to `weight` in a neural network that increases the speed of triggering an activation function, `bias` delays the trigger of the `activation function`.

It acts like an `intercept` in a linear equation. Simply stated, `Bias` is a constant used to adjust the output and help the model to provide the best fit output for the given data.

---
---
---

A perceptron is a fundamental computational unit in machine learning and neural networks, representing one of the earliest and most important concepts in artificial intelligence.

## **What is a Perceptron?**

A perceptron is a linear binary classifier that takes multiple inputs, applies weights to them, sums them up, and produces a binary output (0 or 1, or -1 and +1) based on whether the weighted sum exceeds a certain threshold. It was invented by Frank Rosenblatt in 1957 and represents a mathematical model inspired by biological neurons.

The basic structure consists of:
- **`Input layer`**: Receives input features ($x₁$, $x₂$, ..., $xₙ$)

- **`Weights`**: Each input has an associated weight ($w₁$, $w₂$, ..., $wₙ$)

- **`Bias`**: An additional parameter $(b)$ that shifts the decision boundary

- **`Activation function`**: Typically a step function that produces binary output

- **`Output`**: A single binary classification result

**Mathematical Representation:**

**The perceptron's output is calculated as:**

```
y = f(∑(wᵢ × xᵢ) + b)
```

Where $f$ is the activation function, typically:

```
f(z) = 1 if z ≥ 0
f(z) = 0 if z < 0
```

## **Is Perceptron the Most Basic Building Block?**

Yes and no. The perceptron is historically the most basic form of an `artificial neuron` and serves as the conceptual foundation for neural networks. However, modern neural networks use more sophisticated units:

**As a building block:**
- The `perceptron` introduced the concept of `weighted inputs` and `thresholding`

- It established the framework for `learning through weight adjustment`

- Modern neurons in neural networks are essentially `enhanced perceptrons` with different activation functions

**Limitations as a building block:**
- Single perceptrons can only solve linearly separable problems

- Modern neural networks use neurons with continuous activation functions (`sigmoid`, `ReLU`, `tanh`)

- Deep networks require more sophisticated architectures than `simple perceptron stacking`

## **Perceptron Learning Rules:**

### **1. Perceptron Learning Rule (Original):**

This is the most fundamental learning algorithm:

**`Algorithm:`**

```
For each training example (x, target):
1. Calculate output: y = sign(w·x + b)
2. Calculate error: error = target - y
3. Update weights: w = w + α × error × x
4. Update bias: b = b + α × error
```

Where $α$ $(alpha)$ is the learning rate.

**Key characteristics:**
- Guaranteed to converge for linearly separable data

- May not converge for non-linearly separable data

- Updates weights only when classification is incorrect

### **2. Delta Rule (Widrow-Hoff Rule):**

An improved version that uses continuous error:

**`Algorithm:`**

```
For each training example (x, target):
1. Calculate net input: net = w·x + b
2. Calculate error: error = target - net
3. Update weights: w = w + α × error × x
4. Update bias: b = b + α × error
```

**Advantages:**
- Uses continuous error rather than binary error

- More stable convergence properties

- Forms the basis for backpropagation in neural networks

### **3. Pocket Algorithm:**

**Designed for non-linearly separable data:**

**Algorithm:**
- Maintains the best weight vector found so far

- Updates weights using perceptron rule

- Keeps track of the longest run of correct classifications

- `"Pockets"` the best weights when a better solution is found

## **Types of Perceptrons:**

### **1. Single-Layer Perceptron (Simple Perceptron):**

**Structure:**
- One input layer connected directly to one output layer

- No hidden layers

- Each output unit is an independent perceptron

**Capabilities:**
- Can only solve linearly separable problems

- **Examples:** `AND`, `OR`, `NOT` gates

- Cannot solve `XOR` problem

**Mathematical limitation:**
Can only learn decision boundaries that are hyperplanes in the input space.

### 2. Multi-Layer Perceptron (MLP)

**Structure:**
- Input layer, one or more hidden layers, and output layer

- Each layer fully connected to the next

- Uses non-linear activation functions

**Capabilities:**
- Can solve non-linearly separable problems

- Universal function approximator (with sufficient hidden units)

- Can learn complex decision boundaries

**Key differences from single-layer:**
- Requires backpropagation for training

- Uses continuous activation functions (`sigmoid`, `tanh`, `ReLU`)

- Can have multiple outputs

### 3. Voted Perceptron

**Concept:**
- Maintains multiple perceptrons from different stages of training

- Final decision based on weighted voting

- Each perceptron gets a vote proportional to how long it survived

**Advantages:**
- Better generalization than single perceptron

- Handles noisy data more effectively

### **4. Averaged Perceptron:**

**Concept:**
- Maintains running average of all weight vectors during training

- Final weights are the average of all intermediate weights

- Reduces overfitting to training data

**Benefits:**
- More stable than standard perceptron

- Better performance on test data

- Computationally efficient

## **Detailed Learning Process:**

### Training Phase:
1. **`Initialize weights`**: Usually to small random values

2. **`Present training examples`**: One at a time or in batches

3. **`Calculate output`**: Using current weights

4. **`Compare with target`**: Calculate error

5. **`Update weights`**: Based on learning rule

6. **`Repeat`**: Until convergence or maximum iterations

### **Convergence Properties:**
- **`Perceptron Convergence Theorem`**: For linearly separable data, the perceptron learning algorithm will converge in finite steps

- **`Non-separable case`**: Algorithm may oscillate indefinitely

- **`Learning rate impact`**: Too high causes instability, too low causes slow convergence

## **Limitations and Solutions:**

### **Major Limitations:**

1. **`Linear separability`**: Cannot solve XOR and other non-linearly separable problems

2. **`Binary output`**: Limited to classification tasks

3. **`No probabilistic interpretation`**: Cannot provide confidence measures

### **Historical Solutions:**
1. **`Multi-layer networks`**: Overcome linear separability limitation

2. **`Continuous activation functions`**: Enable gradient-based learning

3. **`Ensemble methods`**: Combine multiple perceptrons for better performance

## **Modern Relevance:**

While basic perceptrons are rarely used alone in modern applications, their principles remain fundamental:

- **`Feature engineering`**: Understanding linear separability helps in feature design

- **`Ensemble methods`**: Perceptrons serve as weak learners in ensemble algorithms

- **`Online learning`**: Perceptron-style updates are used in streaming data scenarios

- **`Large-scale learning`**: Simple perceptrons are computationally efficient for big data

The perceptron's elegance lies in its simplicity and mathematical tractability, making it an excellent educational tool for understanding the foundations of machine learning and neural networks. Despite its limitations, the concepts it introduced—`weighted inputs`, `threshold activation`, and `iterative learning`—remain central to all modern neural network architectures.

----

## **Single Perceptron vs Human Neuron:**

**Single Perceptron:**
- Sigle Perceptron the most basic artificial unit that mimics aspects of biological neurons

- However, it's a **`highly simplified`** model of a human neuron

- Real biological neurons are vastly more complex with:
  - Thousands of synaptic connections
  - Complex temporal dynamics and spike patterns
  - Chemical neurotransmitter systems
  - Non-linear dendritic processing
  - Plasticity mechanisms far more sophisticated than simple weight updates

**`So while a single perceptron is inspired by neurons, it's more accurate to say it captures only the basic concept of "weighted inputs → threshold → output" rather than being equivalent to a human brain neuron.`**

## **Architecture Terminology Clarification:**

### **Single Perceptron Architecture:**  
- **`One perceptron = One artificial neuron`**
- This is indeed the most basic form
- Can only solve linearly separable problems

### **Multiple Perceptrons - Two Scenarios:**

**1. Multiple Independent Perceptrons (Parallel):**

```
Input → [Perceptron 1] → Output 1
Input → [Perceptron 2] → Output 2  
Input → [Perceptron 3] → Output 3
```

- Still considered single-layer architecture
- Each perceptron solves a separate binary classification
- Used for multi-class problems (one-vs-all approach)

**2. Multi-Layer Perceptron (Sequential Layers):**

```
Input → [Hidden Layer Perceptrons] → [Output Layer Perceptrons] → Output
```

- This creates a **neural network**
- Perceptrons are organized in layers
- Information flows through multiple processing stages

## **Multi-Layer Perceptron vs Neural Network:**

> **`Multi-Layer Perceptron (MLP) IS a Neural Network`**

**MLP characteristics:**
- Multiple layers of perceptrons/neurons

- Fully connected between adjacent layers

- Uses non-linear activation functions (not just step functions)

- Trained with backpropagation

**Why it's called both:**
- **Historically**: Called "Multi-Layer Perceptron" because it evolved from single perceptrons

- **Functionally**: It **is** a neural network - specifically a type of feedforward neural network

- **Modern usage**: People often use "neural network" and "MLP" interchangeably

**The Relationship:**

```
Neural Networks (Broad Category)
├── Feedforward Neural Networks
│   ├── Multi-Layer Perceptron (MLP)
│   └── Convolutional Neural Networks (CNN)
├── Recurrent Neural Networks (RNN)
├── Transformer Networks
└── Other architectures...
```

## **Key Differences in Detail:**

### **Single Perceptron:**
- **`Layers`**: Input → Output (no hidden layers)

- **`Capability`**: Linear decision boundaries only

- **`Problems it can solve`**: `AND`, `OR` gates

- **`Problems it cannot solve`**: `XOR`, complex patterns

### **Multi-Layer Perceptron (Neural Network):**
- **Layers**: Input → Hidden Layer(s) → Output
- **Capability**: Non-linear decision boundaries
- **Universal approximation**: Can theoretically approximate any continuous function
- **Problems it can solve**: XOR, image recognition, complex pattern matching

## **Modern Terminology:**

**What we call "Neural Networks" today typically refers to:**

- Multi-layer architectures (MLPs or more complex)

- Use of non-linear activation functions

- Gradient-based training (backpropagation)

- Multiple neurons per layer

**`Single perceptrons are rarely called "neural networks" in modern usage`** - they're considered the building blocks or historical predecessors.

## **Practical Example:**

**Single Perceptron solving `AND gate`:**
```
x1=0, x2=0 → output=0 ✓
x1=0, x2=1 → output=0 ✓  
x1=1, x2=0 → output=0 ✓
x1=1, x2=1 → output=1 ✓
```

**Single Perceptron trying `XOR gate`:**
```
x1=0, x2=0 → output=0 ✓
x1=0, x2=1 → output=1 ✓
x1=1, x2=0 → output=1 ✓
x1=1, x2=1 → output=0 ✗ (Cannot achieve this with linear boundary)
```

**MLP solving `XOR gate`:**
- Requires at least one hidden layer with 2 neurons

- Can create the necessary non-linear decision boundary

- Successfully solves the problem

**Summary:**

- **Single perceptron**: Most basic artificial neuron, inspired by but much simpler than biological neurons

- **Multiple perceptrons in layers**: Creates a Multi-Layer Perceptron, which **is** a neural network

- **MLP = Neural Network**: These terms are essentially equivalent in most contexts

- **The key transition**: From linear (single perceptron) to non-linear capabilities (MLP/neural network)

----
---
----
----
----
-----
----
----
----
----