# **Neural Networks in Deep Learning** 

#### **What is a Neural Network?**

**`Neurons < Neural Networks < Deep Learning < Machine Learning < Artificial Intelligence < Algorithms < Mathematics`**

- **Mathematics**: The study of numbers, quantities, shapes, and patterns
- **Algorithms**: Automated instructions for solving problems
- **Artificial Intelligence**: Programs with the ability to mimic human behavior
- **Machine Learning**: Algorithm with the ability to learn without being explicitly programmed
- **Deep Learning**: Subset of machine learning in which artificial neural networks adapt and learn from vast amounts of data

> A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. 

- Neural networks can adapt to changing input; so the network generates the best possible result without needing to redesign the output criteria.
- It is a `functional unit / building block` of deep learning.
- In essence. neural networks are a collection of `nodes`, or `neurons` that are interconnected and work together to process complex data inputs and transmit signals to other neurons in the network.
- These networks are used in machine learning for pattern recognition and decision making, functioning in a way that is similar to the human brain's structure and operation.

![image.png](attachment:image.png)

#### **Components of a Neural Network**
1. **Neurons**: Basic units of neural networks, analogous to human brain neurons, which process and transmit information.
2. **Layers**: Composed of multiple layers of neurons: an input layer to receive the signal, one or more hidden layers to process the signal, and an output layer to deliver the result.
   - `Input Layer`: The first layer of the neural network, which takes in raw data.
   - `Hidden Layers`: Layers of neurons that process inputs and transmit signals to the next layer.
   - `Output Layer`: The final layer of neurons that produce the output of the neural network.
3. **Weights and Biases**: Connections between neurons have weights that adjust as learning proceeds, and neuron may have a bias term, both of which determine  the strength and the direction of the influence one neuron has on another.
4. **Activation Functions**: Fuctions that decide whether a neuron should be activated or not, influencing the network's ability to learn complex patterns. 

![components.png](attachment:components.png)

#### **Structure of Typical Neuron vs Artificial Neuron**
- **Typical Neuron**: The human brain is composed of approximately 86 billion neurons, each of which is connected to other neurons by synapses. Neurons are the basic building blocks of the nervous system, and they transmit information to other neurons, muscles, or gland cells.
- **Artificial Neuron**: The artificial neuron is a mathematical function conceived as a model of biological neurons. Artificial neurons are the basic units of an artificial neural network. They are simple processors which are intended to simulate the way a biological brain processes information.

![biological_neuron_vs_artificial_neuron.png](attachment:biological_neuron_vs_artificial_neuron.png)

| | Biological Neurons | Artificial Neurons |
|---|---|---|
| **Dendrites / Input** | Dendrites are the structures on the neuron that receive electrical messages from other neurons. | The inputs in an artificial neuron are the values or weights that the neuron receives from previous neurons or initial data. |
| **Nucleus / Node (or) Body** | The nucleus of a neuron contains the genetic material in the form of chromosomes. | The node or body of an artificial neuron is where all the inputs are summed and processed using an activation function. |
| **Axon / Output** | The axon is a tube-like structure that propagates the electrical signal (action potential) from the neuron to other neurons. | The output of an artificial neuron is the result of the activation function applied to the summed inputs. It is passed on to other neurons in the network. |

#### **History of Neural Networks:**

| Time Period | Developments in Neural Networks |
|---|---|
| 1940s | Early Concepts: Warren McCulloch and Walter Pitts introduced a computational model for neural networks, laying the foundational ideas. |
| 1950s | Perceptrons: Frank Rosenblatt developed the perceptron, an early neural network model for pattern recognition, marking a significant advancement. |
| 1960s | XOR Problem and Backpropagation: The XOR problem illustrated the limitations of simple neural networks. The concept of backpropagation was introduced. |
| 1980s | Revival with Multi-layer Networks: Rediscovery of backpropagation led to a resurgence in neural network research, with the development of multi-layer networks capable of solving more complex problems. |
| 1990s | Practical Applications: Neural networks found applications in various fields, from finance to medicine, demonstrating their practical utility. |
| 2000s | Deep Learning and Big Data: The era of big data and improved computing power allowed for the development of deep learning networks, capable of processing complex data like images and speech. |
| 2010s and Beyond | Mainstream Adoption: Neural networks, especially deep learning models, became integral in technology, powering a wide range of Al applications from autonomous vehicles to advanced language processing tools. |

#### **Types of Neural Networks:**

| Type of Neural Network | Complexity | Typical Use Cases |
|---|---|---|
| Feedforward Neural Networks (FNN) | Simple | Basic classification and regression tasks, pattern recognition. |
| Convolutional Neural Networks (CNNs) | Moderate | Image recognition and processing, video analysis, image classification. |
| Recurrent Neural Networks (RNNs) | Moderate | Sequence modeling such as time series prediction, natural language processing, speech recognition. |
| Long Short-Term Memory Networks (LSTMs) | Moderate | Learning long-term dependencies in sequence data, language modeling, text generation, machine translation. |
| Gated Recurrent Units (GRUs) | Moderate | Similar to LSTMs, used for tasks that require modeling of sequential data like language translation, speech recognition, and time series prediction. |
| Radial Basis Function (RBF) Neural Networks | Moderate | Function approximation, time series prediction, classification in high-dimensional spaces. |
| Self-Organizing Maps (SOMs) | Moderate | Visualization of high-dimensional data, dimensionality reduction, clustering. |
| Deep Belief Networks (DBNs) | High | Image recognition, video recognition, motion capture data analysis. |
| Generative Adversarial Networks (GANs) | High | Generating new data samples (images, text, etc.), artistic creation, image super-resolution. |
| Autoencoders | High | Dimensionality reduction, feature learning, noise reduction, data generation. |
| Modular Neural Networks | High | Tasks requiring a combination of different networks, such as complex pattern recognition problems. |
| Neural Turing Machines (NTMs) | Very High | Enhancing neural networks with memory and attention mechanisms, complex problem-solving tasks. |
| Capsule Neural Networks | Very High | Improving the efficiency and accuracy of neural networks in tasks like image analysis and object recognition. |

#### **Architecture of Neural Networks:**

**1. Artificial Neural Network (ANN):**
- The simplest form of a neural network, consisting of an input layer, one or more hidden layers, and an output layer.
- The input layer receives the data, the hidden layers process the data, and the output layer produces the result.
- The layers are composed of neurons, and each neuron is connected to every neuron in the adjacent layers.
- The connections between neurons have weights that adjust as learning proceeds, and each neuron may have a bias term, both of which determine the strength and the direction of the influence one neuron has on another.
- The neurons in the hidden layers use an activation function to decide whether they should be activated or not, influencing the network's ability to learn complex patterns.

![ANN.png](attachment:ANN.png)

**2. Convolutional Neural Network (CNN):**
- A type of neural network that is well-suited for image recognition and processing, video analysis, and image classification.
- CNNs are composed of multiple layers, including convolutional layers, pooling layers, and fully connected layers.
- The convolutional layers apply a series of filters to the input data, extracting features and creating feature maps.
- The pooling layers reduce the dimensionality of the feature maps, making the network more computationally efficient.
- The fully connected layers process the features and produce the final output, such as a classification or prediction.
- CNNs are designed to automatically and adaptively learn spatial hierarchies of features from the input data, making them effective for tasks involving images and other types of spatial data.
- CNNs have been widely used in computer vision tasks, such as object recognition, image segmentation, and image generation.
- They have also been applied to other domains, such as natural language processing and speech recognition.
- CNNs have demonstrated state-of-the-art performance in various machine learning competitions and real-world applications.
- The architecture of a CNN is inspired by the visual cortex of the human brain, which is specialized for processing visual information.
- CNNs have been instrumental in advancing the field of deep learning and have enabled significant progress in computer vision and related areas.
- The success of CNNs has led to their widespread adoption in industry and academia, and they continue to be an active area of research and development.
- CNNs have also been used in combination with other types of neural networks, such as recurrent neural networks, to address more complex tasks and achieve even better performance.
- The architecture of a CNN is designed to take advantage of the spatial structure of the input data and to learn hierarchical representations of features, making it well-suited for tasks involving images and other types of spatial data.

![CNN.png](attachment:CNN.png)

**3. Recurrent Neural Network (RNN):**
- A type of neural network that is well-suited for sequence modeling, such as time series prediction, natural language processing, and speech recognition.
- RNNs are composed of multiple layers, including input layers, hidden layers, and output layers.
- The input layers receive the sequential data, the hidden layers process the data, and the output layers produce the result.
- RNNs are designed to automatically and adaptively learn temporal dependencies in the input data, making them effective for tasks involving sequences of data.
- RNNs have been widely used in natural language processing tasks, such as language modeling, machine translation, and text generation.
- They have also been applied to other domains, such as speech recognition, music generation, and time series prediction.
- RNNs have demonstrated state-of-the-art performance in various machine learning competitions and real-world applications.
- The architecture of an RNN is inspired by the structure of the human brain, which is specialized for processing sequential information.
- RNNs have been instrumental in advancing the field of deep learning and have enabled significant progress in sequence modeling and related areas.
- The success of RNNs has led to their widespread adoption in industry and academia, and they continue to be an active area of research and development.
- RNNs have also been used in combination with other types of neural networks, such as convolutional neural networks, to address more complex tasks and achieve even better performance.

![RNN.png](attachment:RNN.png)

#### **Single-Layer vs Mulrti-Layer Neural Networks:**
Multilayered Networks have at least one hidden layer (all the layers between the input and output layers are hidden). A single-layer perceptron can only learn linear functions, but Multilayered Perceptrons can also learn non-linear functions.

- **Single Layer Neural Network**: A single-layer neural network consists of an input layer and an output layer, with no hidden layers in between. It is the simplest form of a neural network and is used for linearly separable data. A single-layer neural network can only learn linear functions and is limited in its ability to model complex patterns in the data. It is not capable of learning non-linear functions or solving problems that require more complex decision boundaries. Single-layer neural networks are rarely used in practice, as they are not suitable for most real-world applications.
- **Multi Layer Neural Network**: A multi-layer neural network consists of an input layer, one or more hidden layers, and an output layer. It is capable of learning non-linear functions and can model complex patterns in the data. Multi-layer neural networks are used for a wide range of applications, including classification, regression, and pattern recognition. They are the most common type of neural network and have been shown to be effective in solving a wide variety of problems. Multi-layer neural networks are widely used in industry and academia and continue to be an active area of research and development.

![single_layer_vs_multi_layer.png](attachment:single_layer_vs_multi_layer.png)

(a): Single-Layer NN , (b): Multi-Layer NN

| Aspect | Single-Layer Neural Network | Multi-Layer Neural Network |
|---|---|---|
| Complexity | Simple in structure, typically consisting of only an input layer and an output layer. | More complex, consisting of an input layer, multiple hidden layers, and an output layer. |
| Learning Capability | Can only learn linearly separable patterns. | Capable of learning non-linear and complex patterns due to the presence of hidden layers. |
| Activation Function | Generally limited to linear activation functions. | Uses non-linear activation functions like Sigmoid, ReLU, or Tanh, especially in hidden layers. |
| Applications | Suitable for simple tasks such as linear classification and regression. | Suited for a wide range of complex tasks like image and speech recognition, natural language processing, and complex pattern recognition. |
| Flexibility | Less flexible, with limited capacity to model complex relationships. | Highly flexible and capable of modeling a vast array of complex relationships in data. |
| Training Method | Often trained with simpler methods like perceptron learning rule. | Trained using sophisticated techniques like backpropagation and gradient descent. |
| Overfitting Risk | Lower risk of overfitting due to simpler model. | Higher risk of overfitting due to increased complexity, often requiring techniques like regularization and dropout to mitigate. |
| Examples | Simple perceptron. | Deep Neural Networks, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), etc. |

| Type of Neural Network | Description | Advantages | Disadvantages | Use Cases | Limitations |
|---|---|---|---|---|---|
| Feedforward Neural Network | Information flows in one direction (input to output), no loopbacks. A multilayer feedforward neural network includes weights (W), bias (b), and an activation function (f). | Simple to understand and implement. Good for pattern recognition. | Cannot handle temporal data. | Pattern recognition, classification, regression. | Cannot handle temporal data. |
| Recurrent Neural Network (RNN) | Processing units form a cycle, allowing output to influence itself. Useful for sequence processing (e.g., speech recognition, frame-by-frame video classification). The unrolling of an RNN in time demonstrates its structure. | Can handle temporal data. | Difficulty in training due to vanishing and exploding gradients. | Speech recognition, video classification. | Difficulty in learning long-range dependencies. |
| Radial Basis Function Neural Network | Utilized in classification, function approximation, and time series prediction. The hidden layer includes radial basis functions (e.g., Gaussian), each representing a cluster center. | Good at function approximation. | Requires a good selection of radial basis functions. | Classification, function approximation, time series prediction. | Sensitive to the choice of radial basis functions. |
| Kohonen Self-Organizing Neural Network | Organizes input data via unsupervised learning. It consists of fully connected input and output layers arranged in a 2D grid. There is no activation function; weights represent node attributes. Euclidean distance is used to update weights based on input data proximity. | Good for data visualization and clustering. | Limited to low-dimensional data. | Data visualization, clustering. | Limited to low-dimensional data. |
| Modular Neural Network | A large network broken into smaller, independent modules. Smaller networks perform specific tasks, and their outputs combine into the final output of the entire network. | Can handle complex tasks by breaking them into simpler tasks. | Requires careful design of modules. | Complex pattern recognition, large-scale tasks. | Requires careful design of modules. |

## **Multi-Layer Perceptron (MLP):**
> A Multilayer Perceptron (MLP) is a type of artificial neural network that consists of more than one layer of neurons. Unlike a single-layer perceptron, which can only learn linearly separable patterns, a multilayer perceptron can learn more complex, non-linear functions. This makes it a fundamental model in the field of deep learning and neural networks. 

MLPs are used for a wide range of applications, including classification, regression, and pattern recognition. They are the most common type of neural network and have been shown to be effective in solving a wide variety of problems.

##### **Structure of MLP:**
1. **Input layer**: The input layer receives the data and passes it to the hidden layers.
2. **Hidden layers**: The hidden layers process the data and learn to extract features from it.
3. **Output layer**: The output layer produces the final result, such as a classification or prediction.

##### **How MLP Works?**

1. **Forward propagation:**
Signals travel from the input layer forward to the output layer. Each `neuron in the hidden and output layers` processes the input signal using an activation function, which introduces non-linear properties to the network.

2. **Backward propagation:**
After the output is generated, the MLP uses a method called backpropagation to update the weights of the neurons. During backpropagation, the error (difference between the predicted and actual output) is calculated and distributed back through the network, allowing the weights to be updated accordingly. 

3. **Learning Rate:**
This is a `key parameter` in training an MLP. It controls how much the weights are adjusted during backpropagation.

4. **Activation Functions:**
Functions like `Sigmoid`, `Relu` (Rectified Linear Unit), or `Tanh` (Hyperbolic Tangent) are used to introduce non-linear properties, allowing the MLP to learn more complex patterns. 

5. **Loss Function:**
The loss function measures how well the MLP is performing. It is used to calculate the error during backpropagation and to update the weights of the neurons.

6. **Optimizer / Optmization Algorithms:**
Algorithms like `Gradient Descent`, `Stochastic Gradient Descent`, or `Adam` are used to optimize the weights of the neurons during training.

![forward_backward_propagation.png](attachment:forward_backward_propagation.png)

#### **Types of MLP:**
1. Based on `number of hidden layers`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|
| Shallow Neural Networks | One Hidden Layer | Basic pattern recognition, simple classification and regression tasks |
| Deep Neural Networks | Multiple Hidden Layers | Complex pattern recognition, image and speech recognition, natural language processing |

2. Based on `Task`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|
| Classification MLPs | Output a discrete label or class | Image classification, text categorization, medical diagnosis |
| Regression MLPs | Predict a continuous output | Real estate pricing, stock market forecasting, temperature prediction |

3. Based on `Activation Function`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|
| Sigmoid MLPs | Use sigmoid function in hidden layers | Early neural network applications, binary classification tasks (less common now due to vanishing gradient issues) |
| ReLU MLPs | Utilize rectified linear unit (ReLU) activation function | Modern deep learning tasks, including complex neural networks used in various fields |

![Vanishing-and-Exploding-Gradients.png](attachment:Vanishing-and-Exploding-Gradients.png)

4. Based on `Network Topology`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|  
| Feedforward MLPs | Standard form, no cycles in connection | Most common use cases of neural networks, including both classification and regression |
| Recurrent neural Networks | Loops in connections, allowing information persistence | Time series analysis, sequencial data processing, language modeling, speech recognition |

![feedforward_rnn.png](attachment:feedforward_rnn.png)

5. Based on `Output Layer Function`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|
| Softmax MLPs | Softmax function for the output layer fior categorical probability distributions | Multi-class classification problems such as digit recognition, text classification |
| Linear Output MLPs | Linear activation function in the output layer | Regression tasks where the output is continuous |

6. Based on `Adaptaion Mechanism`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|
| Static MLPs | Constant architecture and neuron parameters after traianing | Tasks with consistent data patterns where adaptability is less crucial |
| Dynamic MLPs | Can adapt structure and neuron parameters based on input data or learnign task   | Environments with changing data patterns or tasks requiring ongoing learning |

7. Bases on `Complexity of the Task`:

| Type of MLP | Characteristics | Typical Use Case |
|---|---|---|
| Basic MLPs | Simpler tasks, fewer layers and neurons | Straigtforward applications like basic classification, entry-level neural network projects |
| Complex MLPs | Designed for complex tasks, many layers and neurons or specialized architectures | Advanced applications in AI like high-dimensional data analysis, complex pattern recognition, large scale deep learning  |

#### **Applications of MLP:**
MLPs are versatile and can be used for a wide range of applications, including:
- Classification problems, both binary and multi-class
- Regression problems
- Pattern recognition
- Time series prediction
- Image and speech recognition
- Natural language processing
- Anomaly detection
- Reinforcement learning
- Generative modeling
- Recommendation systems
- Robotics and control systems
- Financial forecasting
- Medical diagnosis
- Bioinformatics
- Game playing