# What is a Neural Network 

**Neural Network** are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns and enable tasks such as pattern recognition and decision-making 

Here,we will explore the fundamentals of neural networks, their architecture, how they work and their applications in various fields. Understanding neural networks is essential for anyone interested in the advancements of artificial intelligence.

# Understanding  neural network in terms of deep learning 

**Neural networks** are capable of learning and identifying patterns directly from data without pre-defined rules. These networks are built from several key components:

**Neurons:** The basic units that receive inputs, each neuron is governed by a threshold and an activation function.

**Connections:** Links between neurons that carry information, regulated by weights and biases.

**Weights and Biases:** These parameters determine the strength and influence of connections.

**Propagation Functions:** Mechanisms that help process and transfer data across layers of neurons.

**Learning Rule:** The method that adjusts weights and biases over time to improve accuracy.

# What are the sturcted stage process to build neural network 

Learning in neural networks follows a structured, three-stage process:

**Input Computation:** Data is fed into the network.

**Output Generation:** Based on the current parameters, the network generates an output.

**Iterative Refinement:** The network refines its output by adjusting weights and biases, gradually improving its performance on diverse tasks.

![Difference between biological neuron and an artifial neuron ](https://media.geeksforgeeks.org/wp-content/uploads/20241106171024318092/Artificial-Neural-Networks.webp)


# Importance of Neural Network 

It is  important in identifying complex patterns, solving intricate challenges and adapting to dynamic environments. Their ability to learn from vast amounts of data is transformative, impacting technologies like natural language processing, self-driving vehicles and automated decision-making.

Neural networks streamline processes, increase efficiency and support decision-making across various industries. As a backbone of artificial intelligence, they continue to drive innovation, shaping the future of technology.

# Layers in neural network architecture 

**Input Layer:** This is where the network receives its input data. Each input neuron in the layer corresponds to a feature in the input data.

**Hidden Layers:** These layers perform most of the computational heavy lifting. A neural network can have one or multiple hidden layers. Each layer consists of units (neurons) that transform the inputs into something that the output layer can use.

**Output Layer:** The final layer produces the output of the model. The format of these outputs varies depending on the specific task like classification, regression.

![Structure of neural network ](https://media.geeksforgeeks.org/wp-content/uploads/20231204175521/nn-ar.jpg)

# Working of Neural Networks

1. **Forward Propagation**
When data is input into the network, it passes through the network in the forward direction, from the input layer through the hidden layers to the output layer. This process is known as forward propagation. Here’s what happens during this phase:

1. **Linear Transformation:** Each neuron in a layer receives inputs which are multiplied by the weights associated with the connections. These products are summed together and a bias is added to the sum. This can be represented mathematically as:

$$
z = w_1x_1 + w_2x_2 + \dots + w_nx_n + b
$$




where:

w represents the weights

x represents the inputs

b is the bias

2. **Activation:** The result of the linear transformation (denoted as z) is then passed through an activation function. The activation function is crucial because it introduces non-linearity into the system, enabling the network to learn more complex patterns. Popular activation functions include ReLU, sigmoid and tanh

2. **Backpropagation**

After forward propagation, the network evaluates its performance using a loss function which measures the difference between the actual output and the predicted output. The goal of training is to minimize this loss. This is where backpropagation comes into play:

**Loss Calculation:** The network calculates the loss which provides a measure of error in the predictions. The loss function could vary; common choices are mean squared error for regression tasks or cross-entropy loss for classification.

**Gradient Calculation:** The network computes the gradients of the loss function with respect to each weight and bias in the network. This involves applying the chain rule of calculus to find out how much each part of the output error can be attributed to each weight and bias.

**Weight Update:** Once the gradients are calculated, the weights and biases are updated using an optimization algorithm like stochastic gradient descent (SGD). The weights are adjusted in the opposite direction of the gradient to minimize the loss. The size of the step taken in each update is determined by the learning rate.

**3. Iteration**

This process of forward propagation, loss calculation, backpropagation and weight update is repeated for many iterations over the dataset. Over time, this iterative process reduces the loss and the network's predictions become more accurate.

Through these steps, neural networks can adapt their parameters to better approximate the relationships in the data, thereby improving their performance on tasks such as classification, regression or any other predictive modeling.


# Example of Email Classification


Let's consider a record of an email dataset:

| Email ID | Email Content             | Sender           | Subject Line      | Label |
|:--------:|:--------------------------|:-----------------|:-----------------|:-----:|
| 1        | "Get free gift cards now!"| spam@example.com | "Exclusive Offer"|   1   |


To classify this email, we will create a feature vector based on the analysis of keywords such as "free" "win" and "offer"

The feature vector of the record can be presented as:

"free": Present (1)
"win": Absent (0)
"offer": Present (1)

# How Neurons Process Data in a Neural Network

**Email Spam Classification using a Simple Neural Network**

We will illustrate how a neural network can be used to decide whether an email is spam or not spam.

**Step 1: Input Features ( input  layer)**

![](attachment:Screenshot_21-8-2025_212557_chatgpt.com.jpeg)

# Step 2: Hidden Layer Calculation

![](attachment:Screenshot_21-8-2025_212646_chatgpt.com.jpeg)

![](attachment:Screenshot_21-8-2025_212715_chatgpt.com.jpeg)

# Step 3: Output layer 

![](attachment:Screenshot_21-8-2025_212836_chatgpt.com.jpeg)

**4. Final Classification**

The network’s output is 0.636 (probability that email is spam).

Rule: If probability > 0.5 → classify as spam (1).

Since ,0.636>0.5

👉 The email is classified as spam.






# Feedforward Neural Network Diagram

![](<attachment:ChatGPT Image Aug 21, 2025, 10_51_37 AM.png>)

### Description
- **Input Layer:** Takes raw input features (e.g., numerical or categorical data).
- **Hidden Layers:** Intermediate layers with neurons applying activation functions to learn complex patterns.
- **Output Layer:** Produces the final result (e.g., classification, regression).
- **Connections:** Each neuron in one layer is fully connected to neurons in the next layer.

# Learning of a Neural Network

1. **Learning with Supervised Learning**

In supervised learning, a neural network learns from labeled input-output pairs provided by a teacher. The network generates outputs based on inputs and by comparing these outputs to the known desired outputs, an error signal is created. The network iteratively adjusts its parameters to minimize errors until it reaches an acceptable performance level.

2. **Learning with Unsupervised Learning**

Unsupervised learning involves data without labeled output variables. The primary goal is to understand the underlying structure of the input data (X). Unlike supervised learning, there is no instructor to guide the process. Instead, the focus is on modeling data patterns and relationships, with techniques like clustering and association commonly used.

3. **Learning with Reinforcement Learning**

Reinforcement learning enables a neural network to learn through interaction with its environment. The network receives feedback in the form of rewards or penalties, guiding it to find an optimal policy or strategy that maximizes cumulative rewards over time. This approach is widely used in applications like gaming and decision-making















# Types of neural network 

There are seven types of neural networks that can be used.

**Feedforward Networks**

It is a simple artificial neural network architecture in which data moves from input to output in a single direction.

No loops or feedback 

used for tasks like pattern recognition 

**Key components**

Neurons, neuron weights, Activation function 

# Activation function 

While building a neural network, one key decision is selecting the Activation Function for both the hidden layer and the output layer. It is a mathematical function applied to the output of a neuron. It introduces non-linearity into the model, allowing the network to learn and represent complex patterns in the data. Without this non-linearity feature a neural network would behave like a linear regression model no matter how many layers it has.

Activation function decides whether a neuron should be activated by calculating the weighted sum of inputs and adding a bias term. This helps the model make complex decisions and predictions by introducing non-linearities to the output of each neuron


![Activation function ](attachment:image.png)

# Introducing Non-Linearity in Neural Network

Non-linearity means that the relationship between input and output is not a straight line. In simple terms the output does not change proportionally with the input. A common choice is the ReLU function defined as:

                                        σ(x) =max(0,x)
                                    
imagine you want to classify apples and bananas based on their shape and color.

If we use a linear function it can only separate them using a straight line.

But real-world data is often more complex like overlapping colors, different lighting, etc.

By adding a non-linear activation function like ReLU, Sigmoid or Tanh the network can create curved decision boundaries to separate them correctly.

# Effect of Non-Linearity

The inclusion of the ReLU activation function σ allows h_1 to introduce a non-linear decision boundary in the input space. This non-linearity enables the network to learn more complex patterns that are not possible with a purely linear model such as:

Modeling functions that are not linearly separable.
Increasing the capacity of the network to form multiple decision boundaries based on the combination of weights and biases.

# Why is Non-Linearity Important in Neural Networks?

**Neural networks** consist of neurons that operate using weights, biases and activation functions.

In the learning process these weights and biases are updated based on the error produced at the output—a process known as backpropagation. Activation functions enable backpropagation by providing gradients that are essential for updating the weights and biases.

Without non-linearity even deep networks would be limited to solving only simple, linearly separable problems. Activation functions help neural networks to model highly complex data distributions and solve advanced deep learning tasks. Adding non-linear activation functions introduce flexibility and enable the network to learn more complex and abstract patterns from data.

# Types of Activation Functions in Deep Learning

**Linear Activation Function**

Linear Activation Function resembles straight line define by y=x. No matter how many layers the neural network contains if they all use linear activation functions the output is a linear combination of the input.

The range of the output spans from 

(−∞ to +∞).

**Linear activation** function is used at just one place i.e. output layer.

Using linear activation across all layers makes the network's ability to learn complex patterns limited.

Linear activation functions are useful for specific tasks but must be combined with non-linear functions to enhance the neural network’s learning and predictive capabilities.

![Linear activation function ](attachment:image-2.png)

# Non-Linear Activation Functions

**Sigmoid Function**

Sigmoid is a mathematical function that maps any real-valued number into a value between 0 and 1. Its characteristic "S"-shaped curve makes it particularly useful in scenarios where we need to convert outputs into probabilities. This function is often called the logistic function.

Mathematically, sigmoid is represented as:

 $$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$


where,

x is the input value,

e is Euler's number (≈2.718)


![Sigmoid Function ](attachment:image-3.png)

Sigmoid function is used as an activation function in machine learning and neural networks for modeling binary classification problems, smoothing outputs, and introducing non-linearity into models.


In machine learning, 

x could be a weighted sum of inputs in a neural network neuron or a raw score in logistic regression. If the output value is close to 1, it indicates high confidence in one class and if the value is close to 0, it indicates high confidence in the other class. 

# Properties of the Sigmoid Function

The **sigmoid function** has several key properties that make it a popular choice in machine learning and neural networks:

**Domain:** The domain of the sigmoid function is all real numbers. This means that you can input any real number into the sigmoid function, and it will produce a valid output.

**Asymptotes:**

As x approaches positive infinity, 

σ(x) approaches 1. Conversely, as 

x approaches negative infinity, 

σ(x) approaches 0. This property ensures that the function never actually reaches 0 or 1, but gets arbitrarily close.

**Monotonicity:** The sigmoid function is monotonically increasing, meaning that as the input increases, the output also increases.


**Differentiability:** The sigmoid function is differentiable, which allows for the calculation of gradients during the training of machine learning models.
​

# Sigmoid Function in Backpropagation

The **sigmoid function** is:

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

---

## 1️⃣ Properties of Sigmoid

1. **Non-linear:** Allows the network to model non-linear relationships.  
2. **Range between 0 and 1:** Useful for probabilities.  
3. **Differentiable everywhere:** Essential for backpropagation.  
4. **S-shaped curve:**  
   - Small negative inputs → output close to 0  
   - Inputs near 0 → output ~0.5  
   - Large positive inputs → output close to 1  

---

## 2️⃣ Sigmoid in Backpropagation

Backpropagation updates weights using gradient descent:

$$
\text{weights} \gets \text{weights} - \eta \cdot \frac{\partial L}{\partial w}
$$

- \(L\) = loss function  
- \(w\) = weight  
- \(eta\) η = learning rate  

The derivative of the activation function is used to compute  With sigmoid:

$$
\frac{\partial L}{\partial w} \propto \sigma'(x)
$$

So computing 

σ′(x) efficiently is crucial.

## 3️⃣ Derivative of Sigmoid

Start with:

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

Define:

$$
u = 1 + e^{-x} \quad \Rightarrow \quad \sigma(x) = \frac{1}{u}
$$

### Step 1: Differentiate \(u\) w.r.t \(x\)

$$
\frac{du}{dx} = - e^{-x}
$$

### Step 2: Differentiate \(y = 1/u\) w.r.t \(u\)

$$
\frac{dy}{du} = -\frac{1}{u^2}
$$

### Step 3: Chain Rule

$$
\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} = \left(-\frac{1}{u^2}\right) \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^2}
$$

### Step 4: 

![](attachment:Screenshot_21-8-2025_235340_chatgpt.com.jpeg)





The above equation is known as the generalized form of the derivation of the sigmoid function. The below image shows the derivative of the sigmoid function graphically

![](attachment:image-4.png)

# Issue with Sigmoid Function in Backpropagation

One key issue with using the sigmoid function is the vanishing gradient problem. When updating weights and biases using gradient descent, if the gradients are too small, the updates to weights and biases become insignificant, slowing down or even stopping learning.


![](attachment:image-5.png)


## 4️⃣ Intuition Behind the Derivative

- Maximum derivative occurs at \(x = 0\) (\(\sigma(0) = 0.5\)) → learning is fastest.  
- As  𝑥→∞ or x→−∞ ,derivative 𝜎′(𝑥)→0 

That means weights of very large or very small activations barely change → vanishing gradient.


## 5️⃣ Vanishing Gradient Problem

- In deep networks, small derivatives at extreme values cause the gradient to vanish.  
- Gradients multiply across layers → learning slows or stops in earlier layers.  
- Modern networks often replace sigmoid with **ReLU** in hidden layers.

---

## 6️⃣ Summary Table

| Aspect | Sigmoid |
|--------|---------|
| Formula | σ(x)=1/(1+e−x)
| Output Range | 0–1 |
| Derivative | σ′(x)=σ(x)(1−σ(x)) |
| Advantage | Smooth, differentiable, models non-linearity |
| Disadvantage | Saturates at extremes → vanishing gradient |

---

## 7️⃣ Visualization

- **Sigmoid curve:** S-shaped  
- **Derivative curve:** bell-shaped, max at 0, approaches 0 at extremes  
- **Red regions near 0 or 1 output:** vanishing gradient



