> ### ***SUBMITTED BY: MALAIKA AHMED***

# ***🔴 Task 33: Neural Networks Basics (Perceptron, Activation Functions)***

Neural networks, inspired by the human brain, consist of interconnected layers of neurons. A perceptron, the simplest neural network, uses a weighted sum of inputs and an activation function to produce an output. Activation functions, like sigmoid and ReLU, introduce non-linearity, allowing the network to learn complex patterns. Make a simple neural network from scratch for a regression task, the Mean Squared Error (MSE) measures prediction accuracy, while gradient descent optimizes the model's weights to minimize this error. A basic neural network with one input layer, one hidden layer, and one output layer can effectively perform regression by using these principles. 

***

 # <span style='color:Blue'>  ***Understanding Neural Networks*** </span>
 
At its core, a neural network is inspired by the structure and functionality of the human brain. It’s composed of interconnected nodes, commonly referred to as neurons or artificial neurons. The strength of neural networks lies in their ability to learn from data. During training, the network adjusts its weights and biases to minimize the difference between predicted outputs and actual targets.The neurons are organized into layers, each serving a specific purpose:

### ***Input Layer:***
Neurons in this layer receive the initial data or features.

### ***Hidden Layers:***
These layers process the input data through weighted connections and activation functions.

### ***Output Layer:***
The final layer produces the network’s output, which could be a classification, regression, or other desired outcome.


![image.png](attachment:f8af0efe-dc73-448e-a081-6e6f3304ea14.png)



A neuron, also known as a node or perceptron, is a fundamental building block in a neural network. It’s a computational unit that takes one or more inputs, performs a weighted sum, applies an activation function, and produces an output. In a neural network, neurons are organized into layers, forming a complex network capable of learning and making predictions.

***

 # <span style='color:Blue'>  ***Anatomy of a Neuron*** </span>


### ***Inputs (x1​,x2​,…,xn​):***
Neurons receive input signals from other neurons or from the external environment. Each input is associated with a weight, representing the strength of the connection.
### ***Weights (w1​,w2​,…,wn​):***
Weights are parameters that the neural network learns during training. They determine the influence of each input on the neuron’s output. Larger weights mean a stronger impact.
### ***Weighted Sum (Z):***
The weighted sum (Z) is calculated by taking the dot product of the inputs and weights, followed by adding a bias
Z = Summation of i = 1 to n (wi * xi) + b
### ***Bias (b):***
The bias is an additional parameter that allows the neuron to adjust its output independently of the inputs. It acts as an offset, providing flexibility to the neuron.
### ***Activation Function (f):***
The weighted sum is passed through an activation function, introducing non-linearity to the neuron. Activation functions enable neural networks to learn complex relationships and patterns. Common activation functions include:
- Linear activation function
- Step function
- Sigmoid
- Softmax activation function
- Rectified Linear Unit (ReLU)

  The output of the neuron is the result of applying the activation function to the weighted sum: a=f(Z)

![image.png](attachment:8cc6f505-26e2-45ec-afab-df38ad326fa9.png)

***

 # <span style='color:Blue'>  ***Activation Function and its Types*** </span>
#   <span style='color:Red'>  ***Activation Function*** </span>
- An activation function is a mathematical function applied to the output of each neuron in the network.
- It determines whether a neuron should be activated or not, based on the weighted sum of its input.
- Essentially, it introduces non-linearity into the model, allowing the network to learn from the error and make adjustments, which is essential for learning complex patterns.

# <span style='color:Red'>  ***Common Activation Function*** </span>
 

## ***Linear Activation***

The linear activation function returns the input directly without changing it in any way. In other words, the output of the linear activation function is the same as the inpu.

The linear activation function has a linear graph, hence the na

![image.png](attachment:4acbbb3d-d65b-453a-b6ae-e3287beb4adb.png)


![image.png](attachment:0b43b1e1-ee1e-4b47-958b-82a82b28c24d.png)

***


## ***Step Activation Function***
The step activation function, or binary step function, has only two possible outputs, 0 and 1. It outputs 0 when the input is negative, and 1otherwise. It is called so because its graph looks like this:
![image.png](attachment:4ddc5029-9c02-4887-be18-c2b421bf39e5.png)


![image.png](attachment:c5bc26bf-f343-4d94-9e87-5058150f7608.png)

***

## ***Sigmoid Activation***
The sigmoid activation function outputs a number between 0 and 1 based on its input. Since it bounds its output to this range, it is also called a squashing function.
The sigmoid function is graphed like this:![image.png](attachment:6edfe933-a3b2-4c47-992e-87484d3d8b19.png)

The graph shows that the larger the input, the closer the output is to 1, and the smaller the input, the closer the output is to 0.

Mathematically, the sigmoid function is represented by σ or ∅, and is defined a
![image.png](attachment:0af1f476-5087-495f-9b0d-6773de8d7fee.png)


![image.png](attachment:228a36c8-4048-44d6-8490-c59bcf0ce911.png)

The sigmoid function is used where the required output is the probability of an input being a class. The sigmoid activation is also called logistic activation for the reason it’s used by a popular classification algorithm called logistic regression.


***



## ***ReLU Activation:***

The ReLU (Rectified Linear Unit) is the most popular activation function used in neural networks. It returns the input directly if it is positive, and 0 otherwise. In other words, it returns the maximum of 0 and its input.
Here’s how its graph looks like:![image.png](attachment:ebbd9cf3-a914-4099-b36b-582537e6b91f.png)


![image.png](attachment:4cb4777a-71e4-4bd4-9485-a20d5f98f73f.png)


***


## ***Softmax Activation***

The Softmax activation function is widely used in neural networks performing multi-class classification. It is very similar to sigmoid, but sigmoid only works for binary problems. The softmax function returns the relative probabilities of all the classes, which all sum up to 0.
The mathematical equation for the softmax activation function is
![image.png](attachment:e5e398b6-3b38-4d67-8dc9-c2ec8e3b1804.png)


- The input of the softmax activation function is a k-dimensional vector, where k is the number of classes. This vector contains the values from the neurons of the output layer.- 
The output of this function is also a k-dimensional vector containing the probabilities for each class.- 
The probabilities returned by the softmax activation function sum up to 

![image.png](attachment:3305a6b3-4951-4799-af6a-34934415d706.png)1.:


s: 
me.


 # <span style='color:Black'>  ***Choosing the Right Activation Function*** </span>

The choice of activation function depends on the specific application and the nature of the data:

- For binary classification, the sigmoid function is often used in the output layer.

- For multi-class classification, the softmax function is preferred in the output layer.

- ReLU and its variants are commonly used in hidden layers due to their computational efficiency and performance.

***

# ***What is the Perceptron Trick?***
The Perceptron Trick is a technique used to adjust the weights of a perceptron algorithm in a supervised learning scenario. It aims to minimize errors in classification tasks by updating the weights iteratively based on misclassified instances.

# ***How does the Perceptron Trick work?***
At its core, the Perceptron Trick involves adjusting the weights of a perceptron to reduce classification errors. Here’s a step-by-step breakdown of how it works:

1. Initialization: Begin by initializing the weights of the perceptron algorithm. These weights are essentially the parameters that the model learns to adjust during training.

2. Input Data: Feed the input data into the perceptron and compute the output using the current weights.

3. Error Calculation: Compare the predicted output with the actual output. If the prediction is incorrect (i.e., a misclassification occurs), proceed to the next step.

4. Weight Adjustment: Apply the Perceptron Trick to update the weights. This involves adding or subtracting a fraction of the input vector from the current weights, depending on whether the prediction was too high or too low.

5. Repeat: Continue iterating through the dataset, updating the weights after each misclassification, until convergence or a predefined number of iterations.

***


 # <span style='color:Red'>  ***Making a simple Neural Network by Applying Perceptron Trick on Regression Task*** </span>




### ***Importing Necessary Libraries*** 


In [282]:
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
import pandas as pd

In [283]:
df=pd.read_csv("Admission_Predict_Ver1.1.csv")

In [284]:
df.drop(columns=['Serial No.'],inplace=True)

In [285]:
df.head()

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,337,118,4,4.5,4.5,9.65,1,0.92
1,324,107,4,4.0,4.5,8.87,1,0.76
2,316,104,3,3.0,3.5,8.0,1,0.72
3,322,110,3,3.5,2.5,8.67,1,0.8
4,314,103,2,2.0,3.0,8.21,0,0.65


In [286]:
X = df.iloc[:,0:-1]
y = df.iloc[:,-1]

In [287]:
X

Unnamed: 0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research
0,337,118,4,4.5,4.5,9.65,1
1,324,107,4,4.0,4.5,8.87,1
2,316,104,3,3.0,3.5,8.00,1
3,322,110,3,3.5,2.5,8.67,1
4,314,103,2,2.0,3.0,8.21,0
...,...,...,...,...,...,...,...
495,332,108,5,4.5,4.0,9.02,1
496,337,117,5,5.0,5.0,9.87,1
497,330,120,5,4.5,5.0,9.56,1
498,312,103,4,4.0,5.0,8.43,0


In [288]:
y

0      0.92
1      0.76
2      0.72
3      0.80
4      0.65
       ... 
495    0.87
496    0.96
497    0.93
498    0.73
499    0.84
Name: Chance of Admit , Length: 500, dtype: float64

In [289]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1)

### ***Scaling Train and test data***

In [290]:
scaler = MinMaxScaler()


In [291]:
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [292]:
X.shape

(500, 7)

In [293]:
df.shape

(500, 8)

### ***Defining Activation Function***




In [294]:
# Sigmoid function which will use in hidden layer
def sigmoid(x):
    return 1 / (1 + np.exp(-x)) 

# Derivative of sigmoid function that will use in back propagation
def sigmoid_derivative(x):
    sigmoid_x = sigmoid(x)  
    return sigmoid_x * (1 - sigmoid_x)  


### ***Initializing Parameters (weights and biases)***

In [295]:
def initialize_parameters(input_size, hidden_size, output_size):

    
    np.random.seed(42)  # For reproducibility
    W1 = np.random.randn(input_size, hidden_size) * 0.01  # Weights for the hidden layer
    b1 = np.zeros((1, hidden_size))  # Biases for the hidden layer
    W2 = np.random.randn(hidden_size, output_size) * 0.01  # Weights for the output layer
    b2 = np.zeros((1, output_size))  # Biases for the output layer
    return W1, b1, W2, b2



### ***Initializing Forward Propagation Function***

In [296]:
def forward_propagation(X, W1, b1, W2, b2):

    Z1 = np.dot(X, W1) + b1  # Linear combination for hidden layer
    A1 = sigmoid(Z1)         # Activation for hidden layer
    Z2 = np.dot(A1, W2) + b2  # Linear combination for output layer
    A2 = Z2                   # Output layer (no activation for regression)
    return Z1, A1, Z2, A2



### ***Defining loss Function (MSE) for model evaluation*** 

In [297]:
def compute_loss(y_true, y_pred):
    y_true = np.array(y_true).flatten() 
    y_pred = np.array(y_pred).flatten()  
    return np.mean((y_true - y_pred) ** 2)
    return np.mean((y_true - y_pred) ** 2)


### ***Defining Backward Propagation Function***

In [298]:
def backward_propagation(X, y, Z1, A1, Z2, A2, W1, W2, b1, b2, learning_rate):
    m = X.shape[0]
    
    y = np.array(y).reshape(-1, 1)  
    
    # Computing gradients
    dA2 = A2 - y  # Gradient of loss w.r.t. output layer
    dW2 = (1/m) * np.dot(A1.T, dA2)  # Gradient of weights for output layer
    db2 = (1/m) * np.sum(dA2, axis=0, keepdims=True)  # Gradient of bias for output layer
    
    dA1 = np.dot(dA2, W2.T) * sigmoid_derivative(Z1)  # Gradient of loss w.r.t. hidden layer
    dW1 = (1/m) * np.dot(X.T, dA1)  # Gradient of weights for hidden layer
    db1 = (1/m) * np.sum(dA1, axis=0, keepdims=True)  # Gradient of bias for hidden layer
    
    # Update weights and biases
    W1 -= learning_rate * dW1
    b1 -= learning_rate * db1
    W2 -= learning_rate * dW2
    b2 -= learning_rate * db2
    
    return W1, b1, W2, b2


### ***Training Neural Network using gradient descent***

In [303]:
# Trains the neural network by performing forward and backward propagation and updating parameters.

def train_neural_network(X, y, hidden_size, learning_rate, epochs):
    input_size = X_train_scaled.shape[1]
    hidden_size = 20 
    output_size = 1  # Since this is a regression task with one output
    W1, b1, W2, b2 = initialize_parameters(input_size, hidden_size, output_size)

    for epoch in range(epochs):  # Loop over epochs
        Z1, A1, Z2, A2 = forward_propagation(X, W1, b1, W2, b2)
        loss = compute_loss(y, A2)
        W1, b1, W2, b2 = backward_propagation(X, y, Z1, A1, Z2, A2, W1, W2, b1, b2, learning_rate)
        
        if epoch % 100 == 0: 
            print(f'Epoch {epoch}, Loss: {loss}')
    
    return W1, b1, W2, b2  


### ***Setting Training Parameters*** 

In [304]:
# Trains the neural network by performing forward and backward propagation and updating parameters.

hidden_size = 20  # Number of neurons in hidden layer
learning_rate = 0.001  # Learning rate for gradient descent
epochs = 2000  # Number of epochs for training

W1, b1, W2, b2 = train_neural_network(X_train_scaled, y_train, hidden_size, learning_rate, epochs)


Epoch 0, Loss: 0.5333182815318668
Epoch 100, Loss: 0.17454057297012518
Epoch 200, Loss: 0.06649198651212439
Epoch 300, Loss: 0.0340004798429249
Epoch 400, Loss: 0.02424368215204824
Epoch 500, Loss: 0.021314894812207434
Epoch 600, Loss: 0.020433961615393584
Epoch 700, Loss: 0.020166659782175866
Epoch 800, Loss: 0.020083121715297157
Epoch 900, Loss: 0.020054594459580032
Epoch 1000, Loss: 0.020042531739696848
Epoch 1100, Loss: 0.020035397234988137
Epoch 1200, Loss: 0.020029739154944926
Epoch 1300, Loss: 0.02002452489630198
Epoch 1400, Loss: 0.020019445606374767
Epoch 1500, Loss: 0.020014408915774266
Epoch 1600, Loss: 0.020009387196435583
Epoch 1700, Loss: 0.020004372178397952
Epoch 1800, Loss: 0.019999361378899353
Epoch 1900, Loss: 0.0199943540449863


### ***Doing Predictions***

In [None]:
# Prediction on test data
_, _, _, y_test_pred = forward_propagation(X_test_poly, W1, b1, W2, b2)


###  ***Model Evaluaion***

In [314]:

mse_test = mean_squared_error(y_test, y_test_pred) 
r2_test = r2_score(y_test, y_test_pred)


print('MSE:,', mse_test) 
print('Test R^2:', r2_test)  



MSE:, 0.01922613892756929
Test R^2: 0.004394442153731748


Model Evaluation can be improved by adding more input layers,Applying hyperparameters tunning techniques and increasing number of epoches.

***

***