<hr>

## <b>Normal and Accelerated Gradient Descent in Logistic Regression </b>

<hr>


### Group - A9

### Team Members

- Diya Deepak (CB.SC.U4AIE23020)
- Ghanasree S (CB.SC.U4AIE23028)
- Neha Jacob (CB.SC.U4AIE23046)
- Sriranjana C (CB.SC.U4AIE23066)


### <b> OBJECTIVE </b>
- Take a ML problem (Logistic Regression)
- Optimize its cost function using Gradient Descent Optimization (with and without momentum)

In [6]:
# Importing Necessary Libraries 
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

### <b> Problem Statement </b>

Predict student admissions based on their performance in JEE and Amrita exams using logistic regression

<hr>

- Loading the dataset 
- Separating the Features and Target variables

In [7]:
# Load the dataset
admission_data = pd.read_csv('Logistic_Regression_Dataset_with_Amrita_and_JEE_Scores.csv')
X = admission_data[['Amrita_Score', 'JEE_Score']].values     # Features / Independent variable
y = admission_data['Admitted'].values    # Target Variable

- Normalizing the feature values for a better performance
- Splitting the training and test sets

In [8]:
scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

# Split the normalized data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size=0.2, random_state=42)

### <b> LOGISTIC REGRESSION </b>

A statistical method used for binary classification problems, where the goal is to predict one of two possible outcomes

- The probability is mapped to one of two classes using a threshold (commonly 0.5)



#### Sigmoid function 

- Maps any real-valued input into a range between 0 and 1
- Used in logistic regression to convert raw predictions into probabilities


$$
\sigma(z) = \frac{1}{1 + e^{-z}}
$$


In [9]:
# Define the sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

Logistic Regression with Gradient Descent (without momentum)

#### Cost function for Logistic Regression

$$
J(w, b) = -\frac{1}{m} \sum_{i=1}^m \left[ y^{(i)} \log(h_\theta(x^{(i)})) + (1 - y^{(i)}) \log(1 - h_\theta(x^{(i)})) \right]
$$

Where:
- $ m $ is the number of training examples.
- $ h_\theta(x^{(i)}) = \frac{1}{1 + e^{-z}} $ is the sigmoid function applied to the linear model.
- $ y^{(i)} $ represents the true label for the $i$-th example.

In [10]:
# Defining the cost function 
def compute_cost(y_true, y_predicted):
    m = len(y_true)
    cost = - (1 / m) * np.sum(y_true * np.log(y_predicted) + (1 - y_true) * np.log(1 - y_predicted))
    return cost

In [11]:
# Logistic regression model with gradient descent 
def logistic_regression_gradient_descent(X, y, learning_rate=0.1, iters=1001):
    n_samples, n_features = X.shape
    weights = np.zeros(n_features)
    bias = 0

    for iter in range(iters):
        linear_model = np.dot(X, weights) + bias
        y_predicted = sigmoid(linear_model)
        
        # Compute gradients
        dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
        db = (1 / n_samples) * np.sum(y_predicted - y)
        
        # Update weights and bias
        weights -= learning_rate * dw
        bias -= learning_rate * db
        
        # Calculate and print cost every 200 iteration
        if iter % 200 == 0:
            cost = compute_cost(y, y_predicted)
            print(f"Iteration {iter}: Cost = {cost}")
    
    return weights, bias



### Gradient Descent without momentum

$$
h_\theta(X) = \sigma(z) = \frac{1}{1 + e^{-z}}, 
$$

The gradients for logistic regression are defined as:

- **Weight Gradient**:

$$
\frac{\partial J}{\partial w_j} = \frac{1}{n} \sum_{i=1}^n \left( h_\theta(x^{(i)}) - y^{(i)} \right) x_j^{(i)}
$$

- **Bias Gradient**:

$$
\frac{\partial J}{\partial b} = \frac{1}{n} \sum_{i=1}^n \left( h_\theta(x^{(i)}) - y^{(i)} \right)
$$



The parameters are updated iteratively using the following equations:

$$
w_j = w_j - \alpha \frac{\partial J}{\partial w_j}, \quad b = b - \alpha \frac{\partial J}{\partial b}
$$

Where:
- $ \alpha\ $ is the learning rate.

In [12]:
# Train the logistic regression model using gradient descent without momentum
weights, bias = logistic_regression_gradient_descent(X_train, y_train, learning_rate=0.2, iters=4001)

# Define prediction function
def predict(X, weights, bias, threshold=0.5):
    linear_model = np.dot(X, weights) + bias
    y_predicted = sigmoid(linear_model)
    return [1 if i > threshold else 0 for i in y_predicted]

# Predict on the test set
y_pred = predict(X_test, weights, bias)

# Calculate accuracy
accuracy = np.mean(y_pred == y_test)
print()
print("Model Accuracy without Momentum:", accuracy)


Iteration 0: Cost = 0.6931471805599452
Iteration 200: Cost = 0.23834728681623155
Iteration 400: Cost = 0.2224416408276094
Iteration 600: Cost = 0.21764164622079876
Iteration 800: Cost = 0.21565674681567007
Iteration 1000: Cost = 0.21471244751371274
Iteration 1200: Cost = 0.21422645211080185
Iteration 1400: Cost = 0.21396367328961816
Iteration 1600: Cost = 0.21381680759767505
Iteration 1800: Cost = 0.2137328025259781
Iteration 2000: Cost = 0.2136839447621258
Iteration 2200: Cost = 0.21365517821428845
Iteration 2400: Cost = 0.2136380852375762
Iteration 2600: Cost = 0.21362785822758681
Iteration 2800: Cost = 0.21362170697034435
Iteration 3000: Cost = 0.21361799222737166
Iteration 3200: Cost = 0.213615741927752
Iteration 3400: Cost = 0.21361437548228146
Iteration 3600: Cost = 0.21361354419659193
Iteration 3800: Cost = 0.213613037749399
Iteration 4000: Cost = 0.21361272885870017

Model Accuracy without Momentum: 0.915


In [34]:
# Function to predict admission based on user input
def predict_admission():
    # Get user input
    jee_score = float(input("Enter your JEE score: "))
    exam_score = float(input("Enter your Amrita exam score: "))

    # Check if both scores are less than 100
    if jee_score >= 100 or exam_score >= 100:
        print("Error: Both scores must be less than 100!")
        return
    
    # Print user input
    print(f"User Input - JEE Score: {jee_score}, Amrita Exam Score: {exam_score}")
    
    # Normalize the user input using the same scaler
    user_input = np.array([[jee_score, exam_score]])
    user_input_normalized = scaler.transform(user_input)
    
    # Make prediction
    admission_prediction = predict(user_input_normalized, weights, bias)
    
    # Print result
    if admission_prediction[0] == 1:
        print("Yayy..!! Admission is likely!")
    else:
        print("Oops..!! Admission is not likely.")
    
# Call the function to predict admission based on user input
predict_admission()

User Input - JEE Score: 23.0, Amrita Exam Score: 45.0
Oops..!! Admission is not likely.


<b> <hr> </b>
### <b> Gradient Descent with Momentum </b>

In [13]:
# Load the dataset
admission_data = pd.read_csv('Logistic_Regression_Dataset_with_Amrita_and_JEE_Scores.csv')
X = admission_data[['Amrita_Score', 'JEE_Score']].values     # Features / Independent variable
y = admission_data['Admitted'].values    # Target Variable

In [14]:
# Normalize the feature values
scaler = StandardScaler()
X_normalized = scaler.fit_transform(X)

# Split the normalized data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_normalized, y, test_size=0.2, random_state=42)


In [15]:
# Define the sigmoid function
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


### Gradient Descent with Momentum

- In gradient descent with momentum, we introduce a velocity term that helps accelerate convergence and smooth out the oscillations in the optimization path. 
- The update rules for the weights and bias include a momentum term, which is a moving average of the gradients.

- Momentum parameter is denoted by $\beta$, where \( 0 $\leq$ $\beta$ < 1 \). 
- This parameter controls how much of the previous gradient is retained.

#### Velocity Update Rules

- **Velocity Update (for weights):**

$$
v_j = \beta v_j + (1 - \beta) \frac{\partial J}{\partial w_j}
$$

- **Velocity Update (for bias):**

$$
v_b = \beta v_b + (1 - \beta) \frac{\partial J}{\partial b}
$$

Where:
- \($ v_j $\) is the velocity for weight  \($ w_j $\),
- \( $v_b $\) is the velocity for bias \( b \),
- \($ \beta$ \) is the momentum term.

#### Parameter Update Rules

- **Weight Update:**

$$
w_j = w_j - \alpha v_j
$$

- **Bias Update:**

$$
b = b - \alpha v_b
$$

Where:
- $\alpha $ is the learning rate.


In [14]:
# Logistic regression model with gradient descent and momentum
def logistic_regression_gradient_descent_with_momentum(X, y, learning_rate=0.2, iters=1001, beta=0.9):
    n_samples, n_features = X.shape
    weights = np.zeros(n_features)
    bias = 0

    # Initialize momentum terms
    V_dw = np.zeros(n_features)
    V_db = 0

    for iter in range(iters):
        linear_model = np.dot(X, weights) + bias
        y_predicted = sigmoid(linear_model)

        # Compute gradients
        dw = (1 / n_samples) * np.dot(X.T, (y_predicted - y))
        db = (1 / n_samples) * np.sum(y_predicted - y)

        # Update momentum terms
        V_dw = beta * V_dw + (1 - beta) * dw
        V_db = beta * V_db + (1 - beta) * db

        # Update weights and bias with momentum
        weights -= learning_rate * V_dw
        bias -= learning_rate * V_db

        # Calculate and print cost every 200 iterations
        if iter % 200 == 0:
            cost = compute_cost(y, y_predicted)
            print(f"Iteration {iter}: Cost = {cost}")

    return weights, bias


In [15]:
# Train the logistic regression model using gradient descent with momentum
weights, bias = logistic_regression_gradient_descent_with_momentum(X_train, y_train, learning_rate=0.2, iters=4001, beta=0.9)

# Define prediction function
def predict(X, weights, bias, threshold=0.5):
    linear_model = np.dot(X, weights) + bias
    y_predicted = sigmoid(linear_model)
    return [1 if i > threshold else 0 for i in y_predicted]

# Predict on the test set
y_pred = predict(X_test, weights, bias)

# Calculate accuracy
accuracy = np.mean(y_pred == y_test)
print()
print("Model Accuracy with Momentum:", accuracy)

Iteration 0: Cost = 0.6931471805599452
Iteration 200: Cost = 0.23570849354744824
Iteration 400: Cost = 0.2215539090463947
Iteration 600: Cost = 0.21724239138429027
Iteration 800: Cost = 0.2154525316791497
Iteration 1000: Cost = 0.2146003421555841
Iteration 1200: Cost = 0.21416218038023158
Iteration 1400: Cost = 0.21392576234997848
Iteration 1600: Cost = 0.21379400807644758
Iteration 1800: Cost = 0.2137189041605106
Iteration 2000: Cost = 0.21367539089311627
Iteration 2200: Cost = 0.2136498776191374
Iteration 2400: Cost = 0.21363478462871094
Iteration 2600: Cost = 0.21362579594078962
Iteration 2800: Cost = 0.21362041535702894
Iteration 3000: Cost = 0.21361718200209553
Iteration 3200: Cost = 0.2136152331630869
Iteration 3400: Cost = 0.21361405582966178
Iteration 3600: Cost = 0.21361334331038634
Iteration 3800: Cost = 0.2136129115000854
Iteration 4000: Cost = 0.21361264952843087

Model Accuracy with Momentum: 0.915


In [20]:
# Define prediction function with user input and validation
def predict_admission_with_momentum():
    # Get user input
    
    jee_score = float(input("Enter your JEE score: "))
    exam_score = float(input("Enter your Amrita exam score: "))

    # Check if both scores are less than 100
    if jee_score >= 100 or exam_score >= 100:
        print("Error: Both scores must be less than 100!")
        return
    
    # Print user input
    print(f"User Input - JEE Score: {jee_score}, Amrita Exam Score: {exam_score}")
    
    # Normalize the user input using the same scaler
    user_input = np.array([[jee_score, exam_score]])
    user_input_normalized = scaler.transform(user_input)
    
    # Make prediction
    admission_prediction = predict(user_input_normalized, weights, bias)
    
    # Print result
    if admission_prediction[0] == 1:
        print("Admission is likely!")
    else:
        print("Admission is not likely.")

# Call the function to predict admission based on user input
predict_admission_with_momentum()


User Input - JEE Score: 45.0, Amrita Exam Score: 56.0
Admission is not likely.
