Question 1: What is Logistic Regression, and how does it differ from Linear
Regression?

Answer:Logistic Regression is a supervised machine learning algorithm used for classification tasks, predicting the probability that an input belongs to a particular class, often binary (0 or 1). It differs from Linear Regression, which predicts continuous numeric values, by using the sigmoid function to output probabilities instead of a straight line.

Question 2: Explain the role of the Sigmoid function in Logistic Regression.

Answer:
The sigmoid function, also called the logistic function, transforms the linear combination of features into a value between 0 and 1, representing probability. It creates the characteristic S-shaped curve and ensures outputs are interpretable as probabilities, crucial for classification.

Question 3: What is Regularization in Logistic Regression and why is it needed?

Answer:
Regularization is a technique to prevent overfitting by penalizing large coefficients in the model. In Logistic Regression, it helps the model generalize better to new, unseen data by discouraging complex solutions that fit noise in the training set, often using L1 (Lasso) or L2 (Ridge) penalties.

Question 4: What are some common evaluation metrics for classification models, and
why are they important?

Answer:Common metrics are accuracy, precision, recall, F1-score, and ROC-AUC. These metrics are crucial because they provide different perspectives on model performance, particularly for imbalanced datasets – accuracy might be misleading when classes are unequal, so F1 or AUC are often more informative



In [1]:
#Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,
#splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
#(Use Dataset from sklearn package)

#Answer:

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split


data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Accuracy
accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)


Accuracy: 1.0


In [2]:
#Question 6: Write a Python program to train a Logistic Regression model using L2
#regularization (Ridge) and print the model coefficients and accuracy.
#(Use Dataset from sklearn package)

#Answer

model_l2 = LogisticRegression(penalty='l2', max_iter=200)
model_l2.fit(X_train, y_train)

print("Coefficients:", model_l2.coef_)
print("Accuracy:", model_l2.score(X_test, y_test))


Coefficients: [[-0.39345607  0.96251768 -2.37512436 -0.99874594]
 [ 0.50843279 -0.25482714 -0.21301129 -0.77574766]
 [-0.11497673 -0.70769055  2.58813565  1.7744936 ]]
Accuracy: 1.0


In [3]:
#Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
#hyperparameters for Logistic Regression and print the best parameters and validation
#accuracy.
#(Use Dataset from sklearn package)

from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}
grid = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Validation Accuracy:", grid.best_score_)



Best Parameters: {'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}
Validation Accuracy: 0.9583333333333334


In [4]:
#Question 9: Write a Python program to standardize the features before training Logistic
#egression and compare the model's accuracy with and without scaling.
#(Use Dataset from sklearn package)

from sklearn.preprocessing import StandardScaler

# Without scaling
model_no_scaling = LogisticRegression(max_iter=200)
model_no_scaling.fit(X_train, y_train)
acc_no_scaling = model_no_scaling.score(X_test, y_test)

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaling = LogisticRegression(max_iter=200)
model_scaling.fit(X_train_scaled, y_train)
acc_scaling = model_scaling.score(X_test_scaled, y_test)

print("Accuracy without scaling:", acc_no_scaling)
print("Accuracy with scaling:", acc_scaling)



Accuracy without scaling: 1.0
Accuracy with scaling: 1.0




Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.


Answer:
For an imbalanced dataset (5% response rate):

Data Handling: Use techniques like stratified sampling and treat missing data carefully.

Feature Scaling: Apply scaling methods for better model convergence.

Balancing Classes: Use resampling (oversample minority or undersample majority), or algorithms like SMOTE.

Hyperparameter Tuning: Tune parameters (e.g., C, penalty) using GridSearchCV to optimize for recall or F1-score, not just accuracy.

Evaluation: Prefer metrics like ROC-AUC, precision, recall, and F1-score to correctly assess model performance on minority (responding) class.

This ensures a reliable and business-relevant solution for predicting customer responses in real-world e-commerce.

