Q1: What is Logistic Regression, and how does it differ from Linear Regression?

Answer:

Logistic Regression is a supervised machine learning algorithm used for classification problems, where the target variable is categorical (e.g., Yes/No, 0/1).

Instead of predicting a continuous value like Linear Regression, Logistic Regression predicts the probability that an instance belongs to a certain class.

It uses the sigmoid (logistic) function to map linear combinations of features to probabilities between 0 and 1.

Key differences:

Linear Regression: Used for regression tasks (continuous outputs). Output = any real number.

Logistic Regression: Used for classification tasks (categorical outputs). Output = probability (0–1).

Q2: Explain the role of the Sigmoid function in Logistic Regression.

Answer:

The sigmoid function converts the linear regression output into a probability between 0 and 1.

Formula:
σ(z)=
1/(1+e
−z
)
	​


If probability > 0.5 → class 1, otherwise class 0.

It ensures predictions are interpretable as probabilities.

Q3: What is Regularization in Logistic Regression and why is it needed?

Answer:

Regularization is a technique used to prevent overfitting by penalizing large coefficients in the model.

Two common types:

L1 (Lasso): Shrinks some coefficients to zero (feature selection).

L2 (Ridge): Shrinks coefficients smoothly but keeps all features.

Needed because logistic regression can overfit when too many features exist or when features are highly correlated.

Q4:  What are some common evaluation metrics for classification models, and
why are they important?

Answer:

Accuracy: Ratio of correctly predicted observations to total observations.

Precision: Out of predicted positives, how many are truly positive.

Recall (Sensitivity): Out of actual positives, how many were predicted correctly.

F1-Score: Harmonic mean of precision and recall, useful for imbalanced datasets.

ROC-AUC: Measures model’s ability to distinguish between classes across thresholds.

These metrics are important because accuracy alone can be misleading, especially in imbalanced datasets.

Q5: Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)

Answer:

In [21]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

import warnings
warnings.filterwarnings('ignore')

#dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
# dataframe formation
df = pd.DataFrame(data = data.data,columns=data.feature_names)
df['target']= data.target

In [22]:
#split dataset
X = df.drop('target',axis = 1)
y=df['target']

#train and test
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=.20,random_state=42)
# training of logistic regression model
log_reg_model = LogisticRegression()
log_reg_model.fit(X_train,y_train)

y_pred = log_reg_model.predict(X_test)

print(f'The accuracy of the model is : {accuracy_score(y_test,y_pred)}')

The accuracy of the model is : 0.956140350877193


Question 6:  Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coefficients and accuracy.
(Use Dataset from sklearn package)

Answer:

In [23]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

import warnings
warnings.filterwarnings('ignore')

#dataset
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
# dataframe formation
df = pd.DataFrame(data = data.data,columns=data.feature_names)
df['target']= data.target
# split data
X =df.drop('target',axis = 1)
y= df.target
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size =.20,random_state = 42)
#model train
model = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=5000)
model.fit(X_train, y_train)

print("Model Coefficients:", model.coef_)
print("\nAccuracy:", model.score(X_test, y_test))


Model Coefficients: [[ 1.0274368   0.22145051 -0.36213488  0.0254667  -0.15623532 -0.23771256
  -0.53255786 -0.28369224 -0.22668189 -0.03649446 -0.09710208  1.3705667
  -0.18140942 -0.08719575 -0.02245523  0.04736092 -0.04294784 -0.03240188
  -0.03473732  0.01160522  0.11165329 -0.50887722 -0.01555395 -0.016857
  -0.30773117 -0.77270908 -1.42859535 -0.51092923 -0.74689363 -0.10094404]]

Accuracy: 0.956140350877193


Question 7: Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)

Answer:

In [24]:
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# OVR Logistic Regression
model = LogisticRegression(multi_class='ovr', max_iter=5000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

print(f'The classification report is : \n {classification_report(y_test, y_pred)}')


The classification report is : 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.85      0.92        13
           2       0.87      1.00      0.93        13

    accuracy                           0.96        45
   macro avg       0.96      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.
(Use Dataset from sklearn package)

Answer:

In [25]:
from sklearn.model_selection import GridSearchCV

params= {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2']
}
log_reg = LogisticRegression(max_iter = 1000)
grid = GridSearchCV(estimator = log_reg, param_grid=params, cv=5,verbose =2, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)


Fitting 5 folds for each of 8 candidates, totalling 40 fits
[CV] END .................................C=0.01, penalty=l1; total time=   0.0s
[CV] END .................................C=0.01, penalty=l1; total time=   0.0s
[CV] END .................................C=0.01, penalty=l1; total time=   0.0s
[CV] END .................................C=0.01, penalty=l1; total time=   0.0s
[CV] END .................................C=0.01, penalty=l1; total time=   0.0s
[CV] END .................................C=0.01, penalty=l2; total time=   0.0s
[CV] END .................................C=0.01, penalty=l2; total time=   0.0s
[CV] END .................................C=0.01, penalty=l2; total time=   0.0s
[CV] END .................................C=0.01, penalty=l2; total time=   0.0s
[CV] END .................................C=0.01, penalty=l2; total time=   0.0s
[CV] END ..................................C=0.1, penalty=l1; total time=   0.0s
[CV] END ..................................C=0.1,

Question 9: Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.
(Use Dataset from sklearn package)

Answer:

In [26]:
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

from sklearn.preprocessing import StandardScaler

# Without scaling
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)
print("Accuracy without scaling:", model.score(X_test, y_test))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=5000)
model_scaled.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", model_scaled.score(X_test_scaled, y_test))


Accuracy without scaling: 1.0
Accuracy with scaling: 1.0


In [27]:
X_train.shape

(105, 4)

Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.

Answer:

Steps to handle imbalanced logistic regression problem:
1. Data handling: Clean missing values, encode categorical variables.

2. Feature scaling: Standardize/normalize features.

3. Balancing classes: Use techniques like SMOTE (oversampling) or class weights in Logistic Regression (class_weight='balanced').

4. Hyperparameter tuning: Use GridSearchCV to tune C, penalty, solver.

5. Evaluation metrics: Use Precision, Recall, F1-score, ROC-AUC instead of only accuracy (important due to imbalance).

6. Business perspective: Prioritize recall (catching responders), because missing a potential customer is more costly than wrongly targeting a non-responder.