Question 1: What is Logistic Regression, and how does it differ from Linear
Regression?

Answer:

Logistic Regression is a statistical method used for classification problems (e.g., predicting whether something is "yes/no", "spam/not spam"). It models the probability of an outcome using the sigmoid (logistic) function, which maps values to a range between 0 and 1.

Difference from Linear Regression:

Linear Regression predicts a continuous value (e.g., house price).

Logistic Regression predicts a probability of a categorical outcome, usually binary (0 or 1).


Question 2: Explain the role of the Sigmoid function in Logistic Regression.

Answer:

The Sigmoid function in Logistic Regression converts the linear combination of input features (which can range from 
−∞ to +∞) into a value between 0 and 1.

Formula: 

σ(z) = 1 / (1+e^z)

It maps the output to a probability, which helps decide the class.

If probability ≥ 0.5 → class 1

If probability < 0.5 → class 0


Question 3: What is Regularization in Logistic Regression and why is it needed?

Answer:

Regularization in Logistic Regression is a technique used to prevent overfitting by adding a penalty term to the loss function, discouraging the model from assigning too much importance (large coefficients) to any feature.

Types:

L1 (Lasso): Encourages sparsity, some coefficients become zero.

L2 (Ridge): Shrinks coefficients but keeps them small and non-zero.

Why needed?

Prevents overfitting on training data.

Improves generalization to unseen data.

Helps handle multicollinearity among features.



Question 4: What are some common evaluation metrics for classification models, and
why are they important?

Answer:

Common evaluation metrics for classification models:

Accuracy: Proportion of correctly predicted instances.

Precision: Out of predicted positives, how many are actually positive.

Recall (Sensitivity): Out of actual positives, how many are correctly identified.

F1-Score: Harmonic mean of Precision and Recall, balances both.

ROC-AUC: Measures model’s ability to distinguish between classes across thresholds.

Confusion Matrix: Summarizes TP, FP, TN, FN for deeper error analysis.

Importance:
They give a complete picture of model performance, especially in imbalanced datasets where accuracy alone can be misleading.



In [1]:
'''
Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)
'''

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 0.956140350877193


In [2]:
'''
Question 6: Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coefficients and accuracy.
(Use Dataset from sklearn package)

'''

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)


model = LogisticRegression(penalty='l2', max_iter=10000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Model Coefficients:", model.coef_)
print("Model Intercept:", model.intercept_)
print("Accuracy:", accuracy)


Model Coefficients: [[ 0.99259094  0.2262027  -0.36800922  0.02614103 -0.15872224 -0.22778509
  -0.5314163  -0.28830881 -0.2260235  -0.03563039 -0.09579112  1.39441518
  -0.15067047 -0.09005958 -0.02349575  0.05615668 -0.03581614 -0.03266777
  -0.03170468  0.01291438  0.1063987  -0.51439774 -0.01758165 -0.01655034
  -0.3153412  -0.74998326 -1.43081686 -0.51765878 -0.744688   -0.09811291]]
Model Intercept: [29.3453142]
Accuracy: 0.956140350877193


In [3]:
'''
Question 7: Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)
'''

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)


model = LogisticRegression(multi_class='ovr', max_iter=10000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))


Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30



In [5]:
'''
Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.
(Use Dataset from sklearn package)

'''

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

log_reg = LogisticRegression(max_iter=10000, solver='liblinear')

param_grid = {'C': [0.01, 0.1, 1, 10, 100],'penalty': ['l1', 'l2']}

grid = GridSearchCV(log_reg, param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Cross-Validation Accuracy:", grid.best_score_)
print("Test Accuracy:", grid.score(X_test, y_test))


Best Parameters: {'C': 100, 'penalty': 'l1'}
Best Cross-Validation Accuracy: 0.9670329670329672
Test Accuracy: 0.9824561403508771


In [6]:
'''
Question 9: Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.
(Use Dataset from sklearn package)

'''

import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

model_no_scaling = LogisticRegression(max_iter=10000)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaling = LogisticRegression(max_iter=10000)
model_scaling.fit(X_train_scaled, y_train)
y_pred_scaling = model_scaling.predict(X_test_scaled)
accuracy_scaling = accuracy_score(y_test, y_pred_scaling)

print("Accuracy without Scaling:", accuracy_no_scaling)
print("Accuracy with Scaling:", accuracy_scaling)


Accuracy without Scaling: 0.956140350877193
Accuracy with Scaling: 0.9736842105263158


Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.


Answer:

The steps that I would take to build a Logistic Regression Model would be :

1. Data Preprocessing and Feature Engineering – Clean the dataset, handle missing values, remove outliers, and create meaningful features.

2. Feature Scaling – Apply scaling techniques such as Standardization or Min-Max Normalization to bring features to a similar scale.

3. Class Imbalance Handling – Use resampling techniques (e.g., Synthetic Minority Oversampling Technique – SMOTE) or Logistic Regression with class_weight='balanced' to address the imbalance.

4. Hyperparameter Tuning – Perform Grid Search Cross-Validation (GridSearchCV) or Randomized Search Cross-Validation (RandomizedSearchCV) to optimize hyperparameters such as C (regularization strength) and penalty (L1 or L2).

5. Model Evaluation – Evaluate performance using metrics beyond accuracy, such as Precision, Recall, F1-Score, Receiver Operating Characteristic – Area Under Curve (ROC-AUC), and Precision-Recall Area Under Curve (PR-AUC).

6. Business Alignment and Threshold Tuning – Adjust the classification threshold according to business goals (e.g., focus on Recall if the goal is to capture maximum responders, or focus on Precision if the cost of contacting non-responders is high).