Question 1: What is Logistic Regression, and how does it differ from Linear Regression?

>>Answer:
Logistic Regression is a classification algorithm used to predict categorical outcomes (e.g., Yes/No, Spam/Not Spam).
It estimates the probability of a data point belonging to a class using the sigmoid function, which outputs values between 0 and 1.

Differences from Linear Regression:

- Purpose:

Linear Regression → Predicts continuous values (e.g., price, temperature).

Logistic Regression → Predicts probabilities for classification.

- Output:

Linear Regression → Produces real values (−∞ to +∞).

Logistic Regression → Produces probabilities (0 to 1).

- Function Used:

Linear Regression → Uses a straight line equation.

Logistic Regression → Uses a sigmoid function to map predictions.-

Question 2: Explain the role of the Sigmoid function in Logistic Regression.

>>Answer:
The sigmoid function is defined as:

σ(z)=1/1+e^-z

It converts the linear combination of inputs into a probability value between 0 and 1.

Helps decide the class:

- If probability > 0.5 → Class 1.

- If probability ≤ 0.5 → Class 0.

Without sigmoid, Logistic Regression would behave like Linear Regression and not be useful for classification.

Question 3: What is Regularization in Logistic Regression and why is it needed?

>>Answer:
Regularization is a technique to prevent overfitting by adding a penalty to large coefficients in Logistic Regression.

Types:

L1 (Lasso): Encourages sparsity (some coefficients become zero).

L2 (Ridge): Shrinks coefficients but keeps all nonzero.

Why needed?

Prevents the model from memorizing training data.

Improves generalization on unseen data.

Controls model complexity.

Question 4: What are some common evaluation metrics for classification models, and why are they important?

>>Answer:

- Accuracy: % of correct predictions. (Good for balanced datasets).

- Precision: Of predicted positives, how many are truly positive. (Useful in spam detection).

- Recall (Sensitivity): Of actual positives, how many were detected. (Useful in medical tests).

- F1 Score: Harmonic mean of precision & recall, balances both.

- ROC-AUC: Measures the ability of the model to distinguish classes.

Importance: These metrics ensure we evaluate models correctly, especially in imbalanced datasets where accuracy alone is misleading.

In [1]:
#Question 5: Python program (basic Logistic Regression)
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Use only binary classification (two classes)
X, y = X[y != 2], y[y != 2]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 1.0


In [2]:
#Question 6: Logistic Regression with L2 Regularization (Ridge)

from sklearn.linear_model import LogisticRegression

# Train Logistic Regression with L2 regularization
model = LogisticRegression(penalty='l2', solver='liblinear')
model.fit(X_train, y_train)

print("Model Coefficients:", model.coef_)
print("Accuracy:", accuracy_score(y_test, model.predict(X_test)))

Model Coefficients: [[-0.35865859 -1.36186707  2.09037258  0.94442534]]
Accuracy: 1.0


In [3]:
#Question 7: Logistic Regression (Multiclass with One-vs-Rest)
from sklearn.metrics import classification_report

# Train Logistic Regression for multiclass
model = LogisticRegression(multi_class='ovr', max_iter=200)
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))



Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        17
           1       1.00      1.00      1.00        13

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [4]:
#Question 8: Logistic Regression with GridSearchCV
from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2']
}

# Logistic Regression with GridSearchCV
grid = GridSearchCV(LogisticRegression(solver='liblinear'), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Validation Accuracy:", grid.best_score_)


Best Parameters: {'C': 0.01, 'penalty': 'l2'}
Validation Accuracy: 1.0


In [5]:
#Question 9: Standardization Effect

from sklearn.preprocessing import StandardScaler

# Without scaling
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
print("Accuracy without scaling:", accuracy_score(y_test, model.predict(X_test)))

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=200)
model_scaled.fit(X_train_scaled, y_train)
print("Accuracy with scaling:", accuracy_score(y_test, model_scaled.predict(X_test_scaled)))

Accuracy without scaling: 1.0
Accuracy with scaling: 1.0


Question 10: Real-world E-commerce Case (Imbalanced Dataset)
>>Answer:
For predicting customer response (only 5% positive cases), the steps:

1. Data Handling:

Remove duplicates, handle missing values, encode categorical features.

2. Feature Scaling:

Standardize features using StandardScaler for stable model training.

3. Balancing Classes:

Use SMOTE (oversampling) or class weights in Logistic Regression.

Example: LogisticRegression(class_weight='balanced').

4. Hyperparameter Tuning:

Tune C (regularization strength) and penalty using GridSearchCV.

5. Evaluation Metrics:

Use Precision, Recall, F1-score, ROC-AUC (not just accuracy).

Focus on Recall to capture maximum responders.

6. Business Impact:

Helps target only likely responders, saving marketing cost and improving ROI.