In [None]:
Question 1: What is Logistic Regression, and how does it differ from Linear regression?

In [None]:
Logistic regression is a statistical model used to predict a **categorical outcome** (often binary, like yes/no) by estimating probabilities using the **logistic (sigmoid) function**, which outputs values between 0 and 1.

It differs from linear regression in that:

* **Linear regression** predicts a continuous value.
* **Logistic regression** predicts the probability of a class, not a continuous number, and uses a nonlinear transformation (sigmoid) instead of a straight line.

In [None]:
Question 2: Explain the role of the Sigmoid function in Logistic Regression.

In [None]:
In logistic regression, the **sigmoid function** converts the linear combination of inputs (which can be any real number) into a value between **0 and 1**, representing the probability of the positive class.

Its role is to:

1. **Map outputs to probabilities** → Ensures predictions stay in the valid range (0–1).
2. **Enable classification** → By applying a threshold (e.g., ≥0.5), we decide the class label.


In [None]:
Question 3: What is Regularization in Logistic Regression and why is it needed?

In [None]:
Regularization in logistic regression is a technique used to **prevent overfitting** by adding a penalty term to the loss function that discourages overly complex models (large coefficient values).

It’s needed because:

* It **controls model complexity**, improving generalization to unseen data.
* Helps **reduce variance** while keeping bias reasonable.
* Common types: **L1 (Lasso)** → can shrink some coefficients to zero; **L2 (Ridge)** → shrinks coefficients but keeps them nonzero.


In [None]:
Question 4: What are some common evaluation metrics for classification models, and why are the important?

In [None]:
Common evaluation metrics for classification models include:

1. **Accuracy** – Proportion of correct predictions; simple but can be misleading if data is imbalanced.
2. **Precision** – Of the predicted positives, how many are actually positive; important when false positives are costly.
3. **Recall (Sensitivity)** – Of the actual positives, how many were correctly predicted; important when missing positives is costly.
4. **F1-Score** – Harmonic mean of precision and recall; balances the two, useful in imbalanced datasets.
5. **ROC-AUC** – Measures model’s ability to distinguish classes across thresholds; higher is better.

They’re important because they **reveal different aspects of model performance** and help choose the right model depending on the problem’s priorities.

In [None]:
5: Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [2]:
# Import required libraries
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from sklearn and convert to DataFrame
data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Split features and target
X = df.drop('target', axis=1)
y = df['target']

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Create and train logistic regression model
model = LogisticRegression(max_iter=10000)  # Increase iterations to ensure convergence
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")


Accuracy: 0.9561


In [None]:
6: Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coefficients and accuracy.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import pandas as pd

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Logistic Regression with L2 regularization (Ridge)
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=10000)
model.fit(X_train, y_train)

# Predictions and accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Model Coefficients:", model.coef_)
print(f"Accuracy: {accuracy:.4f}")


Model Coefficients: [[ 0.97796466  0.22675499 -0.36921764  0.02644054 -0.15485375 -0.22665079
  -0.5186091  -0.27936438 -0.22284174 -0.03509306 -0.09377994  1.39092772
  -0.17022173 -0.08877402 -0.02215899  0.05164999 -0.03656395 -0.03142397
  -0.03290299  0.01227996  0.09595287 -0.51563694 -0.01698607 -0.01657517
  -0.30594188 -0.74668265 -1.39907242 -0.50342187 -0.73505594 -0.09765041]]
Accuracy: 0.9561


In [None]:
7: Write a Python program to train a Logistic Regression model for multiclass
classification using multi_class='ovr' and print the classification report.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
import pandas as pd

# Load Iris dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Logistic Regression with One-vs-Rest strategy
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=1000)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.89      0.94         9
   virginica       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [None]:
8: Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation
accuracy.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [5]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Logistic Regression model
log_reg = LogisticRegression(max_iter=10000, solver='liblinear')

# Hyperparameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2']
}

# GridSearchCV
grid_search = GridSearchCV(log_reg, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)




Best Parameters: {'C': 100, 'penalty': 'l1'}
Best Cross-Validation Accuracy: 0.9670


In [None]:
Question 9: Write a Python program to standardize the features before training Logistic
Regression and compare the model's accuracy with and without scaling.
(Use Dataset from sklearn package)
(Include your Python code and output in the code box below.)

In [7]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import pandas as pd

# Load dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Logistic Regression without scaling
model_no_scale = LogisticRegression(max_iter=10000)
model_no_scale.fit(X_train, y_train)
y_pred_no_scale = model_no_scale.predict(X_test)
acc_no_scale = accuracy_score(y_test, y_pred_no_scale)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression with scaling
model_scale = LogisticRegression(max_iter=10000)
model_scale.fit(X_train_scaled, y_train)
y_pred_scale = model_scale.predict(X_test_scaled)
acc_scale = accuracy_score(y_test, y_pred_scale)

# Results
print(f"Accuracy without scaling: {acc_no_scale:.4f}")
print(f"Accuracy with scaling: {acc_scale:.4f}")


Accuracy without scaling: 0.9561
Accuracy with scaling: 0.9737


In [None]:
Question 10: Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced
dataset (only 5% of customers respond), describe the approach you’d take to build a
Logistic Regression model — including data handling, feature scaling, balancing
classes, hyperparameter tuning, and evaluating the model for this real-world business
use case.