1. What is Logistic Regression, and how does it differ from Linear
Regression?
    - Logistic Regression is a supervised learning algorithm used for classification tasks, predicting discrete (often binary) outcomes by estimating the probability that a given input belongs to a particular class. Unlike Linear Regression, which predicts continuous values using a linear combination of input features, logistic regression applies a non-linear transformation to constrain output between 0 and 1, representing probabilities. Linear regression uses least squares to minimize errors, while logistic regression employs maximum likelihood estimation, modeling log-odds rather than numeric values.

2. Explain the role of the Sigmoid function in Logistic Regression.
    - The sigmoid function, also called the logistic function, transforms the linear model output into a probability bounded between 0 and 1. Its S-shaped curve ensures that predicted values map naturally to probabilities suitable for binary classification.

3. What is Regularization in Logistic Regression and why is it needed?
    - Regularization is a technique to prevent overfitting by penalizing large coefficients in the logistic regression model, typically using L1 (lasso) or L2 (ridge) penalties. Regularization is essential, especially with many features or small datasets, since it encourages simpler models that generalize better by discouraging reliance on individual variables, enhancing predictive performance.

4. What are some common evaluation metrics for classification models, and why are they important?
    - Common Evaluation Metrics for Classification Models and Their Importance
Accuracy: Proportion of correct predictions among total cases, best for balanced datasets.

Precision: Proportion of positive predictions that are truly positive, important when false positives matter.

Recall (Sensitivity): Proportion of actual positives correctly identified, crucial when false negatives are costly.

F1 Score: Harmonic mean of precision and recall, balancing both for imbalanced data.

ROC-AUC: Area under the Receiver Operating Characteristic curve, measures discriminative power.
These metrics guide model assessment, highlighting strengths and weaknesses with respect to specific business needs.

5. Write a Python program that loads a CSV file into a Pandas DataFrame,
splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.

In [1]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()

df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

df.to_csv("breast_cancer.csv", index=False)
df_loaded = pd.read_csv("breast_cancer.csv")

X = df_loaded.drop("target", axis=1)
y = df_loaded["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression(max_iter=5000)  
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Logistic Regression Model Accuracy:", accuracy)


Logistic Regression Model Accuracy: 0.956140350877193


6. Write a Python program to train a Logistic Regression model using L2
regularization (Ridge) and print the model coefficients and accuracy.

In [2]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression(penalty="l2", solver="lbfgs", max_iter=5000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Model Coefficients:\n", model.coef_)
print("\nIntercept:\n", model.intercept_)
print("\nAccuracy:", accuracy)


Model Coefficients:
 [[ 0.98208299  0.22519686 -0.36688444  0.0262268  -0.15507824 -0.22867976
  -0.52338614 -0.2793554  -0.22391176 -0.03605388 -0.09476544  1.39135347
  -0.16429246 -0.08903006 -0.02250974  0.04944847 -0.04186075 -0.03193634
  -0.03298528  0.01189208  0.10400464 -0.51389384 -0.01711567 -0.01662253
  -0.30695364 -0.75341491 -1.41533107 -0.50382259 -0.73542849 -0.09913574]]

Intercept:
 [29.38522795]

Accuracy: 0.956140350877193


7. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr' and print the classification report.

In [3]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LogisticRegression(multi_class="ovr", solver="lbfgs", max_iter=5000)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Classification Report:

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.89      0.94         9
   virginica       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





8. Write a Python program to apply GridSearchCV to tune C and penalty
hyperparameters for Logistic Regression and print the best parameters and validation accuracy.

In [4]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression

data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

param_grid = {
    "C": [0.01, 0.1, 1, 10, 100],   
    "penalty": ["l1", "l2"],        
    "solver": ["liblinear"]         
}

log_reg = LogisticRegression(max_iter=5000, multi_class="ovr")
grid_search = GridSearchCV(log_reg, param_grid, cv=5, scoring="accuracy")
grid_search.fit(X_train, y_train)

print("Best Parameters:", grid_search.best_params_)
print("Best Cross-Validation Accuracy:", grid_search.best_score_)




Best Parameters: {'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}
Best Cross-Validation Accuracy: 0.9583333333333334




9. Write a Python program to standardize the features before training Logistic Regression and compare the model's accuracy with and without scaling.

In [5]:
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
df["target"] = data.target

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


model_no_scaling = LogisticRegression(max_iter=5000, solver="lbfgs")
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_with_scaling = LogisticRegression(max_iter=5000, solver="lbfgs")
model_with_scaling.fit(X_train_scaled, y_train)
y_pred_with_scaling = model_with_scaling.predict(X_test_scaled)
accuracy_with_scaling = accuracy_score(y_test, y_pred_with_scaling)


print("Accuracy without scaling:", accuracy_no_scaling)
print("Accuracy with scaling   :", accuracy_with_scaling)


Accuracy without scaling: 0.956140350877193
Accuracy with scaling   : 0.9736842105263158


10. Imagine you are working at an e-commerce company that wants to
predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.
    - To build a Logistic Regression model for predicting campaign response, I would first clean and preprocess the data, including handling missing values, encoding categorical features, and scaling numerical variables. Since the dataset is highly imbalanced, I’d address this using class weights (class_weight="balanced") or resampling methods like SMOTE. Then, I’d train the model with feature scaling applied and tune hyperparameters (C, penalty, solver) using GridSearchCV with stratified cross-validation. For evaluation, I’d avoid plain accuracy and instead rely on precision, recall, F1-score, ROC-AUC, and PR-AUC, along with business metrics like lift and gain charts to measure real campaign impact. Finally, I’d deploy the model to assign probabilities to customers and target the top-ranked group for the marketing campaign.