In [None]:
Question 1: What is Logistic Regression, and how does it differ from Linear regression?
Ans 1: It predicts the probability that a given input belongs to a particular class.
To do this, it uses the logistic (sigmoid) function, which squashes any real-valued number into a range between 0 and 1.

Question 2: Explain the role of the Sigmoid function in Logistic Regression.
Ans 2: The Sigmoid function (also called the logistic function) plays a crucial role in Logistic Regression because it converts a linear
       equation’s output (which can be any real number) into a probability value between 0 and 1.

Question 3: What is Regularization in Logistic Regression and why is it needed?
Ans 3: Regularization is a technique used in Logistic Regression (and other models) to prevent overfitting by adding a penalty term to the
       model’s cost (loss) function.
      It discourages the model from assigning too high importance (large coefficients) to specific features, making the model more generalized
      and less sensitive to noise in the training data.

Question 4: What are some common evaluation metrics for classification models, and why are they important?
Ans 4: When we build a classification model (like Logistic Regression), we need to measure how well it performs.
      These evaluation metrics help us understand how accurately and reliably the model predicts outcomes.
      | Metric        | Best For                        | Key Focus                      |
| ------------- | ------------------------------- | ------------------------------ |
| Accuracy  | Balanced datasets               | Overall correctness            |
| Precision | When false positives are costly | Purity of positive predictions |
| Recall    | When false negatives are costly | Detecting all positives        |
| F1-Score  | Balancing precision & recall    | Combined performance           |
| AUC-ROC   | Measuring separability          | Ranking/class probability      |


In [1]:
# Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,
# splits into train/test sets, trains a Logistic Regression model, and prints its accuracy.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

# Step 2: Load dataset
data = load_breast_cancer()

# Convert to Pandas DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Step 3: Split into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Step 4: Split dataset into Train and Test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Create and train Logistic Regression model
model = LogisticRegression(max_iter=10000)
model.fit(X_train, y_train)

# Step 6: Make predictions
y_pred = model.predict(X_test)

# Step 7: Evaluate model accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of Logistic Regression Model:", accuracy)


Accuracy of Logistic Regression Model: 0.956140350877193


In [2]:
# Question 6: Write a Python program to train a Logistic Regression model using L2
# regularization (Ridge) and print the model coefficients and


# Step 1: Import libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

# Step 2: Load dataset
data = load_breast_cancer()

# Convert to Pandas DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Step 3: Split features and target
X = df.drop('target', axis=1)
y = df['target']

# Step 4: Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Create Logistic Regression model with L2 regularization (default)
model = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=10000)
model.fit(X_train, y_train)

# Step 6: Predict on test data
y_pred = model.predict(X_test)

# Step 7: Evaluate model
accuracy = accuracy_score(y_test, y_pred)

# Step 8: Print coefficients and accuracy
print("Model Coefficients:\n", model.coef_)
print("\nIntercept:", model.intercept_)
print("\nAccuracy of Logistic Regression (L2 Regularization):", accuracy)


Model Coefficients:
 [[ 1.0274368   0.22145051 -0.36213488  0.0254667  -0.15623532 -0.23771256
  -0.53255786 -0.28369224 -0.22668189 -0.03649446 -0.09710208  1.3705667
  -0.18140942 -0.08719575 -0.02245523  0.04736092 -0.04294784 -0.03240188
  -0.03473732  0.01160522  0.11165329 -0.50887722 -0.01555395 -0.016857
  -0.30773117 -0.77270908 -1.42859535 -0.51092923 -0.74689363 -0.10094404]]

Intercept: [28.64871395]

Accuracy of Logistic Regression (L2 Regularization): 0.956140350877193


In [3]:
# Question 7: Write a Python program to train a Logistic Regression model for multiclass
# classification using multi_class='ovr' and print the classification report.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Step 2: Load dataset (Iris dataset - multiclass)
data = load_iris()

# Convert to DataFrame for clarity
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Step 3: Define features and target
X = df.drop('target', axis=1)
y = df['target']

# Step 4: Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Create Logistic Regression model with multiclass='ovr'
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=10000)
model.fit(X_train, y_train)

# Step 6: Make predictions
y_pred = model.predict(X_test)

# Step 7: Print classification report
print("Classification Report for Logistic Regression (One-vs-Rest):\n")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Classification Report for Logistic Regression (One-vs-Rest):

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.89      0.94         9
   virginica       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [4]:
# Question 8: Write a Python program to apply GridSearchCV to tune C and penalty
# hyperparameters for Logistic Regression and print the best parameters and validation
# accuracy.


import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

# Step 2: Load dataset
data = load_breast_cancer()

# Convert to DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Step 3: Split features and target
X = df.drop('target', axis=1)
y = df['target']

# Step 4: Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Define parameter grid for tuning
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],          # Regularization strength
    'penalty': ['l1', 'l2'],               # Type of regularization
    'solver': ['liblinear']                # Supports both L1 and L2
}

# Step 6: Create Logistic Regression model
log_reg = LogisticRegression(max_iter=10000)

# Step 7: Apply GridSearchCV
grid = GridSearchCV(estimator=log_reg, param_grid=param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

# Step 8: Best parameters and validation accuracy
print("Best Parameters found:", grid.best_params_)
print("Best Cross-Validation Accuracy:", grid.best_score_)

# Step 9: Evaluate on test data
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy using Best Model:", test_accuracy)


Best Parameters found: {'C': 100, 'penalty': 'l1', 'solver': 'liblinear'}
Best Cross-Validation Accuracy: 0.9670329670329672
Test Accuracy using Best Model: 0.9824561403508771


In [6]:
# Question 9: Write a Python program to standardize the features before training Logistic
# Regression and compare the model's accuracy with and without scaling.


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

# Step 2: Load dataset
data = load_breast_cancer()

# Convert to DataFrame
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Step 3: Split features and target
X = df.drop('target', axis=1)
y = df['target']

# Step 4: Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Without Scaling
model_no_scaling = LogisticRegression(max_iter=10000)
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)

# With Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=10000)
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

#  Print Results
print("Accuracy WITHOUT Scaling:", accuracy_no_scaling)
print("Accuracy WITH Scaling:", accuracy_scaled)


Accuracy WITHOUT Scaling: 0.956140350877193
Accuracy WITH Scaling: 0.9736842105263158


In [None]:
# Question 10: Imagine you are working at an e-commerce company that wants to
# predict which customers will respond to a marketing campaign. Given an imbalanced
# dataset (only 5% of customers respond), describe the approach you’d take to build a
# Logistic Regression model — including data handling, feature scaling, balancing
# classes, hyperparameter tuning, and evaluating the model for this real-world business
# use case.
Use Logistic Regression with class balancing + scaling
✅ Tune C and penalty
✅ Evaluate using F1-score and AUC, not accuracy
✅ Choose a model that maximizes recall — ensuring potential customers aren’t missed.
