## Question 1: What is Logistic Regression, and how does it differ from Linear Regression?


Answer: Logistic Regression is a supervised learning algorithm used for classification problems, like predicting whether an email is spam or not spam.
It predicts categorical outcomes (like Yes/No, 0/1) using the sigmoid function, which gives outputs between 0 and 1 — interpreted as probabilities.

Logistic Regression is different from Linear Regression as :

* Linear Regression is used for predicting continuous value while Logistic Regression is used for predicting categorical values.

* In Linear Regression straight lines used while in Logistic Regession sigmoid function used.

## Question 2: Explain the role of the Sigmoid function in Logistic Regression.

Answer: The Sigmoid function plays a very important role in Logistic Regression. It helps to convert the output of a linear equation (which can be any number, positive or negative) into a value between 0 and 1.

This value represents the probability of a data point belonging to a particular class.

## Question 3: What is Regularization in Logistic Regression and why is it needed?

Answer:  Regularization in Logistic Regression is a method used to reduce overfitting — when a model learns the training data too well but performs poorly on new data.

It works by adding a penalty term to the cost (loss) function, which keeps the model’s weights small and prevents it from becoming too complex.

It is needed to helps the model stay simple, stable, and generalize better to unseen data by controlling large weight values.

## Question 4: What are some common evaluation metrics for classification models, and why are they important?


Answer: In classification models like Logistic Regression, evaluation metrics help us understand how well the model is performing. Some common metrics are:

* Accuracy:

  * Tells how many predictions were correct.


  * Useful when classes are balanced (equal number of 0s and 1s).

* Precision:

   * Shows how many predicted positives were actually positive.

    * Important when false positives are costly (like predicting a person has a disease when they don’t).

*  Recall :

   * Shows how many actual positives were correctly identified.

    * Important when missing positives is risky (like missing a cancer case).

*  F1 Score:

    * The harmonic mean of Precision and Recall.

     * Useful when classes are imbalanced.

* ROC-AUC Score:

   * Measures how well the model separates classes (0 and 1).

   * Higher AUC means better classification.

In [1]:
# Question 5: Write a Python program that loads a CSV file into a Pandas DataFrame,splits into train/test sets,
# trains a Logistic Regression model, and prints its accuracy.(Use Dataset from sklearn package)
# (Include your Python code and output in the code box below.)

# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load dataset (Iris dataset from sklearn)
iris = load_iris()
X = iris.data
y = iris.target

# Convert to DataFrame (optional)
df = pd.DataFrame(X, columns=iris.feature_names)
df['target'] = y

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train Logistic Regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Calculate and print accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)



Model Accuracy: 1.0


In [4]:
# Question 6: Write a Python program to train a Logistic Regression model using L2 regularization (Ridge)
# and print the model coefficients and accuracy. (Use Dataset from sklearn package)
# (Include your Python code and output in the code box below.)

# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Logistic Regression model with L2 regularization (default)
model = LogisticRegression(penalty='l2', max_iter=200)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Print model coefficients and accuracy
print("Model Coefficients:\n", model.coef_)
print("\nModel Accuracy:", accuracy_score(y_test, y_pred))



Model Coefficients:
 [[-0.39345607  0.96251768 -2.37512436 -0.99874594]
 [ 0.50843279 -0.25482714 -0.21301129 -0.77574766]
 [-0.11497673 -0.70769055  2.58813565  1.7744936 ]]

Model Accuracy: 1.0


In [5]:
# Question 7: Write a Python program to train a Logistic Regression model for multiclass classification
# using multi_class='ovr' and print the classification report.(Use Dataset from sklearn package)
# (Include your Python code and output in the code box below.)


# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import classification_report

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create Logistic Regression model for multiclass classification (One-vs-Rest)
model = LogisticRegression(multi_class='ovr', max_iter=200)
model.fit(X_train, y_train)

# Predict on test data
y_pred = model.predict(X_test)

# Print the classification report
print("Classification Report:\n")
print(classification_report(y_test, y_pred))



Classification Report:

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      0.89      0.94         9
           2       0.92      1.00      0.96        11

    accuracy                           0.97        30
   macro avg       0.97      0.96      0.97        30
weighted avg       0.97      0.97      0.97        30





In [6]:
# Question 8: Write a Python program to apply GridSearchCV to tune C and penalty hyperparameters for Logistic Regression
# and print the best parameters and validation accuracy. (Use Dataset from sklearn package)
# (Include your Python code and output in the code box below.)



# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create Logistic Regression model
model = LogisticRegression(max_iter=200)

# Define the parameter grid for tuning
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

# Create GridSearchCV
grid = GridSearchCV(model, param_grid, cv=5)
grid.fit(X_train, y_train)

# Predict on test data using best model
y_pred = grid.best_estimator_.predict(X_test)

# Print best parameters and accuracy
print("Best Parameters:", grid.best_params_)
print("Validation Accuracy:", accuracy_score(y_test, y_pred))



Best Parameters: {'C': 10, 'penalty': 'l1', 'solver': 'liblinear'}
Validation Accuracy: 1.0


In [7]:
# Question 9: Write a Python program to standardize the features before training Logistic Regression and
# compare the model's accuracy with and without scaling. (Use Dataset from sklearn package)
# (Include your Python code and output in the code box below.)


# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# --- Model without Scaling ---
model1 = LogisticRegression(max_iter=200)
model1.fit(X_train, y_train)
y_pred1 = model1.predict(X_test)
accuracy_without_scaling = accuracy_score(y_test, y_pred1)

# --- Model with Standardization ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model2 = LogisticRegression(max_iter=200)
model2.fit(X_train_scaled, y_train)
y_pred2 = model2.predict(X_test_scaled)
accuracy_with_scaling = accuracy_score(y_test, y_pred2)

# Print both accuracies
print("Accuracy without Scaling:", accuracy_without_scaling)
print("Accuracy with Scaling:", accuracy_with_scaling)


Accuracy without Scaling: 1.0
Accuracy with Scaling: 1.0


## Question 10: Imagine you are working at an e-commerce company that wants to predict which customers will respond to a marketing campaign. Given an imbalanced dataset (only 5% of customers respond), describe the approach you’d take to build a Logistic Regression model — including data handling, feature scaling, balancing classes, hyperparameter tuning, and evaluating the model for this real-world business use case.


Answer: 1. Understand the Data

* Identify features (e.g., age, purchase history, browsing time) and the target (responded: 0/1).

* Check for missing values and handle them (impute or remove).

* Explore data distribution, especially the imbalance: only 5% respond.

2. Feature Scaling

* Use StandardScaler or MinMaxScaler to scale numeric features.

* Scaling helps Logistic Regression perform better because it depends on weights of features.

3. Handle Class Imbalance

* Since only 5% respond, the dataset is heavily imbalanced.

* Techniques to handle this:

1. Resampling:

  * Oversample minority class (e.g., SMOTE) or undersample majority class.

2.  Class weights:

   * In Logistic Regression, set class_weight='balanced' to give more importance to minority class.

4. Split Data

* Split into training and test sets (e.g., 80/20).

* Consider stratified split to preserve class distribution.

5. Train Logistic Regression

* Use L2 regularization (default) to avoid overfitting.

* Include class weights to handle imbalance.

* Fit the model on scaled training data.

6. Hyperparameter Tuning

* Use GridSearchCV or RandomizedSearchCV to tune:

  * C (regularization strength)

   * penalty (L1 or L2)

   * solver

* Use cross-validation to avoid overfitting and select the best model.

7. Model Evaluation

* Accuracy is not enough for imbalanced data. Use metrics like:

* Precision: How many predicted responders actually responded.

* Recall (Sensitivity): How many actual responders were correctly predicted.

* F1-score: Balance between precision and recall.

* ROC-AUC: Overall ability to distinguish responders vs non-responders.

* Use confusion matrix to visualize correct and incorrect predictions.

8. Real-world Considerations

* High recall might be important to capture as many potential customers as possible.

* High precision reduces wasted marketing cost by avoiding false positives.

* Monitor the model on new campaigns and retrain periodically with updated data.
