<a href="https://colab.research.google.com/github/kanika0216/python-Basics/blob/main/Logistic_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Theoretical**

### 1. What is Logistic Regression, and how does it differ from Linear Regression?  
Logistic Regression is a classification algorithm that predicts probabilities using the Sigmoid function, whereas Linear Regression is used for continuous value prediction.  

### 2. What is the mathematical equation of Logistic Regression?  
The equation is:  
\[
P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \dots + \beta_nX_n)}}
\]  

### 3. Why do we use the Sigmoid function in Logistic Regression?  
The Sigmoid function maps any real number into a probability range (0,1), making it suitable for binary classification.  

### 4. What is the cost function of Logistic Regression?  
Logistic Regression uses Log Loss (Cross-Entropy Loss), which is:  
\[
J(\theta) = -\frac{1}{m} \sum [y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})]
\]  

### 5. What is Regularization in Logistic Regression? Why is it needed?  
Regularization (L1/L2) prevents overfitting by penalizing large coefficients, improving generalization.  

### 6. Explain the difference between Lasso, Ridge, and Elastic Net regression.  
- **Lasso (L1)**: Shrinks some coefficients to zero (feature selection).  
- **Ridge (L2)**: Shrinks coefficients but does not set them to zero.  
- **Elastic Net**: Combines both L1 and L2 penalties.  

### 7. When should we use Elastic Net instead of Lasso or Ridge?  
Elastic Net is preferred when features are highly correlated, as it balances feature selection (Lasso) and coefficient shrinkage (Ridge).  

### 8. What is the impact of the regularization parameter (λ) in Logistic Regression?  
A higher λ increases regularization, reducing overfitting but potentially underfitting; a lower λ reduces regularization, increasing overfitting risk.  

### 9. What are the key assumptions of Logistic Regression?  
- The dependent variable is binary.  
- No multicollinearity among independent variables.  
- The relationship between predictors and log-odds is linear.  

### 10. What are some alternatives to Logistic Regression for classification tasks?  
Alternatives include Decision Trees, Random Forest, SVM, Naïve Bayes, and Neural Networks.  

### 11. What are Classification Evaluation Metrics?  
Common metrics include Accuracy, Precision, Recall, F1-score, ROC-AUC, and Log Loss.  

### 12. How does class imbalance affect Logistic Regression?  
Class imbalance skews predictions toward the majority class, reducing model performance. Solutions include resampling and class weighting.  

### 13. What is Hyperparameter Tuning in Logistic Regression?  
It involves optimizing parameters like the regularization strength (C or λ) and solver choice to improve model performance.  

### 14. What are different solvers in Logistic Regression? Which one should be used?  
Solvers include **liblinear** (small datasets), **lbfgs & newton-cg** (multiclass), and **saga** (large datasets, L1 penalty).  

### 15. How is Logistic Regression extended for multiclass classification?  
It uses **One-vs-Rest (OvR)** or **Softmax Regression** to handle multiple classes.  

### 16. What are the advantages and disadvantages of Logistic Regression?  
**Advantages**: Simple, interpretable, works well with small datasets.  
**Disadvantages**: Assumes linearity in log-odds, struggles with complex relationships.  

### 17. What are some use cases of Logistic Regression?  
It is used in spam detection, medical diagnosis, credit scoring, and customer churn prediction.  

### 18. What is the difference between Softmax Regression and Logistic Regression?  
Logistic Regression is for binary classification, while Softmax Regression extends it to multiclass classification.  

### 19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?  
OvR is preferred for high-dimensional data, while Softmax is more efficient when classes are mutually exclusive.  

### 20. How do we interpret coefficients in Logistic Regression?  
Coefficients represent log-odds changes; exponentiating them gives the odds ratio for each predictor.  

**Practical**

1. Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic Regression, and prints the model accuracy.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Logistic Regression
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")


2. Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1') and print the model accuracy.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Logistic Regression with L1 regularization
model = LogisticRegression(penalty='l1', solver='liblinear')
model.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"L1 Regularization Model Accuracy: {accuracy:.4f}")


3. Write a Python program to train Logistic Regression with L2 regularization (Ridge) using LogisticRegression(penalty='l2'). Print model accuracy and coefficients.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Logistic Regression with L2 regularization
model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"L2 Regularization Model Accuracy: {accuracy:.4f}")
print("Model Coefficients:", model.coef_)


4. Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet').

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Logistic Regression with Elastic Net regularization
model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=200)
model.fit(X_train, y_train)

print("Elastic Net Regularization Model Trained Successfully")


5. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr'.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Logistic Regression with One-vs-Rest (OvR) classification
model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Multiclass OvR Model Accuracy: {accuracy:.4f}")


6. Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic Regression. Print the best parameters and accuracy.

In [None]:
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

# Perform GridSearchCV
grid_search = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Print best parameters and accuracy
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_:.4f}")


7. Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the average accuracy.

In [2]:
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Define Stratified K-Fold Cross-Validation
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = LogisticRegression(max_iter=200)

# Perform cross-validation
scores = cross_val_score(model, X, y, cv=skf)

print(f"Average Accuracy from Stratified K-Fold CV: {scores.mean():.4f}")


Average Accuracy from Stratified K-Fold CV: 0.9667


8. Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its accuracy.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Load dataset from CSV file
df = pd.read_csv('dataset.csv')  # Replace 'dataset.csv' with the actual file path

# Assume the last column is the target variable
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Apply Logistic Regression
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"CSV Dataset Model Accuracy: {accuracy:.4f}")


FileNotFoundError: [Errno 2] No such file or directory: 'dataset.csv'

9. Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in
Logistic Regression. Print the best parameters and accuracy

In [3]:
from sklearn.model_selection import RandomizedSearchCV, train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

param_dist = {
    'C': [0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga']
}

random_search = RandomizedSearchCV(LogisticRegression(max_iter=200), param_dist, cv=5, n_iter=10, random_state=42)
random_search.fit(X_train, y_train)

print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Accuracy: {random_search.best_score_:.4f}")




Best Parameters: {'solver': 'saga', 'penalty': 'l2', 'C': 1}
Best Accuracy: 0.9750




10. Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy

In [None]:
from sklearn.multiclass import OneVsOneClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = OneVsOneClassifier(LogisticRegression(max_iter=200))
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"One-vs-One Multiclass Model Accuracy: {accuracy:.4f}")


11. Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binary
classification

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()


12. Write a Python program to train a Logistic Regression model and evaluate its performance using Precision,
Recall, and F1-Score.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


13. Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to
improve model performance.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20, weights=[0.9, 0.1], random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(class_weight='balanced', max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")


14. Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and
evaluate performance

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

df = pd.read_csv('titanic.csv')

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)
df = pd.get_dummies(df, drop_first=True)

X = df.drop(columns=['Survived'])
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")


15. Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression
model. Evaluate its accuracy and compare results with and without scaling.

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model_without_scaling = LogisticRegression(max_iter=200)
model_without_scaling.fit(X_train, y_train)
accuracy_without_scaling = accuracy_score(y_test, model_without_scaling.predict(X_test))

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train


16. Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred_prob = model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_pred_prob)

print(f"ROC-AUC Score: {roc_auc:.4f}")


17. Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate
accuracy.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(C=0.5, max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy with C=0.5: {accuracy:.4f}")


18. Write a Python program to train Logistic Regression and identify important features based on model
coefficients.

In [None]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target
feature_names = data.feature_names

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

importance = model.coef_[0]
feature_importance = pd.DataFrame({'Feature': feature_names, 'Importance': importance})
feature_importance = feature_importance.sort_values(by='Importance', ascending=False)

print(feature_importance)


19. Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa
Score.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import cohen_kappa_score
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
kappa = cohen_kappa_score(y_test, y_pred)

print(f"Cohen's Kappa Score: {kappa:.4f}")


20. Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary
classification.


In [None]:
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred_prob = model.predict_proba(X_test)[:, 1]
precision, recall, _ = precision_recall_curve(y_test, y_pred_prob)

plt.plot(recall, precision, marker='.')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()


21. Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare
their accuracy.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

solvers = ['liblinear', 'saga', 'lbfgs']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=200)
    model.fit(X_train, y_train)
    accuracy = accuracy_score(y_test, model.predict(X_test))
    print(f"Accuracy with solver {solver}: {accuracy:.4f}")


22. Write a Python program to train Logistic Regression and evaluate its performance using Matthews
Correlation Coefficient (MCC).

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import matthews_corrcoef
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
mcc = matthews_corrcoef(y_test, y_pred)

print(f"Matthews Correlation Coefficient: {mcc:.4f}")


23. Write a Python program to train Logistic Regression on both raw and standardized data. Compare their
accuracy to see the impact of feature scaling

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model_raw = LogisticRegression(max_iter=200)
model_raw.fit(X_train, y_train)
accuracy_raw = accuracy_score(y_test, model_raw.predict(X_test))

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_scaled = LogisticRegression(max_iter=200)
model_scaled.fit(X_train_scaled, y_train)
accuracy_scaled = accuracy_score(y_test, model_scaled.predict(X_test_scaled))

print(f"Accuracy without Scaling: {accuracy_raw:.4f}")
print(f"Accuracy with Scaling: {accuracy_scaled:.4f}")


24. Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using
cross-validation.

In [2]:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

C_values = [0.01, 0.1, 1, 10, 100]

for C in C_values:
    model = LogisticRegression(C=C, max_iter=200)
    scores = cross_val_score(model, X, y, cv=5)
    print(f"C={C}, Average Accuracy: {scores.mean():.4f}")


C=0.01, Average Accuracy: 0.8600
C=0.1, Average Accuracy: 0.9467
C=1, Average Accuracy: 0.9733
C=10, Average Accuracy: 0.9733
C=100, Average Accuracy: 0.9800


25. Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to
make predictions.

In [1]:
import joblib
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

joblib.dump(model, 'logistic_model.pkl')
loaded_model = joblib.load('logistic_model.pkl')

accuracy = loaded_model.score(X_test, y_test)
print(f"Loaded Model Accuracy: {accuracy:.4f}")


Loaded Model Accuracy: 1.0000
