**Theoretical**

In [None]:
### Q.1) What is Logistic Regression, and how does it differ from Linear Regression?

In [None]:
ans) Logistic Regression is a statistical method used for binary classification. Unlike Linear Regression, which predicts continuous values, Logistic Regression predicts probabilities that are mapped to binary outcomes using the sigmoid function.



In [None]:
### Q.2) What is the mathematical equation of Logistic Regression?

In [None]:
ans)  Equation of Logistic Regression:
                    The logistic regression equation consists of two main parts:

The linear predictor (z):
z = β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ
where:

β₀ is the intercept
β₁ to βₙ are the coefficients
x₁ to xₙ are the input features


The logistic (sigmoid) function:
P(y=1|x) = 1 / (1 + e⁻ᶻ)
where:

P(y=1|x) is the probability of the positive class given inputs x
e is Euler's number (approximately 2.71828)
z is the linear predictor from step 1



Combining these, the complete equation is:
P(y=1|x) = 1 / (1 + e⁻⁽β₀ + β₁x₁ + β₂x₂ + ... + βₙxₙ⁾)

In [None]:
### Q.3) Why do we use the Sigmoid function in Logistic Regression?

In [None]:
ans) The sigmoid function maps real-valued inputs to a range between 0 and 1, making it suitable for probability estimation in binary classification.

In [None]:
### Q.4) What is the cost function of Logistic Regression?

In [None]:
ans) The cost function for logistic regression is called the "Binary Cross-Entropy" or "Log Loss" function. For a single training example, it is:
J(θ) = -[y log(h_θ(x)) + (1-y)log(1-h_θ(x))]
where:

J(θ) is the cost function
y is the actual label (0 or 1)
h_θ(x) is the predicted probability P(y=1|x)
log is the natural logarithm

For an entire training set of m examples, the cost function becomes:
J(θ) = -(1/m) ∑[y⁽ⁱ⁾log(h_θ(x⁽ⁱ⁾)) + (1-y⁽ⁱ⁾)log(1-h_θ(x⁽ⁱ⁾))]
where:

m is the number of training examples
i represents each training example

In [None]:
### Q.5) What is Regularization in Logistic Regression? Why is it needed?

In [None]:
ans) Regularization prevents overfitting by adding a penalty to the loss function. It helps in controlling the model complexity and improving generalization.

In [None]:
### Q.6) Explain the difference between Lasso, Ridge, and Elastic Net regression.

In [None]:
ans) 1)  Lasso (L1 Regularization): Shrinks coefficients and performs feature selection.
2) Ridge (L2 Regularization): Shrinks coefficients but does not eliminate any.
3) Elastic Net: A combination of L1 and L2 regularization.


In [None]:
### Q.7) When should we use Elastic Net instead of Lasso or Ridge?

In [None]:
ans) Elastic Net is used when there are highly correlated features because it combines the benefits of Lasso (feature selection) and Ridge (coefficient shrinkage).



In [None]:
### Q.8) What is the impact of the regularization parameter (λ) in Logistic Regression?

In [None]:
ans) The regularization parameter (λ) in logistic regression controls the strength of regularization. Higher values of λ lead to stronger regularization.

In [None]:
### Q.9) What are the key assumptions of Logistic Regression?

In [None]:
ans) 1) The dependent variable is binary.
2) The independent variables have little multicollinearity.
3) The independent variables have a linear relationship with the log-odds.


In [None]:
### Q.10) What are some alternatives to Logistic Regression for classification tasks?

In [None]:
ans) 1) Decision Trees
2) Support Vector Machines (SVM)
3) Random Forest
4) Neural Networks
5) Naive Bayes

In [None]:
### Q.11) What are Classification Evaluation Metrics?

In [None]:
ans) 1) Accuracy
2) Precision
3) Recall
4) F1-score
5) ROC-AUC

In [None]:
### Q.12) How does class imbalance affect Logistic Regression?

In [None]:
ans) It can cause the model to favor the majority class, leading to poor performance. Techniques like oversampling, undersampling, or using weighted loss functions can help.

In [None]:
### Q.13) What is Hyperparameter Tuning in Logistic Regression?

In [None]:
ans) Hyperparameter tuning involves optimizing parameters like C (inverse of regularization strength), solver, and penalty (L1, L2, Elastic Net) to improve model

In [None]:
### Q.14) What are different solvers in Logistic Regression? Which one should be used?

In [None]:
ans) 1) liblinear: Works well for small datasets and supports L1/L2 regularization.
2) saga: Suitable for large datasets and supports all types of regularization.
3) newton-cg, lbfgs: Suitable for larger datasets and only supports L2 regularization.

In [None]:
### Q.15) How is Logistic Regression extended for multiclass classification?

In [None]:
ans)1)  One-vs-Rest (OvR): Trains multiple binary classifiers.
2) Softmax Regression: Uses a generalized logistic function to assign probabilities across multiple classes.

In [None]:
### Q.16) What are the advantages and disadvantages of Logistic Regression?


In [None]:
ans) Advantages:

Easy to implement and interpret.
Works well for linearly separable data.
Disadvantages:
Struggles with non-linearly separable data.
Sensitive to outliers.

In [None]:
### Q.17) What are some use cases of Logistic Regression?

In [None]:
ans) 1) Spam detection
2) Disease diagnosis
3) Customer churn prediction
4) Credit scoring

In [None]:
### Q.18) What is the difference between Softmax Regression and Logistic Regression?

In [None]:
ans) 1) Logistic Regression is for binary classification, while Softmax Regression is for multiclass classification.
2) Softmax assigns probabilities to multiple classes summing to 1.

In [None]:
### Q.19) How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification?

In [None]:
ans)1)  OvR: Works well for small datasets and is easier to train.
2)  Softmax: Preferred when performance and probabilistic interpretation are required.

In [None]:
### Q.20) How do we interpret coefficients in Logistic Regression?

In [None]:
ans) The coefficients represent the log-odds change for a unit increase in the predictor variable. Exponentiating the coefficients gives the odds ratio.



**Practical**

In [None]:
### Q.1)  Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic Regression, and prints the model accuracy.

In [None]:
ans)  from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load dataset
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Train Logistic Regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.2) Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1') and print the model accuracy.

In [None]:
ans) model = LogisticRegression(penalty='l1', solver='liblinear', max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("L1 Regularization Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.3) Write a Python program to train Logistic Regression with L2 regularization (Ridge) using LogisticRegression(penalty='l2'). Print model accuracy and coefficients.

In [None]:
ans)  model = LogisticRegression(penalty='l2', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("L2 Regularization Accuracy:", accuracy_score(y_test, y_pred))

# Print coefficients
print("Coefficients:", model.coef_)


In [None]:
### Q.4) Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet').

In [None]:
ans) model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Elastic Net Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.5)   Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr'.

In [None]:
ans)  model = LogisticRegression(multi_class='ovr', solver='lbfgs', max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Multiclass (OvR) Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.6) Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic Regression. Print the best parameters and accuracy.

In [None]:
ans) from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

grid = GridSearchCV(LogisticRegression(max_iter=200), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best parameters:", grid.best_params_)
print("Best accuracy:", grid.best_score_)


In [None]:
### Q.7) Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the average accuracy.

In [None]:
ans) from sklearn.model_selection import StratifiedKFold, cross_val_score

skf = StratifiedKFold(n_splits=5)
model = LogisticRegression(max_iter=200)
scores = cross_val_score(model, X_train, y_train, cv=skf)

print("Average accuracy:", scores.mean())


In [None]:
### Q.8) Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its accuracy.

In [None]:
ans) import pandas as pd

# Load dataset
df = pd.read_csv("dataset.csv")

# Assuming the last column is the target variable
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Logistic Regression model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("CSV Dataset Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.9) Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in Logistic Regression. Print the best parameters and accuracy.

In [None]:
ans) from sklearn.model_selection import RandomizedSearchCV
import numpy as np

param_dist = {
    'C': np.logspace(-3, 3, 10),
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

random_search = RandomizedSearchCV(LogisticRegression(max_iter=200), param_distributions=param_dist, cv=5, n_iter=10, random_state=42)
random_search.fit(X_train, y_train)

print("Best parameters:", random_search.best_params_)
print("Best accuracy:", random_search.best_score_)


In [None]:
### Q.10) Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy.

In [None]:
ans) from sklearn.multiclass import OneVsOneClassifier

model = OneVsOneClassifier(LogisticRegression(max_iter=200))
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("One-vs-One (OvO) Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.11) Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binary classification.

In [None]:
ans) from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Train Logistic Regression
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Predict and compute confusion matrix
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

# Plot confusion matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()


In [None]:
### Q.12) Write a Python program to train a Logistic Regression model and evaluate its performance using Precision, Recall, and F1-Score.

In [None]:
ans) from sklearn.metrics import precision_score, recall_score, f1_score

y_pred = model.predict(X_test)

print("Precision:", precision_score(y_test, y_pred, average='weighted'))
print("Recall:", recall_score(y_test, y_pred, average='weighted'))
print("F1-Score:", f1_score(y_test, y_pred, average='weighted'))


In [None]:
### Q.13) Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to improve model performance.

In [None]:
ans) model = LogisticRegression(class_weight='balanced', max_iter=200)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
### Q.14)  Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and evaluate performance.

In [None]:
ans) import pandas as pd
from sklearn.impute import SimpleImputer

df = pd.read_csv("titanic.csv")

# Handling missing values
imputer = SimpleImputer(strategy='mean')
df.fillna(df.mean(), inplace=True)

X = df.drop(columns=['Survived'])
y = df['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

print("Titanic Dataset Accuracy:", accuracy_score(y_test, model.predict(X_test)))


In [None]:
### Q.15) Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression model. Evaluate its accuracy and compare results with and without scaling.

In [None]:
ans) from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression(max_iter=200)
model.fit(X_train_scaled, y_train)

print("Standardized Data Accuracy:", accuracy_score(y_test, model.predict(X_test_scaled)))


In [None]:
### Q.16)  Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score.

In [None]:
ans) from sklearn.metrics import roc_auc_score

y_prob = model.predict_proba(X_test)[:, 1]  # Get probability scores
print("ROC-AUC Score:", roc_auc_score(y_test, y_prob))


In [None]:
### Q.17) Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate accuracy.

In [None]:
ans) model = LogisticRegression(C=0.5, max_iter=200)
model.fit(X_train, y_train)

print("Custom Learning Rate Accuracy:", accuracy_score(y_test, model.predict(X_test)))


In [None]:
### Q.18) Write a Python program to train Logistic Regression and identify important features based on model coefficients.

In [None]:
ans) import numpy as np

model.fit(X_train, y_train)
feature_importance = np.abs(model.coef_)

print("Feature Importance:", feature_importance)


In [None]:
### Q.19) Write a Python program to train Logistic Regression and evaluate its performance using Cohen's Kappa Score.

In [None]:
ans) from sklearn.metrics import cohen_kappa_score

print("Cohen's Kappa Score:", cohen_kappa_score(y_test, y_pred))


In [None]:
### Q.20) Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary classification.

In [None]:
ans) from sklearn.metrics import precision_recall_curve

precision, recall, _ = precision_recall_curve(y_test, y_prob)

plt.plot(recall, precision, marker='.')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()


In [None]:
### Q.21) Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare their accuracy.

In [None]:
ans) solvers = ['liblinear', 'saga', 'lbfgs']
for solver in solvers:
    model = LogisticRegression(solver=solver, max_iter=200)
    model.fit(X_train, y_train)
    print(f"Accuracy with {solver} solver:", accuracy_score(y_test, model.predict(X_test)))


In [None]:
### Q.22) Write a Python program to train Logistic Regression and evaluate its performance using Matthews Correlation Coefficient (MCC).

In [None]:
ans) from sklearn.metrics import matthews_corrcoef

print("Matthews Correlation Coefficient (MCC):", matthews_corrcoef(y_test, y_pred))


In [None]:
### Q.23) Write a Python program to train Logistic Regression on both raw and standardized data. Compare their accuracy to see the impact of feature scaling.

In [None]:
ans) model.fit(X_train, y_train)
accuracy_raw = accuracy_score(y_test, model.predict(X_test))

model.fit(X_train_scaled, y_train)
accuracy_scaled = accuracy_score(y_test, model.predict(X_test_scaled))

print("Raw Data Accuracy:", accuracy_raw)
print("Standardized Data Accuracy:", accuracy_scaled)


In [None]:
### Q.24) Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using cross-validation.

In [None]:
ans) from sklearn.model_selection import cross_val_score

C_values = [0.01, 0.1, 1, 10]
for C in C_values:
    model = LogisticRegression(C=C, max_iter=200)
    scores = cross_val_score(model, X_train, y_train, cv=5)
    print(f"C={C}, Average Accuracy: {scores.mean()}")


In [None]:
### Q.25) Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to make predictions.

In [None]:
ans) import joblib

joblib.dump(model, "logistic_regression_model.pkl")
loaded_model = joblib.load("logistic_regression_model.pkl")

y_pred_loaded = loaded_model.predict(X_test)
print("Loaded Model Accuracy:", accuracy_score(y_test, y_pred_loaded))
