## Theoretical

1. What is Logistic Regression, and how does it differ from Linear Regression.

* Logistic Regression:  
  Logistic Regression is a statistical method used to predict a binary outcome (e.g., yes/no, 0/1, true/false) based on a set of independent variables. It is a type of regression analysis that models the probability of a binary dependent variable. 

* Linear Regression:   
  Linear Regression is a statistical method used to predict a continuous outcome (e.g., height, weight, salary) based on a set of independent variables. It is a type of regression analysis that models the relationship between the dependent variable and the independent variables using a linear equation.


2. What is the mathematical equation of Logistic Regression.
* Equation of Logistic Regression:  
For a given input X (a set of independent variables), Logistic Regression models the probability P(Y=1∣X) using the sigmoid function:  

P(Y=1∣X)=1 /  1+e−(β0+β1X1+β2X2+...+β nXn)1
​
where:
- P(Y=1∣X) is the probability that the dependent variable 𝑌 is 1 given input 𝑋
- 𝛽,𝛽1,..,𝛽𝑛β,β1,...,βn are the regression coefficients (weights).
- X1,X2 ,...,Xn  are the independent variables (features).
- e is Euler’s number (≈ 2.718).

3. Why do we use the Sigmoid function in Logistic Regression.

The Sigmoid function is used in Logistic Regression because it squashes the real-valued linear equation output to a range between 0 and 1. This is essential because the logistic function outputs a probability, which is a meaningful and interpretable measure of the likelihood of an event occurring. Additionally, the sigmoid function has a derivative that makes it suitable for gradient descent-based optimization algorithms, such as Logistic Regression.

4. What is the cost function of Logistic Regression.

The cost function for Logistic Regression is the cross-entropy loss function, which measures the difference between the predicted probabilities and the actual binary outcomes. The cross-entropy loss function is defined as:

* Intuition Behind the Cost Function  
- If the actual label is **1** and the predicted probability is **high (close to 1)**, the cost is **low**.
- If the actual label is **1** but the predicted probability is **low (close to 0)**, the cost is **high**.
- Similarly, for **y = 0**, if the predicted probability is **close to 0**, the cost is **low**, and if it’s **close to 1**, the cost is **high**.

5. What is Regularization in Logistic Regression? Why is it needed.

Regularization is a technique used to prevent overfitting in Logistic Regression by adding a penalty term to the loss function. It helps to reduce the complexity of the model and improve its generalization ability by reducing the risk of overfitting. Regularization is needed because Logistic Regression is a linear model, which can lead to high variance and poor generalization performance.

6. Explain the difference between Lasso, Ridge, and Elastic Net regression.

* Lasso Regression:
  Lasso Regression adds a penalty term called L1 regularization to the loss function. It minimizes the loss function by penalizing the sum of absolute values of the regression coefficients (β) to encourage sparsity (i.e., many regression coefficients becoming zero). This can help to reduce the complexity of the model and improve its interpretability.
  
* Ridge Regression:
  Ridge Regression adds a penalty term called L2 regularization to the loss function. It minimizes the loss function by penalizing the sum of squares of the regression coefficients (β) to encourage smaller values of the regression coefficients. This can help to reduce the complexity of the model and improve its generalization performance.

* Elastic Net Regression:
  Elastic Net Regression combines the L1 and L2 regularization techniques to encourage sparsity and balance the trade-off between minimizing the loss function and minimizing the complexity of the model. It uses a combination of the L1 and L2 regularization terms to achieve this balance. This can help to reduce the complexity of the model and improve its generalization performance.

7. When should we use Elastic Net instead of Lasso or Ridge.

Elastic Net should be used when the dataset contains many features and the model needs to handle high-dimensional data. It can help to balance the trade-off between minimizing the loss function and minimizing the complexity of the model by combining the L1 and L2 regularization terms. It can be more effective than Lasso and Ridge when the dataset is sparse or when the number of features is much larger than the number of observations.

8. What is the impact of the regularization parameter (λ) in Logistic Regression.

The regularization parameter (λ) in Logistic Regression determines the amount of penalty applied to the regression coefficients. A larger value of λ encourages sparsity (i.e., many regression coefficients becoming zero) and reduces the complexity of the model. A smaller value of λ allows the model to fit the data more closely and improves its generalization performance. The optimal value of λ depends on the specific dataset and the problem at hand.

9. What are the key assumptions of Logistic Regression.

* The dependent variable is binary (i.e., it takes only two values, 0 or 1).
* The independent variables are independent of each other (i.e., they do not have a strong correlation).
* The independent variables are normally distributed.
* The dependent variable and the independent variables are linearly related.
* There is no multicollinearity (i.e., the independent variables are not strongly correlated).

10. What are some alternatives to Logistic Regression for classification tasks.

* Linear Discriminant Analysis (LDA): LDA is a popular classification method that assumes that the dependent variable is normally distributed and the independent variables are linearly related. It is used when the dataset is multivariate and the dependent variable is categorical.
* K-Nearest Neighbors (KNN): KNN is a classification method that uses the majority vote of the k nearest neighbors to make predictions. It is effective for classification tasks when the dataset is small and the independent variables are continuous.
* Decision Trees: Decision Trees are a classification method that uses a tree-like structure to make predictions. They are effective for classification tasks when the dataset is small and the independent variables are categorical.
* Support Vector Machines (SVM): SVM is a classification method that uses a hyperplane to separate the data points into two classes. It is effective for classification tasks when the dataset is small and the independent variables are continuous.


11. What are Classification Evaluation Metrics.

* Accuracy: The proportion of correct predictions made by the model.
* Precision: The proportion of true positives among the predicted positives.
* Recall: The proportion of true positives among the actual positives.
* F1 Score: The harmonic mean of precision and recall, which is a more balanced measure of the model's performance.
* Confusion Matrix: A table that shows the number of true positives, true negatives, false positives, and false negatives for each class.
* ROC Curve: A plot that shows the trade-off between the true positive rate (TPR) and the false positive rate (FPR) for different classification thresholds

12. How does class imbalance affect Logistic Regression.

Class imbalance can affect Logistic Regression by making it more difficult for the model to learn the correct class proportions. This can lead to biased predictions and inaccurate performance metrics. To address class imbalance, we can use techniques such as oversampling the minority class, undersampling the majority class, or using class weights to give more importance to the minority class.


13. What is Hyperparameter Tuning in Logistic Regression.

Hyperparameter tuning is the process of finding the optimal values for the hyperparameters (e.g., regularization parameter λ, number of neighbors k in KNN, etc.) in Logistic Regression to improve its performance. Hyperparameter tuning can be done using techniques such as grid search, random search, or Bayesian optimization.

14. What are different solvers in Logistic Regression? Which one should be used.

* 'liblinear': This solver is used for small datasets and is faster than other solvers.
* 'lbfgs': This solver is used for larger datasets and is slower than 'liblinear' but can handle more complex problems.
* 'newton-cg': This solver is used for larger datasets and is slower than 'liblinear' but can handle more complex problems. 
* ' sag': This solver is used for large datasets and is faster than 'liblinear' and 'newton-cg' solvers but can handle more complex problems.
* 'saga': This solver is used for large datasets and is faster than 'liblinear', 'newton-cg', and 'lbfgs' solvers but can handle more complex problems.

The choice of solver depends on the specific dataset, problem size, and complexity of the problem. It is recommended to experiment with different solvers and choose the one that provides the best performance for your specific use case.

15. How is Logistic Regression extended for multiclass classification.

Logistic Regression can be extended for multiclass classification using techniques such as one-vs-rest (OvR) or softmax. In OvR, the model is trained for each class against all other classes, and the class with the highest predicted probability is chosen as the predicted class. In softmax, the model is trained for each class against all other classes, and the predicted probabilities are calculated using the softmax function.

16. What are the advantages and disadvantages of Logistic Regression.

Advantages of Logistic Regression:
    * Simple and easy to understand.
    * Can handle both binary and multiclass classification problems.
    * Can handle missing values and outliers automatically.
    * Can handle regularization automatically.
    
Disadvantages of Logistic Regression:
    * Sensitive to outliers and missing values.
    * Does not provide probability estimates for each class.
    * Does not handle non-linear relationships between the independent variables and the dependent variable.
    * Does not handle feature scaling automatically.

17. What are some use cases of Logistic Regression.

Logistic Regression can be used for binary classification tasks, such as predicting whether a customer will subscribe to a marketing campaign or not. It can also be used for multiclass classification tasks, such as predicting the species of an iris flower based on its features. Logistic Regression can be used in combination with other classification algorithms, such as Naive Bayes, Decision Trees, and Support Vector Machines, to improve the overall performance of the model.

18. What is the difference between Softmax Regression and Logistic Regression.

Softmax Regression is a generalization of Logistic Regression that can handle multiclass classification problems. In Softmax Regression, the predicted probabilities for each class are calculated using the softmax function, which ensures that the probabilities sum up to 1 for each instance. Softmax Regression can be used when the dependent variable has more than two categories.

Logistic Regression is simpler and easier to understand than Softmax Regression, but it may not be as effective for multiclass classification problems. In Logistic Regression, the predicted probabilities for each class are calculated using the logistic function, which can be difficult to interpret.

19. How do we choose between One-vs-Rest (OvR) and Softmax for multiclass classification.

OvR is a method that trains multiple binary classifiers, one for each class, and uses the predicted probabilities from the classifiers to make the final prediction. Softmax is a method that trains a single classifier for each class, and uses the predicted probabilities from the classifier to make the final prediction.

OvR can be more efficient and easier to interpret compared to Softmax, as it does not require calculating the predicted probabilities for each class. However, it may not be as effective for multiclass classification problems with many categories, as it may not be able to capture the relationship between the independent variables and the dependent variable for each class.

Softmax can be more effective for multiclass classification problems with many categories, as it can capture the relationship between the independent variables and the dependent variable for each class. However, it may be more difficult to interpret compared to OvR, as it requires calculating the predicted probabilities for each class.

It is recommended to experiment with both OvR and Softmax for multiclass classification and choose the method that provides the best performance for your specific use case.

20. How do we interpret coefficients in Logistic Regression?

Coefficients in Logistic Regression represent the change in the log odds of the dependent variable for a one-unit increase in the corresponding independent variable. To interpret the coefficients, we can use the following formula:

Log odds = Coefficient * Independent Variable Value

For example, if the coefficient of a predictor variable is 0.5, and the predictor variable has a value of 1, then the log odds of the dependent variable for that instance increases by 0.5. To convert the log odds back to a probability, we can use the following formula:

Probability = 1 / (1 + e^-Log Odds)

By interpreting the coefficients, we can gain insights into the relationship between the independent variables and the dependent variable, as well as the impact of each independent variable on the dependent variable.



## Practical

In [None]:
import numpy as  np 
import  pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt
from scipy.stats import uniform
from collections import Counter
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.multiclass import OneVsOneClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV ,StratifiedKFold, cross_val_score , RandomizedSearchCV
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, classification_report,roc_auc_score, roc_curve, auc ,  cohen_kappa_score


import warnings
warnings.filterwarnings("ignore")

In [None]:
# 1. Write a Python program that loads a dataset, splits it into training and testing sets, applies Logistic Regression, and prints the model accuracy

iris = load_iris()

X = iris.data  
y = (iris.target == 0).astype(int)  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f}")

In [None]:
# 2.  Write a Python program to apply L1 regularization (Lasso) on a dataset using LogisticRegression(penalty='l1') and print the model accuracy

model = LogisticRegression(penalty='l1', solver='liblinear', C=1.0) 
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy with L1 Regularization: {accuracy:.4f}")
print("Feature Coefficients:", model.coef_)

In [None]:
# 3. Write a Python program to train Logistic Regression with L2 regularization (Ridge) using LogisticRegression(penalty='l2'). Print model accuracy and coefficients

model = LogisticRegression(penalty='l2') 
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy with L2 Regularization: {accuracy:.4f}")
print("Feature Coefficients:", model.coef_)

In [None]:
# 4. Write a Python program to train Logistic Regression with Elastic Net Regularization (penalty='elasticnet')

model = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=0.5, C=1.0)  

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy with Elastic Net Regularization: {accuracy:.4f}")
print("Feature Coefficients:", model.coef_)

In [None]:
# 5. Write a Python program to train a Logistic Regression model for multiclass classification using multi_class='ovr'C

model = LogisticRegression(multi_class='ovr', solver='liblinear')  

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy (Multiclass with OvR): {accuracy:.4f}")
print("Feature Coefficients (per class):", model.coef_)  

In [None]:
# 6. Write a Python program to apply GridSearchCV to tune the hyperparameters (C and penalty) of Logistic Regression. Print the best parameters and accuracy

model = LogisticRegression(solver='saga', multi_class='ovr', max_iter=500)  

param_grid = {
    'C': [0.01, 0.1, 1, 10, 100],  
    'penalty': ['l1', 'l2'] 
}

grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)


best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Best Hyperparameters:", grid_search.best_params_)
print(f"Best Model Accuracy: {accuracy:.4f}")

In [None]:
# 7. C Write a Python program to evaluate Logistic Regression using Stratified K-Fold Cross-Validation. Print the average accuracyC

model = LogisticRegression(multi_class='ovr', solver='liblinear')  

kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)  # 5-fold cross-validation

scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy')

print(f"Accuracy for each fold: {scores}")
print(f"Average Accuracy: {np.mean(scores):.4f}")


In [None]:
# 8. Write a Python program to load a dataset from a CSV file, apply Logistic Regression, and evaluate its accuracy.

df = sns.load_dataset("titanic")

df = df.dropna()

df['sex'] = df['sex'].map({'male': 0, 'female': 1})
df["embarked"]= df["embarked"].map({'S':1, 'C':2, 'Q':3})
df["class"] = df["class"].map({'Third':3, 'First':1, 'Second':2})

X = df.drop([ 'who', 'adult_male', 'deck', 'embark_town','alive', 'alone'] , axis=1)
y = df["survived"] 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")


In [None]:
# 9. Write a Python program to apply RandomizedSearchCV for tuning hyperparameters (C, penalty, solver) in Logistic Regression. Print the best parameters and accuracyM
 
model = LogisticRegression(multi_class='ovr', max_iter=500)

param_dist = {
    'C': uniform(0.01, 10),  
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear', 'saga'] 
}
random_search = RandomizedSearchCV(model, param_dist, n_iter=10, cv=5, random_state=1 )
random_search.fit(X_train, y_train)

best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Best Hyperparameters:", random_search.best_params_)
print(f"Best Model Accuracy: {accuracy:.4f}")

In [None]:
# 10. Write a Python program to implement One-vs-One (OvO) Multiclass Logistic Regression and print accuracy

model = OneVsOneClassifier(LogisticRegression(solver='liblinear'))  

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy (Multiclass with OvO): {accuracy:.4f}")

In [None]:
# 11. Write a Python program to train a Logistic Regression model and visualize the confusion matrix for binary classification

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)


plt.figure(figsize=(5, 4))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=["Class 0", "Class 1"], yticklabels=["Class 0", "Class 1"])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title(f"Confusion Matrix (Accuracy: {accuracy:.4f})")
plt.show()


In [None]:
# 12. Write a Python program to train a Logistic Regression model and evaluate its performance using Precision, Recall, and F1-ScoreM

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


In [None]:
# 13 Write a Python program to train a Logistic Regression model on imbalanced data and apply class weights to improve model performan

df = sns.load_dataset("titanic")

print("Class distribution:", Counter(df.iloc[:, -1]))
weights = {0: 1, 1: 10}  

model = LogisticRegression(class_weight='balanced', solver='liblinear')  

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

In [None]:
# 14. Write a Python program to train Logistic Regression on the Titanic dataset, handle missing values, and evaluate performance

# Load the Titanic dataset
data_url = "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv"
df = pd.read_csv(data_url)

# Select relevant features
features = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']
df = df[features + ['Survived']]

# Handle missing values
imputer = SimpleImputer(strategy='most_frequent')
df[['Age', 'Embarked']] = imputer.fit_transform(df[['Age', 'Embarked']])

# Convert categorical variables into numerical
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)

# Split data into training and testing sets
X = df.drop(columns=['Survived'])
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize numerical features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

# Print results
print(f"Accuracy: {accuracy:.4f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

In [None]:
# 15 Write a Python program to apply feature scaling (Standardization) before training a Logistic Regression model. Evaluate its accuracy and compare results with and without scaling

df = sns.load_dataset("titanic")

df = df[["pclass", "sex", "age", "fare", "survived"]]

df["age"].fillna(df["age"].median(), inplace=True)

df["sex"] = df["sex"].map({"male": 1, "female": 0})

X = df.drop(columns=["survived"])  
y = df["survived"]  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

model_no_scaling = LogisticRegression()
model_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = model_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


model_scaled = LogisticRegression()
model_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = model_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

print(f"Accuracy WITHOUT Scaling: {accuracy_no_scaling:.4f}")
print(f"Accuracy WITH Scaling: {accuracy_scaled:.4f}")


In [None]:
# 16. Write a Python program to train Logistic Regression and evaluate its performance using ROC-AUC score

df = sns.load_dataset("titanic")

df = df[["pclass", "sex", "age", "fare", "survived"]]

df["age"].fillna(df["age"].median(), inplace=True)

df["sex"] = df["sex"].map({"male": 1, "female": 0})

X = df.drop(columns=["survived"])  
y = df["survived"]  

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train_scaled, y_train)


y_prob = model.predict_proba(X_test_scaled)[:, 1]  


roc_auc = roc_auc_score(y_test, y_prob)
print(f"ROC-AUC Score: {roc_auc:.4f}")

fpr, tpr, _ = roc_curve(y_test, y_prob)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color="blue", label=f"ROC Curve (AUC = {roc_auc:.4f})")
plt.plot([0, 1], [0, 1], color="red", linestyle="--") 
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver Operating Characteristic (ROC) Curve")
plt.legend()
plt.show()

In [None]:
# 17 Write a Python program to train Logistic Regression using a custom learning rate (C=0.5) and evaluate accuracy

model = LogisticRegression(C=0.5, random_state=42)
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy with C=0.5: {accuracy:.4f}")

In [None]:
# 18. Write a Python program to train Logistic Regression and identify important features based on model coefficients

feature_importance = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": model.coef_[0]  
})

feature_importance["Abs_Coefficient"] = feature_importance["Coefficient"].abs()
feature_importance = feature_importance.sort_values(by="Abs_Coefficient", ascending=False)

print(feature_importance[["Feature", "Coefficient"]])


plt.barh(feature_importance["Feature"], feature_importance["Coefficient"], color="skyblue")
plt.xlabel("Coefficient Value")
plt.ylabel("Feature")
plt.title("Feature Importance in Logistic Regression")
plt.axvline(x=0, color="red", linestyle="--")  
plt.gca().invert_yaxis()  
plt.show()

In [None]:
# 19. Write a Python program to train Logistic Regression and evaluate its performance using Cohen’s Kappa Score

accuracy = accuracy_score(y_test, y_pred)
kappa_score = cohen_kappa_score(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")
print(f"Cohen’s Kappa Score: {kappa_score:.4f}")

In [None]:
# 20. Write a Python program to train Logistic Regression and visualize the Precision-Recall Curve for binary classification

from sklearn.metrics import precision_recall_curve, auc

precision, recall, _ = precision_recall_curve(y_test, y_prob)

pr_auc = auc(recall, precision)

plt.figure(figsize=(8, 6))
plt.plot(recall, precision, color="blue", label=f"PR Curve (AUC = {pr_auc:.4f})")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.legend()
plt.grid()
plt.show()

In [None]:
# 21. Write a Python program to train Logistic Regression with different solvers (liblinear, saga, lbfgs) and compare their accuracy

solvers = ["liblinear", "saga", "lbfgs"]
results = {}

for solver in solvers:
    model = LogisticRegression(solver=solver, random_state=42, max_iter=500)
    model.fit(X_train_scaled, y_train)
    y_pred = model.predict(X_test_scaled)
    
  
    accuracy = accuracy_score(y_test, y_pred)
    results[solver] = accuracy

print("Logistic Regression Solver Comparison:")
for solver, acc in results.items():
    print(f"Solver: {solver}, Accuracy: {acc:.4f}")

In [None]:
# 22. M Write a Python program to train Logistic Regression and evaluate its performance using Matthews Correlation Coefficient (MCC)

from sklearn.metrics import matthews_corrcoef

accuracy = accuracy_score(y_test, y_pred)
mcc = matthews_corrcoef(y_test, y_pred)

print(f"Model Accuracy: {accuracy:.4f}")
print(f"Matthews Correlation Coefficient (MCC): {mcc:.4f}")

In [None]:
# 23. Write a Python program to train Logistic Regression on both raw and standardized data. Compare their accuracy to see the impact of feature scalingM

model_raw = LogisticRegression()
model_raw.fit(X_train, y_train)
y_pred_raw = model_raw.predict(X_test)
accuracy_raw = accuracy_score(y_test, y_pred_raw)

y_pred_scaled = model_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

print("Logistic Regression Accuracy Comparison:")
print(f"Without Scaling: {accuracy_raw:.4f}")
print(f"With Scaling (Standardized Data): {accuracy_scaled:.4f}")

In [None]:
# 24. Write a Python program to train Logistic Regression and find the optimal C (regularization strength) using cross-validation

model = LogisticRegression(max_iter=500, solver="liblinear")

param_grid = {"C": np.logspace(-4, 4, 10)}  

grid_search = GridSearchCV(model, param_grid, cv=5, scoring="accuracy", n_jobs=-1)
grid_search.fit(X_train_scaled, y_train)

best_C = grid_search.best_params_["C"]
best_model = LogisticRegression(C=best_C, max_iter=500, solver="liblinear")
best_model.fit(X_train_scaled, y_train)

y_pred = best_model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)

print(f"Best C Value: {best_C}")
print(f"Final Model Accuracy: {accuracy:.4f}")


In [None]:
# 25. Write a Python program to train Logistic Regression, save the trained model using joblib, and load it again to make predictions.


import joblib  # Import joblib for saving/loading models

model = LogisticRegression()
model.fit(X_train_scaled, y_train)

joblib.dump(model, "logistic_regression_model.pkl")

loaded_model = joblib.load("logistic_regression_model.pkl")

y_pred = loaded_model.predict(X_test_scaled)

accuracy = accuracy_score(y_test, y_pred)

print(f"Loaded Model Accuracy: {accuracy:.4f}")