## Logistic Regression

Logistic regression is a popular machine learning algorithm used for binary classification tasks, where the goal is to predict one of two possible outcomes. Despite its name, logistic regression is a classification algorithm rather than a regression algorithm. The underlying principle of logistic regression is based on the logistic function (also known as the sigmoid function), which maps any real-valued number to a value between 0 and 1. This makes it suitable for modeling the probability of a binary outcome.

Here's a step-by-step overview of how logistic regression works:

1. Data Preparation: Start by collecting a labeled dataset, where each data point is associated with a binary outcome (0 or 1) and a set of input features. Ensure the data is preprocessed and any necessary feature engineering steps are performed.

2. Model Training: Initialize the logistic regression model with random weights. The goal is to learn the optimal weights that minimize the error between predicted probabilities and actual labels. This is typically done using an optimization algorithm like gradient descent or its variants.

3. Feature Scaling: Optionally, perform feature scaling to normalize the input features. This step can improve the convergence of the optimization algorithm and make training more efficient.

4. Hypothesis Function: The logistic regression model applies a linear combination of the input features and weights, followed by the logistic function to obtain the predicted probability. The hypothesis function is defined as: h(x) = sigmoid(w₀ + w₁x₁ + w₂x₂ + ... + wₙxₙ). Here, h(x) is the predicted probability, w₀, w₁, ..., wₙ are the weights, and x₁, x₂, ..., xₙ are the input features.

5. Loss Function: The next step is to define a loss function that quantifies the difference between the predicted probabilities and the actual labels. In logistic regression, the most commonly used loss function is the binary cross-entropy loss, given by: L(y, h(x)) = -[y * log(h(x)) + (1 - y) * log(1 - h(x))]. Here, y represents the true label (0 or 1) and h(x) is the predicted probability.

6. Gradient Descent: To minimize the loss function, gradient descent is commonly employed. The weights are iteratively updated by taking steps in the direction of steepest descent of the loss function's gradient. This process continues until convergence is achieved or a predefined number of iterations is reached.

7. Prediction: Once the model is trained, it can be used to make predictions on new, unseen data. A common threshold of 0.5 is used to convert the predicted probabilities into binary predictions. If the predicted probability is greater than or equal to 0.5, the output is assigned the positive class label; otherwise, it is assigned the negative class label.

Logistic regression is a linear model and assumes a linear relationship between the input features and the log-odds of the binary outcome. It can be extended to handle multiclass classification problems using techniques like one-vs-rest or multinomial logistic regression.

Note that logistic regression has some assumptions and limitations, such as the linearity assumption and sensitivity to outliers. It's important to evaluate the performance of the model using appropriate metrics and consider other algorithms if the assumptions are violated or the performance is unsatisfactory.

In [1]:
from sklearn.datasets import load_iris

In [2]:
dataset = load_iris()

In [3]:
print(dataset.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

In [4]:
dataset.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [5]:
import pandas as pd
import numpy as np

In [6]:
df = pd.DataFrame(dataset.data, columns=dataset.feature_names)

In [7]:
print(dataset.target)

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]


In [8]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [9]:
df['target'] = dataset.target

In [10]:
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [11]:
df['target'].unique()

array([0, 1, 2])

In [12]:
df[df['target']!=2]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
95,5.7,3.0,4.2,1.2,1
96,5.7,2.9,4.2,1.3,1
97,6.2,2.9,4.3,1.3,1
98,5.1,2.5,3.0,1.1,1


In [13]:
df_copy = df[df['target']!=2]

In [14]:
df_copy['target'].unique()

array([0, 1])

In [15]:
## Independent and dependent features
X=df_copy.iloc[:,:-1]
y=df_copy.iloc[:,-1]

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [17]:
from sklearn.linear_model import LogisticRegression

In [18]:
classification = LogisticRegression(max_iter=200)

In [19]:
classification

LogisticRegression(max_iter=200)

In [20]:
classification.fit(X_train,y_train)

LogisticRegression(max_iter=200)

In [21]:
y_pred= classification.predict(X_test)

In [22]:
y_pred

array([1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0])

## Confusion Metrics,Accuracy,Classification Report

In [23]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

In [24]:
print(confusion_matrix(y_test,y_pred))

[[12  0]
 [ 0  8]]


In [25]:
print(accuracy_score(y_test,y_pred))

1.0


In [26]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00         8

    accuracy                           1.00        20
   macro avg       1.00      1.00      1.00        20
weighted avg       1.00      1.00      1.00        20



## Cross Validation

In [27]:
from sklearn.model_selection import KFold

In [28]:
cv=KFold()

In [29]:
cv

KFold(n_splits=5, random_state=None, shuffle=False)

In [30]:
from sklearn.model_selection import cross_val_score

In [31]:
## these are the scores all the cross validation
scores = cross_val_score(classification,X_train,y_train,scoring='accuracy',cv=cv)

In [32]:
## Final Scores
np.mean(scores)

1.0

## Another Dataset

In [33]:
## Lets see more complex data
## make a prediction with a multinomial logistic regression model

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

In [34]:
# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=2, random_state=1)

In [35]:
X

array([[ 2.56999479, -0.13019997,  3.16075093, ..., -1.93094078,
         3.26130366,  2.05692145],
       [ 0.34129317,  2.51321418, -0.80416572, ...,  6.24734437,
        -1.92769365,  2.9503149 ],
       [ 2.27539972,  3.36561455,  0.17164362, ...,  2.74693781,
         0.13492444,  2.00339547],
       ...,
       [ 0.5234359 ,  1.90466429,  0.93243365, ...,  1.53945231,
         1.90646166,  1.99458587],
       [ 1.33747921,  3.25859684,  0.78792366, ...,  5.18788314,
        -0.82071083,  3.51411431],
       [-0.98534299,  0.83919047,  2.5820803 , ...,  3.04705685,
         0.66885641,  3.32838496]])

In [36]:
from sklearn.model_selection import train_test_split

In [37]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [38]:
complex_class_model=LogisticRegression(max_iter=200)

In [39]:
complex_class_model.fit(X_train,y_train)

LogisticRegression(max_iter=200)

In [40]:
y_pred_complex = complex_class_model.predict(X_test)

In [41]:
print(confusion_matrix(y_test,y_pred_complex))
print(accuracy_score(y_test,y_pred_complex))
print(classification_report(y_test,y_pred_complex))

[[78 13]
 [29 80]]
0.79
              precision    recall  f1-score   support

           0       0.73      0.86      0.79        91
           1       0.86      0.73      0.79       109

    accuracy                           0.79       200
   macro avg       0.79      0.80      0.79       200
weighted avg       0.80      0.79      0.79       200



In [42]:
cv=KFold(n_splits=5)
cross_val_score(complex_class_model,X_train,y_train,cv=cv)

array([0.80625, 0.78125, 0.79375, 0.8125 , 0.85625])

## Hyperparameter Tuning

Hyperparameter tuning is an important step in training machine learning models, including logistic regression. Tuning the hyperparameters allows you to find the optimal settings that result in the best model performance. Here are some common hyperparameters for logistic regression and strategies for tuning them:

1. Regularization parameter (C or alpha): This parameter controls the inverse of the regularization strength. A smaller value of C or a larger value of alpha increases the regularization strength, which can prevent overfitting. To tune this parameter, you can try different values on a logarithmic scale, such as [0.001, 0.01, 0.1, 1, 10, 100]. Use techniques like cross-validation to evaluate the performance of each setting and select the one that gives the best results.

2. Penalty type (L1 or L2): Logistic regression can use either L1 or L2 regularization. L1 regularization can lead to sparse solutions by driving some coefficients to exactly zero, while L2 regularization encourages small non-zero coefficients. You can try both penalty types and see which one works better for your problem. Some libraries provide a combined penalty option (e.g., Elastic Net), which allows you to mix both L1 and L2 penalties.

3. Solver algorithm: Logistic regression models can be optimized using different algorithms, such as 'liblinear', 'lbfgs', 'sag', or 'saga'. The choice of the solver depends on the size of your dataset, the regularization type, and the specific problem you are working on. Experiment with different solvers to find the one that gives the best results.

4. Maximum number of iterations: This parameter determines the maximum number of iterations taken for the solver to converge. If the solver fails to converge, you can increase this value. However, keep in mind that a large number of iterations can increase the training time. Start with a reasonable default value and increase it only if necessary.

5. Class weights: In imbalanced datasets, where one class is much more prevalent than the other, assigning different weights to the classes can help improve model performance. Some libraries provide options to specify class weights during model training. Try assigning higher weights to the minority class and lower weights to the majority class to account for the class imbalance.

In [43]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

In [44]:
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=2, random_state=1)

In [45]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

## Model Training Hyperparameter Tuning

## GridsearchCV

GridSearchCV is a function provided by the scikit-learn library in Python, which is used for hyperparameter tuning and model selection. It automates the process of searching for the best combination of hyperparameters by exhaustively evaluating all possible parameter combinations using cross-validation.

Here's a general outline of how GridSearchCV works:

1. You start by defining a parameter grid, which is a dictionary or a list of dictionaries. Each dictionary contains the hyperparameters you want to tune and the corresponding values to try.

2. Instantiate an estimator object for the machine learning algorithm you want to use, such as a classifier or a regression model.

3. Create a GridSearchCV object, providing the estimator, parameter grid, and the number of folds for cross-validation.

4. Fit the GridSearchCV object to your training data. This will iterate through all the parameter combinations, fitting the estimator and evaluating its performance using cross-validation.

5. Once the fitting process is complete, you can access various attributes of the GridSearchCV object to examine the results. For example, you can obtain the best set of hyperparameters found, the best cross-validated score, or detailed information about each parameter combination's performance.

6. Finally, you can use the best estimator found by GridSearchCV to make predictions on new, unseen data.

In [47]:
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings('ignore')

In [48]:
parameters = {'penalty':('l1', 'l2', 'elasticnet'), 'C':[1,10,20,30]}

In [49]:
classifier = LogisticRegression()

In [50]:
clf = GridSearchCV(classifier,param_grid=parameters,cv=5)

In [51]:
## Splitting of Training data to train and validation
clf.fit(X_train,y_train)

GridSearchCV(cv=5, estimator=LogisticRegression(),
             param_grid={'C': [1, 10, 20, 30],
                         'penalty': ('l1', 'l2', 'elasticnet')})

In [52]:
clf.best_params_

{'C': 1, 'penalty': 'l2'}

In [53]:
clf.best_score_

0.8087500000000001

In [54]:
classifier = LogisticRegression(C=1,penalty="l2")

In [55]:
classifier.fit(X_train,y_train)

LogisticRegression(C=1)

In [56]:
y_pred=classifier.predict(X_test)

In [57]:
y_pred

array([0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1,
       0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       0, 1])

In [58]:
classifier.predict_proba(X_test)

array([[0.71649202, 0.28350798],
       [0.19508969, 0.80491031],
       [0.12418141, 0.87581859],
       [0.05045906, 0.94954094],
       [0.88775659, 0.11224341],
       [0.75067497, 0.24932503],
       [0.97980488, 0.02019512],
       [0.3921745 , 0.6078255 ],
       [0.59920135, 0.40079865],
       [0.39295203, 0.60704797],
       [0.20428696, 0.79571304],
       [0.80257879, 0.19742121],
       [0.86422932, 0.13577068],
       [0.92665682, 0.07334318],
       [0.00131743, 0.99868257],
       [0.04171096, 0.95828904],
       [0.56288536, 0.43711464],
       [0.89322764, 0.10677236],
       [0.29278211, 0.70721789],
       [0.00870994, 0.99129006],
       [0.71879454, 0.28120546],
       [0.5108267 , 0.4891733 ],
       [0.76230298, 0.23769702],
       [0.73170811, 0.26829189],
       [0.10155737, 0.89844263],
       [0.04046512, 0.95953488],
       [0.57926768, 0.42073232],
       [0.00526468, 0.99473532],
       [0.03101648, 0.96898352],
       [0.96093035, 0.03906965],
       [0.

In [59]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

In [60]:
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[78 13]
 [29 80]]
0.79
              precision    recall  f1-score   support

           0       0.73      0.86      0.79        91
           1       0.86      0.73      0.79       109

    accuracy                           0.79       200
   macro avg       0.79      0.80      0.79       200
weighted avg       0.80      0.79      0.79       200



## Randomized Search cv

RandomizedSearchCV (Randomized Search Cross-Validation) is a technique for hyperparameter tuning in machine learning models. It is an alternative to GridSearchCV that allows you to sample a subset of possible hyperparameter values from a defined search space. The RandomizedSearchCV technique is useful when the hyperparameter search space is large, and it is impractical to perform an exhaustive search using GridSearchCV. Instead of trying out all possible combinations, RandomizedSearchCV selects a random subset of hyperparameter values and evaluates the model's performance using cross-validation.

Here is a step-by-step overview of how RandomizedSearchCV works:

1. Define the search space: Specify the hyperparameters you want to tune and the corresponding value ranges or distributions. For example, you might define a search space for a decision tree classifier with parameters like the maximum depth, minimum samples split, and maximum features.

2. Set the number of iterations: Determine the number of iterations or samples you want to draw from the search space during the search. This parameter is typically set based on computational resources and time constraints.

3. Define the scoring metric: Specify the evaluation metric that will be used to assess the performance of the model with different hyperparameter settings. Common metrics include accuracy, precision, recall, F1 score, or custom scoring functions.

4. Perform RandomizedSearchCV: RandomizedSearchCV performs a specified number of iterations by randomly sampling hyperparameters from the defined search space. For each combination of hyperparameters, it trains and evaluates the model using cross-validation.

5. Select the best hyperparameters: After all iterations are completed, RandomizedSearchCV returns the best set of hyperparameters based on the scoring metric you specified.

6. Evaluate the model: Once you have obtained the best hyperparameters, you can retrain the model using the entire training dataset with those hyperparameters and evaluate its performance on a separate test set or using other evaluation methods.

In [61]:
from sklearn.model_selection import RandomizedSearchCV

In [62]:
random_clf=RandomizedSearchCV(LogisticRegression(),param_distributions=parameters,cv=5,n_iter=20)

In [63]:
random_clf

RandomizedSearchCV(cv=5, estimator=LogisticRegression(), n_iter=20,
                   param_distributions={'C': [1, 10, 20, 30],
                                        'penalty': ('l1', 'l2', 'elasticnet')})

In [64]:
random_clf.fit(X_train,y_train)

RandomizedSearchCV(cv=5, estimator=LogisticRegression(), n_iter=20,
                   param_distributions={'C': [1, 10, 20, 30],
                                        'penalty': ('l1', 'l2', 'elasticnet')})

In [65]:
random_clf.best_params_

{'penalty': 'l2', 'C': 1}

In [66]:
random_clf.best_score_

0.8087500000000001

In [67]:
classifier = LogisticRegression(C=1,penalty='l2')

In [68]:
classifier.fit(X_train,y_train)

LogisticRegression(C=1)

In [69]:
y_pred = classifier.predict(X_test)

In [70]:
y_pred

array([0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0,
       0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,
       0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1,
       0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1,
       0, 1])

In [73]:
classifier.predict_proba(X_test)

array([[0.71649202, 0.28350798],
       [0.19508969, 0.80491031],
       [0.12418141, 0.87581859],
       [0.05045906, 0.94954094],
       [0.88775659, 0.11224341],
       [0.75067497, 0.24932503],
       [0.97980488, 0.02019512],
       [0.3921745 , 0.6078255 ],
       [0.59920135, 0.40079865],
       [0.39295203, 0.60704797],
       [0.20428696, 0.79571304],
       [0.80257879, 0.19742121],
       [0.86422932, 0.13577068],
       [0.92665682, 0.07334318],
       [0.00131743, 0.99868257],
       [0.04171096, 0.95828904],
       [0.56288536, 0.43711464],
       [0.89322764, 0.10677236],
       [0.29278211, 0.70721789],
       [0.00870994, 0.99129006],
       [0.71879454, 0.28120546],
       [0.5108267 , 0.4891733 ],
       [0.76230298, 0.23769702],
       [0.73170811, 0.26829189],
       [0.10155737, 0.89844263],
       [0.04046512, 0.95953488],
       [0.57926768, 0.42073232],
       [0.00526468, 0.99473532],
       [0.03101648, 0.96898352],
       [0.96093035, 0.03906965],
       [0.

In [74]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

In [75]:
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[78 13]
 [29 80]]
0.79
              precision    recall  f1-score   support

           0       0.73      0.86      0.79        91
           1       0.86      0.73      0.79       109

    accuracy                           0.79       200
   macro avg       0.79      0.80      0.79       200
weighted avg       0.80      0.79      0.79       200



## Logistic Regression For Multiclass Classification

Logistic regression is a popular algorithm used for binary classification problems. However, it can also be extended to handle multiclass classification problems through various techniques. In this response, I will explain two common approaches: one-vs-rest and softmax regression.

**One-vs-Rest (OvR) Approach: The one-vs-rest approach, also known as one-vs-all, involves training a separate logistic regression model for each class, treating it as the positive class, and considering all other classes as the negative class. During prediction, the model that outputs the highest probability is chosen as the predicted class.**

Here's a step-by-step breakdown of the one-vs-rest approach for multiclass logistic regression:

1. Split your dataset into a training set and a test set.
2. For each class in your target variable:
   a. Create a binary variable where the positive class is the current class, and the negative class is all other classes.
   b. Train a logistic regression model using the binary variable as the target variable and the original features as inputs.
   c. Store the learned parameters (coefficients) for each model.

During prediction:

1. For a new input, pass it through all the trained models.
2. Calculate the predicted probability for each class using the learned parameters.
3. Assign the class with the highest probability as the predicted class.
4. Softmax Regression (Multinomial Logistic Regression):- Softmax regression, also known as multinomial logistic regression, is another approach to handle multiclass classification with logistic regression. Unlike the one-vs-rest approach, softmax regression directly models the probabilities of each class using a single model.

Here's how softmax regression works:

1. Split your dataset into a training set and a test set.
2. Encode your target variable using one-hot encoding, creating a binary matrix where each column represents a class.
3. Train a logistic regression model with multiple outputs (equal to the number of classes).
4. Use the softmax function to convert the outputs into probabilities, ensuring that the sum of probabilities for all classes adds up to 1.

During prediction:

1. For a new input, pass it through the trained model.
2. Calculate the predicted probabilities for each class using the learned parameters and the softmax function.
3. Assign the class with the highest probability as the predicted class.

In [76]:
# make a prediction with a multinomial logistic regression model

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

# define dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, n_classes=3, random_state=1)

In [77]:
# define the multinomial logistic regression model For multiclass classification
model = LogisticRegression(multi_class='ovr', solver='lbfgs')

In [83]:
# Train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [84]:
# trianing model
model.fit(X_train,y_train)

LogisticRegression(multi_class='ovr')

In [85]:
Y_pred = model.predict(X_test)

In [86]:
Y_pred

array([1, 1, 0, 2, 1, 2, 2, 2, 0, 0, 2, 0, 0, 1, 1, 0, 1, 2, 0, 0, 1, 0,
       2, 1, 0, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 2, 0, 2, 1, 0, 0, 0, 1, 0,
       1, 1, 0, 0, 0, 2, 1, 2, 0, 2, 2, 2, 0, 1, 1, 0, 0, 2, 0, 2, 0, 1,
       2, 1, 0, 0, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 0, 1, 0, 1, 2, 0, 2, 2,
       1, 1, 2, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 2, 1,
       0, 1, 1, 2, 2, 2, 1, 2, 0, 0, 2, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
       2, 2, 2, 2, 2, 2, 0, 2, 1, 1, 0, 1, 2, 0, 1, 0, 2, 0, 0, 1, 1, 0,
       2, 0, 1, 0, 2, 1, 2, 0, 1, 1, 0, 1, 2, 1, 0, 2, 0, 1, 1, 2, 1, 0,
       2, 1, 1, 0, 1, 0, 1, 0, 2, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       1, 0])

In [87]:
model.predict_proba(X_test)

array([[9.10622335e-02, 5.88916888e-01, 3.20020878e-01],
       [7.74864218e-02, 8.83936419e-01, 3.85771592e-02],
       [6.84414416e-01, 2.03496841e-01, 1.12088744e-01],
       [1.43095699e-02, 9.01488946e-02, 8.95541536e-01],
       [6.87497205e-02, 8.36642893e-01, 9.46073865e-02],
       [1.09007702e-01, 3.07621844e-01, 5.83370453e-01],
       [3.78594000e-01, 2.08265604e-01, 4.13140396e-01],
       [3.14084906e-01, 1.75940674e-01, 5.09974420e-01],
       [5.14671931e-01, 1.90311178e-01, 2.95016891e-01],
       [5.97817613e-01, 3.02410459e-01, 9.97719283e-02],
       [4.20115939e-02, 3.31570100e-01, 6.26418306e-01],
       [4.86740270e-01, 1.04804858e-01, 4.08454873e-01],
       [8.01115911e-01, 3.37497449e-02, 1.65134344e-01],
       [8.98140961e-02, 6.97976196e-01, 2.12209707e-01],
       [7.29023255e-02, 8.39105805e-01, 8.79918698e-02],
       [5.10619176e-01, 2.09225440e-01, 2.80155384e-01],
       [1.24309039e-01, 7.90819633e-01, 8.48713286e-02],
       [3.42847759e-03, 3.71703

In [88]:
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

In [89]:
print(confusion_matrix(y_pred,y_test))
print(accuracy_score(y_pred,y_test))
print(classification_report(y_pred,y_test))

[[38 34 35]
 [37 30 26]
 [ 0  0  0]]
0.34
              precision    recall  f1-score   support

           0       0.51      0.36      0.42       107
           1       0.47      0.32      0.38        93
           2       0.00      0.00      0.00         0

    accuracy                           0.34       200
   macro avg       0.33      0.23      0.27       200
weighted avg       0.49      0.34      0.40       200

