## Content


- Hyperparameter Tuning

- LogOdds and Logit
  - [Odds of tails](https://www.scaler.com/hire/test/problem/23537/)
  - [Odds of Drinking](https://www.scaler.com/hire/test/problem/20190/)
  - [Red or Black Jacket](https://www.scaler.com/hire/test/problem/23572/)
  - [Implement the model](https://www.scaler.com/hire/test/problem/23535/) (hold)
  - [Sigmoid Function and logistic model](https://www.scaler.com/hire/test/problem/20188/) (hold)

- Impact of outliers
  - [Impact of Outliers on Logistic Regression](https://www.scaler.com/hire/test/problem/29675/)

- Multiclass classification
  - [One-vs-Rest](https://www.scaler.com/hire/test/problem/24776/) (hold)
  - [Multiclass classification II](https://www.scaler.com/hire/test/problem/24777/)
  - [Fruits/Vegetables](https://www.scaler.com/hire/test/problem/20238/)
  - [One Vs Rest MCQ](https://www.scaler.com/hire/test/problem/29671/)
  - [One vs Rest](https://www.scaler.com/hire/test/problem/24778/)

- Extra (HW):
  -[Logistic regression assumptions](https://www.scaler.com/hire/test/problem/16045/)


Let's see how we can perform hyperparameter tuning on our logistic regression model


<font color='red'>Instructor Note</font>

If possible, show the documentation page of sklearn's logistic regression, and explain few parameters before going into hyperparameter tuning

##**Hyperparameter tuning**

We will tune the regularization rate of our model.

You can refer to the documentation for the various list of parameters in logistic regression.

Link: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html



Hence let's start doing hyper parameter tuning on parameter $C = \frac{1}{\lambda}$  to increase the performance of the model

In [None]:
from sklearn.pipeline import make_pipeline
train_scores = []
val_scores = []
scaler = StandardScaler()
for la in np.arange(0.01, 5000.0, 100): # range of values of Lambda
  scaled_lr = make_pipeline(scaler, LogisticRegression(C=1/la))
  scaled_lr.fit(X_train, y_train)
  train_score = accuracy(y_train, scaled_lr.predict(X_train))
  val_score = accuracy(y_val, scaled_lr.predict(X_val))
  train_scores.append(train_score)
  val_scores.append(val_score)

NameError: name 'StandardScaler' is not defined

Now, let's plot the graph and pick the Regularization Parameter $λ$ which gives the best validation score

In [None]:
plt.figure(figsize=(10,5))
plt.plot(list(np.arange(0.01, 5000.0, 100)), train_scores, label="train")
plt.plot(list(np.arange(0.01, 5000.0, 100)), val_scores, label="val")
plt.legend(loc='lower right')

plt.xlabel("Regularization Parameter(λ)")
plt.ylabel("Accuracy")
plt.grid()
plt.show()


- We see how Validation increases to a peak and then decreases

- Notice as Regularization is increasing, the Accuracy decreasing since model is moving towards Underfit

Let's take lambda value as 1000 for this data and check the
results

In [None]:
model = LogisticRegression(C=1/1000)
model.fit(X_train, y_train)

In [None]:
accuracy(y_train, model.predict(X_train))

In [None]:
accuracy(y_val, model.predict(X_val))

We can observe an increase of 0.01, or 1%, in both training and validation data

Let's check our model for test data too

In [None]:
accuracy(y_test, model.predict(X_test))

**Quiz** (Check your understanding)
```
What is the effect of increasing the regularization rate (C) in logistic regression?
a) The model becomes less prone to overfitting
b) The model's training accuracy increases
c) The model becomes more prone to overfitting
d) The model's test accuracy increases

Answer: c) The model becomes more prone to overfitting

Explanation:
Increasing the regularization rate (C) in logistic regression reduces the impact of regularization, making the model more prone to overfitting.
```



**Quiz** (Check your understanding)
```
How does the regularization rate (C) affect the magnitude of the model coefficients in logistic regression?
a) Higher C results in larger coefficient values
b) Higher C results in smaller coefficient values
c) C has no impact on the magnitude of the coefficients
d) The effect of C on the coefficients depends on the dataset

Answer: b) Higher C results in smaller coefficient values

Explanation:
higher values of the regularization rate (C) lead to smaller coefficient values.
Because higher C increases the penalty for large coefficients during training
Which pushes model to shrink the coefficient magnitudes to reduce the loss
```

## Logit/ Log odds


**Quiz** (What do you think?)

```
The logistic regression algorithm estimates the parameters by maximizing the:
a) Sum of squared errors
b) Mean squared error
c) Likelihood function
d) Cross-entropy loss

Ans: Likelihood function

Explanation:
The logistic regression algorithm estimates the parameters by maximizing the likelihood function. Or minimizing the negative likelihood.
The likelihood function measures how well the chosen parameters fit the observed data
```

### Log odds interpretation of logistic regression


<img src='https://drive.google.com/uc?id=1z-0qkx0h81U_iwb7fVeFQG0RVkpqyPGy' width=800>





<img src='https://drive.google.com/uc?id=1mruiW2aBWCEMjW74WtAC3_AQoeDZ4EdJ' width=800>



#### Which concept of earlier is this similar to?

Remember, $σ(p)$ also defined probability.

So if we simplify our winning/losing as belonging to class 1/0, then $σ(p)$ here defines the probability of belonging to class 1 (winning class)

**Quiz** (What do you think?)

```
The logistic regression model predicts:
a) Probabilities
b) Class labels
c) Continuous values
d) Ordinal values

Ans: a) Probabilities

Explanation:
Logistic regression predicts the probabilities of the target variable belonging to the positive class
```


<img src='https://drive.google.com/uc?id=1Xpm2xAc1oT95bAzZvRQPUobikSRR2Fgs' width=800>



<img src='https://drive.google.com/uc?id=1XWM57akV5CFtG8JypxDELnpNokU6nLco' width=800>

#### What does this mean geometrically?



<img src='https://drive.google.com/uc?id=17CVyUuT9ZLlsqgWhsyKUChPP0o6Nlw33' width=800>

**Quiz** (Check your understanding)

```
If log(odds) is negative, which of the options hold true?

a. 1-p > p
b. p > 1-p
c. p == 1-p

Ans: a. 1-p > p

Explanation: since odds = p/1-p, negative log value would mean p/1-p is <1, which would mean 1-p>p
```


<img src='https://drive.google.com/uc?id=1F7pWJ-_hmPbEe7LgaJhC9VESNrx0Y24x
' width=800>

To find the probability of the point lying, we simply apply exponential to both sides and solve for p, which would give:

$p=\frac{1}{1+e^{-z}}$

Note: Sigmoid and Logit and just inverse of each other, and both can be used to build a logistic regression model

**Quiz** (Check your understanding)

```
What is the range of log odds in logistic regression?
a) (0, 1)
b) (-∞, ∞)
c) [0, 1]
d) [0, ∞)

Ans: b) (-∞, ∞)

Explanation:
The log odds in logistic regression can take any real value ranging from negative infinity to positive infinity.
This is because it is the logarithm of the odds ratio, which is a continuous value.
```

**Quiz** (Check your understanding)

```
How are log odds transformed into probabilities in logistic regression?
a) By applying the sigmoid function
b) By taking the exponential function
c) By dividing by the odds ratio
d) By subtracting the intercept term

Ans: a) By applying the sigmoid function

Explanation:
The sigmoid function maps the log odds to a value between 0 and 1
```

## Impact of outliers


<img src='https://drive.google.com/uc?id=1aQk_WFojHob2thbycSBBC1hXx2cIL2Lh' width=800>



### Case I: When the outlier lies on the correct side

Now, $\hat{y}=σ(z^i)$


<img src="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/036/753/original/image_2023-06-14_052158593.png?1686700322" height=500 width=600>


<img src='https://drive.google.com/uc?id=1iDeFLogS9rCNs1WiELMsFoRMIx_jRHZ8' width=800>



Since the Loss is very less in this case:

=> The impact of outlier is **very less**

### Case II: When the outlier is on the opposite/wrong side


<img src='https://drive.google.com/uc?id=1SKv32h8SUGk4pbOuS6XQnCv20LMnUV6V' width=800>


Let's say $z^i=-4.3$

So $\hat{y}$ becomes 0.01

Therefore, L = $-log_e(0.01)$

This comes out almost equal to 4.6, which is a very large value

=> The impact of outlier will be **very high**

Thus the best thing is to find the outlier and remove them, so that we get accurate results

**Quiz** (Check your understanding)

```
How do outliers affect the classification boundaries in logistic regression?
a) Outliers shift the classification boundaries closer to the outlier values
b) Outliers have no effect on the classification boundaries
c) Outliers widen the gap between the classification boundaries
d) Outliers make the classification boundaries more sensitive to minor changes

Answer: a) Outliers shift the classification boundaries closer to the outlier values
```

## Multi-class classification

Till now we have seen how to use logistic regression to classify between two classes

But in real world there will be cases with many more classes

#### How can we use logistic regression in cases with more than two output classes?


<img src='https://drive.google.com/uc?id=1ZXmXc62oRRLsGOxNVvHi4GWITISWvL16' width=800>




<img src='https://drive.google.com/uc?id=1MSTuz_D9AJUZlHgDqMwQsBsyTLAE2gE7' width=800>



To train these models, we can't use the same dataset, since our data will have three classes.

So we will modify our data for the three models.

Say for model 1, to check whether the input is orange or not,
- Our output column will be modified by replacing the values with orange as 1, and rest values with 0

We will do the same for the other two models


<img src='https://drive.google.com/uc?id=1xCJJoF5j0HJILD0xfhI6hA_1RqwoefHz' width=800>




<img src='https://drive.google.com/uc?id=15kHWLomnIvIkr6EmzB1EiDpAddlOQ-q2' width=800>



**Quiz** (Check your understanding)

```
We want to classify cars based on the 20 different brands of cars.
How many logisitic Regression model will we need ?

a. 10
b. 20
c. 21
d. 19


b. 20

**Explanation**

if we have yi = {1,2,3...K} in the  dataset, we have to generate K-binary classifier models.
```

#### Now given an input point, how to predict which class it belongs to?



<img src='https://drive.google.com/uc?id=1RTcgUwMq12FlqHJBH3l0jl91mbfCMQxv' width=800>



**Quiz** (Check your understanding)

```
For three models, the yhat values come to be:
M1=0.34
M2=0.28
M3=0.35

What would be the predicted output class by the classifier?

a. M1
b. M2
c. M3
d. None since no model has yhat>0.5

Ans: c.M3

Explanation: The model with the highest yhat value will be chosen irrespective of whether they are greater than the threshold or not
```

**Quiz** (Check your understanding)

```
What is the purpose of the one-vs-rest (OvR) strategy in multi-class logistic regression?
a) To improve the interpretability of the model coefficients
b) To handle imbalanced datasets in multi-class problems
c) To reduce the complexity of the model
d) To transform a multi-class problem into multiple binary classification problems
Answer: d)To transform a multi-class problem into multiple binary classification problems

Explanation:
The one-vs-rest (OvR) strategy is used in multi-class logistic regression to transform the multi-class problem into multiple binary classification problems.
We build n models for n classes where in each model we treat each class as the positive class and the rest of the classes as the negative class.

```

Let's see an implementation of the same using sklearn

### Sklearn Code implementation for MultiClass Classification

Importing libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LogisticRegression

from sklearn.inspection import DecisionBoundaryDisplay

Creating some data with multiple classes

In [None]:
# dataset creation with 3 classes
from sklearn.datasets import make_classification

X, y = make_classification(n_samples= 498,
                           n_features= 2,
                           n_classes = 3,
                           n_redundant=0,
                           n_clusters_per_class=1,
                           random_state=5)
y=y.reshape(len(y), 1)

print(X.shape, y.shape)

Plotting the data

In [None]:
plt.scatter(X[:, 0], X[:, 1], c = y)
plt.show()


Splitting the data into train validation and test set

In [None]:
from sklearn.model_selection import train_test_split

X_tr_cv, X_test, y_tr_cv, y_test = train_test_split(X, y, test_size=0.2, random_state=4)
X_train, X_val, y_train, y_val = train_test_split(X_tr_cv, y_tr_cv, test_size=0.25,random_state=4)
X_train.shape

In [None]:
y_train = y_train.reshape(-1) # making 1D vector

training the OneVsRest Logistic Regression model

In [None]:
model = LogisticRegression(multi_class='auto')
# fit model
model.fit(X_train, y_train)

Checking the Accuracy of Training, validation and Test dataset

In [None]:
print(f'Training Accuracy:{model.score(X_train,y_train)}')
print(f'Validation Accuracy :{model.score(X_val,y_val)}')
print(f'Test Accuracy:{model.score(X_test,y_test)}')

Creating Hyperplane of OVR LogisticRegression for the entire data

In [None]:
X

In [None]:
_, ax = plt.subplots()
DecisionBoundaryDisplay.from_estimator(model, X, response_method="predict", cmap=plt.cm.Paired, ax=ax)
plt.title("Decision surface of LogisticRegression")
plt.axis("tight")

# Plot also the training points
colors = "bry"
for i, color in zip(model.classes_, colors):
        idx = np.where(y == i)
        plt.scatter(
            X[idx, 0], X[idx, 1], c=color, cmap=plt.cm.Paired, edgecolor="black", s=20
        )


# Plot the three one-against-all classifiers
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
coef = model.coef_
intercept = model.intercept_

def plot_hyperplane(c, color):
        def line(x0):
            return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1]

        plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color)

for i, color in zip(model.classes_, colors):
        plot_hyperplane(i, color)

plt.show()

**Observe**

We can see how One-vs-Rest Logistic Regression is able to classify Multi-class Classification data

**Extra Quizzes**

**Quiz** (What do you think?)
```
Which evaluation metric is commonly used to assess the performance of a logistic regression model?
a) Mean squared error
b) R-squared value
c) Accuracy
d) Root mean squared error

Answer: c) Accuracy
Explanation:
Accuracy is a commonly used metric to check the performance of a logistic regression model.
It measures the ratio of correctly predicted datapoints out of the total number of datapoints.
```

**Quiz** (Check your understanding)

```
Logistic regression assumes that the relationship between the independent variables and the log-odds of the dependent variable is:
a) Exponential
b) Quadratic
c) Non-linear
d) Linear
Answer: d) Linear
```

**Quiz** (What do you think?)

```
How is the loss function typically defined in multi-class logistic regression?
a) Cross-entropy loss
b) Mean squared error (MSE)
c) Mean absolute error (MAE)
d) Hinge loss

Answer: a) Cross-entropy loss
```