# LOGISTIC REGRESSION
## Logistic regression is very much similar to the linear regression in which used to predict the value but here we are going to classify the objects into different class or category. Logistic Regression is used when the dependent variable(target) is categorical.
### For example,
### To predict whether an email is spam (1) or (0)
### Whether the tumor is malignant (1) or not (0)
### Iris flower Classification

### Consider a scenario where we need to classify whether an email is spam or not. If we use linear regression for this problem, there is a need for setting up a threshold based on which classification can be done. Say if the actual class is malignant, predicted continuous value 0.4 and the threshold value is 0.5, the data point will be classified as not malignant which can lead to serious consequence in real time.
### From this example, it can be inferred that linear regression is not suitable for classification problem. Linear regression is unbounded, and this brings logistic regression into picture. Their value strictly ranges from 0 to 1.
![image.png](attachment:image.png)


In [41]:
## importing all the necessary libraries like numpy , pandas and matplotlib

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
import random

# Types of Classification:

## 1. Binary Classification
### For malignant problem we have two possiblities or classification i.e; It is malignant (1) or not (0) and similarly for  email is spam (1) or not (0). This type of classification is called binary classification.
### For binary classification we need a function which distinguish between categories and this function is sigmoid function.  There are some more activation functions like ReLU , Tanh you can explore through the online available materials
![image.png](attachment:image.png)
## _
## 2. Multinomial Classification 
### For Iris flower classification we have three types of classes( Iris-Setosa , Iris-Versicolor, Iris-Virginica )
### Now this classification can be converted into binary classiffication as given input is Iris_setosa (1) or not (0) , Iris-Versicolor (1) or not (0) , Iris-Virginica (1) or not (0).
### This makes our work easy.

In [42]:
## In python it is defined as :
def sigmoid(X):
    return (1.0/(1.0+np.exp(-X)))

## Decision Boundary or Cost Function
### It is a function that measures the performance of a Machine Learning model for given data. Cost Function quantifies the error between predicted values and expected values and presents it in the form of a single real number. Depending on the problem Cost Function can be formed in many different ways. The purpose of Cost Function is to be either:

### Minimized - then returned value is usually called cost, loss or error. The goal is to find the values of model parameters for which Cost Function return as small number as possible.

### Maximized - then the value it yields is named a reward. The goal is to find values of model parameters for which returned number is as large as possible.

### For algorithms relying on Gradient Descent to optimize model parameters, every function has to be differentiable.

### Now to predict which class a data belongs, a threshold can be set. Based upon this threshold, the obtained estimated probability is classified into classes.
### Say, if predicted_value ≥ 0.5, then classify email as spam else as not spam.
### Decision boundary can be linear or non-linear. Polynomial order can be increased to get complex decision boundary.

![image.png](attachment:image.png)

## Why cost function which has been used for linear can not be used for logistic?
### Linear regression uses mean squared error as its cost function. If this is used for logistic regression, then it will be a non-convex function of parameters (theta). Gradient descent will converge into global minimum only if the function is convex.



![image.png](attachment:image.png)

## Simplified Cost Function 


![image.png](attachment:image.png)

In [43]:
## Here we defined our required cost function:
def cost_fun(y,h):
    m=h.shape[0]
    return np.sum((-y*np.log(h))-((1-y)*np.log(1-h)))/m

In [44]:
## Loss function is defined here , there is a very minor difference between between cost and loss that we call loss
## to the cost of single point :
def loss(X):
    return (sigmoid(X)*(1-sigmoid(X)))

## Gradient Descent 
### Gradient Descent is the most common optimization algorithm in machine learning and deep learning. It is a first-order optimization algorithm. This means it only takes into account the first derivative when performing the updates on the parameters. On each iteration, we update the parameters in the opposite direction of the gradient of the objective function J(w) w.r.t the parameters where the gradient gives the direction of the steepest ascent. The size of the step we take on each iteration to reach the local minimum is determined by the learning rate α. Therefore, we follow the direction of the slope downhill until we reach a local minimum.

### Lets suppose we have a dataset which depends on only two factors name them x1 , x2 . Now for that 
![image.png](attachment:image.png)

### Here z is our function and L(y^,y) is our cost function . Now by differentiating the loss we get the gradient values for those 2 parameters.

![image.png](attachment:image.png)

### Now we have a general formulae for the gradient of each parametes . 

# #Now we will make a logistic regression model from scratch on a Iris Flower Dataset
#### Just follow the following steps and try to understand each of them .

### Here we imported the iris dataset which i have downloaded , you can download it from the given link 
### https://github.com/shubhamkeshri1621/Iris_flower_clssification/blob/master/IRIS.csv

In [45]:
iris_data = pd.read_csv(r"C:\Users\SHUBHAM\Downloads\IRIS.csv")
##iris_data.tail(20) stores the dataset

### stores that data set in a array from tabular form

In [46]:
iris_data = np.array(iris_data)
print(iris_data) , 


[[5.1 3.5 1.4 0.2 'Iris-setosa']
 [4.9 3.0 1.4 0.2 'Iris-setosa']
 [4.7 3.2 1.3 0.2 'Iris-setosa']
 [4.6 3.1 1.5 0.2 'Iris-setosa']
 [5.0 3.6 1.4 0.2 'Iris-setosa']
 [5.4 3.9 1.7 0.4 'Iris-setosa']
 [4.6 3.4 1.4 0.3 'Iris-setosa']
 [5.0 3.4 1.5 0.2 'Iris-setosa']
 [4.4 2.9 1.4 0.2 'Iris-setosa']
 [4.9 3.1 1.5 0.1 'Iris-setosa']
 [5.4 3.7 1.5 0.2 'Iris-setosa']
 [4.8 3.4 1.6 0.2 'Iris-setosa']
 [4.8 3.0 1.4 0.1 'Iris-setosa']
 [4.3 3.0 1.1 0.1 'Iris-setosa']
 [5.8 4.0 1.2 0.2 'Iris-setosa']
 [5.7 4.4 1.5 0.4 'Iris-setosa']
 [5.4 3.9 1.3 0.4 'Iris-setosa']
 [5.1 3.5 1.4 0.3 'Iris-setosa']
 [5.7 3.8 1.7 0.3 'Iris-setosa']
 [5.1 3.8 1.5 0.3 'Iris-setosa']
 [5.4 3.4 1.7 0.2 'Iris-setosa']
 [5.1 3.7 1.5 0.4 'Iris-setosa']
 [4.6 3.6 1.0 0.2 'Iris-setosa']
 [5.1 3.3 1.7 0.5 'Iris-setosa']
 [4.8 3.4 1.9 0.2 'Iris-setosa']
 [5.0 3.0 1.6 0.2 'Iris-setosa']
 [5.0 3.4 1.6 0.4 'Iris-setosa']
 [5.2 3.5 1.5 0.2 'Iris-setosa']
 [5.2 3.4 1.4 0.2 'Iris-setosa']
 [4.7 3.2 1.6 0.2 'Iris-setosa']
 [4.8 3.1 

(None,)

### Next step to take few (120) random data from given dataset so that the system doesnot follows the same pattern and gives same output for each execution. and rest of the data will be stored in validation.

In [47]:
test = random.sample(range(0,150),120)
print(test) 


[91, 65, 89, 123, 122, 131, 49, 12, 120, 69, 51, 61, 96, 142, 19, 76, 54, 83, 80, 72, 66, 0, 68, 7, 36, 1, 148, 8, 145, 9, 44, 124, 37, 87, 138, 67, 147, 64, 128, 113, 115, 11, 39, 25, 18, 56, 143, 42, 35, 137, 5, 102, 41, 2, 75, 30, 119, 55, 126, 57, 74, 117, 82, 141, 60, 118, 125, 70, 10, 85, 101, 28, 86, 106, 134, 32, 146, 114, 63, 52, 23, 24, 103, 129, 95, 14, 116, 136, 110, 140, 93, 31, 108, 88, 40, 92, 20, 46, 121, 84, 77, 111, 90, 33, 81, 79, 99, 104, 47, 34, 109, 98, 71, 127, 149, 53, 107, 21, 26, 130]


In [48]:
validation = [i for i in range (150) if not (i in test)]
print(validation) , 

[3, 4, 6, 13, 15, 16, 17, 22, 27, 29, 38, 43, 45, 48, 50, 58, 59, 62, 73, 78, 94, 97, 100, 105, 112, 132, 133, 135, 139, 144]


(None,)

### y stores the data values of 4th column ( if we put :4 then the other four columns will be stored in the variable , our data set has 5 features or parameters ) and x stores the other 4 parameters.

In [49]:
y = iris_data[:,4]
print(y)

['Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor

In [50]:
x=np.array(iris_data[:,:4],dtype=float)
print(x)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.1 1.5 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.

### Now we have to subtract the mean values of each parameter from that parameter , this is called FEATURE SCALING . It is required because different parameter has different range of values , so for ploting and visulaization it is essential to make all the features of similar range

In [51]:
x = x-np.mean(x)
print(x) 

[[ 1.63633333  0.03633333 -2.06366667 -3.26366667]
 [ 1.43633333 -0.46366667 -2.06366667 -3.26366667]
 [ 1.23633333 -0.26366667 -2.16366667 -3.26366667]
 [ 1.13633333 -0.36366667 -1.96366667 -3.26366667]
 [ 1.53633333  0.13633333 -2.06366667 -3.26366667]
 [ 1.93633333  0.43633333 -1.76366667 -3.06366667]
 [ 1.13633333 -0.06366667 -2.06366667 -3.16366667]
 [ 1.53633333 -0.06366667 -1.96366667 -3.26366667]
 [ 0.93633333 -0.56366667 -2.06366667 -3.26366667]
 [ 1.43633333 -0.36366667 -1.96366667 -3.36366667]
 [ 1.93633333  0.23633333 -1.96366667 -3.26366667]
 [ 1.33633333 -0.06366667 -1.86366667 -3.26366667]
 [ 1.33633333 -0.46366667 -2.06366667 -3.36366667]
 [ 0.83633333 -0.46366667 -2.36366667 -3.36366667]
 [ 2.33633333  0.53633333 -2.26366667 -3.26366667]
 [ 2.23633333  0.93633333 -1.96366667 -3.06366667]
 [ 1.93633333  0.43633333 -2.16366667 -3.06366667]
 [ 1.63633333  0.03633333 -2.06366667 -3.16366667]
 [ 2.23633333  0.33633333 -1.76366667 -3.16366667]
 [ 1.63633333  0.33633333 -1.96

### Now we take the training data i.e; the parameters which will give us the required output .
#### Here x_train is our input training data and y_training is our output training data

In [52]:
x_train=(x[test,:4])
print (x_train)

[[ 2.63633333 -0.46366667  1.13633333 -2.06366667]
 [ 3.23633333 -0.36366667  0.93633333 -2.06366667]
 [ 2.03633333 -0.96366667  0.53633333 -2.16366667]
 [ 2.83633333 -0.76366667  1.43633333 -1.66366667]
 [ 4.23633333 -0.66366667  3.23633333 -1.46366667]
 [ 4.43633333  0.33633333  2.93633333 -1.46366667]
 [ 1.53633333 -0.16366667 -2.06366667 -3.26366667]
 [ 1.33633333 -0.46366667 -2.06366667 -3.36366667]
 [ 3.43633333 -0.26366667  2.23633333 -1.16366667]
 [ 2.13633333 -0.96366667  0.43633333 -2.36366667]
 [ 2.93633333 -0.26366667  1.03633333 -1.96366667]
 [ 2.43633333 -0.46366667  0.73633333 -1.96366667]
 [ 2.23633333 -0.56366667  0.73633333 -2.16366667]
 [ 2.33633333 -0.76366667  1.63633333 -1.56366667]
 [ 1.63633333  0.33633333 -1.96366667 -3.16366667]
 [ 3.33633333 -0.66366667  1.33633333 -2.06366667]
 [ 3.03633333 -0.66366667  1.13633333 -1.96366667]
 [ 2.53633333 -0.76366667  1.63633333 -1.86366667]
 [ 2.03633333 -1.06366667  0.33633333 -2.36366667]
 [ 2.83633333 -0.96366667  1.43

### Now we added a column of 1 in the x_train array 

In [53]:
x_train = np.concatenate((x_train,np.ones((120,1),dtype=float)),axis=1)
print(x_train)

[[ 2.63633333 -0.46366667  1.13633333 -2.06366667  1.        ]
 [ 3.23633333 -0.36366667  0.93633333 -2.06366667  1.        ]
 [ 2.03633333 -0.96366667  0.53633333 -2.16366667  1.        ]
 [ 2.83633333 -0.76366667  1.43633333 -1.66366667  1.        ]
 [ 4.23633333 -0.66366667  3.23633333 -1.46366667  1.        ]
 [ 4.43633333  0.33633333  2.93633333 -1.46366667  1.        ]
 [ 1.53633333 -0.16366667 -2.06366667 -3.26366667  1.        ]
 [ 1.33633333 -0.46366667 -2.06366667 -3.36366667  1.        ]
 [ 3.43633333 -0.26366667  2.23633333 -1.16366667  1.        ]
 [ 2.13633333 -0.96366667  0.43633333 -2.36366667  1.        ]
 [ 2.93633333 -0.26366667  1.03633333 -1.96366667  1.        ]
 [ 2.43633333 -0.46366667  0.73633333 -1.96366667  1.        ]
 [ 2.23633333 -0.56366667  0.73633333 -2.16366667  1.        ]
 [ 2.33633333 -0.76366667  1.63633333 -1.56366667  1.        ]
 [ 1.63633333  0.33633333 -1.96366667 -3.16366667  1.        ]
 [ 3.33633333 -0.66366667  1.33633333 -2.06366667  1.  

### Now we defined a array y_train of 120*3 whose each every element is 0 and it will define the output 
### If one data is of first class of 3 then it will make the first element of that row 1 , similarly for 2nd class second element of the row becomes 1 and for last class 3rd element of the row of y_train will become 1.

In [54]:
y_train = np.zeros((120,3)) 
print(y_train)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0.

In [55]:
for i,j in zip(test,range(120)):
    if y[i]=='Iris-setosa':
        y_train[j,0]=1 ## for iris_data in setosa make 1 in y_train
    if y[i]=='Iris-versicolor':
        y_train[j,1]=1 ## for iris_data in versicolor make 1 in y_train
    if y[i]=='Iris-virginica':
        y_train[j,2]=1 ## for iris_data in virginica make 1 in y_train
print(y_train)

[[0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 1. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 1. 0.]
 [0. 1. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 0. 1.]
 [0.

### Similarly we will proceed for the validation dataset which had already separated earlier in this code.

In [56]:
x_test =(x[validation,:4])
x_test = np.concatenate((x_test,np.ones((30,1))),axis=1)
print(x_test)

[[ 1.13633333 -0.36366667 -1.96366667 -3.26366667  1.        ]
 [ 1.53633333  0.13633333 -2.06366667 -3.26366667  1.        ]
 [ 1.13633333 -0.06366667 -2.06366667 -3.16366667  1.        ]
 [ 0.83633333 -0.46366667 -2.36366667 -3.36366667  1.        ]
 [ 2.23633333  0.93633333 -1.96366667 -3.06366667  1.        ]
 [ 1.93633333  0.43633333 -2.16366667 -3.06366667  1.        ]
 [ 1.63633333  0.03633333 -2.06366667 -3.16366667  1.        ]
 [ 1.13633333  0.13633333 -2.46366667 -3.26366667  1.        ]
 [ 1.73633333  0.03633333 -1.96366667 -3.26366667  1.        ]
 [ 1.23633333 -0.26366667 -1.86366667 -3.26366667  1.        ]
 [ 0.93633333 -0.46366667 -2.16366667 -3.26366667  1.        ]
 [ 1.53633333  0.03633333 -1.86366667 -2.86366667  1.        ]
 [ 1.33633333 -0.46366667 -2.06366667 -3.16366667  1.        ]
 [ 1.83633333  0.23633333 -1.96366667 -3.26366667  1.        ]
 [ 3.53633333 -0.26366667  1.23633333 -2.06366667  1.        ]
 [ 3.13633333 -0.56366667  1.13633333 -2.16366667  1.  

In [57]:
y_test = np.zeros((30,3))
print(y_test)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


### Simialrly we change the y_test array as per given data in the dataset

In [58]:
for i,j in zip(validation,range(120)):
    if y[i]=='Iris-setosa':
        y_test[j,0]=1 ## for iris_data in setosa make 1 in y_train
    if y[i]=='Iris-versicolor':
        y_test[j,1]=1 ## for iris_data in vericolor make 1 in y_train
    if y[i]=='Iris-virginica':
        y_test[j,2]=1 ## for iris_data in virginica make 1 in y_train
print(y_test)

[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]
 [0. 0. 1.]]


### Now we are going to proceed with the most important part of logistic regression i.e; Gradient descent 
### Now we will define theta a 5*3 matrix which will give gradient for each parameter  for each class

In [59]:
theta = np.random.rand(x.shape[1]+1,3)
print(theta)

[[0.91015876 0.35429841 0.0609197 ]
 [0.22499069 0.50222546 0.63316599]
 [0.34574705 0.40390574 0.29546395]
 [0.44786027 0.99338298 0.35706383]
 [0.61927458 0.91201326 0.79858073]]


### Now we will put the values of theta and training data to the sigmoid function which made in the start which will be further used to calculate the cost of the funnction and then gradient .


In [60]:
h = sigmoid(np.dot(x_train,theta))
##print(h)

### We have give learning rate value , for different values we will get the different results you can try this by changing the learning rate values

In [61]:
alpha = 0.12 ## learning rate 

In [62]:
m = x_train.shape[0]
print(m)

120


### For Training dataset we are calculating theta values

In [63]:

for i in range (0,1000):
    h = sigmoid(np.dot(x_train,theta)) ## sigmoid function update
    theta -= ((alpha)/m)*np.dot((h-(y_train)).T,x_train).T ## theta update
    
    ## Calculation of theta is explained above in the part of gradient decent please refer that part for any doubts
    


In [64]:
print("For Training Dataset")
h = sigmoid(np.dot(x_train,theta))
print(np.argmax(h,axis=1)) ## max value in each row of h array of sigmoid function

q=np.argmax(y_train,axis=1)==np.argmax(h,axis=1) ## check how many output we got correct

print(np.sum(q*100/m)) ## percentage or accuracy

For Training Dataset
[1 1 1 2 2 2 0 0 2 1 1 1 1 2 0 1 1 2 1 1 1 0 1 0 0 0 2 0 2 0 0 2 0 1 2 1 2
 1 2 2 2 0 0 0 0 2 2 0 0 2 0 2 0 0 1 0 1 1 2 1 1 2 1 2 1 2 2 2 0 1 2 0 1 1
 2 0 2 2 1 2 0 0 2 2 1 0 2 2 2 2 1 0 2 1 0 1 0 0 2 1 2 2 1 0 1 1 1 2 0 0 2
 1 1 2 2 1 2 0 0 2]
94.16666666666669


## Overfitting In Model Training
### Overfitting refers to a model that models the training data too well. This means we are getting a good output for training dataset but not for the test dataset.

### Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the noise or random fluctuations in the training data is picked up and learned as concepts by the model. The problem is that these concepts do not apply to new data and negatively impact the models ability to generalize.

### Overfitting is more likely with nonparametric and nonlinear models that have more flexibility when learning a target function. As such, many nonparametric machine learning algorithms also include parameters or techniques to limit and constrain how much detail the model learns.

### For example, decision trees are a nonparametric machine learning algorithm that is very flexible and is subject to overfitting training data. This problem can be addressed by pruning a tree after it has learned in order to remove some of the detail it has picked up.

## UnderFitting In Model Training
### Underfitting refers to a model that can neither model the training data nor generalize to new data. This means the if we are not getting a good accuracy for training as well as test data set then this is called underfitting.

### An underfit machine learning model is not a suitable model and will be obvious as it will have poor performance on the training data.

### Underfitting is often not discussed as it is easy to detect given a good performance metric. The remedy is to move on and try alternate machine learning algorithms. Nevertheless, it does provide a good contrast to the problem of overfitting.

In [65]:
##print(theta)

### Now we will test our test data set and see the accuracy 

In [66]:
print( "for test data set")
h = sigmoid(np.dot(x_test,theta)) ## sigmoid function update
##print(h)
##   theta -= alpha/m*np.dot((h-(y_test)).T,x_test).T ## theta update
    
##h = sigmoid(np.dot(x_test,theta))
print(np.argmax(h,axis=1))
q1=np.argmax(y_test,axis=1)==(np.argmax(h,axis=1))
##print(q1)
np.sum(q1*100/30)

for test data set
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2]


99.99999999999997

### Here we got a accuracy of 99 percent this means our trained  model is optimal as the provided learning rate is also optimal . Now go to learning rate block and change the learning rate and see what changes you observed now.

## A Good Fit in Machine Learning
### Ideally, you want to select a model at the sweet spot between underfitting and overfitting.
### This is the goal, but is very difficult to do in practice.

### To understand this goal, we can look at the performance of a machine learning algorithm over time as it is learning a training data. We can plot both the skill on the training data and the skill on a test dataset we have held back from the training process.

### Over time, as the algorithm learns, the error for the model on the training data goes down and so does the error on the test dataset. If we train for too long, the performance on the training dataset may continue to decrease because the model is overfitting and learning the irrelevant detail and noise in the training dataset. At the same time the error for the test set starts to rise again as the model’s ability to generalize decreases.

### The sweet spot is the point just before the error on the test dataset starts to increase where the model has good skill on both the training dataset and the unseen test dataset.

### You can perform this experiment with your favorite machine learning algorithms. This is often not useful technique in practice, because by choosing the stopping point for training using the skill on the test dataset it means that the testset is no longer “unseen” or a standalone objective measure. Some knowledge (a lot of useful knowledge) about that data has leaked into the training procedure.

### There are two additional techniques you can use to help find the sweet spot in practice: resampling methods and a validation dataset.

## To Limit Overfitting
### Both overfitting and underfitting can lead to poor model performance. But by far the most common problem in applied machine learning is overfitting.

### Overfitting is such a problem because the evaluation of machine learning algorithms on training data is different from the evaluation we actually care the most about, namely how well the algorithm performs on unseen data.

### There are two important techniques that you can use when evaluating machine learning algorithms to limit overfitting:

### Use a resampling technique to estimate model accuracy.
### Hold back a validation dataset.
### The most popular resampling technique is k-fold cross validation. It allows you to train and test your model k-times on different subsets of training data and build up an estimate of the performance of a machine learning model on unseen data.

### A validation dataset is simply a subset of your training data that you hold back from your machine learning algorithms until the very end of your project. After you have selected and tuned your machine learning algorithms on your training dataset you can evaluate the learned models on the validation dataset to get a final objective idea of how the models might perform on unseen data.

### Using cross validation is a gold standard in applied machine learning for estimating model accuracy on unseen data. If you have the data, using a validation dataset is also an excellent practice.

# For Better understanding you must visualize the dataset and the gradient decent using matplotlib library about which you learnt in the previous assingments.

## Here is your task you must try making a logistic regression model using this data set https://github.com/shubhamkeshri1621/Iris_flower_clssification/blob/master/datasets_228_482_diabetes.csv