# Logistic Regression

1. Logistic Regression is a supervised learning Model here supervised learning model means it is actually using labelled data.
2. It is a Classification Model - which we predict will be a : yes or no,cat or dog ,0 or 1.
3. Best for Binary Classification Problem which means that it is ahving only 2 classes
4. It uses Sigmoid Function i.e  yhat = 1/1+e^-Z   Z = w.X + b. Where Yhat = Probability that (y=1),in different way we can write Yhat = p(y=1 | X), X = input features, w = weights (the number of weights equal to number of features in the dataset), b = bias, Yhat = sigmoid(Z), if we put the values in the sigmoid function we will get a output from 0 to 1.
5. So if the Yhat value is less than 0.5 we can say that its True or if the value greater than 0.5 it False its upto us even we can say that at the time of predicting the cat or dog. we can say that p > 0.5 then it  is dog if its less its cat  


### Advantages of Logistic Regression
- Easy to implement
- Perform well on a data with linear relationship
- less prone to overfitting for low dimensional dataset

### Disadvantages of Logistic Regression
- High dimensional dataset causing overfitting
- Difficult to capture complex relationships in a dataset
- Sensitive to Outliers 
- Needs a large dataset

## Math Intution behind Logistic Regression


### yhat = 1/1+e^-Z ,  Z = w.X + b

#### If we are getting a large positive  Z value, then the  yhat value will be  1

#### if we are getting a large negative Z value, when we execute the sigmoid function the minus and minus becomes + and we will get a very large positive number in the function so the function will return 0 

## Loss Function

Loss function measures how far an estimated value is from its true value.

Formula 

- Loss = 1/n Σ(yi - yhati)^2

By using this above function we will get multiple local minimum or multiple curves in the function so we will not be getting a correct prediction or good optimum model so here we use another loss funciton for logistic regression.

## Loss function for Logistic Regression
Loss Function is the measure which is used to find the difference between the actual value and the predicted value if the value is very large then we can say that it is having high loss so we need to reduce the loss by using optimisation techniques like Gradient Descent

#### Binary Cross Entropy Loss Function or Log Loss:
##### Formula : L(y,yhat) = -(y*log(yhat)+(1-y)log(1-yhat))
- Loss function (L) mainly applies for a single training set as compared to the Cost Function
- Cost function (J) which deals with a penalty for a number of training sets for the complete batch

##### Formula for Cost Function:  J(w,b) = 1/m Σ( L (y(i) ,yhat(i) )) = - 1 / m  Σ(yi log yhat i + (1-yi) log (1-yhat i ))   , m denotes the number of data points in the training set

# Gradient Descent for Logistic Regression

Gradietnt Descent is an ooptimization algorithm used for minimizing the cost function in various machine learning algorithms. It is used for updating the parameters of the learning model.

w2 = w1 - L*dw

b2 = b1 - L*db

dw = 1/m * (yhat-y).X ,   x is the input feature
db = 1/m * (yhat - y)

W --> Weight
b --> bias
L --> Learning Rate

dw --> Partial Derivative of cost function with respect to w 
db --> Partial Derivative of cost function with respect to b

### Learning Rate 
Learning rate is a tuning parameter in an organization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function

### Formula for Building a Logistic Regression Model 
1. Sigmoid Function : yhat = 1/1+e^-Z , Z = w.X + b
2. Updating weights through Gradient Descent : w2 = w1 - L*dw , b2 = b1 - L*db
3. Derivatives i.e dw,db : dw = 1/m * (yhat-y).X , x is the input feature , db = 1/m * (yhat - y)

## Building Logistic Regression Model From Scratch

In [1]:
#Importing the Dependencies
import numpy as np

### Workflow of Logistic Regression

Step 1. Set Learning Rate & Number of Iterations; Initiate Random weight and bias value

Step 2. Build Logistic Regression Function

Step 3. Update the parameters using Gradient Descent

     Finally we will get the best model(best weight and bias value) as it has minimum cost function.

Step 4. Build the predict function to determine the class of the new data point 

### Logistic Regresssion

In [58]:
class Logistic_Regression():
    
    #Declaring Learning rate and Number of Iterations(These parameters are called as Hyperparameters)
    def __init__(self,learning_rate,no_of_iterations):
        self.learning_rate = learning_rate
        self.no_of_iterations = no_of_iterations
        
        
    # Fit function to train the model with dataset    
    def fit(self,X,Y):
        
        
        
        #number of data points in the dataset (number of rows) --> m
        #number of input features in the dataset (number of columns) --> n 
        self.m,self.n = X.shape 
        
        #Initiating weight and bias value
        self.w = np.zeros(self.n)
        
        self.b = 0
        
        self.X = X
        
        self.Y = Y
        
        #implementing Gradient descent for optimization
        for i in range(self.no_of_iterations):
            self.update_weights()
        
    def update_weights(self):
        
        #Y_hat formula(sigmoid function)
        Y_hat = 1 / (1+np.exp(- self.X.dot(self.w)+self.b))  # Z = w.x + b
        
        #Derivatives
        dw = (1/self.m)*np.dot(self.X.T, (Y_hat - self.Y))
        db = (1/self.m)*np.sum(Y_hat - self.Y)
        
        
        #updating the weights & bias using gradient descent
        self.w = self.w - self.learning_rate*dw 
        self.b = self.b - self.learning_rate*db
        
    
    
    #Sigmoid Equation and Decision Boundary
    def predict(self,X):
        
        Y_pred = 1 / (1+np.exp(- (X.dot(self.w)+self.b)))
        
        Y_pred = np.where(Y_pred > 0.5 , 1, 0)
        
        return Y_pred 
        
        
        

## Implementing Logistic Regression from Scratch

In [59]:
#Importing the Dependencies
import pandas as pd 
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

### Data Collection and Analysis

In [60]:
#Loading the dataset to the pandas dataframe
diabetes_dataset = pd.read_csv(r'E:\ML\diabetes.csv')

In [61]:
#Printing the first 5 rows of the dataset
diabetes_dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [62]:
# No of rows and columns in the dataset
diabetes_dataset.shape

(768, 9)

In [63]:
diabetes_dataset.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [64]:
diabetes_dataset['Outcome'].value_counts()

0    500
1    268
Name: Outcome, dtype: int64

#### 0 --> Non Diabetic
#### 1 --> Diabetic

In [65]:
diabetes_dataset.groupby('Outcome').mean()

Unnamed: 0_level_0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
Outcome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,3.298,109.98,68.184,19.664,68.792,30.3042,0.429734,31.19
1,4.865672,141.257463,70.824627,22.164179,100.335821,35.142537,0.5505,37.067164


In [66]:
#Seperating the data and the labels
features = diabetes_dataset.drop(columns='Outcome',axis=1)
target = diabetes_dataset['Outcome']

In [67]:
#Checking the features 
features

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33
...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63
764,2,122,70,27,0,36.8,0.340,27
765,5,121,72,23,112,26.2,0.245,30
766,1,126,60,0,0,30.1,0.349,47


In [68]:
#Checking the target
target

0      1
1      0
2      1
3      0
4      1
      ..
763    0
764    0
765    0
766    1
767    0
Name: Outcome, Length: 768, dtype: int64

## Data Standardization

In [69]:
scaler = StandardScaler()

In [70]:
scaler.fit(features)

In [71]:
#Transforming every features into similar values
standardized_data = scaler.transform(features)
print(standardized_data)

[[ 0.63994726  0.84832379  0.14964075 ...  0.20401277  0.46849198
   1.4259954 ]
 [-0.84488505 -1.12339636 -0.16054575 ... -0.68442195 -0.36506078
  -0.19067191]
 [ 1.23388019  1.94372388 -0.26394125 ... -1.10325546  0.60439732
  -0.10558415]
 ...
 [ 0.3429808   0.00330087  0.14964075 ... -0.73518964 -0.68519336
  -0.27575966]
 [-0.84488505  0.1597866  -0.47073225 ... -0.24020459 -0.37110101
   1.17073215]
 [-0.84488505 -0.8730192   0.04624525 ... -0.20212881 -0.47378505
  -0.87137393]]


In [72]:
features = standardized_data
target = diabetes_dataset['Outcome']

### Train Test Split

In [73]:
X_train, X_test, Y_train, Y_test = train_test_split(features,target, test_size = 0.2, random_state=2)

In [74]:
print(features.shape,X_train.shape,X_test.shape)

(768, 8) (614, 8) (154, 8)


## Training the Model

In [75]:
classifier = Logistic_Regression(learning_rate=0.01,no_of_iterations=1000)

In [76]:
#Training the support vector machine classifier
classifier.fit(X_train,Y_train)

# Model Evaluation

#### Accuracy Score

In [83]:
#Accuracy Score on the training data
X_train_prediction = classifier.predict(X_train)
print(X_train_prediction.shape)
training_data_accuracy = accuracy_score(Y_train,X_train_prediction)

(614,)


In [84]:
print(training_data_accuracy)

0.6726384364820847


In [80]:
#Accuracy Score of testing data
X_test_prediction = classifier.predict(X_test)

print(X_test_prediction.shape)
testing_data_accuracy = accuracy_score(Y_test,X_test_prediction)

(154,)


In [81]:
print(testing_data_accuracy)

0.7402597402597403


### Making a Predictive System

In [85]:
#Giving a new input according to the dataset to predict
input_data = (5,165,72,19,175,25.8,0.587,45)

In [86]:
#Changing the input data to numpy array
input_data_as_numpy_array = np.asarray(input_data)

In [87]:
#Reshape the array as we are predicting a instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)
print(input_data_reshaped)

[[  5.    165.     72.     19.    175.     25.8     0.587  45.   ]]


In [88]:
# standardize the input data
std_data = scaler.transform(input_data_reshaped)
print(std_data)

[[ 0.3429808   1.38037527  0.14964075 -0.09637905  0.82661621 -0.78595734
   0.34768723  1.00055664]]




In [89]:
prediction = classifier.predict(std_data)
print(prediction)

[0]


In [90]:
if (prediction[0] == 0):
      print('The person is not diabetic')
else:
      print('The person is diabetic')

The person is not diabetic
