# **Logistic Regression**

---



**Diabetes Prediction Classifier using Logistic Regression**

Importing the required packages

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

Reading the dataset:

In [None]:
data=pd.read_csv("https://assets.datacamp.com/production/repositories/628/datasets/444cdbf175d5fbf564b564bd36ac21740627a834/diabetes.csv")

In [None]:
data.head()

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


Seperating the data and labels:

In [None]:
X = data.drop('diabetes', axis=1).values
y = data['diabetes'].values

Dividing into test and train data:

In [None]:
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.02, random_state=0)

**Scale Data:**

Standardization of data is the process of transforming data in such a way that the mean of each column becomes zero, and the standard deviation of each column becomes one. Thus,we get same scale for all the columns.

In [None]:
X_train[0], np.mean(X_train[0]), np.std(X_train[0])

(array([  8.   , 179.   ,  72.   ,  42.   , 130.   ,  32.7  ,   0.719,
         36.   ]), 62.552375, 58.112401877605905)

Scaling the columns using StandardScalar:

In [None]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

In [None]:
X_train[0], np.mean(X_train[0]), np.std(X_train[0])

(array([1.22656957, 1.83455624, 0.15021277, 1.34316464, 0.439292  ,
        0.08873678, 0.78589027, 0.24059085]),
 0.7636266406057524,
 0.6029177622495817)

Training the model:

In [None]:
model = LogisticRegression(C=0.05, multi_class='ovr',random_state=0)
model.fit(X_train, y_train)

LogisticRegression(C=0.05, multi_class='ovr', random_state=0)

In [None]:
#Evaluating the model:
X_test = scaler.transform(X_test)
y_pred = model.predict(X_test)

**Performance of training data:**

In [None]:
model.score(X_train, y_train)

0.7699468085106383

**Performance of testing data:**

In [None]:
model.score(X_test, y_test)

1.0

***Confusion Matrix:***

In [None]:
confusion_matrix(y_test, y_pred)

array([[10,  0],
       [ 0,  6]])

***Evaluation metrics of logistic regression***

In [None]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         6

    accuracy                           1.00        16
   macro avg       1.00      1.00      1.00        16
weighted avg       1.00      1.00      1.00        16



### **Conclusion:**Our classification model is working well.