# Building a Model To Predict Diabetes
The given dataset lists the glucose level reading of several pregnant women taken either during a survey examination or routine medical care. It specifies if the 2-hour post-load plasma glucose was at least 200 mg/dl.

### I'll perform the following tasks here:
1. I'll find the features of the dataset
2. Find the response label of the dataset
3. I'll create a model to predict the diabetes outcome
4. I'll use training and testing datasets to train the model
5. Finally, I'll check the accuracy of the model.

#### Importing the dataset

In [1]:
#Importing the required libraries
import pandas as pd

In [2]:
#Importing the diabetes dataset
df=pd.read_csv('pima-indians-diabetes.data',header=None)

#### Analyzing the dataset

In [3]:
#Viewing the first five observations of the dataset
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


#### Finding the features of the dataset

In [4]:
#Using the .NAMES file to view and set the features of the dataset
df_names=['Number of times pregnant','Plasma glucose concentration','Diastolic blood pressure','Triceps skin fold thickness','2-Hour serum insulin','Body mass index','Diabetes pedigree function','Age','Class variable']

In [5]:
#Using the feature names set earlier and fix it as the column headers of the dataset
df=pd.read_csv('pima-indians-diabetes.data',header=None,names=df_names)

In [6]:
#Verifing if the dataset is updated with the new headers
df.head()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure,Triceps skin fold thickness,2-Hour serum insulin,Body mass index,Diabetes pedigree function,Age,Class variable
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [7]:
#Viewing the number of observations and features of the dataset
df.shape

(768, 9)

#### Finding the response  of the dataset

In [8]:
#Selecting features from the dataset to create the model
features=['Number of times pregnant','2-Hour serum insulin','Body mass index','Age']

In [9]:
#Creating the feature object
x_features=df[features]

In [10]:
#Creating the reponse object
y_target=df['Class variable']

In [11]:
#Viewing the shape of the feature object
x_features.shape

(768, 4)

In [12]:
#Viewing the shape of the target object
y_target.shape

(768,)

#### Using training and testing datasets to train the model

In [13]:
#Splitting the dataset to test and train the model
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x_features, y_target, random_state=1)

#### Creating a model  to predict the diabetes outcome

In [14]:
# Creating a logistic regression model using the training set
from sklearn.linear_model import LogisticRegression
logreg=LogisticRegression()
logreg.fit(x_train, y_train)

LogisticRegression()

In [15]:
#Making predictions using the testing set
y_predict=logreg.predict(x_test)

#### Checking the accuracy of the model

In [16]:
#Evaluating the accuracy of your model
from sklearn import metrics
print(metrics.accuracy_score(y_test, y_predict))

0.6927083333333334


In [17]:
#Printing the first 30 actual and predicted responses
print('Actual:    ', y_test.values[0:30])
print('Predicted: ', y_predict[0:30])

Actual:     [0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 1 1 0 0 0 1 0 1]
Predicted:  [0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]


The task is done. Thank you for checking out my notebook. Regards,
* Rachit Shukla :)