# Diabetes Prediction Using SVM
This project implements Support Vector Machine (SVM) for predicting the likelihood of diabetes based on various health parameters. The dataset consists of health metrics such as glucose level, blood pressure, BMI, and more, along with a binary target variable indicating the presence or absence of diabetes. After preprocessing the data and splitting it into training and testing sets, an SVM classifier is trained on the training data and evaluated for accuracy. The trained model is then used to make predictions on new data points, allowing for the identification of individuals at risk of diabetes.

In [None]:
# Importing libraries
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.metrics import accuracy_score

In [None]:
# Mounting Gdrive
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
# Loading data from gdrive
dataset = pd.read_csv('/content/gdrive/MyDrive/Colab Notebooks/Datasets/diabetes.csv')

In [None]:
dataset.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
# Number of rows and columns
dataset.shape

(768, 9)

In [None]:
dataset.describe()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [None]:
dataset['Outcome'].value_counts()

Outcome
0    500
1    268
Name: count, dtype: int64

In [None]:
dataset.groupby('Outcome').mean()

Unnamed: 0_level_0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
Outcome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,3.298,109.98,68.184,19.664,68.792,30.3042,0.429734,31.19
1,4.865672,141.257463,70.824627,22.164179,100.335821,35.142537,0.5505,37.067164


In [None]:
# Saparate the data
X = dataset.drop(columns='Outcome', axis =1)
Y = dataset['Outcome']

In [None]:
print(X.shape, Y.shape)

(768, 8) (768,)


Data Standardization

In [None]:
scaler = StandardScaler()

In [None]:
scaler.fit(X)

In [None]:
standard_data = scaler.transform(X)

In [None]:
standard_data.std()

1.0

In [None]:
X = standard_data


Data split

In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size =0.2, stratify = Y, random_state=2)

In [None]:
print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)

(614, 8) (154, 8) (614,) (154,)


Training of model

In [None]:
classifier = svm.SVC(kernel='linear')

In [None]:
# training SVM
classifier.fit(X_train, Y_train)

Testing of model


In [None]:
# Accuracy score on training data
X_train_accuracy = classifier.predict(X_train)
training_data_accuracy = accuracy_score(X_train_accuracy, Y_train)

In [None]:
print("Accuracy of the model is : ", training_data_accuracy )

Accuracy of the model is :  0.7866449511400652


In [None]:
# Accuracy score on training data
X_test_accuracy = classifier.predict(X_test)
testing_data_accuracy = accuracy_score(X_test_accuracy, Y_test)

In [None]:
print("Accuracy of the model is : ", testing_data_accuracy )

Accuracy of the model is :  0.7727272727272727


Making a prediction

In [None]:
input_data = (1,103,30,38,83,43.3,0.183,33)

np_data = np.asarray(input_data)

np_data_reshaped = np_data.reshape(1,-1)

std_data = scaler.transform(np_data_reshaped)

print(std_data)

[[-0.84488505 -0.56004775 -2.02166474  1.09545411  0.02778979  1.43512945
  -0.87244072 -0.0204964 ]]




In [None]:
prediction = classifier.predict(std_data)

print(prediction)

[0]


0 -> Non Diabetic <br>
1 -> Diabetic

In [None]:
if prediction[0]==1:
  print("Person is diabetic")
else:
  print("Person is Not diabetic")

Person is Not diabetic
