Here we are going to see a simple Classification Task

Dataset is available here - https://archive.ics.uci.edu/ml/machine-learning-databases/00529/.


### Contents
1. Data Load
2. Preprocess
3. Model
4. Prediction
5. Summary

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as  plt

In [2]:
data = pd.read_csv('diabetes_data_upload.csv', header = 0)

In [3]:
data.head()

Unnamed: 0,Age,Gender,Polyuria,Polydipsia,sudden weight loss,weakness,Polyphagia,Genital thrush,visual blurring,Itching,Irritability,delayed healing,partial paresis,muscle stiffness,Alopecia,Obesity,class
0,40,Male,No,Yes,No,Yes,No,No,No,Yes,No,Yes,No,Yes,Yes,Yes,Positive
1,58,Male,No,No,No,Yes,No,No,Yes,No,No,No,Yes,No,Yes,No,Positive
2,41,Male,Yes,No,No,Yes,Yes,No,No,Yes,No,Yes,No,Yes,Yes,No,Positive
3,45,Male,No,No,Yes,Yes,Yes,Yes,No,Yes,No,Yes,No,No,No,No,Positive
4,60,Male,Yes,Yes,Yes,Yes,Yes,No,Yes,Yes,Yes,Yes,Yes,Yes,Yes,Yes,Positive


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 520 entries, 0 to 519
Data columns (total 17 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Age                 520 non-null    int64 
 1   Gender              520 non-null    object
 2   Polyuria            520 non-null    object
 3   Polydipsia          520 non-null    object
 4   sudden weight loss  520 non-null    object
 5   weakness            520 non-null    object
 6   Polyphagia          520 non-null    object
 7   Genital thrush      520 non-null    object
 8   visual blurring     520 non-null    object
 9   Itching             520 non-null    object
 10  Irritability        520 non-null    object
 11  delayed healing     520 non-null    object
 12  partial paresis     520 non-null    object
 13  muscle stiffness    520 non-null    object
 14  Alopecia            520 non-null    object
 15  Obesity             520 non-null    object
 16  class               520 no

### Preprocess

1. Data has no null values. 
2. Most of the column values are 'Yes' and 'No'. Here we will replace them with 1 and 0. 
3. Similarly Gender and our target column 'class' are also treated the same.

After these steps, data is ready for the model

In [5]:
data = data.replace(to_replace = ['Yes','No'], value = [0,1])
data = data.replace(to_replace = ['Male','Female'], value = [0,1])
data = data.replace(to_replace = ['Positive','Negative'], value = [0,1])
data.head()

Unnamed: 0,Age,Gender,Polyuria,Polydipsia,sudden weight loss,weakness,Polyphagia,Genital thrush,visual blurring,Itching,Irritability,delayed healing,partial paresis,muscle stiffness,Alopecia,Obesity,class
0,40,0,1,0,1,0,1,1,1,0,1,0,1,0,0,0,0
1,58,0,1,1,1,0,1,1,0,1,1,1,0,1,0,1,0
2,41,0,0,1,1,0,0,1,1,0,1,0,1,0,0,1,0
3,45,0,1,1,0,0,0,0,1,0,1,0,1,1,1,1,0
4,60,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0


### Model Build 

In [6]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [7]:
X = data.drop(columns=['class'])
y = data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y , test_size = 0.2, random_state=42)

#### Logistic Regression

In [8]:
log = LogisticRegression(max_iter=1000)
log.fit(X_train, y_train)

## Predictions and Evaluations
preds = log.predict(X_test)
accuracy_score(y_test, preds)

0.9230769230769231

#### Support Vector Machines

In [9]:
svc = SVC()
svc.fit(X_train, y_train)

## Predictions and Evaluations
preds1 = svc.predict(X_test)
accuracy_score(y_test, preds1)

0.6826923076923077

### Summary

Diabetes classification example, Logistic Regression achieved 92% accuracy whereas SVC got us 68%