## V. Support Vector Machine

Support-vector machines (SVMs) are supervised learning machine learning models widely used for classification and regression tasks. In medical research, SVMs can be used to predict the health status of a patient for a target disease. In this experiment, we are going to train a SVM model to predict Diabetes.

Please download the pima.csv file from Canvas and save it to the same directory as this jupyter notebook, and run the following code to load the datasets.

In [56]:
import os
from csv import reader
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn.svm import LinearSVC

In [57]:
def norm(arr):
    x_max = max(arr)
    x_min = min(arr)
    for i in range(len(arr)):
        arr[i] = (arr[i] - x_min) / (x_max - x_min)
    return arr

In [58]:

# Load training data
with open("train.csv") as f:
    csv_data = reader(f, delimiter=',')
    raw_data = np.array(list(csv_data))

# Preprocess training data
x_train = []
y_train = []
data_count = len(raw_data)
tuple_len = len(raw_data[0])

for i in raw_data:
    temp = norm([int(j) for j in i[0:tuple_len - 2]])
    x_train.append(temp)
    if i[tuple_len - 1] == "yes":
        y_train.append(1)
    else:
        y_train.append(0)


In [59]:

# Load test data
with open("test.csv") as f:
    csv_data = reader(f, delimiter=',')
    raw_data = np.array(list(csv_data))

# Preprocess test data
x_test = []
y_test = []
data_count = len(raw_data)
tuple_len = len(raw_data[0])

for i in raw_data:
    temp = norm([int(j) for j in i[0:tuple_len - 1]])
    x_test.append(temp)



In [60]:
#predict
clf = LinearSVC(loss="hinge", random_state=42).fit(x_train, y_train)
y_test = clf.predict(x_test)
print(y_test)

[0 0 1 0 1 0 1 1 1 1]


In [61]:

# Split dataset
xt_train, xt_test, yt_train, yt_test = train_test_split(x_train, y_train, test_size=0.33, random_state=73)

#accuracy
clf = LinearSVC(loss="hinge", random_state=42).fit(xt_train, yt_train)
print(clf.score(xt_test, yt_test))

1.0
