## Half Space classifier using LP Solver

The dataset used for this task is the Pima Indians Diabetes Dataset from Kaggle. (https://www.kaggle.com/uciml/pima-indians-diabetes-database) <br>
Number of Instances: 768 Number of Attributes: 8 plus class <br>
Attribute information:
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)

#### Formulation
Maximize ${1}$ with subject to ${<\underline{w}, y_i\underline{x_i}> \ge 1}$

In [73]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [74]:
df=pd.read_csv("diabetes.csv")
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [75]:
y = np.array(df['Outcome'])
X = np.array([list(df.loc[i][:-1]) for i in range(len(df))])

In [76]:
for i in range(len(y)):
  if y[i] == 0:
    y[i]=-1

In [77]:
l = len(y)
r = np.random.permutation(l)
train = int(l*0.7)
test = int(l*0.3)

In [78]:
X_train = X[r[:train]]
y_train = y[r[:train]]
X_test = X[r[-test:]]
y_test = y[r[-test:]]

In [79]:
ans = np.array([[0 for a in range(len(X_train[0]))] for b in range(train)])
for i in range(train):
  for j in range(len(X_train[0])):
    ans[i][j] = y_train[i] * X_train[i][j]

In [80]:
import pulp as p
prob = p.LpProblem('HSLP',p.LpMinimize)
prob+=+1
w = np.array([p.LpVariable('w'+str(i)) for i in range(len(X_train[0]))])
w = np.transpose(w)
# <w, yixi>
inner_prod = list(np.inner(w,ans))
for i in inner_prod:
  prob+=i>=1
status = prob.solve()
print(p.LpStatus[status])
for i in range(len(X_train[0])):
  print('w'+str(i)+' = '+str(p.value(w[i])))

Infeasible
w0 = 0.079442851
w1 = 0.0073533153
w2 = -0.028720918
w3 = 0.001678764
w4 = 0.0017446856
w5 = -0.0090311493
w6 = 0.32774983
w7 = 0.0018702335


In [81]:
w_val = [p.value(w[i]) for i in range(len(w))]
y_pred = list()
for i in range(test):
  if np.inner(w_val, X_test[i])>0:
    y_pred.append(1)
  else :
    y_pred.append(-1)
cclf=0
for i in range(test):
  if y_pred[i]==y_test[i]:
    cclf+=1
print("Accuracy = "+str(cclf/test))

Accuracy = 0.7043478260869566
