# Heart Disease Risk Predictions

## Data Used: UCI Heart Disease Dataset
This directory contains 4 databases concerning heart disease diagnosis.
   All attributes are numeric-valued.  The data was collected from the
   four following locations:

     1. Cleveland Clinic Foundation
     2. Hungarian Institute of Cardiology, Budapest
     3. V.A. Medical Center, Long Beach, CA
     4. University Hospital, Zurich, Switzerland

## Number of Instances: 
####        Database:    # of instances:
          1. Cleveland: 303
          2. Hungarian: 294
          3. Switzerland: 123
          4. Long Beach VA: 200
      
      

# Attribute Information:
      1. age:age in years       
      2. sex:(1 = male; 0 = female)       
      3. cp:chest pain type
          -- Value 1: typical angina
          -- Value 2: atypical angina
          -- Value 3: non-anginal pain
          -- Value 4: asymptomatic
      4. trestbps: resting blood pressure  
      5. chol:cholestoral      
      6. fbs:(fasting blood sugar > 120 mg/dl)  (1 = true; 0 = false)    
      7. restecg:
          -- Value 0: normal
          -- Value 1: having ST-T wave abnormality 
          -- Value 2: showing probable or definite left ventricular hypertrophy
      8. thalach:maximum heart rate achieved
      9. exang:exercise induced angina (1 = yes; 0 = no)     
      10. oldpeak:ST depression induced by exercise relative to rest   
      11. slope:the slope of the peak exercise ST segment
        -- Value 1: upsloping
        -- Value 2: flat
        -- Value 3: downsloping     
      12. ca: number of major vessels (0-3) colored by flourosopy        
      13. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect 
      14. category:diagnosis of heart disease[0-4]       (the predicted attribute)


## Class Distribution:
        Database:      0   1   2   3   4 Total
          Cleveland: 164  55  36  35  13   303
          Hungarian: 188  37  26  28  15   294
        Switzerland:   8  48  32  30   5   123
      Long Beach VA:  51  56  41  42  10   200

In [1]:
import pandas
import numpy
import matplotlib.pyplot as plt

In [2]:
df=pandas.read_csv('Preprocessed/data_combined.csv')
print df[:15]

    AGE  SEX  CP THRESTBPS CHOL FBS RESTECG THALACH EXANG OLDPEAK SLOPE CA  \
0    63    1   1       145  233   1       2     150     0     2.3     3  0   
1    67    1   4       160  286   0       2     108     1     1.5     2  3   
2    67    1   4       120  229   0       2     129     1     2.6     2  2   
3    37    1   3       130  250   0       0     187     0     3.5     3  0   
4    41    0   2       130  204   0       2     172     0     1.4     1  0   
5    56    1   2       120  236   0       0     178     0     0.8     1  0   
6    62    0   4       140  268   0       2     160     0     3.6     3  2   
7    57    0   4       120  354   0       0     163     1     0.6     1  0   
8    63    1   4       130  254   0       2     147     0     1.4     2  1   
9    53    1   4       140  203   1       2     155     1     3.1     3  0   
10   57    1   4       140  192   0       0     148     0     0.4     2  0   
11   56    0   2       140  294   0       2     153     0     1.

In [3]:
print df.dtypes

AGE           int64
SEX           int64
CP            int64
THRESTBPS    object
CHOL         object
FBS          object
RESTECG      object
THALACH      object
EXANG        object
OLDPEAK      object
SLOPE        object
CA           object
THAL         object
CATEGORY      int64
dtype: object


In [4]:
print df['CATEGORY'].value_counts()

0    411
1    265
2    109
3    107
4     28
Name: CATEGORY, dtype: int64


## Missing Attribute Values(WEKA TOOL)
1. THRESTBPS(6%)
2. RESTECG(2 values)
2. CHOL(3%)
3. FBS(10%)
4. THALAC(6%)
5. EXANG(6%)
5. OLDPEAK(7%)
6. SLOPE(34%)
7. CA(66%)
8. THAL(53%)

## Replacing missing values for THERESTBPS

In [5]:
print df['THRESTBPS'].value_counts().head()

120    131
130    115
140    102
110     59
?       59
Name: THRESTBPS, dtype: int64


In [6]:
#average rest blood pressure is  generally in range 120-140
df['THRESTBPS'] = df['THRESTBPS'].replace(['?'],'120')
df['THRESTBPS'] = df['THRESTBPS'].astype('int64')

## Replacing missing values for FBS

In [7]:
#print df.columns
print df['FBS'].value_counts()
print "male:\n",df[df['SEX']==1]['FBS'].value_counts()
print "Female:\n",df[df['SEX']==0]['FBS'].value_counts()#directly replace with 0

0    692
1    138
?     90
Name: FBS, dtype: int64
male:
0    528
1    119
?     79
Name: FBS, dtype: int64
Female:
0    164
1     19
?     11
Name: FBS, dtype: int64


In [8]:
#randomly filling values with 80% with 0 and 20% with 1s
v=df.FBS.values=='?'
df.loc[v, 'FBS'] = numpy.random.choice(('0','1'), v.sum(), p=(0.8,0.2))
print df['FBS'].value_counts()
df['FBS']=df['FBS'].astype('int64')

0    768
1    152
Name: FBS, dtype: int64


# Replacing missing values in CHOL

In [9]:
df['CHOL'].value_counts().head()
#evenly distributed...
#so will replace with mean of the class

0      172
?       30
254     10
220     10
216      9
Name: CHOL, dtype: int64

In [10]:
df['CHOL']=df['CHOL'].replace('?','-69')#temporarily replacing ? with -69
df['CHOL']=df['CHOL'].astype('int64')
k=int(df[df['CHOL']!=-69]['CHOL'].mean())
df['CHOL']=df['CHOL'].replace(-69,k)


print df['CHOL'].unique() #completed !--!

[233 286 229 250 204 236 268 354 254 203 192 294 256 263 199 168 239 275
 266 211 283 284 224 206 219 340 226 247 167 230 335 234 177 276 353 243
 225 302 212 330 175 417 197 198 290 253 172 273 213 305 216 304 188 282
 185 232 326 231 269 267 248 360 258 308 245 270 208 264 321 274 325 235
 257 164 141 252 255 201 222 260 182 303 265 309 307 249 186 341 183 407
 217 288 220 209 227 261 174 281 221 205 240 289 318 298 564 246 322 299
 300 293 277 214 207 223 160 394 184 315 409 244 195 196 126 313 259 200
 262 215 228 193 271 210 327 149 295 306 178 237 218 242 319 166 180 311
 278 342 169 187 157 176 241 131 132 161 173 194 297 292 339 147 291 358
 412 238 163 280 202 328 129 190 179 272 100 468 320 312 171 365 344  85
 347 251 287 156 117 466 338 529 392 329 355 603 404 518 285 279 388 336
 491 331 393   0 153 316 458 384 349 142 181 310 170 369 165 337 333 139
 385]


## Replacing missing values in RESTECG

In [11]:
print df['RESTECG'].value_counts()

#replacing with max occuring value for attribute
df['RESTECG']=df['RESTECG'].replace('?','0')
#print df['RESTECG'].unique()
#print df['RESTECG'].value_counts()
df['RESTECG'] = df['RESTECG'].astype('int64')



print "after replacing\n",df['RESTECG'].value_counts()

0    551
2    188
1    179
?      2
Name: RESTECG, dtype: int64
after replacing
0    553
2    188
1    179
Name: RESTECG, dtype: int64


## Replacing missing values in THALACH

In [12]:
df['THALACH'].value_counts().head()

?      55
150    43
140    41
120    35
130    30
Name: THALACH, dtype: int64

In [13]:
df['THALACH']=df['THALACH'].replace('?','-69')#temporarily replacing ? with -69
df['THALACH']=df['THALACH'].astype('int64')
k=int(df[df['THALACH']!=-69]['THALACH'].mean())
print k
df['THALACH']=df['THALACH'].replace(-69,k)

137


In [14]:
df['THALACH'].value_counts().head()

137    60
150    43
140    41
120    35
130    30
Name: THALACH, dtype: int64

## Replacing missing values in EXANG

In [15]:
#exang:exercise induced angina (1 = yes; 0 = no) 
print df['EXANG'].value_counts()

0    528
1    337
?     55
Name: EXANG, dtype: int64


In [16]:
k=528.0/(337.0+528.0)
print k

0.610404624277


In [17]:
v=df.EXANG.values=='?'
df.loc[v,'EXANG'] = numpy.random.choice(('0','1'), v.sum(), p=(0.61,0.39))
print df['EXANG'].value_counts()
df['EXANG']=df["EXANG"].astype('int64')

0    561
1    359
Name: EXANG, dtype: int64


## Replacing missing values in OLDPEAK

In [18]:
print df['OLDPEAK'].value_counts().head()

0      370
1       83
2       76
?       62
1.5     48
Name: OLDPEAK, dtype: int64


In [19]:
df['OLDPEAK']=df['OLDPEAK'].replace('?','-69')#temporarily replacing ? with -69
df['OLDPEAK']=df['OLDPEAK'].astype('float64')
k=df[df['OLDPEAK']!=-69]['OLDPEAK'].mean()
print k
df['OLDPEAK']=df['OLDPEAK'].replace(-69,numpy.round(k,1))

0.878787878788


In [20]:
print df['OLDPEAK'].value_counts().head()

0.0    370
1.0     83
2.0     76
0.9     66
1.5     48
Name: OLDPEAK, dtype: int64


## SLOPE

In [21]:
print df['SLOPE'].value_counts()

2    345
?    309
1    203
3     63
Name: SLOPE, dtype: int64


In [22]:
#k=203.0/(345.0+203.0+63.0)
#print k

In [23]:
v=df.SLOPE.values=='?'
df.loc[v,'SLOPE'] = numpy.random.choice(('2','1','3'), v.sum(), p=(0.6,0.30,0.10))
print df['SLOPE'].value_counts()
df['SLOPE']=df['SLOPE'].astype('int64')

2    550
1    278
3     92
Name: SLOPE, dtype: int64


## CA

In [24]:
print df["CA"].value_counts()
k=(41.0)/(181+67+41+20)
print k

?    611
0    181
1     67
2     41
3     20
Name: CA, dtype: int64
0.132686084142


In [25]:
v=df.CA.values=='?'
df.loc[v,'CA'] = numpy.random.choice(('0','1','2','3'), v.sum(), p=(0.60,0.20,0.13,0.07))
df['CA']=df['CA'].astype('int64')
print df['CA'].value_counts()

0    549
1    186
2    128
3     57
Name: CA, dtype: int64


## THAL

In [26]:
print df['THAL'].value_counts()
#can't use random walk directly here

?    486
3    196
7    192
6     46
Name: THAL, dtype: int64


In [27]:
print df[df['THAL']=='3']['SEX'].value_counts()
print df[df['THAL']=='7']['SEX'].value_counts()

1    110
0     86
Name: SEX, dtype: int64
1    171
0     21
Name: SEX, dtype: int64


In [28]:
print "THAL:3=====>\n",df[df['THAL']=='3']['CATEGORY'].value_counts()
print "THAL:7=====>\n",df[df['THAL']=='7']['CATEGORY'].value_counts()
print "THAL:6=====>\n",df[df['THAL']=='6']['CATEGORY'].value_counts()

THAL:3=====>
0    138
1     30
2     14
3     12
4      2
Name: CATEGORY, dtype: int64
THAL:7=====>
1    63
3    43
0    38
2    37
4    11
Name: CATEGORY, dtype: int64
THAL:6=====>
1    13
2    12
0    11
3     7
4     3
Name: CATEGORY, dtype: int64


In [29]:
df['THAL']=df['THAL'].replace('?',-1)
'''
df['THAL']=df['THAL'].replace('?',-1)
for row in df.iterrows():
    if row['THAL']==-1 and row['CATEGORY']>=1:
        df.loc[row.Index, 'ifor'] = 7
        
    elif row['THAL']==-1 and row['CATEGORY']==0:
        df.loc[row.Index, 'ifor'] = 3
'''
df.loc[(df['THAL']==-1)&(df['CATEGORY']!=0),'THAL']='7'
#print df['THAL'].value_counts()
df.loc[(df['THAL']==-1)&(df['CATEGORY']==0),'THAL']='3'
print df['THAL'].value_counts()
df['THAL']=df['THAL'].astype('int64')

7    454
3    420
6     46
Name: THAL, dtype: int64


In [30]:
print df.dtypes

AGE            int64
SEX            int64
CP             int64
THRESTBPS      int64
CHOL           int64
FBS            int64
RESTECG        int64
THALACH        int64
EXANG          int64
OLDPEAK      float64
SLOPE          int64
CA             int64
THAL           int64
CATEGORY       int64
dtype: object


In [31]:
dummies = pandas.get_dummies(df["CP"],prefix="CP")
df = df.join(dummies)

dummies = pandas.get_dummies(df["RESTECG"],prefix="RESTECG")
df      = df.join(dummies)

dummies = pandas.get_dummies(df["SLOPE"],prefix="SLOPE")
df      = df.join(dummies)

dummies = pandas.get_dummies(df["THAL"],prefix="THAL")
df      = df.join(dummies)


del df['CP']
del df['RESTECG']
del df['SLOPE']
del df['THAL']

In [32]:
print df.dtypes

AGE            int64
SEX            int64
THRESTBPS      int64
CHOL           int64
FBS            int64
THALACH        int64
EXANG          int64
OLDPEAK      float64
CA             int64
CATEGORY       int64
CP_1           uint8
CP_2           uint8
CP_3           uint8
CP_4           uint8
RESTECG_0      uint8
RESTECG_1      uint8
RESTECG_2      uint8
SLOPE_1        uint8
SLOPE_2        uint8
SLOPE_3        uint8
THAL_3         uint8
THAL_6         uint8
THAL_7         uint8
dtype: object


In [33]:
for g in df.columns:
    if df[g].dtype=='uint8':
        df[g]=df[g].astype('int64')

In [34]:
df.dtypes
df.loc[df['CATEGORY']>0,'CATEGORY']=1

In [35]:
stdcols = ["AGE","THRESTBPS","CHOL","THALACH","OLDPEAK"]
nrmcols = ["CA"]
stddf   = df.copy()
stddf[stdcols] = stddf[stdcols].apply(lambda x: (x-x.mean())/x.std())
stddf[nrmcols] = stddf[nrmcols].apply(lambda x: (x-x.mean())/(x.max()-x.min()))
#stddf[stdcols] = stddf[stdcols].apply(lambda x: (x-x.mean())/(x.max()-x.min()))


for g in stdcols:
    print g,max(stddf[g]),min(stddf[g])
    
for g in nrmcols:
    print g,max(stddf[g]),min(stddf[g])    

AGE 2.49229867231 -2.70681396757
THRESTBPS 3.67440591791 -7.03102349108
CHOL 3.70670590542 -1.82755513195
THALACH 2.56523324659 -3.08339929343
OLDPEAK 5.04825042025 -3.30258023693
CA 0.777898550725 -0.222101449275


In [36]:
print stddf.dtypes

AGE          float64
SEX            int64
THRESTBPS    float64
CHOL         float64
FBS            int64
THALACH      float64
EXANG          int64
OLDPEAK      float64
CA           float64
CATEGORY       int64
CP_1           int64
CP_2           int64
CP_3           int64
CP_4           int64
RESTECG_0      int64
RESTECG_1      int64
RESTECG_2      int64
SLOPE_1        int64
SLOPE_2        int64
SLOPE_3        int64
THAL_3         int64
THAL_6         int64
THAL_7         int64
dtype: object


In [37]:
from sklearn.model_selection import train_test_split


In [38]:
df_copy=stddf.copy()
df_copy=df_copy.drop(['CATEGORY'],axis=1)

dat=df_copy.values
#print dat.shape

print type(dat),dat.shape

<type 'numpy.ndarray'> (920, 22)


In [39]:
labels=df['CATEGORY'].values
print labels[:5],type(labels)

[0 1 1 0 0] <type 'numpy.ndarray'>


In [40]:
x_train,x_test,y_train,y_test=train_test_split(dat,labels, test_size=0.25, random_state=42)

In [41]:
print "x_train:",x_train.shape
print "y_train:",y_train.shape
print
print "x_test:",x_test.shape
print "y_test:",y_test.shape

x_train: (690, 22)
y_train: (690,)

x_test: (230, 22)
y_test: (230,)


In [42]:
#training and testing
#SVM
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=5)
clf.fit(x_train,y_train)
print "SVM:",clf.score(x_test,y_test)*100,"%"
svmpred=clf.predict(x_test)
#print svmpred




#from sklearn.model_selection import cross_val_score
#scores = cross_val_score(clf,dat,labels, cv=5)
#print scores


from sklearn import linear_model
lrcv=linear_model.LogisticRegressionCV(fit_intercept=True,penalty='l2',dual=False)
lrcv.fit(x_train,y_train)
print "Logistic Regression:",lrcv.score(x_test,y_test)*100,"%"


SVM: 89.1304347826 %
Logistic Regression: 88.2608695652 %


In [43]:
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel

clf = ExtraTreesClassifier()
clf = clf.fit(dat,labels)
g=clf.feature_importances_
c=stddf.drop(['CATEGORY'],axis=1).columns

print "Importance of various features"
for k in range(len(c)):
    print c[k],g[k]
    
    
model = SelectFromModel(clf, prefit=True)
X_new = model.transform(dat)
print X_new.shape



from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf,X_new,labels, cv=5)
print "Kfold after\n",scores


tx_train,tx_test,ty_train,ty_test=train_test_split(X_new,labels, test_size=0.25, random_state=42)


tclf = svm.SVC(gamma=0.001, C=5)
tclf.fit(tx_train,ty_train)
print "after feature sel SVM:",tclf.score(tx_test,ty_test)*100,"%"
tsvmpred=tclf.predict(tx_test)
#print tsvmpred


lrcv=linear_model.LogisticRegressionCV(fit_intercept=True,penalty='l2',dual=False)
lrcv.fit(tx_train,ty_train)
print "Logistic Regression:",lrcv.score(tx_test,ty_test)*100,"%"


Importance of various features
AGE 0.0391474092892
SEX 0.0346967566595
THRESTBPS 0.0350830915876
CHOL 0.0554300078477
FBS 0.012461149937
THALACH 0.0615110604613
EXANG 0.0888011350796
OLDPEAK 0.0458873517785
CA 0.0314134796584
CP_1 0.00652806038277
CP_2 0.0200896188842
CP_3 0.0155983697576
CP_4 0.0858083232334
RESTECG_0 0.0151922311442
RESTECG_1 0.0127260395155
RESTECG_2 0.0155222359857
SLOPE_1 0.0137939886724
SLOPE_2 0.0126457440528
SLOPE_3 0.00465370697018
THAL_3 0.173541247828
THAL_6 0.0145746780262
THAL_7 0.204894313248
(920, 7)
Kfold after
[ 0.77297297  0.79347826  0.90217391  0.90217391  0.79781421]
after feature sel SVM: 90.0 %
Logistic Regression: 88.2608695652 %


In [44]:
import keras 
import tensorflow
%matplotlib inline
from sklearn import metrics
import matplotlib.pyplot as plt # side-stepping mpl backend
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers.core import Dropout, Flatten, Activation, Dense
import math

Using TensorFlow backend.


In [45]:
def make_model(activ,opti,ip,layers,trainx,trainy,testx,testy):
    model = Sequential()
    model.add(Dense(layers[0], input_dim=ip, init='uniform', activation=activ))
    model.add(Dense(layers[1], init='uniform', activation=activ))
    model.add(Dense(1, init='uniform', activation=activ))
    model.compile(loss='mse', optimizer=opti, metrics=['accuracy'])
    model.fit(trainx,trainy,epochs=600,batch_size=512,verbose=2,validation_data=(testx,testy))
    
    trainScore = model.evaluate(trainx,trainy, verbose=0)
    print "Train Score: ",100-trainScore[0]*100
    testScore = model.evaluate(testx,testy, verbose=0)
    print "Test Score: ",100-testScore[0]*100
    
    return model
    
    
    
    

In [46]:
#without k best features,sigmoid and rmsprop
m1=make_model('sigmoid','rmsprop',22,[220,22],x_train,y_train,x_test,y_test)

  app.launch_new_instance()


Train on 690 samples, validate on 230 samples
Epoch 1/600
0s - loss: 0.2504 - acc: 0.4710 - val_loss: 0.2470 - val_acc: 0.5826
Epoch 2/600
0s - loss: 0.2483 - acc: 0.5435 - val_loss: 0.2448 - val_acc: 0.5826
Epoch 3/600
0s - loss: 0.2478 - acc: 0.5435 - val_loss: 0.2438 - val_acc: 0.5826
Epoch 4/600
0s - loss: 0.2475 - acc: 0.5435 - val_loss: 0.2438 - val_acc: 0.5826
Epoch 5/600
0s - loss: 0.2473 - acc: 0.5435 - val_loss: 0.2440 - val_acc: 0.5826
Epoch 6/600
0s - loss: 0.2472 - acc: 0.5435 - val_loss: 0.2430 - val_acc: 0.5826
Epoch 7/600
0s - loss: 0.2469 - acc: 0.5435 - val_loss: 0.2434 - val_acc: 0.5826
Epoch 8/600
0s - loss: 0.2465 - acc: 0.5435 - val_loss: 0.2428 - val_acc: 0.5826
Epoch 9/600
0s - loss: 0.2462 - acc: 0.5435 - val_loss: 0.2425 - val_acc: 0.5826
Epoch 10/600
0s - loss: 0.2459 - acc: 0.5435 - val_loss: 0.2419 - val_acc: 0.5826
Epoch 11/600
0s - loss: 0.2454 - acc: 0.5435 - val_loss: 0.2416 - val_acc: 0.5826
Epoch 12/600
0s - loss: 0.2450 - acc: 0.5435 - val_loss: 0.24

In [47]:
pr=m1.predict(x_test)
u=0
k=0
k=0
for u in range(len(x_test)):
    if round(pr[u][0],1)>=0.3 and round(pr[u][0],1)<=0.8:
        g=svmpred[u]
    else:
        g=round(pr[u][0],0)
    if g!=y_test[u]:    
        print "expected",y_test[u],"predicted:",pr[u][0]," ",svmpred[u]
        k=k+1
print "error",k*100/len(y_test)        

expected 1 predicted: 0.216149   0
expected 0 predicted: 0.785701   1
expected 0 predicted: 0.969262   1
expected 0 predicted: 0.569194   1
expected 1 predicted: 0.0498735   0
expected 1 predicted: 0.0791502   0
expected 1 predicted: 0.0690048   0
expected 0 predicted: 0.886327   1
expected 1 predicted: 0.135113   0
expected 1 predicted: 0.583736   0
expected 0 predicted: 0.701358   1
expected 0 predicted: 0.930863   1
expected 1 predicted: 0.10414   0
expected 1 predicted: 0.053342   0
expected 1 predicted: 0.0861178   0
expected 0 predicted: 0.925943   1
expected 1 predicted: 0.357002   0
expected 1 predicted: 0.342572   0
expected 1 predicted: 0.302132   0
expected 0 predicted: 0.939591   1
expected 1 predicted: 0.224161   1
expected 0 predicted: 0.659478   1
expected 0 predicted: 0.933857   1
expected 1 predicted: 0.0497455   0
error 10


In [48]:
print tx_train.shape

(690, 7)


In [49]:
m2=make_model('sigmoid','rmsprop',tx_train.shape[1],[tx_train.shape[1]*50,tx_train.shape[1]*10],tx_train,ty_train,tx_test,ty_test)

  app.launch_new_instance()


Train on 690 samples, validate on 230 samples
Epoch 1/600
0s - loss: 0.2487 - acc: 0.5435 - val_loss: 0.2456 - val_acc: 0.5826
Epoch 2/600
0s - loss: 0.2479 - acc: 0.5435 - val_loss: 0.2460 - val_acc: 0.5826
Epoch 3/600
0s - loss: 0.2474 - acc: 0.5435 - val_loss: 0.2431 - val_acc: 0.5826
Epoch 4/600
0s - loss: 0.2465 - acc: 0.5435 - val_loss: 0.2433 - val_acc: 0.5826
Epoch 5/600
0s - loss: 0.2457 - acc: 0.5435 - val_loss: 0.2409 - val_acc: 0.5826
Epoch 6/600
0s - loss: 0.2446 - acc: 0.5435 - val_loss: 0.2415 - val_acc: 0.5826
Epoch 7/600
0s - loss: 0.2430 - acc: 0.5435 - val_loss: 0.2373 - val_acc: 0.5826
Epoch 8/600
0s - loss: 0.2432 - acc: 0.5826 - val_loss: 0.2352 - val_acc: 0.5826
Epoch 9/600
0s - loss: 0.2407 - acc: 0.5435 - val_loss: 0.2357 - val_acc: 0.5826
Epoch 10/600
0s - loss: 0.2379 - acc: 0.5435 - val_loss: 0.2330 - val_acc: 0.5826
Epoch 11/600
0s - loss: 0.2362 - acc: 0.5551 - val_loss: 0.2296 - val_acc: 0.5826
Epoch 12/600
0s - loss: 0.2372 - acc: 0.5783 - val_loss: 0.22

In [50]:
pr=m2.predict(tx_test)
u=0
k=0
k=0
for u in range(len(tx_test)):
    if round(pr[u][0],1)>=0.3 and round(pr[u][0],1)<=0.8:
        g=tsvmpred[u]
    else:
        g=round(pr[u][0],0)
    if g!=ty_test[u]:    
        print "expected",ty_test[u],"predicted:",pr[u][0]," ",tsvmpred[u]
        k=k+1
print "error",k*100/len(ty_test)        

expected 1 predicted: 0.177789   0
expected 1 predicted: 0.555469   0
expected 0 predicted: 0.486715   1
expected 0 predicted: 0.956782   1
expected 0 predicted: 0.877582   1
expected 1 predicted: 0.0591064   0
expected 1 predicted: 0.160541   0
expected 1 predicted: 0.183873   0
expected 1 predicted: 0.0772032   0
expected 1 predicted: 0.51156   0
expected 0 predicted: 0.613172   1
expected 1 predicted: 0.155944   0
expected 1 predicted: 0.0764589   0
expected 1 predicted: 0.111633   0
expected 0 predicted: 0.841426   1
expected 1 predicted: 0.291515   0
expected 0 predicted: 0.513341   1
expected 1 predicted: 0.219192   0
expected 1 predicted: 0.581296   0
expected 0 predicted: 0.83139   1
expected 1 predicted: 0.211968   1
expected 0 predicted: 0.822785   1
expected 0 predicted: 0.460131   1
expected 1 predicted: 0.0453796   0
error 10


In [51]:
#without k best features,sigmoid and rmsprop
m3=make_model('relu','rmsprop',22,[220,22],x_train,y_train,x_test,y_test)

  app.launch_new_instance()


Train on 690 samples, validate on 230 samples
Epoch 1/600
0s - loss: 0.5381 - acc: 0.4565 - val_loss: 0.5376 - val_acc: 0.4174
Epoch 2/600
0s - loss: 0.4932 - acc: 0.4565 - val_loss: 0.4716 - val_acc: 0.4174
Epoch 3/600
0s - loss: 0.4294 - acc: 0.4565 - val_loss: 0.3912 - val_acc: 0.4174
Epoch 4/600
0s - loss: 0.3526 - acc: 0.4565 - val_loss: 0.3041 - val_acc: 0.4174
Epoch 5/600
0s - loss: 0.2731 - acc: 0.4681 - val_loss: 0.2282 - val_acc: 0.5739
Epoch 6/600
0s - loss: 0.2060 - acc: 0.6203 - val_loss: 0.1729 - val_acc: 0.7739
Epoch 7/600
0s - loss: 0.1613 - acc: 0.7899 - val_loss: 0.1444 - val_acc: 0.8391
Epoch 8/600
0s - loss: 0.1386 - acc: 0.8348 - val_loss: 0.1293 - val_acc: 0.8348
Epoch 9/600
0s - loss: 0.1267 - acc: 0.8493 - val_loss: 0.1216 - val_acc: 0.8391
Epoch 10/600
0s - loss: 0.1198 - acc: 0.8551 - val_loss: 0.1154 - val_acc: 0.8435
Epoch 11/600
0s - loss: 0.1140 - acc: 0.8638 - val_loss: 0.1106 - val_acc: 0.8522
Epoch 12/600
0s - loss: 0.1091 - acc: 0.8652 - val_loss: 0.10

In [52]:
pr=m3.predict(x_test)
u=0
k=0
k=0
for u in range(len(x_test)):
    if round(pr[u][0],1)>=0.3 and round(pr[u][0],1)<=0.8:
        g=svmpred[u]
    else:
        g=round(pr[u][0],0)
    if g!=y_test[u]:    
        print "expected",y_test[u],"predicted:",pr[u][0]," ",svmpred[u]
        k=k+1
print "error",k*100/len(y_test)        

expected 1 predicted: 0.0   1
expected 1 predicted: 0.0   0
expected 1 predicted: 0.0   1
expected 0 predicted: 0.924106   1
expected 0 predicted: 0.728353   1
expected 1 predicted: 0.0   0
expected 1 predicted: 0.0   0
expected 1 predicted: 0.0   0
expected 0 predicted: 1.03506   1
expected 1 predicted: 0.0637085   0
expected 1 predicted: 0.219852   0
expected 0 predicted: 0.625588   1
expected 1 predicted: 0.0   0
expected 1 predicted: 0.031186   0
expected 1 predicted: 0.0   0
expected 0 predicted: 1.00212   1
expected 1 predicted: 0.383882   0
expected 1 predicted: 0.778516   0
expected 1 predicted: 0.466895   0
expected 1 predicted: 0.132854   1
expected 0 predicted: 0.514256   1
expected 0 predicted: 1.07643   0
expected 1 predicted: 0.146861   1
expected 0 predicted: 1.01527   1
expected 0 predicted: 0.817707   1
expected 0 predicted: 0.857912   0
expected 1 predicted: 0.0   0
error 11


# Try K folds on SVM and neural networks

In [65]:
#selecting the best  partition for testing and training....

from sklearn.model_selection import KFold
kf = KFold(n_splits=5)
kf.get_n_splits(dat)

clf = svm.SVC(gamma=0.001, C=5)
score=-9
for train_index, test_index in kf.split(dat):
    print train_index[0],train_index[len(train_index)-1],test_index[0],test_index[len(test_index)-1]
    fX_train, fX_test = dat[train_index],dat[test_index]
    #print fX_train.shape
    #print fy_train.shape
    fy_train, fy_test = labels[train_index],labels[test_index]
    clf.fit(fX_train,fy_train)
    g=clf.score(fX_test,fy_test)*100
    #print g
    if g>score:
        btrain_x,btrain_y,btest_x,btest_y=fX_train,fy_train,fX_test,fy_test
        score=g
        
clf.fit(btrain_x,btrain_y)
print "SVM accuracy best",clf.score(btest_x,btest_y)*100

184 919 0 183
0 919 184 367
0 919 368 551
0 919 552 735
0 735 736 919
SVM accuracy best 94.0217391304


In [59]:
m1=make_model('sigmoid','rmsprop',22,[220,22],btrain_x,btrain_y,btest_x,btest_y)


  app.launch_new_instance()


Train on 736 samples, validate on 184 samples
Epoch 1/600
0s - loss: 0.2488 - acc: 0.6087 - val_loss: 0.2621 - val_acc: 0.3315
Epoch 2/600
0s - loss: 0.2432 - acc: 0.6087 - val_loss: 0.2701 - val_acc: 0.3315
Epoch 3/600
0s - loss: 0.2407 - acc: 0.6087 - val_loss: 0.2766 - val_acc: 0.3315
Epoch 4/600
0s - loss: 0.2392 - acc: 0.6087 - val_loss: 0.2802 - val_acc: 0.3315
Epoch 5/600
0s - loss: 0.2385 - acc: 0.6087 - val_loss: 0.2828 - val_acc: 0.3315
Epoch 6/600
0s - loss: 0.2380 - acc: 0.6087 - val_loss: 0.2856 - val_acc: 0.3315
Epoch 7/600
0s - loss: 0.2376 - acc: 0.6087 - val_loss: 0.2868 - val_acc: 0.3315
Epoch 8/600
0s - loss: 0.2373 - acc: 0.6087 - val_loss: 0.2880 - val_acc: 0.3315
Epoch 9/600
0s - loss: 0.2370 - acc: 0.6087 - val_loss: 0.2891 - val_acc: 0.3315
Epoch 10/600
0s - loss: 0.2367 - acc: 0.6087 - val_loss: 0.2926 - val_acc: 0.3315
Epoch 11/600
0s - loss: 0.2363 - acc: 0.6087 - val_loss: 0.2916 - val_acc: 0.3315
Epoch 12/600
0s - loss: 0.2360 - acc: 0.6087 - val_loss: 0.29

In [60]:
pr=m1.predict(x_test)
u=0
k=0
k=0
for u in range(len(x_test)):
    if round(pr[u][0],1)>=0.3 and round(pr[u][0],1)<=0.8:
        g=svmpred[u]
    else:
        g=round(pr[u][0],0)
    if g!=y_test[u]:    
        print "expected",y_test[u],"predicted:",pr[u][0]," ",svmpred[u]
        k=k+1
#    if round(pr[u][0],0)==y_test[u]:
#        k=k+1
print "accuracy",k*100/len(y_test)

expected 1 predicted: 0.12592   1
expected 1 predicted: 0.229137   0
expected 0 predicted: 0.328931   1
expected 0 predicted: 0.97175   1
expected 0 predicted: 0.944708   1
expected 1 predicted: 0.0793388   0
expected 1 predicted: 0.479475   0
expected 1 predicted: 0.667528   0
expected 0 predicted: 0.642096   1
expected 1 predicted: 0.0855344   0
expected 1 predicted: 0.744842   0
expected 0 predicted: 0.881188   0
expected 0 predicted: 0.904613   1
expected 0 predicted: 0.656147   1
expected 1 predicted: 0.203589   0
expected 1 predicted: 0.0913149   0
expected 1 predicted: 0.131262   0
expected 0 predicted: 0.93853   1
expected 1 predicted: 0.667069   0
expected 0 predicted: 0.668592   1
expected 1 predicted: 0.710148   0
expected 0 predicted: 0.87085   1
expected 0 predicted: 0.727532   1
expected 0 predicted: 0.814858   1
expected 1 predicted: 0.0872949   0
accuracy 10


# The accuracy of SVM has been increased,But no improvements with NN

1. SVM:94.02%
2. Logistic:89%
3. Neural networks:92.5%