## Setup

First we will import necessary libraries.

In [30]:
from pandas import read_csv, DataFrame
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn import svm
from sklearn.neural_network import MLPClassifier

Now we will read the data and eliminate rows with invalid values.

In [3]:
data = DataFrame(read_csv("processed.cleveland.data", header = None, na_values ="?")).dropna()

Note that the variable we're interested in predicting is not binary. It is 0 when the patient does not have heart disease, and ranges from 1 to 4 depending on the severity of the disease. Since we're only interested in predicting whether the patient has the disease or not, we will convert all values from 2 to 4 into 1.

In [4]:
data[13].mask(data[13] > 0, 1, inplace = True)

Before we can start building the models, we will split the data set into a training set, containing 70% of observations, and a testing set, containing 30% of observations. The training set will be used to train the models, and the testing set will be used to test their accuracy.

In [14]:
X = data.iloc[:, 0:13]
y = data.iloc[:, 13]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)

## Support vector machines

We are ready to start building the support vector machines.

### Linear

We start with a linear kernel.

A linear kernel SVM has one important hyper-parameter: *C*, or cost. We can use a grid search to test different values of *C* and select the one that produces best results.

In [15]:
svm_linear = GridSearchCV(svm.SVC(kernel = "linear"), [{"C": [0.01, 0.1, 0.5, 1, 10, 100]}]).fit(X_train, y_train)

The SVM can now be used to predict whether a patient has heart disease based on the given variables.

In [16]:
svm_linear.predict(X_test)

array([0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0,
       1, 0], dtype=int64)

We can also determine its accuracy on the training set.

In [17]:
svm_linear.score(X_test, y_test)

0.8333333333333334

We see that we obtain an accuracy of 83.33%

### Radial

Now we will build a support vector machine with a radial kernel.

We will tune the hyper-parameters like we did with the linear kernel, but we will also use a *gamma* value since we're using a radial kernel.

In [24]:
svm_rbf = GridSearchCV(svm.SVC(kernel = "rbf"), [{"C": [0.01, 0.1, 0.5, 1, 10, 100], "gamma": [0.01, 0.1, 1, 5]}]).fit(X_train, y_train)

As we did before, we can use the model to make predictions using the testing set.

In [25]:
svm_rbf.predict(X_test)

array([0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0,
       1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0,
       0, 0], dtype=int64)

Finally, we test its accuracy.

In [26]:
svm_rbf.score(X_test, y_test)

0.6

We obtain an accuracy value of 60%.

### Sigmoid

Finally, we will build a sigmoid kernel SVM. Just like with the radial kernel, we will tune *C* and *gamma*.

In [27]:
svm_sigmoid = GridSearchCV(svm.SVC(kernel = "sigmoid"), [{"C": [0.01, 0.1, 0.5, 1, 10, 100], "gamma": [0.01, 0.1, 1, 5]}]).fit(X_train, y_train)

Now we can use the SVM to make predictions.

In [28]:
svm_sigmoid.predict(X_test)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0], dtype=int64)

Then we check its accuracy.

In [29]:
svm_sigmoid.score(X_test, y_test)

0.5666666666666667

The accuracy for the sigmoid kernel SVM is 56.67%.

## Multilayer perceptrons

Now we will build multilayer perceptrons using various parameters.

### Rectified linear unit function

The rectified linear unit is an activation function that returns f(x) = max(0, x).

#### SGD optimizer

We will build two MLPs using the stochastic gradient descent optimizer and two different learning rates.

##### SGD - learning rate 0.01

In [34]:
mlp_relu_sgd2 = MLPClassifier(activation = 'relu', solver = 'sgd', learning_rate_init = 0.01).fit(X_train, y_train)
mlp_relu_sgd2.score(X_test, y_test)

0.5666666666666667

With these paremeters, the MLP has an accuracy of 56.67% on the testing data.

##### SGD - learning rate 0.001

In [35]:
mlp_relu_sgd3 = MLPClassifier(activation = 'relu', solver = 'sgd', learning_rate_init = 0.001).fit(X_train, y_train)
mlp_relu_sgd3.score(X_test, y_test)

0.6555555555555556

For these parameters the accuracy is 65.56%.

#### Adam optimizer

Similarly, we will build two MLPs using Adam, an extension of SGD.

##### Adam - learning rate 0.01

In [36]:
mlp_relu_adam2 = MLPClassifier(activation = 'relu', solver = 'adam', learning_rate_init = 0.01).fit(X_train, y_train)
mlp_relu_adam2.score(X_test, y_test)

0.6111111111111112

We can see Adam performs much better, with a 61.11% accuracy on the testing data.

##### Adam - learning rate 0.001

In [37]:
mlp_relu_adam3 = MLPClassifier(activation = 'relu', solver = 'adam', learning_rate_init = 0.001).fit(X_train, y_train)
mlp_relu_adam3.score(X_test, y_test)

0.8333333333333334

Learning rate 0.001 produces an accuracy of 83.33%, the highest so far.

### Hyperbolic tangent function

We will repeat this process using the hyperbolic tangent f(x) = tanh(x) activation function.

#### SGD optimizer

We use the same learning rates as before.

##### SGD - learning rate 0.01

In [38]:
mlp_tanh_sgd2 = MLPClassifier(activation = 'tanh', solver = 'sgd', learning_rate_init = 0.01).fit(X_train, y_train)
mlp_tanh_sgd2.score(X_test, y_test)

0.43333333333333335

The test accuracy is not very good, at 43.33%.

##### SGD - learning rate 0.001

In [39]:
mlp_tanh_sgd3 = MLPClassifier(activation = 'tanh', solver = 'sgd', learning_rate_init = 0.001).fit(X_train, y_train)
mlp_tanh_sgd3.score(X_test, y_test)

0.5666666666666667

This time the accuracy is 56.67%.

#### Adam optimizer

Now we use the Adam optimizer.

##### Adam - learning rate 0.01

In [40]:
mlp_tanh_adam2 = MLPClassifier(activation = 'tanh', solver = 'adam', learning_rate_init = 0.01).fit(X_train, y_train)
mlp_tanh_adam2.score(X_test, y_test)

0.6444444444444445

The testing accuracy is better (64.44%) than when using the rectified linear unit function, but the difference is likely not large enough to consider statistically significant given the testing set's size.

##### Adam - learning rate 0.001

In [41]:
mlp_tanh_adam3 = MLPClassifier(activation = 'tanh', solver = 'adam', learning_rate_init = 0.001).fit(X_train, y_train)
mlp_tanh_adam3.score(X_test, y_test)

0.6666666666666666

With a 66.68% accuracy, the smaller learning rate didn't perform much better.