## 2.3 Support Vector Machine
In this task you are suppose to implement several types of support vector machines using a Python library.
The classifcation should be about Abalone sex.
* Download the Abalone dataset https://archive.ics.uci.edu/ml/datasets/abalone
* Predict the three classes: Male, Female and Infant
* Try linear-, polynomial-, Gaussian-, and other kernels. Which works best and why?
* Choose the network architecture with care.
* Train and validate all algorithms.
* Make the necessary assumptions.
* You can be in groups of up to 3.
* Handin: One page report to be delivered at the end of the semester.

## Get and prepare data
* Number of Instances: 4177
* Number of Attributes: 8


* Attribute information:

   Given is the attribute name, attribute type, the measurement unit and a
   brief description.  The number of rings is the value to predict: either
   as a continuous value or as a classification problem.


| Name           |  Data Type | Measure | Description                 |
|----------------|:----------:|:-------:|-----------------------------|
| Sex            |   nominal  |     .   | M, F, and I (infant)        |
| Length         | continuous |    mm   | Longest shell measurement   |
| Diameter       | continuous |    mm   | perpendicular to length     |
| Height         | continuous |    mm   | with meat in shell          |
| Whole weight   | continuous |  grams  | whole abalone               |
| Shucked weight | continuous |  grams  | weight of meat              |
| Viscera weight | continuous |  grams  | gut weight (after bleeding) |
| Shell weight   | continuous |  grams  | after being dried           |
| Rings          |   integer  |      .  | +1.5 gives the age in years |

* Missing Attribute Values: None


In [21]:
import requests
import random
import pandas as pd
import numpy as np
from pprint import pprint

# Get data
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
r = requests.get(data_url)

# Split response into lists of lists
dataset = [line.split(',') for line in r.text.splitlines()]

# Shuffle data
#random.seed(2411)
#random.shuffle(dataset)

pd_data = pd.DataFrame({
    'sex':[row[0] for row in dataset],
    'length':[row[1] for row in dataset],
    'diameter':[row[2] for row in dataset],
    'height':[row[3] for row in dataset],
    'whole-weight':[row[4] for row in dataset],
    'shucked-weight':[row[5] for row in dataset],
    'viscera-weight':[row[6] for row in dataset],
    'shell-weight':[row[7] for row in dataset],
    'rings':[row[8] for row in dataset]
})

pd_data.head(10)

Unnamed: 0,sex,length,diameter,height,whole-weight,shucked-weight,viscera-weight,shell-weight,rings
0,M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,M,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,I,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
5,I,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
6,F,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20
7,F,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
8,M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
9,F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19


In [22]:
# Preprocessing (convert categories to numbers)
from sklearn import preprocessing

le_sex = preprocessing.LabelEncoder()
le_sex.fit(pd_data['sex'].ravel())
pd_data['sex'] = le_sex.transform(pd_data['sex'])

pd_data.head(10)

Unnamed: 0,sex,length,diameter,height,whole-weight,shucked-weight,viscera-weight,shell-weight,rings
0,2,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,2,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,0,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,2,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,1,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7
5,1,0.425,0.3,0.095,0.3515,0.141,0.0775,0.12,8
6,0,0.53,0.415,0.15,0.7775,0.237,0.1415,0.33,20
7,0,0.545,0.425,0.125,0.768,0.294,0.1495,0.26,16
8,2,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
9,0,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19


In [23]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

X=pd_data[['length', 'diameter', 'height', 'whole-weight', 'shucked-weight', 'viscera-weight', 'shell-weight', 'rings']]  # Features
y=pd_data['sex']  # Class

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=109) # 70% training and 30% test

X_train.head(10)
#y_train.head(10)
#y.head(10)

Unnamed: 0,length,diameter,height,whole-weight,shucked-weight,viscera-weight,shell-weight,rings
3995,0.245,0.175,0.055,0.0785,0.04,0.018,0.02,5
637,0.355,0.26,0.085,0.1905,0.081,0.0485,0.055,6
3263,0.565,0.45,0.115,0.9085,0.398,0.197,0.29,17
4024,0.33,0.245,0.065,0.1445,0.058,0.032,0.0505,6
1573,0.48,0.37,0.12,0.536,0.251,0.114,0.15,8
2681,0.62,0.49,0.155,1.1,0.505,0.2475,0.31,9
1557,0.425,0.325,0.11,0.317,0.135,0.048,0.09,8
1657,0.6,0.48,0.165,0.9165,0.4135,0.1965,0.2725,9
2028,0.57,0.435,0.15,0.8295,0.3875,0.156,0.245,10
2055,0.465,0.355,0.09,0.4325,0.2005,0.074,0.1275,9


## Training

In [24]:
#Import support vector machine
from sklearn import svm

C=1.0
gamma = 'auto'

#svm_linear = svm.SVC(kernel='linear', C=C, gamma=gamma).fit(X_train,y_train)
#svm_poly = svm.SVC(kernel='poly', C=C, gamma=gamma).fit(X_train,y_train)
svm_rbf = svm.SVC(kernel='rbf', C=C, gamma=gamma).fit(X_train,y_train)
#svm_sigmoid = svm.SVC(kernel='sigmoid', C=C, gamma=gamma).fit(X_train,y_train)


#y_pred_linear=svm_linear.predict(X_test)
#y_pred_poly=svm_poly.predict(X_test)
y_pred_rbf=svm_rbf.predict(X_test)
#y_pred_sigmoid=svm_sigmoid.predict(X_test)

## Accuracy

In [25]:
#Import scikit-learn metrics module for accuracy calculation
from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
#print("Accuracy (linear):",metrics.accuracy_score(y_test, y_pred_linear))
#print("Accuracy (polynomial):",metrics.accuracy_score(y_test, y_pred_poly))
print("Accuracy (radial basis function):",metrics.accuracy_score(y_test, y_pred_rbf))
#print("Accuracy (sigmoid):",metrics.accuracy_score(y_test, y_pred_sigmoid))


Accuracy (radial basis function): 0.538277511962


Adapted from the tutorial at: https://www.datacamp.com/community/tutorials/svm-classification-scikit-learn-python