<a href="https://colab.research.google.com/github/nackjaylor/sydney-innovation-program/blob/main/sip_classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sydney Innovation Program - SVM Exercise

In this exercise, we will be using a support vector machine (SVM) to perform a classification task. In particular, we will be using the Breast Cancer Wisconsin Diagnostic Dataset to make predictions as to whether a patient has breast cancer, from a range of attributes.

This first cell imports the required libraries:
1. Pandas: deals with datastructures
2. Numpy: Numerical Computing Library
3. Matplotlib: Graphing Library
4. Seaborn: Statistical Visualisation Library

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

From the datasets module, load the breast cancer dataset.

In [None]:
from sklearn.datasets import load_breast_cancer

We want to take a look at what this dataset contains, so we print a description.

In [None]:
cancer_dset = load_breast_cancer()
print(cancer_dset.DESCR)

In [None]:
cancer_dset.keys()

In [None]:
df_feat = pd.DataFrame(data=np.c_[cancer_dset['data'], cancer_dset['target']], columns=np.append(cancer_dset['feature_names'], ['target']))
df_feat.head(10)

In [None]:
df_feat.info()
cancer_dset.target_names

In [None]:
sns.pairplot(df_feat, hue = 'target', vars = ['mean radius', 'mean texture', 'mean perimeter', 'mean area', 'mean smoothness', 'mean symmetry'])

In [None]:
plt.figure(figsize=(25,16))
sns.heatmap(df_feat.corr(), annot = True)

## Training an SVM

Now we've taken a look at the data, we can train an SVM!

In [None]:
x = df_feat.drop(['target'],axis=1)
x.head()

In [None]:
y = df_feat['target']
y.head()

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
train_x, test_x, train_y, test_y = train_test_split(x, y, test_size = 0.15, random_state = 16)

In [None]:
print(f'{train_x.shape} for training')
print(f'{test_x.shape} for testing')


In [None]:
from sklearn.svm import SVC

### FOR YOU TO DO
Experiment With the kernel! Choose out of linear, poly, sigmoid, rbf, or precomputed.

In [None]:
svm_model = SVC(kernel='linear')

In [None]:
svm_model.fit(train_x, train_y)

In [None]:
predict_y = svm_model.predict(test_x)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
cm = np.array(confusion_matrix(test_y, predict_y, labels=[1,0]))

In [None]:
confusion = pd.DataFrame(cm, index=['is_malignant', 'is_benign'], columns=['predicted_malignant', 'predicted_benign'])
sns.heatmap(confusion, annot=True)

In [None]:
print(classification_report(test_y, predict_y))