Introduction to SVM

Use SVM to build and train model using human cell records, and classify cells to whether the samples are benign (mild state) or malignant (evil state). SVM works by mapping data to a high-dimension feature space so that data points can be categorized, even when the data are not otherwise linearly separable. (This gets done by the kernel function of SVM classifier). A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane.

Necessary Imports

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Load Data from csv(Comma Separated File) File

In [2]:
cell_df=pd.read_csv('lung_cancer_examples.csv')

In [3]:
cell_df.shape

(59, 7)

In [4]:
cell_df.size

413

In [5]:
cell_df.count

<bound method DataFrame.count of            Name      Surname  Age  Smokes  AreaQ  Alkhol  Result
0          John         Wick   35       3      5       4       1
1          John  Constantine   27      20      2       5       1
2        Camela     Anderson   30       0      5       2       0
3          Alex       Telles   28       0      8       1       0
4         Diego     Maradona   68       4      5       6       1
5     Cristiano      Ronaldo   34       0     10       0       0
6        Mihail          Tal   58      15     10       0       0
7         Kathy        Bates   22      12      5       2       0
8        Nicole       Kidman   45       2      6       0       0
9           Ray      Milland   52      18      4       5       1
10      Fredric        March   33       4      8       0       0
11          Yul      Brynner   18      10      6       3       0
12         Joan     Crawford   25       2      5       1       0
13         Jane        Wyman   28      20      2       8 

In [7]:
cell_df['Name'].value_counts()

Katharine      4
Sidney         2
Maggie         2
Gregory        2
John           2
Jane           2
Glenda         2
Jack           2
Ernest         1
Ray            1
Richard        1
Jessica        1
Peter          1
Charlize       1
Marlon         1
Diane          1
Anna           1
Camela         1
Sissy          1
Alec           1
Faye           1
Gene           1
John           1
Henry          1
Barbra         1
Dustin         1
Halle          1
Mihail         1
Fredric        1
Alex           1
Charlton       1
Barbra         1
Cristiano      1
Rex            1
Joan           1
Diego          1
Ellen          1
Robert         1
Nicole         1
Maximilian     1
Yul            1
Sally          1
Jane           1
Nicole         1
Rod            1
Paul           1
Kathy          1
Lee            1
Gwyneth        1
Name: Name, dtype: int64

In [8]:
cell_df['Result'].value_counts()

0    31
1    28
Name: Result, dtype: int64

Distribution of Classes

In [9]:
benign_df=cell_df[cell_df['Result']=='0'][0:200]
malignant_df=cell_df[cell_df['Result']=='1'][0:200]

Identifying Unwanted Rows

In [10]:
cell_df.dtypes

Name       object
Surname    object
Age         int64
Smokes      int64
AreaQ       int64
Alkhol      int64
Result      int64
dtype: object

In [11]:
cell_df.columns

Index(['Name', 'Surname', 'Age', 'Smokes', 'AreaQ', 'Alkhol', 'Result'], dtype='object')

In [12]:
feature_df=cell_df[['Age', 'Smokes', 'AreaQ', 'Alkhol']]

In [13]:
x=np.asarray(feature_df)

In [14]:
y=np.asarray(cell_df['Result'])

In [15]:
x[0:5]

array([[35,  3,  5,  4],
       [27, 20,  2,  5],
       [30,  0,  5,  2],
       [28,  0,  8,  1],
       [68,  4,  5,  6]], dtype=int64)

In [16]:
y[0:5]

array([1, 1, 0, 0, 1], dtype=int64)

Divide the Data as Train/Test dataset

In [17]:
from sklearn.model_selection import train_test_split

In [18]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=4)

In [19]:
#47 rows and 4 columns
x_train.shape

(47, 4)

In [20]:
#47 rows
y_train.shape

(47,)

In [22]:
#12 rows and 4 columns
x_test.shape

(12, 4)

In [53]:
#114 rows
y_test.shape

(114,)

Modeling (SVM with Scikit-learn)

In [54]:
from sklearn import svm
classifier=svm.SVC(kernel='linear',gamma='auto',C=2)

In [55]:
classifier.fit(x_train,y_train)

SVC(C=2, gamma='auto', kernel='linear')

In [56]:
y_predict=classifier.predict(x_test)

Evaluation (Results)

In [57]:
from sklearn.metrics import classification_report
print(classification_report(y_test,y_predict))

              precision    recall  f1-score   support

           B       0.99      0.93      0.95        80
           M       0.85      0.97      0.90        34

    accuracy                           0.94       114
   macro avg       0.92      0.95      0.93       114
weighted avg       0.94      0.94      0.94       114

