## Predictive model to predict lawn mower ownership using SVM classification
We will predict the lawn mower ownership column of Riding Mower dataset


### 1. Setup

In [64]:
# Common imports
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVC

np.random.seed(1)

In [65]:
import os
os.chdir('D:/BAIS/2nd Sem/DSP')
print(os.getcwd())

D:\BAIS\2nd Sem\DSP


### 2. Load the data

In [66]:
# Load the data
mower=pd.read_csv('./data/RidingMowers.csv')
mower.head(3)

Unnamed: 0,Income,Lot_Size,Ownership
0,60.0,18.4,Owner
1,85.5,16.8,Owner
2,64.8,21.6,Owner


In [67]:
# generate a basic summary of the data
mower.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Income     24 non-null     float64
 1   Lot_Size   24 non-null     float64
 2   Ownership  24 non-null     object 
dtypes: float64(2), object(1)
memory usage: 704.0+ bytes


In [68]:
# checking the null values
mower.isna().sum()

Income       0
Lot_Size     0
Ownership    0
dtype: int64

In [69]:
# create a list of these catagorical variables
category_var_list = list(mower.select_dtypes(include='object').columns)
category_var_list

['Ownership']

In [70]:
# explore the categorical variable values - often there are typos here that need to be fixed.
for cat in category_var_list:
    print(f"Category: {cat} Values: {mower[cat].unique()}")

Category: Ownership Values: ['Owner' 'Nonowner']


### 3. Summary the findings from our initial evaluation of the data

* We have 1 categorical variables
* We have 0 variables that have missing values
* There doesn't seem to be a problem with the catogorical class names.

In [71]:
mower.shape

(24, 3)

### 4. Split data (train/test)

In [73]:
# split the data into validation and training set
train_df, test_df = train_test_split(mower, test_size=0.3)
target = 'Ownership'
predictors = list(mower.columns)
predictors.remove(target)

### 5 Standardize numeric values

In [74]:
# create a standard scaler and fit it to the training set of predictors
scaler = preprocessing.StandardScaler()
cols_to_stdize = ['Income', 'Lot_Size']                

# Transform the predictors of training and validation sets
train_df[cols_to_stdize] = scaler.fit_transform(train_df[cols_to_stdize]) # train_predictors is not a numpy array

test_df[cols_to_stdize] = scaler.transform(test_df[cols_to_stdize]) # validation_target is now a series object


In [75]:
train_X = train_df[predictors]
train_y = train_df[target] # train_target is now a series objecttrain_df.to_csv('mower.csv', index=False)
test_X = test_df[predictors]
test_y = test_df[target] # validation_target is now a series object

### 6 Model the data¶
First, we will create a dataframe to hold all the results of our models.

In [76]:
performance = pd.DataFrame({"model": [], "Accuracy": [], "Precision": [], "Recall": [], "F1": []})

###  6.1 it a SVM classification model using linear kernal

In [77]:
svm_lin_model = SVC(kernel="linear")
linear = svm_lin_model.fit(train_X, np.ravel(train_y))

In [78]:
model_preds = svm_lin_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"linear svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.625,0.75,0.6,0.666667


### 6.2 Fit a SVM classification model using rbf kernal

In [79]:
svm_rbf_model = SVC(kernel="rbf", C=10, gamma='scale')
rbf = svm_rbf_model.fit(train_X, np.ravel(train_y))

In [80]:
model_preds = svm_rbf_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"rbf svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.625,0.75,0.6,0.666667
0,rbf svm,0.75,1.0,0.6,0.75


### 6.3 Fit a SVM classification model using polynomial kernal

In [81]:
svm_poly_model = SVC(kernel="poly", degree=3, coef0=1, C=10)
poly = svm_poly_model.fit(train_X, np.ravel(train_y))

In [82]:
model_preds = svm_poly_model.predict(test_X)
c_matrix = confusion_matrix(test_y, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"poly svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.625,0.75,0.6,0.666667
0,rbf svm,0.75,1.0,0.6,0.75
0,poly svm,0.625,1.0,0.4,0.571429


## 7 Summary

Sorted by accuracy, the best models are:

In [83]:
performance.sort_values(by=['Accuracy'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.625,0.75,0.6,0.666667
0,poly svm,0.625,1.0,0.4,0.571429
0,rbf svm,0.75,1.0,0.6,0.75


Sorted by precision, the best models are:

In [84]:
performance.sort_values(by=['Precision'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.625,0.75,0.6,0.666667
0,rbf svm,0.75,1.0,0.6,0.75
0,poly svm,0.625,1.0,0.4,0.571429


Sorted by Recall, the best models are:


In [85]:
performance.sort_values(by=['Recall'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,poly svm,0.625,1.0,0.4,0.571429
0,linear svm,0.625,0.75,0.6,0.666667
0,rbf svm,0.75,1.0,0.6,0.75


Sorted by F1 score, the best models are:


In [86]:
performance.sort_values(by=['F1'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,poly svm,0.625,1.0,0.4,0.571429
0,linear svm,0.625,0.75,0.6,0.666667
0,rbf svm,0.75,1.0,0.6,0.75


### Conclusion: 
#### As we can see here, the SVM classification with rbf kernel is showing the best results with highest accuracy, precision, recall and F1 score. Hence we can say SVM classification with rbf kernel is the best model for predicting the lawnmower owner.

### 8 Save the model to disk

In [93]:
import pickle

# save model
pickle.dump(rbf, open('./data/owner_model.pkl', "wb"))