## Support Vector Classification  
#### Resources  
- [Pierian Training](https://pieriantraining.com/machine-learning-in-python-support-vector-machine-classification/)
- [Data TechNotes](https://www.datatechnotes.com/2020/06/classification-example-with-svc-in-python.html)

In [1]:
import pandas as pd
import numpy as np

In [2]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

In [3]:
df = pd.read_csv('water_potability_AdityaKadiwal.csv')

In [4]:
# remove_null = df.dropna()
fillWithMean = df.apply(lambda col: col.fillna(col.mean()), axis=0)

X = fillWithMean.drop(columns=['Potability'])
y = fillWithMean['Potability']

X_train, X_test, y_train, y_test = train_test_split(
    X,  
    y, 
    test_size=1/3,
    random_state=0)

In [5]:
# X = remove_null.drop(columns=['Potability'])
# y = remove_null['Potability']

In [6]:
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [7]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [8]:
svc = SVC()

In [9]:
svc.fit(X_train, y_train)
score = svc.score(X_train, y_train)
test_score = svc.score(X_test, y_test)
print("Score: ", score)
print("Test Score: ", test_score)

Score:  0.733058608058608
Test Score:  0.6785714285714286


In [10]:
cv_scores = cross_val_score(svc, X_train, y_train)
print("CV average score: %.2f" % cv_scores.mean())

CV average score: 0.67


In [11]:
y_pred = svc.predict(X_test)

In [12]:
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[625  57]
 [294 116]]


### Understanding the confusion matrix  ^^
[Confusion Matrix: How To Use It & Interpret Results [Examples]](https://www.v7labs.com/blog/confusion-matrix-guide)

In [13]:
cr = classification_report(y_test, y_pred)
print(cr)

              precision    recall  f1-score   support

           0       0.68      0.92      0.78       682
           1       0.67      0.28      0.40       410

    accuracy                           0.68      1092
   macro avg       0.68      0.60      0.59      1092
weighted avg       0.68      0.68      0.64      1092



After running examples using code from both resources only one produced readable results.  
It appears the SVC 'kernel' option is most significant. The default 'rbf' increased the predicted  
results from 60% yo 75% when compared to everything else I've tried.  
I need to research this more.  
I've yet to determine the best model and data preprocessing to utilize.

I reviewed the article on understanding the confusion matrix. Now I see that this model  
was pretty good with predicting 'positive' results (204 true to 24 false) but  
not good at predicting 'negative' results (108 false to 64 true).