# Resources

## Support Vector Machine (SVM)
* https://towardsdatascience.com/support-vector-machines-svm-c9ef22815589

## Guide on Support Vector Machine (SVM) Algorithm
* https://www.analyticsvidhya.com/blog/2021/10/support-vector-machinessvm-a-complete-guide-for-beginners/

## Support Vector Machines (SVM): An Intuitive Explanation
* https://medium.com/low-code-for-advanced-data-science/support-vector-machines-svm-an-intuitive-explanation-b084d6238106#:~:text=They%20are%20widely%20used%20in,data%20points%20into%20different%20classes.

## plot_decision_regions: Visualize the decision regions of a classifier
* https://rasbt.github.io/mlxtend/user_guide/plotting/plot_decision_regions/


## Desmos
* https://www.desmos.com/


---
# **Introduction To Machine Learning**
##**Supervised Learning (= classification):**

*   k-Nearest Neighbor (kNN)
*   naive Bayesian (NB)
*   Decision Tree (DT)
*   **Support Vector Machine (SVM)**
---

In Support Vector Machine we separate a data point into class attribute using hyperplane to separate our data.

In this technique, we lot each data item as a point in n-dimensional space(where n is the number of features you have) with the value of a particular coordinate. Then we perform classification by finding the hyperplane that differentiates the two classes very well.


## <font color = #950CDF> Part 1: </font> <font color = #4854E8> Information of Dataset </font>

<b>Diabetes Dataset:</b> This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.
<b> Attribute Information </b> <br>
`Pregnancies:` Number of times pregnant <br>
`Glucose:` Plasma glucose concentration a 2 hours in an oral glucose tolerance test <br>
`BloodPressure:` Diastolic blood pressure (mm Hg) <br>
`SkinThickness:` Triceps skin fold thickness (mm) <br>
`Insulin:` 2-Hour serum insulin (mu U/ml) <br>
`BMI:` Body mass index (weight in kg/(height in m)^2) <br>
`DiabetesPedigreeFunction:` Diabetes pedigree function <br>
`Age:` Age (years) <br>
`Outcome:` Class variable (0 or 1) <br>

https://www.kaggle.com/datasets/mathchi/diabetes-data-set

### <font color = #27C3E5> 1.1: </font> <font color = #41EA46> Import Libraries and Dataset </font>

#### <font color = blue> Import the Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import minimize, fmin_tnc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from collections import Counter

#### <font color = blue>Import the Dataset

In [None]:
df = pd.read_csv("diabetes.csv")
df.head(3)

### <font color = #27C3E5> 1.2: </font> <font color = #41EA46> Data Summary and Visualization </font>

#### <font color = blue>Data shape

In [None]:
print("Number of columns:", df.shape[1])
print("Number of rows: ", df.shape[0])

#### <font color = blue>Data Info

In [None]:
df.info()

#### <font color = blue>Data Description

In [None]:
df.describe()

#### <font color = blue> Visualize the Label Class

In [None]:
plt.style.use('fivethirtyeight')
diabetes = df[df['Outcome'] == 0].shape[0]
non_diabetes = df[df['Outcome'] == 1].shape[0]

label = [diabetes, non_diabetes]
plt.pie(label, labels = ['diabetes', 'non-diabetes'], shadow = True, wedgeprops = {'edgecolor': 'black'},
        autopct = '%4.1f%%', startangle = 90, colors = ['red', 'green'])
plt.tight_layout()
plt.show()

#### <font color = blue> Realtionship of Features

In [None]:
sns.pairplot(df, hue = 'Outcome')  # blue = diabetes, red = non-diabetes class

#### <font color = blue> Check the Correaltion of Features

In [None]:
# Define the figure size
plt.figure(figsize = (16, 9))

# Cutomize the annot
annot_kws={'fontsize':10,                      # To change the size of the font
           'fontstyle':'italic',               # To change the style of font 
           'fontfamily': 'serif',              # To change the family of font 
           'alpha':1 }                         # To change the transparency of the text  


# Customize the cbar
cbar_kws = {"shrink":1,                        # To change the size of the color bar
            'extend':'min',                    # To change the end of the color bar like pointed
            'extendfrac':0.1,                  # To adjust the extension of the color bar
            "drawedges":True,                  # To draw lines (edges) on the color bar
           }

# take upper correlation matrix
matrix = np.triu(df.corr())

# Generate heatmap correlation
ax = sns.heatmap(df.corr(), mask = matrix, cmap = 'rainbow', annot = True, linewidth = 1.5 ,annot_kws= annot_kws, cbar_kws=cbar_kws)

# Set the title etc
plt.title('Correlation Matrix', fontsize = 20)

# Set the size of text
sns.set(font_scale = 1.2)

## <font color = #950CDF> Part 2: </font> <font color = #4854E8> Data Preprocessing </font>

### <font color = #27C3E5> 2.1: </font> <font color = #41EA46> Define Predictor and target Attribute </font>

In [None]:
X = df.iloc[:, :-1]
Y = df.iloc[:, -1]

#### <font color = blue> Predictor Attribute

In [None]:
X.head()

#### <font color = blue> Target Attribute

In [None]:
Y.head()

### <font color = #27C3E5> 2.2: </font> <font color = #41EA46> Dealing with Missing Value </font>

#### <font color = blue> Check the Missing Value

In [None]:
df.isnull().sum()

### <font color = #27C3E5> 2.3: </font> <font color = #41EA46> Feature Scaling </font>

#### <font color = blue> Apply Standard Scaler

In [None]:
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

#### <font color = blue> After apply Standard Scaler

In [None]:
pd.DataFrame(X).head()

### <font color = #27C3E5> 2.4: </font> <font color = #41EA46> Split the Data into Train and Test </font>

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)

#### <font color = blue> Training Data

In [None]:
print("X_test", X_train.shape)
print("X_test", y_train.shape)

#### <font color = blue> Testing Data

In [None]:
print("X_test", X_test.shape)
print("X_test", y_test.shape)

## <font color = #950CDF> Part 4: </font> <font color = #4854E8> Build Support Vector Machine  </font>
In this part, we build Support Vector Machine Classification Model using Scikit-Learn

#### <font color = blue> Import Model from Sklearn

In [None]:
from sklearn.svm import SVC

#### <font color = blue> Initialize the Model

In [None]:
clf = SVC(
          C=1.0,                          # The regularization parameter
          kernel='linear',                # The kernel type used 
       #  degree=3,                       # Degree of polynomial function 
          gamma='scale',                  # The kernel coefficient
          coef0=0.0,                      # If kernel = 'poly'/'sigmoid'
          shrinking=True,                 # To use shrinking heuristic
          probability=False,              # Enable probability estimates
          tol=0.001,                      # Stopping crierion
          cache_size=200,                 # Size of kernel cache
          class_weight=None,              # The weight of each class
          verbose=False,                  # Enable verbose output
          max_iter=- 1,                   # Hard limit on iterations
          decision_function_shape='ovr',  # One-vs-rest or one-vs-one
          break_ties=False,               # How to handle breaking ties
          random_state=None               # Random state of the model
)

#### <font color = blue> Fit the Model

In [None]:
clf.fit(X_train, y_train)

#### <font color = blue> Predict the Test Data</font>

In [None]:
y_pred = clf.predict(X_test)

## <font color = #950CDF> Part 4: </font> <font color = #4854E8> Evaluate the Result </font>
In this part, we evaluate the Support Vector Machine. first we make confusion matrix and we visualize the score following ("Accuracy", "Precision", "TPR", "FPR", "F-Score", "Specificity", "Error" and "Roc Area").

### <font color = #27C3E5> 4.1: </font> <font color = #41EA46> Confusion Matrix</font>

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix_ = confusion_matrix(y_pred, y_test)

#[row, column]
TP = confusion_matrix_[1, 1]        
TN = confusion_matrix_[0, 0]           
FP = confusion_matrix_[0, 1]           
FN = confusion_matrix_[1, 0]

group_names = ['TN','FP','FN','TP']

group_counts = ["{0:0.0f}".format(value) for value in confusion_matrix_.flatten()]

group_percentages = ["{0:.2%}".format(value) for value in confusion_matrix_.flatten()/np.sum(confusion_matrix_)]

labels = [f"{v1}\n{v2}\n{v3}" for v1, v2, v3 in zip(group_names,group_counts,group_percentages)]

labels = np.asarray(labels).reshape(2,2)

sns.heatmap(confusion_matrix_, annot=labels, fmt='', cmap='Greens')

### <font color = #27C3E5> 4.2: </font> <font color = #41EA46>  Evaluate the Results </font>

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, mean_absolute_error, roc_auc_score

#### <font color = blue>4.2.1: Calculate the Results

In [None]:
# Accuracy Score
Accuracy = accuracy_score(y_pred, y_test)
print('Accuracy Score:', Accuracy) 

# Precision Score
Precision = precision_score(y_pred, y_test)
print('Precision Score:', Precision)   

# True positive Rate (TPR) or Sensitivity or Recall
TPR = recall_score(y_pred, y_test)
print('True positive Rate:', TPR)             

# False positive Rate (FPR)
FPR = FP / float(TN + FP)
print('False positive Rate', FPR)                       

# F1 Score or F-Measure or F-Score
F1 = f1_score(y_pred, y_test)
print('F1 Score:', F1)                 

# Specificity
Specificity = TN / (TN + FP)
print('Specificity:', Specificity)                    

# Mean Absolute Error
Error = mean_absolute_error(y_pred, y_test)
print('Mean Absolute Error:', Error)   

# ROC Area
Roc = roc_auc_score(y_pred, y_test)
print('ROC Area:', Roc) 

#### <font color = blue>4.2.2: Visualize the Results

In [None]:
plt.figure(figsize = (12, 5))

result = [Accuracy, Precision, TPR, FPR, F1, Specificity, Error, Roc]
label = ["Accuracy", "Precision", "TPR", "FPR", "F-Score", "Specificity", "Error", "Roc Area"]
colors=[ 'red', 'green', 'blue', 'darkgoldenrod', 'orange', 'purple', 'brown', 'darkcyan']

plt.bar(label, result, color = colors, edgecolor='black')
plt.show()