# ML Classification - Network Traffic Analysis
## Part 4 - MODEL TUNING

In this notebook, we load the processed dataset file and use it to tune one of the previously trained classification models.

> Since we already obtained very good accuracy for most models in the previous phase, and for the sake of this demo, we will tune one of the less-performing estimators (Support Vector Classifier).

> **INPUT:** the cleaned and processed dataset csv file.<br>
> **OUTPUT:** an analysis of the model's performance before/after tuning.  

***

### 1. INITIALIZATION

In [11]:
# Import necessary libraries and modules
import pandas as pd
from sklearn.model_selection import StratifiedKFold, GridSearchCV
from sklearn.svm import SVC

### 2. LOADING PROCESSED DATASET

#### Reading dataset file into pandas DataFrame

In [12]:
# Initialize required variables to read the cleaned data file
data_file_location = "..\\data\\processed\\"
data_file_name = "conn.log.labeled_processed"
data_file_ext = ".csv"


# Read the dataset
data_df = pd.read_csv(data_file_location + data_file_name + data_file_ext, index_col=0)

### 3. MODEL TUNING

In [13]:
# Split data into independent and dependent variables
data_X = data_df.drop("label", axis=1)
data_y = data_df["label"]

In [14]:
# Initialize the model
model=SVC()

# Set hyperparameters
parameters={
    'C':[0.1, 1, 10, 100, 1000],
    'gamma':[1, 0.1, 0.01, 0.001, 0.0001],
}

# Initialize cross validation method
cross_validation_folds = StratifiedKFold(n_splits=5, shuffle=True, random_state=0)

# Initialize tuning process
grid = GridSearchCV(
    estimator=model, 
    param_grid=parameters, 
    scoring=['f1','precision','recall'],
    cv=cross_validation_folds,
    verbose=100,
    refit="precision")

# Train the model
grid.fit(data_X, data_y)

# Store performance metrics
results = pd.DataFrame(index=["SVC Base", "SVC Tuned"], columns=["Recall", "Precision", "F1"])
results.iloc[0] = [0.999906, 0.995403, 0.997649] # Results obtained from previous phase
results.iloc[1] = [grid.cv_results_['mean_test_recall'][grid.best_index_], grid.cv_results_['mean_test_precision'][grid.best_index_], grid.cv_results_['mean_test_f1'][grid.best_index_]]
print ("Best Parameters: {}".format(grid.best_params_))

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV 1/5; 1/25] START C=0.1, gamma=1.............................................
[CV 1/5; 1/25] END C=0.1, gamma=1; f1: (test=0.998) precision: (test=0.995) recall: (test=1.000) total time=   0.1s
[CV 2/5; 1/25] START C=0.1, gamma=1.............................................
[CV 2/5; 1/25] END C=0.1, gamma=1; f1: (test=0.997) precision: (test=0.995) recall: (test=1.000) total time=   0.2s
[CV 3/5; 1/25] START C=0.1, gamma=1.............................................
[CV 3/5; 1/25] END C=0.1, gamma=1; f1: (test=0.998) precision: (test=0.996) recall: (test=1.000) total time=   0.2s
[CV 4/5; 1/25] START C=0.1, gamma=1.............................................
[CV 4/5; 1/25] END C=0.1, gamma=1; f1: (test=0.998) precision: (test=0.996) recall: (test=1.000) total time=   0.2s
[CV 5/5; 1/25] START C=0.1, gamma=1.............................................
[CV 5/5; 1/25] END C=0.1, gamma=1; f1: (test=0.998) precision: (test=

In [15]:
# Check and compare results
results

Unnamed: 0,Recall,Precision,F1
SVC Base,0.999906,0.995403,0.997649
SVC Tuned,0.997644,0.998727,0.998185


### 4. RESULT ANALYSIS

- In the tuned SVC model, the recall slightly decreased. This decrease is minimal and still very close to perfect, indicating that the tuning didn't significantly impact the model's ability to capture positive instances.
- On the other side, the precision slightly increased resulting in an small increase in the overall accuracy (F1) of the best performing hyperparameter combination (since we instructed the grid search algorithm to consider precision for the best model indicator).
- In summary, the tuning of the Support Vector Classifier model led to a slight decrease in recall, but it significantly improved the model's F1 score. 
- Given the initial strong performance of the base model, substantial enhancements are typically not anticipated following the tuning process.