# Loading of Datasets

Based on the optimised parameters that we have derived from the data mining process, we will determine the training and testing times of each classifier we have chosen.

The datasets that we have chosen are:
- `noncir_ss_scaled_trimmed_cir_pca_ss_scaled.pkl`: Non-Cir (Standard Scaled after feature selection) with CIR Statistical Measures (PCA and Standard Scaled)
- `noncir_ss_scaled_trimmed_cir_ss_scaled.pkl`: Non-CIR (Standard Scaled after feature selection) with CIR Statistical Measures (Standard Scaled)
- `noncir_ss_scaled_trimmed_cir_pca.pkl`: Non-CIR (Standard Scaled after feature selection) with CIR Statistical Measures (PCA)

The datasets will be loaded into `dataset_1`, `dataset_2`, `dataset_3` respectively.

In [1]:
from utils import *
import pandas as pd
import numpy as np
import time


dataset_1 = load_from_pickle("noncir_ss_scaled_trimmed_cir_pca_ss_scaled.pkl")
dataset_2 = load_from_pickle("noncir_ss_scaled_trimmed_cir_ss_scaled.pkl")
dataset_3 = load_from_pickle("noncir_ss_scaled_trimmed_cir_pca.pkl")


# Support Vector Machine (SVM)

## Linear SVM

### Dataset_1 (noncir_ss_scaled_trimmed_cir_pca_ss_scaled)

Evaluation of the training and testing time of `dataset_1`. With a 70:30 training and test split

In [3]:
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

# Select the features to be used for Support Vector Classification
X = dataset_1.drop(columns = 'NLOS')
Y = dataset_1[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [3]:
clf = SVC(kernel = 'linear', C = 0.01, random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  6.014267921447754


In [4]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  1.4445459842681885


### Dataset_2 (noncir_ss_scaled_trimmed_cir_ss_scaled)

Evaluation of the training and testing time of `dataset_2`. With a 70:30 training and test split

In [5]:
# Select the features to be used for Support Vector Classification
X = dataset_2.drop(columns = 'NLOS')
Y = dataset_2[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [6]:
# Create a SVM Classifier with the kernel of linear for linear hyperplane
clf = SVC(kernel = 'linear', C = 0.001, random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  144.48278498649597


In [7]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  19.74851965904236


## Radial Basis Function

### Dataset_1 (noncir_ss_scaled_trimmed_cir_pca_ss_scaled)

Evaluation of the training and testing time of `dataset_1`. With a 70:30 training and test split

In [8]:
# Select the features to be used for Support Vector Classification
X = dataset_1.drop(columns = 'NLOS')
Y = dataset_1[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [9]:
# Create a SVM Classifier with the kernel of linear for linear hyperplane
clf = SVC(kernel = 'rbf', C = 0.1, gamma = 'auto', random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  9.284297704696655


In [10]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  6.838964223861694


### Dataset_2 (noncir_ss_scaled_trimmed_cir_ss_scaled)

Evaluation of the training and testing time of `dataset_2`. With a 70:30 training and test split

In [11]:
# Select the features to be used for Support Vector Classification
X = dataset_2.drop(columns = 'NLOS')
Y = dataset_2[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [12]:
# Create a SVM Classifier with the kernel of linear for linear hyperplane
clf = SVC(kernel = 'rbf', C = 0.1, gamma = 'auto', random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  186.82493042945862


In [13]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  83.8489122390747


### Dataset_3 (noncir_ss_scaled_trimmed_cir_pca)

Evaluation of the training and testing time of `dataset_3`. With a 70:30 training and test split

In [14]:
# Select the features to be used for Support Vector Classification
X = dataset_3.drop(columns = 'NLOS')
Y = dataset_3[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [15]:
clf = SVC(kernel = 'rbf', C = 0.0001, gamma = 'auto', random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  42.361613750457764


In [16]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  28.251997470855713


# MLP Classifier

### Dataset_1 (noncir_ss_scaled_trimmed_cir_pca_ss_scaled)

Evaluation of the training and testing time of `dataset_1`. With a 70:30 training and test split

In [3]:
# Select the features to be used for Support Vector Classification
X = dataset_1.drop(columns = 'NLOS')
Y = dataset_1[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [4]:
# Create a SVM Classifier with the kernel of linear for linear hyperplane
clf = MLPClassifier(hidden_layer_sizes = (5, 5, 5), activation = 'relu', learning_rate = 'constant', solver = 'adam', max_iter = 1000, random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  5.480480670928955


In [5]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  0.0042514801025390625


### Dataset_2 (noncir_ss_scaled_trimmed_cir_ss_scaled)

Evaluation of the training and testing time of `dataset_2`. With a 70:30 training and test split

In [6]:
# Select the features to be used for Support Vector Classification
X = dataset_2.drop(columns = 'NLOS')
Y = dataset_2[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [7]:
clf = MLPClassifier(hidden_layer_sizes = (2, 2), activation = 'relu', learning_rate = 'constant', solver = 'adam', max_iter = 1000, random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  36.62067103385925


In [8]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  0.03354620933532715


### Dataset_3 (noncir_ss_scaled_trimmed_cir_pca)

Evaluation of the training and testing time of `dataset_3`. With a 70:30 training and test split

In [10]:
clf = MLPClassifier(hidden_layer_sizes = (10, 10, 10), activation = 'relu', learning_rate = 'constant', solver = 'adam', max_iter = 1000, random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Training Time: ", time.time() - t0)

Training Time:  6.067868947982788


In [11]:
t0 = time.time()
clf.predict(x_test)
print("Testing Time: ", time.time() - t0)

Testing Time:  0.003998517990112305


# Random Forest

In [12]:
PURITY_MEASURE = ['entropy', 'gini']

### Dataset_1 (noncir_ss_scaled_trimmed_cir_pca_ss_scaled)

Evaluation of the training and testing time of `dataset_1`. With a 70:30 training and test split

In [13]:
# Select the features to be used for Support Vector Classification
X = dataset_1.drop(columns = 'NLOS')
Y = dataset_1[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [14]:
clf = RandomForestClassifier(max_depth = 10, criterion = PURITY_MEASURE[0], random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Entropy Training Time: ", time.time() - t0)


clf2 = RandomForestClassifier(max_depth = 10, criterion = PURITY_MEASURE[1], random_state = RANDOM_STATE)
t0 = time.time()
clf2.fit(x_train, y_train)
print("Gini Training Time: ", time.time() - t0)

Entropy Training Time:  13.483222961425781
Gini Training Time:  10.056505680084229


In [15]:
t0 = time.time()
clf.predict(x_test)
print("Entropy Testing Time: ", time.time() - t0)

t0 = time.time()
clf2.predict(x_test)
print("Gini Testing Time: ", time.time() - t0)

Entropy Testing Time:  0.054071903228759766
Gini Testing Time:  0.05740928649902344


### Dataset_2 (noncir_ss_scaled_trimmed_cir_ss_scaled)

Evaluation of the training and testing time of `dataset_2`. With a 70:30 training and test split

In [16]:
# Select the features to be used for Support Vector Classification
X = dataset_2.drop(columns = 'NLOS')
Y = dataset_2[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [17]:
clf = RandomForestClassifier(max_depth = 10, criterion = PURITY_MEASURE[0], random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Entropy Training Time: ", time.time() - t0)



clf2 = RandomForestClassifier(max_depth = 10, criterion = PURITY_MEASURE[1], random_state = RANDOM_STATE)
t0 = time.time()
clf2.fit(x_train, y_train)
print("Gini Training Time: ", time.time() - t0)

Entropy Training Time:  34.38670086860657
Gini Training Time:  31.081051349639893


In [18]:
t0 = time.time()
clf.predict(x_test)
print("Entropy Testing Time: ", time.time() - t0)

t0 = time.time()
clf2.predict(x_test)
print("Gini Testing Time: ", time.time() - t0)

Entropy Testing Time:  0.13614916801452637
Gini Testing Time:  0.13197803497314453


### Dataset_3 (noncir_ss_scaled_trimmed_cir_pca)

Evaluation of the training and testing time of `dataset_3`. With a 70:30 training and test split

In [19]:
# Select the features to be used for Support Vector Classification
X = dataset_3.drop(columns = 'NLOS')
Y = dataset_3[['NLOS']].to_numpy()
Y = Y.reshape(-1)

# Split dataset into 70% training and 30% test
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = TEST_SIZE, random_state = RANDOM_STATE)

In [20]:
clf = RandomForestClassifier(max_depth = 10, criterion = PURITY_MEASURE[0], random_state = RANDOM_STATE)
t0 = time.time()
clf.fit(x_train, y_train)
print("Entropy Training Time: ", time.time() - t0)



clf2 = RandomForestClassifier(max_depth = 10, criterion = PURITY_MEASURE[1], random_state = RANDOM_STATE)
t0 = time.time()
clf2.fit(x_train, y_train)
print("Gini Training Time: ", time.time() - t0)

Entropy Training Time:  13.614278316497803
Gini Training Time:  10.226720094680786


In [21]:
t0 = time.time()
clf.predict(x_test)
print("Entropy Testing Time: ", time.time() - t0)

t0 = time.time()
clf2.predict(x_test)
print("Gini Testing Time: ", time.time() - t0)

Entropy Testing Time:  0.05671858787536621
Gini Testing Time:  0.0545496940612793
