# Classifiers for feature vectors

As seen in the literature, the models used for trajectory classification are SVM, KNN, DBSCAN and KMEANS. In this notebook we are going to test the effectiveness of these models.

Firstly, we are going to load the vectors where the trajectories are described by their characteristics.

In [8]:
import feature_vec as fv

metadata = fv.get_selected_data()
feat_vectors, clss_mask, clss = fv.get_feat_vectors(metadata)

100.00%

Now, we split the data into 70% for model training and the other 30% for testing.

In [62]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(feat_vectors, clss, train_size=0.70)

## Decision Tree Classifier

In [63]:
from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier(criterion='entropy')
dtc.fit(X_train, y_train)
dtc.score(X_test, y_test)

# 0.81

0.8148774302620456

## Random Forest Classifier

In [64]:
from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(criterion='entropy', max_features='log2', bootstrap=False)
rfc.fit(X_train, y_train)
rfc.score(X_test, y_test)

# 0.88

0.9070160608622148

## K-Nearest Neighbors

In [10]:
from sklearn.neighbors import KNeighborsClassifier

0.6889264581572274

Whith number of neighbors by default for kneighbors queries.

In [73]:
knn = KNeighborsClassifier(weights='distance')
knn.fit(X_train, y_train)
knn.score(X_test, y_test)

# 0.70

0.6990701606086221

The results with the weights parameter with value 'distance' are better.

Whith number of neighbors in 20.

In [69]:
knn = KNeighborsClassifier(weights='distance', n_neighbors=20)
knn.fit(X_train, y_train)
knn.score(X_test, y_test)

# 0.71

0.7379543533389687

In [96]:
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

pipeline = Pipeline([('pca', PCA(n_components = 15)), ('std', StandardScaler()), 
                     ('Random_Forest', RandomForestClassifier())], verbose = True)

pipeline.fit(X_train, y_train)
pipeline.score(X_test, y_test)

[Pipeline] ............... (step 1 of 3) Processing pca, total=   0.1s
[Pipeline] ............... (step 2 of 3) Processing std, total=   0.0s
[Pipeline] ..... (step 3 of 3) Processing Random_Forest, total=   1.3s


0.8402366863905325

## Support Vector Machine

In [11]:
from sklearn.svm import SVC

Standardizing the data and using the rbf kernel.

In [None]:
svm = make_pipeline(StandardScaler(), SVC(kernel = 'rbf', gamma='auto', probability=True))
svm.fit(X_train, y_train)

svm.score(X_test, y_test)

# 0.85

In [66]:
svm = make_pipeline(StandardScaler(), SVC(kernel='poly', degree=3, gamma='scale'))
svm.fit(X_train, y_train)

svm.score(X_test, y_test)

#0.72

0.7565511411665258

In [67]:
svm = make_pipeline(StandardScaler(), SVC(kernel='sigmoid', gamma='auto'))
svm.fit(X_train, y_train)

svm.score(X_test, y_test)

# 0.71

0.7311918850380389

## Neural Networks

In [None]:
# Write our code here XD

Let's evaluate the models

In [95]:
from sklearn.metrics import accuracy_score, roc_auc_score

acc_score = accuracy_score(y_test, y_pred=svm.predict(X_test))
auc_score = roc_auc_score(y_test, svm.predict_proba(X_test)[:], multi_class='ovr')
print(f"Accuracy: {acc_score:0.4f}")
print(f"AUC: {auc_score:0.4f}")

Accuracy: 0.8555
AUC: 0.9676


## KMeans

In [101]:
from sklearn.cluster import KMeans 
from sklearn import metrics

kmeans = KMeans(n_clusters=5,
                n_init=10,
                init='random',
                tol=1e-4, 
                random_state=170,
                verbose=True)

y_pred = kmeans.fit(X=X_train)

print(kmeans.score(X=X_test, y=y_test))

'''
# Calcular la homogeneidad y la integridad de los clusters.
homogeneity = metrics.homogeneity_score(y_test, y_pred)
completeness = metrics.completeness_score(y_, y_pred) 

# Calcular el coeficiente de coeficiente de Silhouette para cada muestra.
s = metrics.silhouette_samples(X, y_pred)

# Calcule el coeficiente de Silhouette medio de todos los puntos de datos.
s_mean = metrics.silhouette_score(X, y_pred) 
''' 

Initialization complete
Iteration 0, inertia 2.4780358113286615e+36
Iteration 1, inertia 2.453453944709991e+36
Iteration 2, inertia 1.278771200133821e+36
Iteration 3, inertia 7.98588464964244e+35
Iteration 4, inertia 3.0059292173441837e+35
Iteration 5, inertia 2.9698281681824833e+35
Iteration 6, inertia 2.935909578057414e+35
Converged at iteration 6: strict convergence.
Initialization complete
Iteration 0, inertia 2.4780358113286524e+36
Iteration 1, inertia 2.3705006336285012e+36
Iteration 2, inertia 1.2787711532082803e+36
Iteration 3, inertia 7.985884649642438e+35
Iteration 4, inertia 3.0139392350703178e+35
Iteration 5, inertia 2.9698281681824833e+35
Iteration 6, inertia 2.935909578057674e+35
Converged at iteration 6: strict convergence.
Initialization complete
Iteration 0, inertia 2.4780358113286683e+36
Iteration 1, inertia 2.4394237262553433e+36
Iteration 2, inertia 1.2787711969985272e+36
Iteration 3, inertia 7.985884649642409e+35
Iteration 4, inertia 3.013564937590225e+35
Iteration

Exception ignored on calling ctypes callback function: <function _ThreadpoolInfo._find_modules_with_dl_iterate_phdr.<locals>.match_module_callback at 0x7f23758cae50>
Traceback (most recent call last):
  File "/home/manoly/.local/lib/python3.8/site-packages/threadpoolctl.py", line 400, in match_module_callback
    self._make_module_from_path(filepath)
  File "/home/manoly/.local/lib/python3.8/site-packages/threadpoolctl.py", line 515, in _make_module_from_path
    module = module_class(filepath, prefix, user_api, internal_api)
  File "/home/manoly/.local/lib/python3.8/site-packages/threadpoolctl.py", line 606, in __init__
    self.version = self.get_version()
  File "/home/manoly/.local/lib/python3.8/site-packages/threadpoolctl.py", line 646, in get_version
    config = get_config().split()
AttributeError: 'NoneType' object has no attribute 'split'


Iteration 7, inertia 8.381650837229101e+33
Iteration 8, inertia 4.9897918247481714e+33
Converged at iteration 8: strict convergence.
-1.1201767083825649e+35


'\n# Calcular la homogeneidad y la integridad de los clusters.\nhomogeneity = metrics.homogeneity_score(y_test, y_pred)\ncompleteness = metrics.completeness_score(y_, y_pred) \n\n# Calcular el coeficiente de coeficiente de Silhouette para cada muestra.\ns = metrics.silhouette_samples(X, y_pred)\n\n# Calcule el coeficiente de Silhouette medio de todos los puntos de datos.\ns_mean = metrics.silhouette_score(X, y_pred) \n'

DBSCAN