# Unsupervised Learning

As we could see, there are many models used in trajectory classification, in this notebook we are going to use and test unsupervised learning models (in case it wasn't obvious enough already).

First of all we are going to load the vectors where the trajectories are described by their characteristics.

In [3]:
import feature_vec as fv

metadata = fv.get_selected_data()
feat_vectors, clss_mask, clss = fv.get_feat_vectors(metadata)

100.00%

## K-Means

We will use the KMeans model with 5 clusters, the idea is that there is one left for each class. In addition, we will try some transformations to the data, we will standardize them and we will try to keep the most important characteristics with a PCA model.

In [13]:
from sklearn.cluster import KMeans
from sklearn.pipeline import make_pipeline, Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

feat_v = StandardScaler().fit_transform(PCA(n_components=30, random_state=0).fit_transform(feat_vectors))

kmeans = KMeans(n_clusters=5,
                n_init=15,
                init='random',
                tol=1e-6, 
                random_state=0,
                verbose=True,
                algorithm='elkan')


Now let's check how pure the clusters are.

In [43]:
import pprint

def count_classes(model):
    count_dict = {i: [0]*5 for i in range(5)}
    clss_count = [0]*5
    for i in range(len(model.labels_)):
        index = model.labels_[i]
        if index == -1: continue
        count_dict[index][clss[i]] += 1
    pprint.pprint(count_dict)

In [44]:
y_pred = kmeans.fit_predict(feat_v)
count_classes(kmeans)

Initialization complete
Iteration 0, inertia 126205.7722822872
Iteration 1, inertia 110903.28182396787
Iteration 2, inertia 110305.38403939131
Iteration 3, inertia 109916.39790080061
Iteration 4, inertia 109561.94322885174
Iteration 5, inertia 109209.03269696262
Iteration 6, inertia 108756.51001883682
Iteration 7, inertia 108500.5487137201
Iteration 8, inertia 108425.01470709372
Iteration 9, inertia 108350.17246006029
Iteration 10, inertia 108276.82062786326
Iteration 11, inertia 108201.4205872605
Iteration 12, inertia 108124.5073752792
Iteration 13, inertia 108063.58137652167
Iteration 14, inertia 108013.56371003226
Iteration 15, inertia 107981.25148018509
Iteration 16, inertia 107955.48395906355
Iteration 17, inertia 107938.9383928439
Iteration 18, inertia 107933.38195554455
Iteration 19, inertia 107926.28958117867
Iteration 20, inertia 107915.77559234077
Iteration 21, inertia 107907.63214180157
Iteration 22, inertia 107902.07879837074
Iteration 23, inertia 107893.19745094104
Iterati

It obviously doesn't look good.

Now, let's perform the tests for homogeneity, the integrity of the clusters and the Silhouette Coefficient.

In [39]:
from sklearn import metrics

# Calculate the homogeneity and completeness of the clusters.
homogeneity = metrics.homogeneity_score(clss, y_pred)
completeness = metrics.completeness_score(clss, y_pred) 

# Calculate the Silhouette coefficient ratio for each sample.
silh = metrics.silhouette_samples(feat_vectors, y_pred)

# Calculate the mean Silhouette coefficient of all data points.
silh_mean = metrics.silhouette_score(feat_vectors, y_pred)

print(homogeneity,
      completeness,
      silh, silh_mean)


0.08166501438023839 0.0892204077790774 [-0.68982217 -0.3659875  -0.89824299 ...  0.36445689 -0.3659875
 -0.3659875 ] -0.42709379881958637


## DBSCAN

We now apply the DBSCAN model as another alternative.

In [40]:
from sklearn.cluster import DBSCAN

Usamos `SatandarScaler` para estandarizar características a media cero y varianza unitaria. Luego aplicamos el algoritmo a los datos.

In [49]:
from sklearn.preprocessing import StandardScaler

X = StandardScaler().fit_transform(PCA(n_components=8).fit_transform(feat_vectors))

dbscan = DBSCAN(eps=0.1, min_samples=14).fit(X)

count_classes(dbscan)

{0: [5, 149, 92, 258, 4],
 1: [0, 0, 21, 0, 0],
 2: [0, 1, 12, 1, 0],
 3: [0, 17, 35, 13, 0],
 4: [0, 0, 0, 0, 0]}


## OPTICS

In [50]:
from sklearn.cluster import OPTICS

optics = OPTICS(min_samples=14).fit(X)

count_classes(optics)

{0: [0, 6, 29, 5, 0],
 1: [0, 2, 0, 26, 1],
 2: [0, 0, 24, 0, 0],
 3: [0, 0, 0, 3, 15],
 4: [0, 0, 0, 0, 0]}
