# trackintel Similarity Module 
demonstration notebook. run the following cells to get an overview what the similarity module provides.

### import of framework and data

In [None]:
import trackintel as ti

pfs = ti.io.file.read_positionfixes_csv('testtplset.csv')

### preprocessing of the trajectories
When starting from raw tracking data, the following steps have to be performed:
- extract staypoints 
- extract triplegs

the test data are already preprocessed positionfixes. To calculate some similarities between trajectories you always need trajectories as positionfixes. The positionfixes have tripleg_ids to distinguish the trajectories. You can access a single trajectory using command as:

In [None]:
pfs[pfs['tripleg_id']==22]

Let's store two trajectories out of the pfs data frame. 

In [None]:
ta = pfs[pfs['tripleg_id']==22]
tb = pfs[pfs['tripleg_id']==33]

A trajectory distance between these two can be calculated using the methods available in ti.similarity.measures. These are Dynamic Time Warping (DTW) and Edit Distance on Real Sequences (EDR). An algorithm called Start End Distance is also available, but works a bit different. This one is explained later.

The DTW distance of two trajectories can be calculated like this:

In [None]:
ti.similarity.e_dtw(ta,tb)

How can this distance be interpreted? The DTW distance uses a euclidian distance function. So the distance is dependent on the coordinates of the positionfixes. To see this information you can call:

In [None]:
print(pfs.crs)

The crs is empty. To set the initial projection, you can write the EPSG id of the coordinate system in the pfs GeoDataFrame. In this case this would be WGS84 with EPSG id 4326.

In [None]:
pfs.crs='EPSG:4326'

To reproject the data set you could call pfs.to_crs(epsg=1234). To avoid changes in the positionfixes, it is recommended to reproject a copy or to reproject directly when calculating the similarity matrix of the data set. In this example the data is reprojected to CH1903+. To calculate a distance matrix with the DTW method, the following code can be executed.

In [None]:
distmatrix = ti.similarity.similarity_matrix(pfs.to_crs(epsg=2056), 'dtw', dist=True)

The similarity values are stored in 'simmatrix'. To access a value, normal python matrix syntax can be used, the row and column indices correspond to the tripleg_ids.

In [None]:
distmatrix[22,33]

The dtw distance value, compared to the value calculated above, is now in meters. 

The method similarity matrix can also be executed on a positionfixes object. By not setting the dist parameter to True, the trajectory distances will be inverted. This is recommended for large data sets (or data sets with high tripleg ids), as the matrix will not store zero values and be more performant.

In [None]:
simmatrix = pfs.to_crs(epsg=2056).as_positionfixes.similarity_matrix('dtw')

In [None]:
simmatrix[22,33]

Comparing the two matrices, the distance matrix has at each position a stored value, the similarity matrix stores only the relevant values:

In [None]:
print(distmatrix)

In [None]:
print(simmatrix)