# MAT-Similarity: Methods to measure the similarity for Multiple Aspect Trajectory Data \[MAT-Tools Framework\]

Welcome to this tutorial on using the mat-similarity package. In this tutorial, you will learn how to measure similarity between multiple aspect trajectories using the mat-similarity as a python library.

The present package offers a tool to support the user in measuring the similarity between multiple aspect trajectories. It integrates into a unique framework for multiple aspect trajectories and, in general, for multidimensional sequence data mining methods.

Created on Mai, 2024
Copyright (C) 2024, License GPL Version 3 or superior (see LICENSE file)


In [1]:
# Setup and Installation of mat-similarity package
!pip install mat-similarity

[31mERROR: Ignored the following versions that require a different python version: 0.1b0 Requires-Python <3.10[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement mat-similarity (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for mat-similarity[0m[31m
[0m

In [2]:
# Setup and instalattion of other packages of mat framework needed.
!pip install mat-model mat-data



In [3]:
# Setup and Installation basic libraries
# First, ensure you have all the necessary packages installed. If you haven't installed the required packages yet, run the following commands:

!pip install pandas numpy prettytable matplotlib

# importing the necessary libraries 
import pandas as pd
import numpy as np
from prettytable import PrettyTable
from typing import List, Dict, Union
import matplotlib.pyplot as plt



# 1. Loading data
To use helpers for data pre-processing, import from package matdata (dependency: mat-data):

## 1.1 Loading a sample data
a) Lets start by loading FoursquareNYC data:
(For other preprocessing functions, check the docs: https://mat-analysis.github.io/mat-tools/

In [4]:
from matdata.preprocess import *

from matdata.dataset import *
ds = 'mat.FoursquareNYC'
df = load_ds(ds, sample_size=0.25)
df

Loading dataset file: https://github.com/mat-analysis/datasets/tree/main/mat/FoursquareNYC/


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1055k  100 1055k    0     0   828k      0  0:00:01  0:00:01 --:--:--  828k


Spliting Data (class-balanced):   0%|          | 0/193 [00:00<?, ?it/s]

Unnamed: 0,space,time,day,poi,type,root_type,rating,weather,tid,label
0,40.6604738351670 -73.8302910891864,1042,Monday,MTA Subway - Howard Beach/JFK Airport (A),Metro Station,Travel & Transport,-1.0,Clear,128,6
1,40.6086420833785 -73.8190376758575,1179,Monday,MTA Bus - Q53,Beach,Outdoors & Recreation,-1.0,Clear,128,6
2,40.7340555764763 -73.8708472251892,1208,Monday,Queens Center Mall,Shopping Mall,Shop & Service,7.5,Clear,128,6
3,40.7333724746837 -73.8711404741537,1210,Monday,MTA Bus - Q11/Q21/Q29/Q52LTD/Q53LTD/Q59/Q60 - ...,Bus Line,Travel & Transport,-1.0,Clear,128,6
4,40.7631337910326 -73.8752118314646,1273,Monday,"MTABus Q19, Q49 (Astoria Blvd/94th St)",Bus Station,Travel & Transport,-1.0,Clear,128,6
...,...,...,...,...,...,...,...,...,...,...
15267,40.7047332789043 -73.9877378940582,939,Thursday,Miami Ad School Brooklyn,General College & University,College & University,-1.0,Clear,29559,1070
15268,40.6978026652822 -73.9941451630314,483,Friday,Eastern Athletic Club,Gym,Outdoors & Recreation,6.9,Clear,29559,1070
15269,40.6946728967503 -73.9940820360805,794,Friday,Starbucks,Coffee Shop,Food,7.0,Clear,29559,1070
15270,40.7023694709909 -73.9875124790989,1261,Friday,Superfine,American Restaurant,Food,7.6,Clear,29559,1070


## 1.2 Trajectory Objects (Conversions)

You can convert the dataframe into Trajectory objects (and Dataset Descriptor object)

In [5]:
from matmodel.util.parsers import df2trajectory

T, dataset_descriptor = df2trajectory(df, data_desc='./FoursquareNYC.json')

Converting Trajectories:   0%|          | 0/694 [00:00<?, ?it/s]

At now, you can get specifc trajectory object to manipulate:

In [6]:
traj1 = T[1]
traj1.display()

traj2 = T[2]
traj2.display()

𝘛𐄁135 	𝘱1⟨(40.690 -73.982), 2024-01-01 02:25:00, Monday, NYCT Transit Survey Unit, Office, Professional & Other Places, -1.0, Clouds⟩↴
	𝘱2⟨(40.709 -73.991), 2024-01-01 03:21:00, Monday, MTA Subway - Manhattan Bridge (B/D/N/Q), Train, Travel & Transport, -1.0, Clouds⟩↴
	𝘱3⟨(40.828 -73.926), 2024-01-01 23:02:00, Monday, MTA Subway - 161st St/Yankee Stadium (4/B/D), Metro Station, Travel & Transport, -1.0, Clouds⟩↴
	𝘱4⟨(40.709 -73.991), 2024-01-01 01:40:00, Tuesday, MTA Subway - Manhattan Bridge (B/D/N/Q), Train, Travel & Transport, -1.0, Clouds⟩↴
	𝘱5⟨(40.690 -73.982), 2024-01-01 02:25:00, Tuesday, NYCT Transit Survey Unit, Office, Professional & Other Places, -1.0, Rain⟩↴
	𝘱6⟨(40.759 -73.988), 2024-01-01 04:07:00, Tuesday, MTA Bus - 8 Av & W 46 St (M20/M104), Bus Stop, Travel & Transport, -1.0, Rain⟩↴
	𝘱7⟨(40.653 -74.002), 2024-01-01 05:07:00, Wednesday, MTA Regional Bus Depot - Jackie Gleason, Bus Station, Travel & Transport, -1.0, Clouds⟩↴
	𝘱8⟨(40.638 -73.979), 2024-01-01 05:53:00, Wed

# 2. Similarity trajectory methods


## 2.1 Similarity methods for Multiple Aspect Trajectories

### 2.1.1 MUITAS:

A similarity measure for trajectory data with heterogeneous semantic dimensions considers the semantic relationship between attributes.
(paper Towards Semantic-Aware Multiple-Aspect Trajectory Similarity Measuring, published in Transactions in GIS, available in: \url{https://doi.org/10.1111/tgis.12542})

## Using the MUITAS Class for Measuring Similarity Between Multiple Aspect Trajectories.

Welcome to this tutorial on using the MUITAS (Multiple Aspect Trajectory Similarity) class. In this tutorial, you will learn how to measure similarity between multiple aspect trajectories using the MUITAS class.

Objectives:

- Understand the MUITAS class and its functionalities.
- Learn how to set up and initialize the MUITAS class.
- Measure similarity between trajectories using the MUITAS class.
- Visualize the trajectories and similarity scores.


positional arguments:
  input                  two trajectories objects to compute distances/similarities
  output                 the distance/similarity score
  config                 parameter configuration

    

To compute the similarity between two trajectories (P and Q), MUITAS(P,Q) needs to configure an Application. An application essentially defines the context of the problem, that is, how trajectories will be analyzed. An application 𝔸 is defined by a tuple 𝔸 = (attributes, distance function, thresholds, features, weights). In the following, we exemplify how to configure all elements in the application.

The set of attributes is automatically obtained from the dataset descriptor, following the dataset read, such as the distance function is defined following the comparator in matmodel package, e.g:

Attributes type and comparator:

| Type    | Comparator      |
|---------|-----------------|
| space2d | euclidean/-1.0  |
| time    | difference/-1.0 |
| nominal | equals/-1.0     |
| numeric | diffnotneg/-1.0 |




In [7]:
print(f"Attributes in Application: {dataset_descriptor.attributes}")

Attributes in Application: [1. space (space2d), 2. time (time), 3. day (nominal), 4. poi (nominal), 5. type (nominal), 6. root_type (nominal), 7. rating (numeric), 8. weather (nominal)]


#### a) First, it is necessary create the similarity object given the dataset descriptor to be analysed

In [8]:
# importing all functions refering similarity measure method:
from matsimilarity.methods.mat.MUITAS.MUITAS_T2T import *

# Create the MUITAS object 
muitas = MUITAS(dataset_descriptor)

#### b) Second, it is necessary to define other parameter configurations of MUITAS:

- Defining the features:
--  which attributes will be analyzed, and if they will be examined individually or join with which other.
-- To define features to measure similarity using MUITAS in this implementation, we define each feature with the set of attributes and the weight defined for the relative feature by the add_feature method.
   

In [9]:
# Add features to the MUITAS object
muitas.add_feature([dataset_descriptor.attributes[0]], 1)
muitas.add_feature([dataset_descriptor.attributes[1]], 1)
muitas.add_feature([dataset_descriptor.attributes[2]], 0.25)
muitas.add_feature([dataset_descriptor.attributes[3], dataset_descriptor.attributes[4], dataset_descriptor.attributes[5]], 0.25)
muitas.add_feature([dataset_descriptor.attributes[3], dataset_descriptor.attributes[6]], 0.25)
muitas.add_feature([dataset_descriptor.attributes[7]], 0.25)

Feature: Attributes: [0], Weight: 1
Feature: Attributes: [1], Weight: 1
Feature: Attributes: [2], Weight: 0.25
Feature: Attributes: [3, 4, 5], Weight: 0.25
Feature: Attributes: [3, 6], Weight: 0.25
Feature: Attributes: [7], Weight: 0.25


#### c) Setting Thresholds

It is possible to define the thresholds to be considered when two attributes are a match, according to each attribute type, following the attributes in the dataset descriptor. If not defined, default values will be considered according to attribute types:

Attributes type and comparator:

| Type              | Threshold |
|-------------------|-----------|
| space2d / space3d | 0.2       |
| time              | 100       |
| nominal           | 0.0       |
| numeric           | 0.1       |



In this implementation, we define thresholds for measuring similarity using the `MUITAS` class. Below is an example demonstrating how to set thresholds for each attribute and display them.


In [10]:
# Set thresholds
muitas.set_threshold(threshold_value=[0.2, 100, 0.0, 0.0, 0.0, 0.0, 0.1, 0.0])
muitas.display_attributes_and_thresholds()


Attribute: space, Type: space2d, Threshold: 0.2
Attribute: time, Type: time, Threshold: 100
Attribute: day, Type: nominal, Threshold: 0.0
Attribute: poi, Type: nominal, Threshold: 0.0
Attribute: type, Type: nominal, Threshold: 0.0
Attribute: root_type, Type: nominal, Threshold: 0.0
Attribute: rating, Type: numeric, Threshold: 0.1
Attribute: weather, Type: nominal, Threshold: 0.0


#### d) Measuring Similarity

Finally, it is possible to measure similarity between two trajectories:

In [11]:
# Measure similarity
similarity_score = muitas.similarity_of(traj1, traj2)
print(f"Similarity Score: {similarity_score}")

Similarity Score: 0.6342592592592593


## Conclusion:

In this tutorial, you learned how to use the MUITAS class to measure the similarity between multiple aspect trajectories. We covered:

- Initializing the MUITAS class.
- Setting thresholds for different attributes.
- Defining features with different numbers of attributes and with different weights
- Measuring similarity between trajectories.

\# By Vanessa Lago Machado (2024)