# MAT-model: Model Classes for Multiple Aspect Trajectory Data Mining \[MAT-Tools Framework\]

Sample Code in python notebook to use `mat-model` as a python library.

The present package offers a tool, to support the user in the task of modeling multiple aspect trajectories. It integrates into a unique framework for multiple aspects trajectories and in general for multidimensional sequence data mining methods.

Created on Apr, 2024
Copyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)

In [None]:
!pip install mat-model
#!pip install --upgrade mat-model

## 1. Loading Trajectory Sample Data

In [1]:
from matdata.dataset import *
ds = 'mat.FoursquareNYC'
df = load_ds(ds, sample_size=0.25)
df

Loading dataset file: https://github.com/mat-analysis/datasets/tree/main/mat/FoursquareNYC/


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1055k  100 1055k    0     0  5831k      0 --:--:-- --:--:-- --:--:-- 5929k


Stratification (class-balanced):   0%|          | 0/193 [00:00<?, ?it/s]

Sorting data:   0%|          | 0/193 [00:00<?, ?it/s]

Unnamed: 0,space,time,day,poi,type,root_type,rating,weather,tid,label
0,40.8340978041072 -73.9452672225881,788,Monday,Galaxy Gourmet Deli,Deli / Bodega,Food,8.2,Clouds,127,6
1,40.5671960000000 -73.8825760000000,1175,Monday,MTA Bus - Beach 169 St & Rockaway Point Bl (Q2...,Bus Stop,Travel & Transport,-1.0,Clouds,127,6
2,40.6899127194574 -73.9815044403076,1381,Monday,MTA Subway - DeKalb Ave (B/Q/R),Metro Station,Travel & Transport,-1.0,Clouds,127,6
3,40.7085883614824 -73.9910316467285,1404,Monday,MTA Subway - Manhattan Bridge (B/D/N/Q),Train,Travel & Transport,-1.0,Clouds,127,6
4,40.8331652006224 -73.9418603427692,845,Tuesday,The Grinnell,Home (private),Residence,-1.0,Clear,127,6
...,...,...,...,...,...,...,...,...,...,...
17,40.7047332789043 -73.9877378940582,939,Thursday,Miami Ad School Brooklyn,General College & University,College & University,-1.0,Clear,29559,1070
18,40.6978026652822 -73.9941451630314,483,Friday,Eastern Athletic Club,Gym,Outdoors & Recreation,6.9,Clear,29559,1070
19,40.6946728967503 -73.9940820360805,794,Friday,Starbucks,Coffee Shop,Food,7.0,Clear,29559,1070
20,40.7023694709909 -73.9875124790989,1261,Friday,Superfine,American Restaurant,Food,7.6,Clear,29559,1070


## 2. Trajectory Objects

First, convert the trajectory dataframe into Trajectory objects:

In [2]:
from matmodel.util.parsers import df2trajectory

T, data_desc = df2trajectory(df)

Converting Trajectories:   0%|          | 0/694 [00:00<?, ?it/s]

- Text display of the trajectory:

In [3]:
traj = T[1]
traj.display()

𝘛𐄁128 	𝘱1⟨(40.660 -73.830), 2024-01-01 17:22:00, Monday, MTA Subway - Howard Beach/JFK Airport (A), Metro Station, Travel & Transport, -1.0, Clear⟩↴
	𝘱2⟨(40.609 -73.819), 2024-01-01 19:39:00, Monday, MTA Bus - Q53, Beach, Outdoors & Recreation, -1.0, Clear⟩↴
	𝘱3⟨(40.734 -73.871), 2024-01-01 20:08:00, Monday, Queens Center Mall, Shopping Mall, Shop & Service, 7.5, Clear⟩↴
	𝘱4⟨(40.733 -73.871), 2024-01-01 20:10:00, Monday, MTA Bus - Q11/Q21/Q29/Q52LTD/Q53LTD/Q59/Q60 - Queens Blvd & 59th Av, Bus Line, Travel & Transport, -1.0, Clear⟩↴
	𝘱5⟨(40.763 -73.875), 2024-01-01 21:13:00, Monday, MTABus Q19, Q49 (Astoria Blvd/94th St), Bus Station, Travel & Transport, -1.0, Clear⟩↴
	𝘱6⟨(40.757 -73.992), 2024-01-01 02:17:00, Thursday, Port Authority Bus Terminal, Bus Station, Travel & Transport, 5.5, Clear⟩↴
	𝘱7⟨(40.756 -73.986), 2024-01-01 02:27:00, Thursday, Times Square, Plaza, Outdoors & Recreation, 9.0, Clear⟩↴
	𝘱8⟨(40.816 -73.958), 2024-01-01 02:51:00, Thursday, MTA Subway - 125th St (1), Metro 

- The dataset descriptor of attributes:

In [None]:
data_desc.attributes

In [None]:
traj.data_desc.attributes # data_desc is referenced in each trajectory internally

- The spectial attributes, TID and class label:

In [None]:
# Special desctriptors for trajectory:
print(data_desc.idDesc)
print(data_desc.labelDesc)

- Trajectory points:

In [None]:
traj.points[0]

- The aspect values (from one point):

In [None]:
# Values
traj.points[0].aspects[2], traj.points[0].aspects[3]

In [None]:
# The attribute name (aspect text) and value
data_desc.attributes[3].text, traj.points[0].aspects[3]

Testing the aspect type (instance):

In [None]:
a = traj.points[0].aspects[0]
b = traj.points[0].aspects[2]
from matmodel.base import Space2D

isinstance(a, Space2D), isinstance(b, Space2D), type(a), type(b)

In [None]:
a.value, type(a.value), b.value, type(b.value)

## 3. Distance Comparators

- If Dataset Descriptor has Aspect Comparators instantiated:

In [None]:
a1 = data_desc.attributes[0]
a8 = data_desc.attributes[7]
print(a1.order, a1.text, a1.dtype, sep=' -- ')

print('Comparator 1:', a1.comparator)
print('Comparator 8:', a8.comparator)

In [None]:
for attr in data_desc.attributes:
    print(attr, attr.comparator)

Calculating distances:

In [None]:
# Spatial Distance:
a1.comparator.distance(traj.points[0].aspects[0], traj.points[1].aspects[0])

In [None]:
# Distance of p1 to p2, on attribute Weather (equals)
a8.comparator.distance(traj.points[0].aspects[7], traj.points[1].aspects[7])

In [None]:
# Distance of p1 to p6, on attribute 1 (different)
a8.comparator.distance(traj.points[0].aspects[7], traj.points[5].aspects[7])

Examples to deal with distances:

In [None]:
# For this distance values:
d1 = 2
d2 = 10

# this is a function used to increase the difference proportionally the greater the distance,
# goes up to the max_value of the comparator (if set)
a1.comparator.enhance(d1), a1.comparator.enhance(d2)

In [None]:
# For this distance values, supposing a max_value of 100:
d1 = 25
d2 = 75

# If you have distance values that you want to normalize from 0 to 1, you can assign the largest possible distance value to comparators max_value:
a1.comparator.max_value = 100
a1.comparator.normalize(d1), a1.comparator.normalize(d2)

In [None]:
help(a1.comparator.distance)

Replacing comparator instances:

In [None]:
# Eu posso criar outros comparadores, ou trocar:
from matmodel.comparator import LcsDistance, EditlcsDistance

a1.comparator = LcsDistance()
print(traj.points[0].aspects[2], traj.points[2].aspects[2], a1.comparator.distance(traj.points[0].aspects[2], traj.points[2].aspects[2]))
print(traj.points[0].aspects[2], traj.points[5].aspects[2], a1.comparator.distance(traj.points[0].aspects[2], traj.points[5].aspects[2]))

a1.comparator = EditlcsDistance()
print(traj.points[0].aspects[2], traj.points[2].aspects[2], a1.comparator.distance(traj.points[0].aspects[2], traj.points[2].aspects[2]))
print(traj.points[0].aspects[2], traj.points[5].aspects[2], a1.comparator.distance(traj.points[0].aspects[2], traj.points[5].aspects[2]))

---

- Date time and interval comparators:

In [None]:
# 1. Instantiate some examples:
from matmodel.base import DateTime, Interval

v1 = DateTime('60')
v2 = DateTime('150')
v3 = DateTime('1430')

i1 = Interval('70', '120') # All dates are out
i2 = Interval('90', '180') # Inside: v2, out: v1 e v3
i3 = Interval('1380', '1430') # v3 == interval end

v1, v2, v3, i1, i2, i3

In [None]:
type(v1.value), v1.get('m'), v2.get('m'), v3.get('m')

In [None]:
# 2. Comparators:
from matmodel.comparator import *

tD = TimeDistance() # This always compares in relation to the time of day (only considers time) in minutes, hours, etc. *Forwards or Backwards*
dD = DatetimeDistance(units='m') # This one will always compare from the largest to the smallest
iD = InintervalDistance(units='m') # This compares dates and data intervals.

tD.max_value, iD.max_value, dD.max_value

In [None]:
print(tD.distance(v1, v2), tD.distance(v2, v1))
print(tD.distance(v1, v3), tD.distance(v3, v1)) # The difference is the small possible (Forwards or Backwards)

print(dD.distance(v1, v2), dD.distance(v2, v1))
print(dD.distance(v1, v3), dD.distance(v3, v1)) # from the largest to the smallest

Instead of using the distance, we can use a match function of the comparator for True or False:

In [None]:
print(iD.distance(v1, v2), iD.distance(v2, v1))
print(iD.match(v1, v2), iD.match(v2, v1)) # See if equals

v4 = DateTime('60') # Equals v1
print(iD.match(v1, v4), iD.match(v4, v1))

print(iD.match(v1, v2, 60*3), iD.match(v2, v1, 60*3)) # See if match in the threshold of 3h

In [None]:
i1.start, i1.end

In [None]:
# distance is a match 0 or 1
print(iD.distance(v1, i1), iD.distance(i1, v1))
print(iD.distance(v2, i1), iD.distance(i1, v2))
print(iD.distance(v3, i1), iD.distance(i1, v3))
print()
print(iD.distance(v1, i2), iD.distance(i2, v1))
print(iD.distance(v2, i2), iD.distance(i2, v2)) # v2 inside
print(iD.distance(v3, i2), iD.distance(i2, v3))
print()
print(iD.distance(v1, i3), iD.distance(i3, v1))
print(iD.distance(v2, i3), iD.distance(i3, v2))
print(iD.distance(v3, i3), iD.distance(i3, v3)) # v3 inside

In [None]:
# match is bool
print(iD.match(v1, i1), iD.match(i1, v1))
print(iD.match(v2, i1), iD.match(i1, v2))
print(iD.match(v3, i1), iD.match(i1, v3))
print()
print(iD.match(v1, i2), iD.match(i2, v1))
print(iD.match(v2, i2), iD.match(i2, v2))
print(iD.match(v3, i2), iD.match(i2, v3))
print()
print(iD.match(v1, i3), iD.match(i3, v1))
print(iD.match(v2, i3), iD.match(i3, v2))
print(iD.match(v3, i3), iD.match(i3, v3))

---

In [None]:
# For manually setting a weight for each attribute (can be configured in the descriptor JSON)
for attr in data_desc.attributes:
    attr.weight = 1.0

In [None]:
for attr in data_desc.attributes:
    print(attr, '\t>> W:', attr.weight)

---
## Other Stuff

In [None]:
# We can configure how the dataset attributes are going to be instantiated, each with different distance functions (comparators):
from matmodel.util.parsers import df2trajectory
T, data_desc = df2trajectory(df, data_desc='../datasets/mat/FoursquareNYC/FoursquareNYC.json')

In [None]:
for attr in data_desc.attributes:
    print(attr.comparator, '>>', attr)

In [None]:
# Also, we can configure dependency groups, given any name to a subset of related attributes (in JSON descriptor file):
data_desc.dependencies

In [None]:
for attr in data_desc.attributes:
    print(attr.dependency_group, '>>', attr)

In [None]:
# The FeatureDescriptor class can instantiate any instance by a JSON object:
from matmodel.descriptor import FeatureDescriptor

desc = {
    "order": 7,
    "type": "numeric",
    "text": "rating",
    "dependency": "poi",
    "weight": 0.5,
    "comparator": {
        "distance": "diffnotneg",
        "maxValue": 5.0,
        "param1": 'something x',
        "param2": 'something y',
    }
}

ft = FeatureDescriptor.instantiate(desc)
ft

In [None]:
ft, ft.dependency_group, ft.weight, ft.comparator, ft.comparator.param1, ft.comparator.param2

# 
---

\# By Tarlis Portela (2023)