**First International Summer School in Data Science for Mobility (DSM)**, 3-7 October 2022, Santorini, Greece

<img width=200, src="http://master-school.isti.cnr.it/wp-content/uploads/2022/05/logo-Summer-School-Mobility-2022.png"/>

Author: [Luca Pappalardo](https://twitter.com/lucpappalard)

# Mobility Measures

We can compute on a `TrajDataFrame` two types of measures:

- **individual measures**, describing features related to the mobility patterns of a single individual
- **collective measures**, describing the mobility patterns of an entire population of individuals

In [None]:
# let's import some useful libraries
%matplotlib inline
import matplotlib.pyplot as plt
from scipy.stats import gaussian_kde
import pandas as pd
from tqdm import tqdm
import folium
from folium.plugins import HeatMap
from stats_utils import *
import stats_utils
import warnings
warnings.filterwarnings('ignore')

In [None]:
# let's import skmob's data structures
import skmob
from skmob import TrajDataFrame, FlowDataFrame

## Loading the data
- We load data of *checkins* made by users on **Brightkite**
- Brightkite is a location-based social network (LBSN)
- The dataset is freely available at the SNAP website: https://snap.stanford.edu/data/loc-brightkite.html

In [None]:
# download the dataset using pandas
#url = "https://snap.stanford.edu/data/loc-brightkite_totalCheckins.txt.gz"
url = "data/loc-brightkite_totalCheckins.txt.gz"
df = pd.read_csv(url, sep='\t', header=0, nrows=500000, 
                 names=['user', 'check-in_time', "latitude", "longitude", 
                        "location id"])

# convert the pandas DataFrame into an skmob TrajDataFrame
tdf = skmob.TrajDataFrame(df, latitude='latitude', 
            longitude='longitude', datetime='check-in_time', user_id='user')
print(tdf.shape)
tdf.head()

In [None]:
print("number of users:\t", len(tdf.uid.unique()))
print("number of records:\t", len(tdf))

## Individual measures

- computed on the trajectories of a <u>single individual</u>
- quantify standard *mobility patterns*
- examples: 
    - radius of gyration
    - jump lengths
    - max distance
    - individual mobility network

### Radius of gyration $r_g$
characteristic distance traveled by an individual:

$$r_g = \sqrt{\frac{1}{N} \sum_{i=1}^N (\mathbf{r}_i - \mathbf{r}_{cm})^2}$$

$r_{cm}$ is the position vector of the *center of mass* of the set of locations visited by the individual

In [None]:
from skmob.measures.individual import radius_of_gyration

In [None]:
rg_df = radius_of_gyration(tdf)

In [None]:
rg_df.head()

In [None]:
# let's plot the distribution of the radius of gyration
fig = plt.figure(figsize=(4, 4))
rg_list = list(rg_df[rg_df['radius_of_gyration'] > 1.0]['radius_of_gyration'])
x, y = zip(*lbpdf(1.5, rg_list))
plt.plot(x, y, marker='o')
plt.xlabel('$r_g$ [km]', fontsize=20)
plt.ylabel('P($r_g$)', fontsize=20)
plt.grid(alpha=0.2)
plt.loglog()
plt.show()

### Jump lengths
- a jump length is is the distance between two consecutive visits of an individual
- given a `TrajDataFrame`, skmob computes the lengths for each individual independently
- use the `jump_lengths` function

In [None]:
from skmob.measures.individual import jump_lengths

In [None]:
jl_df = jump_lengths(tdf) # disable progress bar with show_progress=False
jl_df.head()

In [None]:
# merge=True put all distances of the individuals into a single list
jl_list = jump_lengths(tdf, merge=True)
type(jl_list)

In [None]:
# let's plot the distribution of jump lengths
fig = plt.figure(figsize=(4, 4))
d_list = [dist for dist in jl_list if dist >= 1]
x, y = zip(*lbpdf(1.5, d_list))
plt.plot(x, y, marker='o')
plt.xlabel('jump length [km]', fontsize=15);plt.ylabel('P(jump length)', fontsize=15)
plt.grid(alpha=0.2)
plt.loglog()
plt.show()

### Distances

- maximum distance traveled by each individual `maximum_distance`


In [None]:
from skmob.measures.individual import max_distance_from_home, distance_straight_line, maximum_distance

In [None]:
md_df = maximum_distance(tdf)
md_df.head()

In [None]:
# let's plot the distribution
fig, ax1 = plt.subplots(1, 1)
ax1.hist(md_df.maximum_distance, bins=50, rwidth=0.8)
ax1.set_xlabel('max', fontsize=15)
plt.show()

### Individual mobility network
a network where: 
- nodes represent locations visited by the individual
- directed edges represent trips between the locations made by the individual 

In [None]:
from skmob.measures.individual import individual_mobility_network

In [None]:
imn_df = individual_mobility_network(tdf)
imn_df.head()

In [None]:
an_imn = imn_df[imn_df.uid == 2]
an_imn.sort_values(by='n_trips', ascending=False).head(5)

## Collective measures

- are computed on the trajectories of a <u>population of individuals</u>
- quantify standard *mobility patterns*
- examples: 
    - visits per time unit
    - origin destination matrix

### Visits per location

number of visits to a location made by the population of individuals

In [None]:
from skmob.measures.collective import visits_per_location

In [None]:
vpl_df = visits_per_location(tdf)
vpl_df.head()

In [None]:
fig = plt.figure(figsize=(4, 4))
x, y = zip(*lbpdf(1.5, list(vpl_df.n_visits)))
plt.plot(x, y, marker='o')
plt.xlabel('visits per location', fontsize=15)
plt.loglog() 
plt.show()

### Many many other measures can be computed with scikit-mobility. 
#### Just check the documentation https://scikit-mobility.github.io/scikit-mobility/reference/measures.html