# Statistics

For each group of participants and for all trajectories we study the following statistics:

    1. Total amount of time spent (T)
    2. Total distance travelled (D) and Euclidean distances (d) between geo-locations.
    3. Instantaneous velocity (v):
\begin{align}
v(i)=\frac{d_{i}}{\Delta t_{i}}
\end{align}

    4. Radius of gyration (Rg):
\begin{align}
R_{g}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}\left( r_{i}-r_{0} \right)^{2}}
\end{align}  

where $r_{i}$ are the coordinates of the N individual points, being  $r_{0}$ the position of the center of mass of the set of points, $r_{cm}=\sum_{i=1}^{N} r_{i}/N$. In human mobility, it characterizes the typical distance of an individual from the center of mass of their trajectory.

    5. Turtuosity:
\begin{align}
T=1- \langle cos(\theta_{i}) \rangle
\end{align}    

where $\theta_{i}$ is the angle between the current position "i" and the last position "N". It gives a measure of how inefficient the path is with respect to the final destination. That is, how turbulent and dispersed it is.
    
    6. Stops duration 
    
We consider the position "i" as stopped if $\Delta t_{i} = t_{i+1}-t_{i} \geq 10s$. Then, the value of $\Delta t_{i}$ is the duration of the stop. If we detect two or more consecutive stops, we consider a single stop with the duration corresponding to the sum of all of them. 


#### Import libraries

In [1]:
from gpxcsv import gpxtolist
import pandas as pd
import networkx as nx
import osmnx as ox


import matplotlib.pyplot as plt
import numpy as np
import glob
import os
from math import sin, cos, sqrt, atan2, radians
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from folium import Map
from folium.plugins import HeatMap, HeatMapWithTime
import utm
from matplotlib.animation import FuncAnimation
import matplotlib.animation as animation
from scipy.stats import gaussian_kde
from shapely.geometry import Point, LineString, Polygon
import folium

#%matplotlib notebook
#%matplotlib inline
ox.config(use_cache=True, log_console=True)
ox.__version__





'1.0.1'

#### Functions

In [2]:
def GPScoordinates_to_utm(lat,lon):
    """" Function that projects the GPS coordinates in degrees (latitude, longitude) into the UTM coordinate system
    in order to work with the concept of "point" and "Euclidean distance" in a plane.
    (https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system)
    
    Note: We are dealing with locations in the same area/region. Otherwise, we should be careful if two locations 
    belong to different UTM zones when calculating distances, etc. 
    
    Input:
        - lists of GPS coordinates: latitude and longitude
        
    Output:
        - lists of the UTM projections of the GPS coordinates: 
        - The utm package returns Easting, Northing, Zone_number and Zone_letter. So we only store the two first elements
    """
    
    lat_utm=[]
    lon_utm=[]
    for i in range(len(lat)):
        u=utm.from_latlon(lat[i],lon[i])  # get the UTM projection
        lat_utm.append(u[0])   # Store the projection of latitude and longitude in lists
        lon_utm.append(u[1])
        
    return lat_utm, lon_utm
        
    

def EuclidianDistance(x1,y1,x2,y2):
    """ Function that returns the distance in metres between 2 points in a p(NO GPS locs.) 
    
    Note: Be careful as the GPS coordinates (lat,lon) cannot be used, but their projections to a plane (e.g. utm projection).

    
    Input:
        - The coordinates of two points in a plane: (x1,y1) and (x2,y2).
        
    Output:
        - Euclidian distance in metres between the two points.
    """
    Euclidian_distance = ( (x2-x1)**2 + (y2-y1)**2 ) ** 0.5
        
    return Euclidian_distance
        
    
        
def getDistanceFromLatLonInM(lat1,lon1,lat2,lon2):
    """ Function that returns the distance in metres between 2 GPS locations in degrees (latitude and longitude).
    It is based in the Haversine formula (https://en.wikipedia.org/wiki/Haversine_formula) which takes into account the
    Earth's curvature. 
    
    Input:
        - 2 GPS coordinates: (latitude1,longitude1) of the first point and (latitude2,longitude2) of the second point. 
        
    Output:
        - Distance in metres between the two GPS locations.
    """
    
    R = 6371 # Radius of the earth in km
    dLat = radians(lat2-lat1)  # Diference between latitudes in radians
    dLon = radians(lon2-lon1) # Diference between longitudes in radians
    rLat1 = radians(lat1)   # Latitudes in radians
    rLat2 = radians(lat2)
    a = sin(dLat/2) * sin(dLat/2) + cos(rLat1) * cos(rLat2) * sin(dLon/2) * sin(dLon/2) 
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    d = R * c # Distance in km
    e= d*1000 # Distance in m
   
    return e
   
    
    
def instantaneous_velocity(distance,time):
    """" Function that computes the instantenous velocity between two points, given their distance and their time difference.
    
    Input:
        - Distance between two points or two locations
        - Time difference between the two points/locations
        
    Output:
        - Instantenous velocity between the two points
    """
    
    v=distance/time
    
    return v


def convert(seconds):
    """" Function that converts the seconds to hours, minuts and seconds
    
    Input:
        - Number of seconds
        
    Output:
        - Hours, minuts and seconds
    """
    
    seconds = seconds % (24 * 3600)
    hour = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    
    return hour, minutes, seconds


def map_network(latitude, longitude, dist):
    """" Function that creates a map of the urban network in a given radius around a gps location.
    
    Input:
        - latitude and longitude (GPS coordinates)
        - dist: distance (radius) from which the network is constructed
        
    Output:
        - The figure and the axes.
        
    """
    
    G = ox.graph_from_point((latitude, longitude), dist, network_type='all')   # Create the graph from lat and lon
    fig, ax = ox.plot_graph(G, show=False, close=False, bgcolor="#333333",edge_color="w", edge_linewidth=0.8, node_size=0)
    
    #for _, edge in ox.graph_to_gdfs(G, nodes=False).fillna('').iterrows():     # Name of the street
        #c = edge['geometry'].centroid
        #text = edge['name']
        #ax.annotate(text, (c.x, c.y), c='w')       
 
    return fig, ax


def map_network2(lat1,lon1,lat2,lon2,lat3,lon3,lat4,lon4):
    """" Function that creates a map of the urban network from a polygon (given the 4 bounds).
    
    Input:
        - bounds: 4 latitude and longitude points (GPS coordinates)
        
    Output:
        - The figure and the axes.
        
    """
    
    P = Polygon([(lat1,lon1), (lat2,lon2),(lat3,lon3),(lat4,lon4)])  # Create the graph from lat and lon bounds
    G = ox.graph_from_polygon(P, network_type='all') 
    fig, ax = ox.plot_graph(G, show=False, close=False, bgcolor="#333333",edge_color="w", edge_linewidth=0.8, node_size=0)
    place_name = "Granollers, Vallès Oriental"
    tags={"building": True}
    gdf = ox.geometries_from_place(place_name, tags)
    gdf.shape
    gdf.plot(ax=ax,color='silver',alpha=0.5)
    buildings = ox.geometries_from_place(place_name, tags={'building':True}) # Retrieve buildings from the area:
    
    #for _, edge in ox.graph_to_gdfs(G, nodes=False).fillna('').iterrows():     # Name of the street
        #c = edge['geometry'].centroid
        #text = edge['name']
        #ax.annotate(text, (c.x, c.y), c='w')       
 
    return fig, ax


def RadiusOfGyration(x,y):
    """" Function that computes the radius of gyration of a 2-d trajectory with x and y coordinates.
    
    M. C. Gonzalez, C. a C. A. Hidalgo, A. L. A.-L. Barabási, M. C. González, C. A. H. 
    & A.-L. B. Marta C. González, and M. C. Gonz, Nature 453, 779 (2008).
    
    Note: x,y coordinates can't be lat,lon in degrees. Before we must project them into a plane (e.g UTM projection)
    
    
    Input:
        - Lists of coordinates x and y
    
    Output:
        - Radius of gyration
    """
    
    r_cm_x=sum(x)/len(x)
    r_cm_y=sum(y)/len(y)
    
    radius2=[]
    for i in range(len(x)):
        r_new_x=x[i]-r_cm_x
        r_new_y=y[i]-r_cm_y
        radius2.append((r_new_x*r_new_x) + (r_new_y*r_new_y))
    
    mean_radius2=sum(radius2)/len(radius2)
    
    rg=mean_radius2**0.5
    
    return rg
            

def vector(latitude0,longitude0,latitude1,longitude1): 
    """ Given two points returns the vector from the origin point to destination. If GPS coordinates, use UTM projection.
    
    Input:
        - Coordinates of origin point (p0x,p0y) and destionation point (p1x, p1y).
    
    Oputput:
        - Coordinates of the vector from p0 to p1. 
    """
    
    p0=(latitude0,longitude0)   
    p1=(latitude1,longitude1)
    vec=(p1[0]-p0[0], p1[1]-p0[1]) 
    return vec


def determinant(vec0,vec1):
    """  Returns determinant of two vectors. If det<0 means that the second vector has turned in the clockwise direction.
    
    Input:
        - Two consecutive vectors (vec0, vec1) characterized with their x,y coordinates.
        
    Output:
        - Determinant of two vectors. If det<0 the second vector has turned in the clockwise direction respect to the first.
    """
    det=vec0[0]*vec1[1]-vec0[1]*vec1[0]
    return det


def reorientation(vec0,vec1):
    """Returns the angle between two consecutive vectors (change in orientation, reorientation, turning angle)
    The range is from -pi to +pi. If det<0 the second vector has turned in the clockwise direction and therefore the 
    reorientation angle is also <0. On contrary, if has turned counter-clockwise is >0. 
    
    Input: 
        - Two consecutive vectors (vec0, vec1) characterized with their x,y coordinates.
    
    Output:
        - Reorientation angle between them 
    """
    unit_vec0=vec0/np.linalg.norm(vec0)
    unit_vec1=vec1/np.linalg.norm(vec1)
    dot_product=np.dot(unit_vec0,unit_vec1)
    a=np.arccos(dot_product)
    det=determinant(vec0,vec1)
    if det<0:
        return -a
    else:
        return a

    
        
def turtuosity(latitudes,longitudes):
    """ Given the two lists of latitudes and longitudes (utm projection), it returns the turtuosity of the trajectory. 
    If turtuosity is near 0 means that the trajectory is very straight to the final destionation. If its near to 1 means that
    is the movement is very turtuous (not direct).
    
    Input:
        - Lists of x and y points (latitudes and longitudes projected) of a trajectory.
        
    Output:
        - Estimation of the turtuosity of the trajectory.
    """

    vectors=[]
    vectors_straight=[]
    for i in range(1,len(latitudes)):
        vectors.append(vector(latitudes[i-1],longitudes[i-1],latitudes[i],longitudes[i]))  # Vector between the point i-1 and  i.
        vectors_straight.append(vector(latitudes[i-1],longitudes[i-1],latitudes[-1],longitudes[-1]))  # straight vector between  
                                                                                         # point i and final point (destination)
        
    reorientations=[]   
    for i in range(len(vectors)-1):
        reorientations.append(np.cos(abs(reorientation(vectors_straight[i],vectors[i])))) # Reorientation angle 
        #reorientations2 = [x for x in reorientations if np.isnan(x) == False]  # avoid nan values                                                                           # vector and the straight vector (>=0)
            
    turtuo=1.-(sum(reorientations)/len(reorientations))  # The average value of the movement re-orientations gives the effiency
                                                         # of the trajectory (how straightforward/directed is towards the final
                                                         # destionation. Then 1 - effiency is the turtuosity
            
    return turtuo




  

##  1. Statistics for each group

In [3]:
all_files = glob.glob(os.path.join("*.csv")) # Make list of paths (csv files for all individuals)

latitud_tots=[]  # Initialise all the variables
longitud_tots=[]
velocity_all=[]
time=[]
At_tots=[]

distancia_tots=[]
vel_mitja_tots=[]
durada_tots=[]
nom_grups=[]
turto_tots=[]
radi_tots=[]
nombre_aturades=[]
temps_total_aturada=[]
temps_promig_aturada=[]
temps_aturada_grup=[]


for file in all_files:   # Loop over all csv files in the folder/directory (each group/trajectory)
    
    df = pd.read_csv(file)  # Read the .csv
    
    print('Grup:', file)
    print('')
    
    
    
    #TIME
    t=df['At'][:-1].tolist()  
    total_seconds_trajectory=sum(t)
    hours, minutes, seconds=convert(total_seconds_trajectory)
    print('The duration of the journey is:', hours, 'hours','', minutes,'minutes','','i','', seconds, 'seconds')
    print('')
    
    
    
    # DISTANCE
    distance=df['d'][:-1].tolist()       
    print('The total distance travelled is:', sum(distance), 'metres')
    print('')
    
    
    
    # INSTANTANEOUS VELOCITY
    v3=df.loc[df['stops'] == 'flight', 'v'].tolist()
    velocity=v3[:-1]
    
    mean = sum(velocity) / len(velocity)
    variance = sum([((x - mean) ** 2) for x in velocity]) / len(velocity)
    res = variance ** 0.5
    print('The average instantaneous velocity is, <v>=', mean, 'm/s', '', '\u03C3(v)=',res, 'm/s')
    print('The average instantaneous velocity is, <v>:', 3.6*mean,'km/h','', '\u03C3(v)=',3.6*res ,'km/h')
    print('') 

    

    # RADIUS OF GYRATION.  
    lat=df['lat'].tolist()
    lon=df['lon'].tolist()
    lat_utm, lon_utm=GPScoordinates_to_utm(lat,lon)
    Rg=RadiusOfGyration(lat_utm, lon_utm)
    print('The radius of gyration is, Rg=', Rg, 'm')
    print('')
    
    
    # TURTUOSITY
    turto=turtuosity(lat_utm,lon_utm)
    print('The turtuosity is, T=', turto)
    print('')
    
    
    distancia_tots.append(sum(distance))   # store all distances, <v>, etc.
    vel_mitja_tots.append(mean)
    durada_tots.append(sum(t))
    nom_grups.append(str(file))
    turto_tots.append(turto)
    radi_tots.append(Rg)
    
    latitud_tots.extend(lat)
    velocity_all.extend(velocity)
    
    
    
    # STOPS DURATION
    stops=df['stops'].tolist()  
    At_cs=[]
    i=0
    while i < len(stops):
        if stops[i]=='flight':       # We store the duration of the stops. If there are two or more consecutive stops, we sum
            i=i+1                    # their corresponding durations and we consider them as a single stop.
        else:
            s=0
            stopss=[]
            while stops[i+s]=='stop':
                stopss.append(t[i+s])
                s=s+1

            At_c=sum(stopss)
            At_cs.append(At_c)

            i=i+s
    
    print('The total number of STOPS is:', len(At_cs))
    print('The total duration of STOPS is:', sum(At_cs),'s')
    if len(At_cs)!=0:
        mean_temps=sum(At_cs)/len(At_cs)
        std_temps=np.std(At_cs)
        print('The average STOP duration is:', sum(At_cs)/len(At_cs),'s', '', '\u03C3(stop)=',std_temps, 's')
    else:
        mean_temps=0.0
    print('')
    print('---------------------------------------------------------------------------------------------------------')
    print('')
    print('')
    
    nombre_aturades.append(len(At_cs))
    temps_total_aturada.append(sum(At_cs))
    temps_promig_aturada.append(mean_temps)
    temps_aturada_grup.extend(At_cs)
    





    

Grup: e1.csv

The duration of the journey is: 0.0 hours  45.0 minutes  i  53.0 seconds

The total distance travelled is: 896.0505703017408 metres

The average instantaneous velocity is, <v>= 1.2127910593817441 m/s  σ(v)= 0.3992397698740374 m/s
The average instantaneous velocity is, <v>: 4.366047813774279 km/h  σ(v)= 1.4372631715465347 km/h

The radius of gyration is, Rg= 56.81708741097374 m

The turtuosity is, T= 0.9612257374078075

The total number of STOPS is: 18
The total duration of STOPS is: 2208.0 s
The average STOP duration is: 122.66666666666667 s  σ(stop)= 207.70866349020903 s

---------------------------------------------------------------------------------------------------------


Grup: e2.csv

The duration of the journey is: 1.0 hours  7.0 minutes  i  31.0 seconds

The total distance travelled is: 752.8778994605093 metres

The average instantaneous velocity is, <v>= 1.4906572845472792 m/s  σ(v)= 0.9225465728692595 m/s
The average instantaneous velocity is, <v>: 5.366366224

The duration of the journey is: 1.0 hours  26.0 minutes  i  14.0 seconds

The total distance travelled is: 469.1635140989095 metres

The average instantaneous velocity is, <v>= 0.8352726280154669 m/s  σ(v)= 0.17752588558915272 m/s
The average instantaneous velocity is, <v>: 3.006981460855681 km/h  σ(v)= 0.6390931881209498 km/h

The radius of gyration is, Rg= 51.17609859386947 m

The turtuosity is, T= 0.61294426457153

The total number of STOPS is: 13
The total duration of STOPS is: 4884.0 s
The average STOP duration is: 375.6923076923077 s  σ(stop)= 662.5005526401625 s

---------------------------------------------------------------------------------------------------------


Grup: g7.csv

The duration of the journey is: 1.0 hours  34.0 minutes  i  33.0 seconds

The total distance travelled is: 886.1694566488306 metres

The average instantaneous velocity is, <v>= 1.2874803217367286 m/s  σ(v)= 0.716777924182284 m/s
The average instantaneous velocity is, <v>: 4.6349291582522225 km/h  σ(v

## 2. DataFrame

In [4]:
# New DataFrame with the statistics for each group of participants

df_estadistica=pd.DataFrame()
df_estadistica['grup']=nom_grups
df_estadistica['distancia recorreguda (m)']=distancia_tots
df_estadistica['temps total (s)']=durada_tots
df_estadistica['velocitat mitjana (m/s)']=vel_mitja_tots
df_estadistica['turtuositat']=turto_tots
df_estadistica['radi de gir (m)']=radi_tots
df_estadistica['nombre aturades']=nombre_aturades
df_estadistica['temps promig aturades (s)']=temps_promig_aturada
df_estadistica['temps total aturades (s)']=temps_total_aturada


In [5]:
df_estadistica.style.background_gradient(axis=0) 

Unnamed: 0,grup,distancia recorreguda (m),temps total (s),velocitat mitjana (m/s),turtuositat,radi de gir (m),nombre aturades,temps promig aturades (s),temps total aturades (s)
0,e1.csv,896.05057,2753.0,1.212791,0.961226,56.817087,18,122.666667,2208.0
1,e2.csv,752.877899,4051.0,1.490657,0.947987,37.828331,17,214.529412,3647.0
2,e3.csv,1152.083883,4351.0,1.320079,0.979267,56.59452,16,224.9375,3599.0
3,e4.csv,716.687572,3695.0,1.570088,0.977765,43.869303,20,164.75,3295.0
4,e5.csv,1025.90231,3898.0,1.263011,0.963666,55.10657,29,112.034483,3249.0
5,e6.csv,1552.476509,3984.0,1.668774,0.946833,73.006098,21,143.0,3003.0
6,e7.csv,1292.43188,4039.0,1.374222,0.96779,45.81393,32,99.3125,3178.0
7,g1.csv,1309.595551,7407.0,1.289463,0.795216,86.268269,36,182.527778,6571.0
8,g2.csv,280.467736,694.0,1.223557,0.847569,26.959992,4,117.75,471.0
9,g3.csv,879.747351,6049.0,1.103773,0.79887,62.291938,21,259.904762,5458.0


## 3. Statistics for all groups

In [7]:
print('The total number of geo-locations in the dataset is:',len(latitud_tots))

The total number of geo-locations in the dataset is: 2981


### 3.1 - Instantaneous velocity

We obtain the mean value, the standard deviation error, the error, the minimum and maximum values and the quantiles

In [12]:
mean = sum(velocity_all) / len(velocity_all)
variance = sum([((x - mean) ** 2) for x in velocity_all]) / len(velocity_all)
res = variance ** 0.5
error=res/(len(velocity_all)**0.5)
q1=np.quantile(velocity_all, 0.25)
q2=np.quantile(velocity_all, 0.50)
q3=np.quantile(velocity_all, 0.75)
print('<v>=',mean,'m/s', '', '\u03C3(v)=',res,'m/s', '', 'error=',error, 'm/s')
print('min v=',min(velocity_all),'m/s','', 'max v=',max(velocity_all),'m/s')
print('q1 (25%):',q1,'m/s')
print('q2 (50%):',q2,'m/s')
print('q3 (75%):',q3,'m/s')


<v>= 1.3786831832400817 m/s  σ(v)= 1.5915924579715253 m/s  error= 0.034269423611369164 m/s
min v= 0.5559746332236926 m/s  max v= 30.37808942454657 m/s
q1 (25%): 0.8958119430994259 m/s
q2 (50%): 1.1419638586050864 m/s
q3 (75%): 1.412592094406029 m/s


### 3.2 - Duration of the journey

We obtain the mean value, the standard deviation error, the error, the minimum and maximum values and the quantiles

In [13]:
mean = sum(durada_tots) / len(durada_tots)
variance = sum([((x - mean) ** 2) for x in durada_tots]) / len(durada_tots)
res = variance ** 0.5
error=res/(len(durada_tots)**0.5)
q1=np.quantile(durada_tots, 0.25)
q2=np.quantile(durada_tots, 0.50)
q3=np.quantile(durada_tots, 0.75)
print('<T>=',mean,'s', '', '\u03C3(T)=',res,'s', '', 'error=',error,'s')
print('min T=',min(durada_tots),'s','', 'max t=',max(durada_tots),'s')
print('q1 (25%):',q1,'s')
print('q2 (50%):',q2,'s')
print('q3 (75%):',q3,'s')

<T>= 4199.444444444444 s  σ(T)= 1493.2799180261707 s  error= 351.96945208199907 s
min T= 694.0 s  max t= 7407.0 s
q1 (25%): 3745.75 s
q2 (50%): 4045.0 s
q3 (75%): 5063.25 s


### 3.3 - Distance travelled

We obtain the mean value, the standard deviation error, the error, the minimum and maximum values and the quantiles

In [21]:
mean = sum(distancia_tots) / len(distancia_tots)
variance = sum([((x - mean) ** 2) for x in distancia_tots]) / len(distancia_tots)
res = variance ** 0.5
error=res/(len(distancia_tots)**0.5)
q1=np.quantile(distancia_tots, 0.25)
q2=np.quantile(distancia_tots, 0.50)
q3=np.quantile(distancia_tots, 0.75)
print('<D>=',mean,'m', '', '\u03C3(D)=',res,'m', '', 'error=',error,'m')
print('min D=',min(distancia_tots),'m','', 'max D=',max(distancia_tots),'m')
print('q1 (25%):',q1,'m')
print('q2 (50%):',q2,'m')
print('q3 (75%):',q3,'m')

<D>= 955.0727998739726 m  σ(D)= 389.08453847010344 m  error= 91.70810520234943 m
min D= 280.46773642711884 m  max D= 1814.570911296141 m
q1 (25%): 767.7429682012851 m
q2 (50%): 882.9584037955658 m
q3 (75%): 1257.3448807634002 m


### 3.4 - Stops duration
We obtain the mean value, the standard deviation error, the error, the minimum and maximum values and the quantiles

In [23]:
mean = sum(temps_aturada_grup) / len(temps_aturada_grup)
variance = sum([((x - mean) ** 2) for x in temps_aturada_grup]) / len(temps_aturada_grup)
res = variance ** 0.5
error=res/(len(temps_aturada_grup)**0.5)
q1=np.quantile(temps_aturada_grup, 0.25)
q2=np.quantile(temps_aturada_grup, 0.50)
q3=np.quantile(temps_aturada_grup, 0.75)
print('<T(stop)>=',mean,'s', '', '\u03C3(T)=',res,'s', '', 'error=',error,'s')
print('min T=',min(temps_aturada_grup),'s','', 'max T=',max(temps_aturada_grup),'s')
print('q1 (25%):',q1,'s')
print('q2 (50%):',q2,'s')
print('q3 (75%):',q3,'s')

print('The average number of stops for group is:',sum(nombre_aturades)/len(nombre_aturades))

<T(stop)>= 190.57817109144543 s  σ(T)= 370.49006323669056 s  error= 20.1222581053907 s
min T= 10.0 s  max T= 2669.0 s
q1 (25%): 13.0 s
q2 (50%): 36.0 s
q3 (75%): 198.5 s
The average number of stops for group is: 18.833333333333332
