# Data processing and cleaning

     1. We read the .gpx file corresponding to a journey recorded using the Wikiloc Application.
     
     2. Each .gpx file is composed by the GPS coordinates (latitude and longitude), the timestamp (2022-04-30T11:26:17Z), the elevation and the name of the file.
     
     3. Since we are interested in the trajectory inside the district (Barri Primer de Maig de Granollers) and the Wikiloc app. was initialised and finalised outside the district (for example in the school), we clean the trajectory by disregarding the "XX" first and last "XXX" rows (geolocations) of the Data-set. In order to do so, we represent the trajectory on maps and compute the basic statistics (time difference, euclidian distance and instantaneous velocity) between geolocations. Then we analise the map and the dataset in order to decide where to cut.
     
     4. Once the trajectory is processed and cleaned, we store the new Data-frame in a .csv file (adding the columns of the time increment between geolocations, the euclidian distance between geolocations and the instantaneous velocity).
     
     5. Finally, we label each GPS location as "stop" or "flight". We consider that a Geolocation "i" is stopped if the next geolocation "i+1" is recorded at least 10s later. We add a new column with the labels.
     
     6. We store the new DataFrame in a .csv file


#### Import libraries

In [None]:
from gpxcsv import gpxtolist
import pandas as pd
import networkx as nx
import osmnx as ox


import matplotlib.pyplot as plt
import numpy as np
import glob
import os
from math import sin, cos, sqrt, atan2, radians
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from folium import Map
from folium.plugins import HeatMap, HeatMapWithTime
import utm
from matplotlib.animation import FuncAnimation
import matplotlib.animation as animation
from scipy.stats import gaussian_kde
from shapely.geometry import Point, LineString, Polygon
import folium

#%matplotlib notebook
#%matplotlib inline
ox.config(use_cache=True, log_console=True)
ox.__version__





#### Functions 

In [None]:
def GPScoordinates_to_utm(lat,lon):
    """" Function that projects the GPS coordinates in degrees (latitude, longitude) into the UTM coordinate system
    in order to work with the concept of "point" and "Euclidean distance" in a plane.
    (https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system)
    
    Note: We are dealing with locations in the same area/region. Otherwise, we should be careful if two locations 
    belong to different UTM zones when calculating distances, etc. 
    
    Input:
        - lists of GPS coordinates: latitude and longitude
        
    Output:
        - lists of the UTM projections of the GPS coordinates: 
        - The utm package returns Easting, Northing, Zone_number and Zone_letter. So we only store the two first elements
    """
    
    lat_utm=[]
    lon_utm=[]
    for i in range(len(lat)):
        u=utm.from_latlon(lat[i],lon[i])  # get the UTM projection
        lat_utm.append(u[0])   # Store the projection of latitude and longitude in lists
        lon_utm.append(u[1])
        
    return lat_utm, lon_utm
        
    

def EuclidianDistance(x1,y1,x2,y2):
    """ Function that returns the distance in metres between 2 points in a p(NO GPS locs.) 
    
    Note: Be careful as the GPS coordinates (lat,lon) cannot be used, but their projections to a plane (e.g. utm projection).

    
    Input:
        - The coordinates of two points in a plane: (x1,y1) and (x2,y2).
        
    Output:
        - Euclidian distance in metres between the two points.
    """
    Euclidian_distance = ( (x2-x1)**2 + (y2-y1)**2 ) ** 0.5
        
    return Euclidian_distance
        
    
        
def getDistanceFromLatLonInM(lat1,lon1,lat2,lon2):
    """ Function that returns the distance in metres between 2 GPS locations in degrees (latitude and longitude).
    It is based in the Haversine formula (https://en.wikipedia.org/wiki/Haversine_formula) which takes into account the
    Earth's curvature. 
    
    Input:
        - 2 GPS coordinates: (latitude1,longitude1) of the first point and (latitude2,longitude2) of the second point. 
        
    Output:
        - Distance in metres between the two GPS locations.
    """
    
    R = 6371 # Radius of the earth in km
    dLat = radians(lat2-lat1)  # Diference between latitudes in radians
    dLon = radians(lon2-lon1) # Diference between longitudes in radians
    rLat1 = radians(lat1)   # Latitudes in radians
    rLat2 = radians(lat2)
    a = sin(dLat/2) * sin(dLat/2) + cos(rLat1) * cos(rLat2) * sin(dLon/2) * sin(dLon/2) 
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    d = R * c # Distance in km
    e= d*1000 # Distance in m
   
    return e
   
    
    
def instantaneous_velocity(distance,time):
    """" Function that computes the instantenous velocity between two points, given their distance and their time difference.
    
    Input:
        - Distance between two points or two locations
        - Time difference between the two points/locations
        
    Output:
        - Instantenous velocity between the two points
    """
    
    v=distance/time
    
    return v


def convert(seconds):
    """" Function that converts the seconds to hours, minuts and seconds
    
    Input:
        - Number of seconds
        
    Output:
        - Hours, minuts and seconds
    """
    
    seconds = seconds % (24 * 3600)
    hour = seconds // 3600
    seconds %= 3600
    minutes = seconds // 60
    seconds %= 60
    
    return hour, minutes, seconds


def map_network(latitude, longitude, dist):
    """" Function that creates a map of the urban network in a given radius around a gps location.
    
    Input:
        - latitude and longitude (GPS coordinates)
        - dist: distance (radius) from which the network is constructed
        
    Output:
        - The figure and the axes.
        
    """
    
    G = ox.graph_from_point((latitude, longitude), dist, network_type='all')   # Create the graph from lat and lon
    fig, ax = ox.plot_graph(G, show=False, close=False, bgcolor="#333333",edge_color="w", edge_linewidth=0.8, node_size=0)
    
    #for _, edge in ox.graph_to_gdfs(G, nodes=False).fillna('').iterrows():     # Name of the street
        #c = edge['geometry'].centroid
        #text = edge['name']
        #ax.annotate(text, (c.x, c.y), c='w')       
 
    return fig, ax


def map_network2(lat1,lon1,lat2,lon2,lat3,lon3,lat4,lon4):
    """" Function that creates a map of the urban network from a polygon (given the 4 bounds).
    
    Input:
        - bounds: 4 latitude and longitude points (GPS coordinates)
        
    Output:
        - The figure and the axes.
        
    """
    
    P = Polygon([(lat1,lon1), (lat2,lon2),(lat3,lon3),(lat4,lon4)])  # Create the graph from lat and lon bounds
    G = ox.graph_from_polygon(P, network_type='all') 
    fig, ax = ox.plot_graph(G, show=False, close=False, bgcolor="#333333",edge_color="w", edge_linewidth=0.8, node_size=0)
    place_name = "Granollers, Vallès Oriental"
    tags={"building": True}
    gdf = ox.geometries_from_place(place_name, tags)
    gdf.shape
    gdf.plot(ax=ax,color='silver',alpha=0.5)
    buildings = ox.geometries_from_place(place_name, tags={'building':True}) # Retrieve buildings from the area:
    
    #for _, edge in ox.graph_to_gdfs(G, nodes=False).fillna('').iterrows():     # Name of the street
        #c = edge['geometry'].centroid
        #text = edge['name']
        #ax.annotate(text, (c.x, c.y), c='w')       
 
    return fig, ax


def RadiusOfGyration(x,y):
    """" Function that computes the radius of gyration of a 2-d trajectory with x and y coordinates.
    
    M. C. Gonzalez, C. a C. A. Hidalgo, A. L. A.-L. Barabási, M. C. González, C. A. H. 
    & A.-L. B. Marta C. González, and M. C. Gonz, Nature 453, 779 (2008).
    
    Note: x,y coordinates can't be lat,lon in degrees. Before we must project them into a plane (e.g UTM projection)
    
    
    Input:
        - Lists of coordinates x and y
    
    Output:
        - Radius of gyration
    """
    
    r_cm_x=sum(x)/len(x)
    r_cm_y=sum(y)/len(y)
    
    radius2=[]
    for i in range(len(x)):
        r_new_x=x[i]-r_cm_x
        r_new_y=y[i]-r_cm_y
        radius2.append((r_new_x*r_new_x) + (r_new_y*r_new_y))
    
    mean_radius2=sum(radius2)/len(radius2)
    
    rg=mean_radius2**0.5
    
    return rg
        
    
        
        
def vector(latitude0,longitude0,latitude1,longitude1): 
    """ Given two points returns the vector from the origin point to destination. If GPS coordinates, use UTM projection.
    
    Input:
        - Coordinates of origin point (p0x,p0y) and destionation point (p1x, p1y).
    
    Oputput:
        - Coordinates of the vector from p0 to p1. 
    """
    
    p0=(latitude0,longitude0)   
    p1=(latitude1,longitude1)
    vec=(p1[0]-p0[0], p1[1]-p0[1]) 
    return vec



def determinant(vec0,vec1):
    """  Returns determinant of two vectors. If det<0 means that the second vector has turned in the clockwise direction.
    
    Input:
        - Two consecutive vectors (vec0, vec1) characterized with their x,y coordinates.
        
    Output:
        - Determinant of two vectors. If det<0 the second vector has turned in the clockwise direction respect to the first.
    """
    det=vec0[0]*vec1[1]-vec0[1]*vec1[0]
    return det


def reorientation(vec0,vec1):
    """Returns the angle between two consecutive vectors (change in orientation, reorientation, turning angle)
    The range is from -pi to +pi. If det<0 the second vector has turned in the clockwise direction and therefore the 
    reorientation angle is also <0. On contrary, if has turned counter-clockwise is >0. 
    
    Input: 
        - Two consecutive vectors (vec0, vec1) characterized with their x,y coordinates.
    
    Output:
        - Reorientation angle between them 
    """
    unit_vec0=vec0/np.linalg.norm(vec0)
    unit_vec1=vec1/np.linalg.norm(vec1)
    dot_product=np.dot(unit_vec0,unit_vec1)
    a=np.arccos(dot_product)
    det=determinant(vec0,vec1)
    if det<0:
        return -a
    else:
        return a

    
        
def turtuosity(latitudes,longitudes):
    """ Given the two lists of latitudes and longitudes (utm projection), it returns the turtuosity of the trajectory. 
    If turtuosity is near 0 means that the trajectory is very straight to the final destionation. If its near to 1 means that
    is the movement is very turtuous (not direct).
    
    Input:
        - Lists of x and y points (latitudes and longitudes projected) of a trajectory.
        
    Output:
        - Estimation of the turtuosity of the trajectory.
    """

    vectors=[]
    vectors_straight=[]
    for i in range(1,len(latitudes)):
        vectors.append(vector(latitudes[i-1],longitudes[i-1],latitudes[i],longitudes[i]))  # Vector between the point i-1 and  i.
        vectors_straight.append(vector(latitudes[i-1],longitudes[i-1],latitudes[-1],longitudes[-1]))  # straight vector between  
                                                                                         # point i and final point (destination)
        
    reorientations=[]   
    for i in range(len(vectors)-1):
        reorientations.append(np.cos(abs(reorientation(vectors_straight[i],vectors[i])))) # Reorientation angle 
        #reorientations2 = [x for x in reorientations if np.isnan(x) == False]  # avoid nan values                                                                           # vector and the straight vector (>=0)
            
    turtuo=1.-(sum(reorientations)/len(reorientations))  # The average value of the movement re-orientations gives the effiency
                                                         # of the trajectory (how straightforward/directed is towards the final
                                                         # destionation. Then 1 - effiency is the turtuosity
            
    return turtuo


   

In [None]:
df = pd.DataFrame(gpxtolist('g8.gpx'))    # Read the .gpx file 
df

## 0. Original trajectory representation

In [None]:
# We read the .gpx file with the data collected with the app. Wikiloc.
df = pd.DataFrame(gpxtolist('g8.gpx'))

# Initialise the map of the district of Granollers (Primer de Maig)
fig, ax = map_network2(2.2804419490443517, 41.605107921345585, 2.28054397694062,41.60183248235341,  
                       2.283748215236327,41.602065783859004,  2.2840062702794897, 41.60530532450823)

lat=df['lat'].tolist()  # Latitude and Longitude to lists
lon=df['lon'].tolist()

ax.plot(lon, lat,'-o',c='blue',markersize=1) # Scatter the GPS locs. in order to visualise the trajectory
ax.plot(lon[0],lat[0],'o',c='green',markersize=10) # Initial point
ax.plot(lon[-1],lat[-1],'o',c='red',markersize=10) # Final point
plt.show()


fig, ax = plt.subplots(figsize=(7,5))   # Scatter the trajectory
ax.plot(lon,lat,'-o',c='blue')
ax.plot(lon[0],lat[0],'o',c='green',markersize=10)
ax.plot(lon[-1],lat[-1],'o',c='red',markersize=10)
plt.show()
   
    


## 1. Data processing and cleaning

In [None]:
df2=df[5:10].copy()   # Copy of the DataFrame selecting the range of rows desired in order to clean (cut) the dat

### 1.1- Timestamp and Time difference

In [None]:
df2['time'] = pd.to_datetime(df2['time'], format='%Y-%m-%d %H:%M:%S')  # Correct format for the time (datetime)
temps=df2['time'].tolist()  # Time to list


# TIME DIFFERENCE BETWEEN CONSECTUVIE GEOLOCATIONS
diff_time=[]
for i in range(1,len(df2['time'])):
    diff=temps[i]-temps[i-1]
    diff_time.append(diff)

    
# Obviuosly, if there are N times, there will be N-1 times differences. So we need to add a NaN value at the end of 
# the list to store it as a new column of the DataFrame. 
diff_time.insert(len(diff_time), np.nan)  

df2['At']=diff_time   # Store it as a new column in the data frame and transform the difference into total seconds.
df2['At']=df2['At'].dt.seconds

t=df2['At'][:-1].tolist()  # Time difference (AT) in seconds into list (disregarding the last element, NaN)

### 1.2- Euclidian Distance (haversine formula) between Geo-locations

In [None]:
# EUCLIDIAN DISTANCE (USING THE HAVERSINE FORMULA) BETWEEN CONSECUTIVE GEOLOCATIONS
distance=[]
lat=df2['lat'].tolist()    # latitude and longitude to list
lon=df2['lon'].tolist()
for p in range(1,len(lat)):
    dist=getDistanceFromLatLonInM(lat[p-1],lon[p-1],lat[p],lon[p])    # distance between each consecutive geolocation
    distance.append(dist)        
    


### 1.3- Instantaneous velocity

In [None]:
# INSTANTANEOUS VELOCITY BETWEEN CONSECUTIVE GEOLOCATIONS (DISTANCE OVER TIME DIFF.)
velocity=[]
for k in range(len(distance)):
    velocity.append(instantaneous_velocity(distance[k],t[k]))

### 1.4- Print some statistics

In [None]:
print('Time Difference (\u0394t):')
print('<\u0394t>=',np.mean(t),'s')
print('\u03C3(\u0394t)=', np.std(t),'s')
print('')
print('')

print('Euclidian Distance (d):')
print('<d>=', np.mean(distance),'m')
print('\u03C3(d)=', np.std(distance),'m')
print('max distance=',max(distance),'m')
print('min distance=', min(distance),'m')
print('')
print('')

print('Instantaneous velocity (v):')
print('<v>=', np.mean(velocity),'m/s')
print('\u03C3(v)=', np.std(velocity),'m/s')
print('max velocity=',  max(velocity), 'm/s')
print('min velocity=', min(velocity), 'm/s')

#### Finally, insert NaN values at the end of the list of distances and velocities in order to incorporte them to the dataframe

In [None]:
distance.insert(len(distance), np.nan)
velocity.insert(len(velocity), np.nan)   # Add NaN value at the end in order to store as a new column of the DataFrame

df2['d']=distance
df2['v']=velocity

df2

### 1.5- Labelling Geo-locations as "Stop" or "Flight"

The Wikiloc App. stops recording the geo-location when it considers that there is no movement. After some tests using several trajectories, we consider that when the time difference between two consecutive recordings is larger than 10 seconds, the participant is stoped. 


We therefore can label each GPS location as "stop" or "flight".

In [None]:
df2 = df2.reset_index()  # Reset the index after the clean-up process

stops=[]                       # New column labelling each geo-location as "stop" if the time difference is >10s. or "flight"
for i in range(len(lon)):
    if df2['At'][i]>=10.0:
        stops.append('stop')
    else:
        stops.append('flight')

df2['stops']=stops      
df2

### 1.6- Store the processed data into a new .csv file
We store the new DataSet as a .csv file, processed and cleaned with new columns (time difference, distance, velocity and stops)

In [None]:
# Delete nnecessary columns and store the datafame into a .csv file with the name of the trajectory (group)
del df2['cmt']
del df2['desc']
del df2['index']  

#df2.to_csv("g8.csv",index=False)  

### 1.7- Final Map representation
Visualisation of the trajectory after the clean-up process

In [None]:
fig, ax = map_network2(2.2804419490443517, 41.605107921345585, 2.28054397694062,41.60183248235341,  
                       2.283748215236327,41.602065783859004,  2.2840062702794897, 41.60530532450823)

lat=df2['lat'].tolist()  # Latitude and Longitude to lists
lon=df2['lon'].tolist()

ax.plot(lon, lat,'-o',c='blue',markersize=1) # Scatter the GPS locs. 
ax.plot(lon[0],lat[0],'o',c='green',markersize=10)
ax.plot(lon[-1],lat[-1],'o',c='red',markersize=10)
plt.show()


fig, ax = plt.subplots(figsize=(7,5)) 
ax.plot(lon,lat,'-o',c='blue')
ax.plot(lon[0],lat[0],'o',c='green',markersize=10)
ax.plot(lon[-1],lat[-1],'o',c='red',markersize=10)
plt.show()
   
    
