#  Module 1 : Parts modales

**Description** : Le but de ce module est de mener un premier calcul des parts modales kilométriques i.e. les distances journalières moyennisées par mode et par motif.

**Durée estimée première partie** : 7 jours

**Objectifs spécifiques** :
- [ ] Sous-échantillonnage des résidents et visiteurs par canton (basé sur le GPS)
- [ ] Rééchantillonnage des jours d’observation pour avoir un calendrier continue par usager
- [ ] Intégrer le détail des transit
- [ ] Distinguer de façon aussi systématique que possible les jours sans déplacement des jours
non-détectés et comparaison statistique au jours non-déplacé dans d’autres bases de
données
- [ ] Recodage des modes et motifs selon besoin des cantons
- [ ] Calcul liminaire des parts modales kilométriques et par déplacements
- [ ] Ajout des données d’équipement (e.g. type de motorisation principale du ménage)
- [ ] Documenter les hypothèses et limites du calcul liminaire des parts modales (e.g. aspects
saisonniers, échantillonnage, perte de signal, moyennisation des données longitudinales, ...)

**Résultats attendus** : Parts modales kilométriques par mode pour les résidents et visiteurs de chaque canton en vue du calcul des émissions carbone. Il doit être possible de calculer les parts modales en tenant compte des jours non-mobiles.

**Sous-échantillonnage** :
- Vaud : résident·es du canton
- Genève : résident·es du canton

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

In [3]:
import geopandas as gpd
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np

from shapely import geometry, ops
from shapely.geometry import MultiLineString, LineString, Point
import os
import concurrent.futures
from shapely.ops import unary_union
import xyt

import time

### Charger les données

In [4]:
%%time
# Définir le CRS du projet (EPSG:4326 for WGS84)
target_crs = 'EPSG:4326'
print("CRS du projet: WGS84 \n")

#Charger les étapes
legs = pd.read_pickle('../Data/time_space_filters/legs_filtered.pkl')
del legs['canton_dep']
legs['started_at'] = pd.to_datetime(legs['started_at'])
legs['finished_at'] = pd.to_datetime(legs['finished_at'])
legs.rename(columns={'IDNO':'user_id_fors', 'id':'leg_id'}, inplace = True)

print("Fichier étape chargé")

#Charger les activités
staypoints = pd.read_pickle('../Data/time_space_filters/staypoints_filtered.pkl').reset_index(drop=True)
staypoints = gpd.GeoDataFrame(staypoints, geometry="geometry", crs=target_crs)

staypoints['started_at'] = pd.to_datetime(staypoints['started_at'])
staypoints['finished_at'] = pd.to_datetime(staypoints['finished_at'])
staypoints.rename(columns={'IDNO':'user_id_fors', 'id':'activity_id'}, inplace = True)
staypoints['lon'] = staypoints.geometry.x
staypoints['lat'] = staypoints.geometry.y

print("Fichier activité chargé")

#Charger les user_statistics
usr_stats = pd.read_pickle('../Data/processed_feuille_de_route/gps_user_statistics.pkl')
print("Fichier statistiques utilisateur·ices chargé")

#Charger les trips
trips = pd.read_csv('../Data/dumps_motiontag/Trips.2023-04-01--2023-08-31.csv')
print("Fichier des déplacements chargé")

#Charger les bases officielles pour le sous-échantillonage géographique https://opendata.swiss/de/dataset/vm-uvek-zones-2017/resource/29b98f2c-42f2-4e72-b8b1-a39500ed0ad0
TAZ = gpd.read_file('../../Vague1/Verkehrszonen_Schweiz_NPVM_2017_shp/Verkehrszonen_Schweiz_NPVM_2017.shp')
TAZ = TAZ[['ID_Agglo', 'N_Agglo', 'N_KT', 'ID_Gem', 'geometry']]
TAZ = TAZ.to_crs(crs=target_crs)
#repare anomalies
TAZ['geometry'] = TAZ['geometry'].buffer(0)
shp_KT = TAZ.dissolve(by='N_KT').reset_index()
print("Fichier Zones de traffic chargé")

CRS du projet: WGS84 

Fichier étape chargé
Fichier activité chargé
Fichier statistiques utilisateur·ices chargé
Fichier des déplacements chargé
Fichier Zones de traffic chargé
CPU times: user 17 s, sys: 4.81 s, total: 21.8 s
Wall time: 23.1 s


### Ajouter le *next activity_id* aux étapes

In [5]:
# Sort 'points' and 'legs' by 'started_at' to ensure data is in chronological order
staypoints.sort_values(by=['user_id_fors','started_at'], inplace=True, ignore_index=True)
legs.sort_values(by=['user_id_fors','started_at'], inplace=True)

In [6]:
legs = pd.merge(legs, staypoints[['activity_id', 'previous_leg_id']],
               left_on='leg_id', right_on='previous_leg_id', how='left')
legs.rename(columns={'activity_id':'leading_stay_id'}, inplace=True)
del legs['previous_leg_id']

###  Ajouter la durée et la longueur des étapes

In [7]:
%%time 
# Add length in meters
legs['length'] = legs.to_crs('EPSG:2056').length
# Add the duration in seconds
legs['duration'] = (legs['finished_at'] - legs['started_at']).dt.total_seconds()

CPU times: user 1min 2s, sys: 16.1 s, total: 1min 18s
Wall time: 1min 26s


### Extraire les aires géographiques et les sous-échantillons (Genève et Vaud)
Nous utilisons les zones de traffic du Modèle Voyageur de l'ARE.

We want to sample :
- all the residents of Canton de Genève
- all the activities that happen in Canton de Genève

In [8]:
%%time

# Prompt the user for N_KT value
n_kt = 'CH'

if n_kt == 'CH':
    #Prendre tous les résident·es
    list_residents_N_KT = legs.user_id_fors.unique().tolist()
else:    
    # Lister les résident·es du KT
    list_residents_N_KT = usr_stats.loc[usr_stats.KT_home_survey == n_kt, 'user_id_fors'].tolist()

# Sous Echantillon des legs des résident·es du KT
legs_N_KT = legs.loc[legs.user_id_fors.isin(list_residents_N_KT)].reset_index(drop=True).copy()

# Liste des activités des résident·es du KT
list_staypoints_residents_N_KT = legs_N_KT.dropna().leading_stay_id.tolist()

CPU times: user 629 ms, sys: 165 ms, total: 794 ms
Wall time: 834 ms


In [9]:
%%time
# Identifier les activités qui se passent dans le KT
staypoints_N_KT = staypoints[staypoints.activity_id.isin(list_staypoints_residents_N_KT)].reset_index(drop=True).copy()
list_activity_id_in_KT = staypoints_N_KT.loc[staypoints_N_KT.within(unary_union(TAZ[TAZ.N_KT == n_kt].geometry)), 'activity_id'].tolist()

CPU times: user 670 ms, sys: 564 ms, total: 1.23 s
Wall time: 1.53 s


In [10]:
#Flagger les activités qui se passent dans le KT pour les résident·es du KT
legs_N_KT['leading_stay_id_in_KT'] = 0
legs_N_KT.loc[legs_N_KT.leading_stay_id.isin(list_activity_id_in_KT), 'leading_stay_id_in_KT'] = 1

In [11]:
#Ajouter le user_id_day
legs_N_KT.insert(
    1,"user_id_day",legs_N_KT["user_id_fors"]
    + "_" 
    + legs_N_KT.started_at.dt.year.astype(str)
    + legs_N_KT.started_at.dt.month.astype(str).str.zfill(2)
    + legs_N_KT.started_at.dt.day.astype(str).str.zfill(2),
)
legs_N_KT.insert(1, 'legs_date',legs_N_KT.started_at.dt.date)
legs_N_KT['legs_date'] = pd.to_datetime(legs_N_KT['legs_date'])

In [12]:
%autoreload
xyt.plot_gps(staypoints[staypoints.activity_id.isin(list_activity_id_in_KT)].rename(columns={'user_id_fors':'user_id'}).dropna()[:2000], geo_columns='geometry')

  xyt.plot_gps(staypoints[staypoints.activity_id.isin(list_activity_id_in_KT)].rename(columns={'user_id_fors':'user_id'}).dropna()[:2000], geo_columns='geometry')


In [17]:
legs_N_KT

Unnamed: 0,leg_id,legs_date,user_id_day,started_at,finished_at,type,started_at_timezone,detected_mode,mode,user_id_fors,geometry,low_quality_legs_1,low_quality_legs_2,scattered_tracking,leading_stay_id,length,duration,leading_stay_id_in_KT
0,aa257257-d427-4a84-81b1-fe3bad92050b,2023-05-02,CH01_20230502,2023-05-02 14:46:42,2023-05-02 15:56:14,Track,Europe/Zurich,Mode::Car,Mode::Car,CH01,"LINESTRING (6.58428 46.54247, 6.58428 46.54247...",0,0,0,7dd02868-9907-4315-85e6-810913107a65,54830.991458,4172.0,0
1,5de690c8-5ee3-46d6-8dce-0733532c6c79,2023-05-02,CH01_20230502,2023-05-02 18:26:22,2023-05-02 18:39:12,Track,Europe/Zurich,Mode::Car,Mode::Car,CH01,"LINESTRING (6.94385 46.27073, 6.94385 46.27073...",0,0,0,,10293.051705,770.0,0
2,a6fa18d2-a25b-4f9d-ba5e-f91bc758774f,2023-05-02,CH01_20230502,2023-05-02 18:39:17,2023-05-02 18:48:41,Track,Europe/Zurich,Mode::Walk,Mode::Walk,CH01,"LINESTRING (6.96785 46.32073, 6.96784 46.32073...",0,0,0,59acb1a8-2a61-4cba-b2ea-cb563e7ae7fa,179.019953,564.0,0
3,4f2b8865-3290-4d4c-826d-7524d2278da9,2023-05-03,CH01_20230503,2023-05-03 04:26:51,2023-05-03 05:13:57,Track,Europe/Zurich,Mode::Car,Mode::Car,CH01,"LINESTRING (6.96822 46.31965, 6.96837 46.32017...",0,0,0,872eee0c-b72e-4e1f-8a20-f12c5e9ff5a7,55009.776548,2826.0,0
4,c17d2027-28a0-4124-a7e1-2ed227436310,2023-05-03,CH01_20230503,2023-05-03 09:49:49,2023-05-03 09:50:45,Track,Europe/Zurich,Mode::Walk,Mode::Walk,CH01,"LINESTRING (6.58502 46.54596, 6.58491 46.54557...",0,0,0,a6ec5bd7-1c05-4845-830d-2a9ac870b9a0,97.611720,56.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
668237,485fb233-bf7e-4aed-a98b-ca3d95846d78,2023-06-05,FR9994_20230605,2023-06-05 10:50:37,2023-06-05 11:09:14,Track,Europe/Zurich,Mode::Walk,Mode::Walk,FR9994,"LINESTRING (6.14684 46.20418, 6.14684 46.20418...",0,0,1,a4a8e0db-8d48-4566-8cd7-1bd38b2e484f,752.201394,1117.0,0
668238,a63683dd-8a30-4f5e-9f7c-a75a70ad1fd5,2023-06-05,FR9994_20230605,2023-06-05 11:30:27,2023-06-05 11:36:38,Track,Europe/Zurich,Mode::Walk,Mode::Walk,FR9994,"LINESTRING (6.15094 46.20234, 6.14995 46.20324...",0,0,1,78e868e5-a2f8-4e8e-b548-6db975c6ea13,452.281133,371.0,0
668239,f37f8129-4b03-480e-8b34-8fec6c3e4c7c,2023-06-05,FR9994_20230605,2023-06-05 16:49:44,2023-06-05 17:15:49,Track,Europe/Zurich,Mode::Car,Mode::Car,FR9994,"LINESTRING (6.14669 46.20446, 6.14669 46.20446...",0,0,1,061ff169-2c57-4a94-ba78-c1ad4cd1a527,20853.079017,1565.0,0
668240,e6d61832-e3f1-48e9-aa5e-b65f92d59eab,2023-06-05,FR9994_20230605,2023-06-05 18:12:27,2023-06-05 18:13:34,Track,Europe/Paris,Mode::Walk,Mode::Walk,FR9994,"LINESTRING (6.09565 46.12210, 6.09542 46.12224...",0,0,1,,97.012728,67.0,0


In [14]:
import pandas as pd

def get_daily_modal_distances(df):
    
    # Create a copy of the DataFrame to avoid modifying the original
    df = df.copy()
    
    df['length'] = df['length'].astype(float)
    # Group by 'user_id_day', 'previous_mode', and 'previous_leg_id', then sum the distances
    grouped = df.groupby(['user_id_fors', 'user_id_day', 'mode'])['length'].sum().reset_index()

    # Pivot the table to have modes as columns
    pivoted = grouped.pivot_table(
        index=['user_id_fors', 'user_id_day'],
        columns='mode',
        values='length',
        aggfunc='sum'
    ).reset_index()

    # Resample to include missing days and fill NaNs with different values in different columns
    pivoted['date'] = pd.to_datetime(pivoted['user_id_day'].str[-8:])
    # Create a date range covering the entire date range for each ID
    date_ranges = pivoted.groupby('user_id_fors')['date'].agg(['min', 'max']).reset_index()
    date_ranges['legs_date'] = date_ranges.apply(lambda row: pd.date_range(row['min'], row['max'], freq='D'), axis=1)

    # Create a Cartesian product of IDs and date ranges
    cartesian = date_ranges.explode('legs_date').reset_index(drop=True)

    # Complete the original df with a continuous timeline
    pivoted_filled = pd.merge(pivoted, cartesian[['user_id_fors', 'legs_date']], how='outer', left_on=['user_id_fors', 'date'],
                              right_on=['user_id_fors', 'legs_date'])

    # Create 'days_without_track' column and mark as True for added rows, False otherwise
    pivoted_filled['days_without_track'] = pivoted_filled['date'].isnull().astype(int)
    del pivoted_filled['date']

    # Fill missing values in the user_id_day column
    pivoted_filled['user_id_day'] = pivoted_filled.apply(
        lambda row: row['user_id_day'] if not pd.isnull(row['user_id_day'])
        else row['user_id_fors'] + "_" +
             row['legs_date'].strftime('%Y%m%d'),
        axis=1
    )

    # Fill missing values in the modes columns
    # Get the columns that start with 'Mode::'
    modes_columns = [col for col in pivoted_filled.columns if col.startswith('Mode::')]

    # Fill missing values in the 'modes_columns' with 0
    pivoted_filled[modes_columns] = pivoted_filled[modes_columns].fillna(0)

    # Sort the resulting DataFrame
    pivoted_filled.sort_values(by=['user_id_fors', 'legs_date'], inplace=True)

    return pivoted_filled


In [15]:
dmd = get_daily_modal_distances(legs_N_KT)
dmd.head()

Unnamed: 0,user_id_fors,user_id_day,Mode::Airplane,Mode::Bicycle,Mode::Bikesharing,Mode::Boat,Mode::Bus,Mode::Car,Mode::Carsharing,Mode::Ebicycle,Mode::Ecar,Mode::KickScooter,Mode::LightRail,Mode::Motorbike,Mode::Other,Mode::RegionalTrain,Mode::Subway,Mode::TaxiUber,Mode::Train,Mode::Tram,Mode::Walk,legs_date,days_without_track
0,CH01,CH01_20230502,0.0,0.0,0.0,0.0,0.0,65124.043163,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,179.019953,2023-05-02,0
1,CH01,CH01_20230503,0.0,0.0,0.0,0.0,0.0,109684.885687,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,97.61172,2023-05-03,0
2,CH01,CH01_20230504,0.0,0.0,0.0,0.0,0.0,109406.18637,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,451.83873,2023-05-04,0
3,CH01,CH01_20230505,0.0,0.0,0.0,0.0,0.0,127922.707667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1220.826887,2023-05-05,0
4,CH01,CH01_20230506,0.0,0.0,0.0,0.0,0.0,301541.634238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,8033.454326,2023-05-06,0


###  Get the mean distance per user in meter

In [None]:
# Assuming your DataFrame is named df
# Filter columns that start with 'Mode::'
mode_columns = dmd.filter(like='Mode::')

# Calculate the mean for each user_id, considering zeros
mean_mode_per_user = mode_columns.groupby(dmd['user_id_fors']).apply(lambda x: x.mean())
mean_mode_per_user

###  Get the sum distance per user in meter

In [None]:
import pandas as pd

# Assuming your DataFrame is named df
# Filter columns that start with 'Mode::'
mode_columns = dmd.filter(like='Mode::')

# Calculate the mean for each user_id, considering zeros
sum_mode_per_user = mode_columns.groupby(dmd['user_id_fors']).apply(lambda x: x.sum())


sum_mode_per_user


In [None]:
sum_mode_per_user_ = pd.merge(sum_mode_per_user.reset_index(), usr_stats, how='left', on='user_id_fors')
sum_mode_per_user_

In [None]:
import matplotlib.pyplot as plt

df = sum_mode_per_user_.copy()

# Select only columns that start with 'Mode::'
mode_cols = df.filter(like='Mode::')

# Divide each 'Mode::' column by 'active_days_count'
for col in mode_cols.columns:
    df[col] = df[col] / df['active_days_count']

# Calculate the mean for each 'Mode::' column
mode_means = df.filter(like='Mode::').mean()

# Plotting a Pie Chart
plt.figure(figsize=(10, 6))
plt.pie(mode_means, labels=mode_means.index, autopct='%1.1f%%', startangle=140)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Modal Shares')
plt.show()