#  Module 1 : Parts modales

**Description** : Le but de ce module est de mener un premier calcul des parts modales kilométriques i.e. les distances journalières moyennisées par mode et par motif.

**Durée estimée première partie** : 7 jours

**Objectifs spécifiques** :
- [x] Sous-échantillonnage des résidents et visiteurs par canton (basé sur le GPS)
- [x] Rééchantillonnage des jours d’observation pour avoir un calendrier continue par usager
- [ ] Intégrer le détail des transit
- [x] Distinguer de façon aussi systématique que possible les jours sans déplacement des jours
non-détectés et comparaison statistique au jours non-déplacé dans d’autres bases de
données
- [ ] Recodage des modes et motifs selon besoin des cantons
- [x] Calcul liminaire des parts modales kilométriques et par déplacements
- [ ] Ajout des données d’équipement (e.g. type de motorisation principale du ménage)
- [ ] Documenter les hypothèses et limites du calcul liminaire des parts modales (e.g. aspects
saisonniers, échantillonnage, perte de signal, moyennisation des données longitudinales, ...)

**Résultats attendus** : Parts modales kilométriques par mode pour les résidents et visiteurs de chaque canton en vue du calcul des émissions carbone. Il doit être possible de calculer les parts modales en tenant compte des jours non-mobiles.

**Sous-échantillonnage** :
- Vaud : résident·es du canton
- Genève : résident·es du canton

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

In [3]:
import geopandas as gpd
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np

from shapely import geometry, ops
from shapely.geometry import MultiLineString, LineString, Point
import os
import concurrent.futures
from shapely.ops import unary_union
from shapely.geometry import JOIN_STYLE, Polygon, MultiPolygon

import pycountry
import xyt

import time

from panel_functions import *

### Charger les données

In [4]:
%%time
# Définir le CRS du projet (EPSG:4326 for WGS84)
target_crs = 'EPSG:4326'
print("CRS du projet: WGS84 \n")

#Charger les étapes
legs = pd.read_pickle('../Data/dumps_motiontag/storyline_time_space_filters/legs.pkl')
print("Fichier étape chargé")

#Charger les activités
staypoints = pd.read_pickle('../Data/dumps_motiontag/storyline_formated/staypoints.pkl').reset_index(drop=True)
print("Fichier activité chargé")

#Charger les user_statistics
usr_stats = pd.read_pickle('../Data/processed_feuille_de_route/gps_user_statistics.pkl')
print("Fichier statistiques utilisateur·ices chargé")

#Charger les trips
trips = pd.read_csv('../Data/dumps_motiontag/Trips.2023-04-01--2023-08-31.csv')
print("Fichier des déplacements chargé")

#Charger les bases officielles pour le sous-échantillonage géographique https://opendata.swiss/de/dataset/vm-uvek-zones-2017/resource/29b98f2c-42f2-4e72-b8b1-a39500ed0ad0
TAZ = gpd.read_file('../../Vague1/Verkehrszonen_Schweiz_NPVM_2017_shp/Verkehrszonen_Schweiz_NPVM_2017.shp')
TAZ = TAZ[['ID_Agglo', 'N_Agglo', 'N_KT', 'ID_Gem', 'geometry']]
TAZ = TAZ.to_crs(crs=target_crs)
#repare anomalies
TAZ['geometry'] = TAZ['geometry'].buffer(0)
shp_KT = TAZ.dissolve(by='N_KT').reset_index()
print("Fichier Zones de traffic chargé")

# Get world countries GeoDataFrame
def get_world_countries():
    world_countries = gpd.read_file('../Data/other_shp/countries/ne_110m_admin_0_countries.shp')
    world_countries = world_countries[['SOVEREIGNT','geometry']]
    return world_countries
world_countries = get_world_countries()
print("Fichier Map Monde chargé")

# Get perimetre panel GeoDataFrame
perimetre_panel = gpd.read_file('../Data/other_shp/perimetre_panel/perimetre_panel_08.01.24.shp')
perimetre_panel = perimetre_panel.to_crs(crs=target_crs)
perimetre_panel = perimetre_panel[['COMM_ID','COMM_NAME','Typo_panel','geometry']]
perimetre_panel['panel_area'] = 1

perimetre_panel_full = perimetre_panel.dissolve().geometry.apply(lambda p: close_holes(p))
perimetre_panel_full = gpd.GeoDataFrame(geometry=[perimetre_panel_full.iloc[0]], crs=target_crs)
perimetre_panel_full['panel_area'] = 1
print("Fichier Périmètre panel chargé")

CRS du projet: WGS84 

Fichier étape chargé
Fichier activité chargé
Fichier statistiques utilisateur·ices chargé
Fichier des déplacements chargé
Fichier Zones de traffic chargé
Fichier Map Monde chargé
Fichier Périmètre panel chargé
CPU times: user 17.5 s, sys: 3.89 s, total: 21.4 s
Wall time: 21.9 s


#### Ajouter un ID par usager-jour

In [5]:
#Ajouter le user_id_day
legs.insert(1, 'legs_date',legs.started_at.dt.date)
legs['legs_date'] = pd.to_datetime(legs['legs_date'])

legs.insert(
    1,"user_id_day",legs["user_id_fors"]
    + "_" 
    + legs.started_at.dt.year.astype(str)
    + legs.started_at.dt.month.astype(str).str.zfill(2)
    + legs.started_at.dt.day.astype(str).str.zfill(2),
)

### Ajouter le *next activity_id* aux étapes

In [6]:
# Sort 'points' and 'legs' by 'started_at' to ensure data is in chronological order
staypoints.sort_values(by=['user_id_fors','started_at'], inplace=True, ignore_index=True)
legs.sort_values(by=['user_id_fors','started_at'], inplace=True)

In [7]:
legs.head(1)

Unnamed: 0,leg_id,user_id_day,legs_date,user_id_motiontag,type,started_at,started_at_timezone,finished_at,finished_at_timezone,length,detected_mode,mode,purpose,geometry,confirmed_at,started_on,misdetected_completely,merged,created_at,updated_at,started_at_in_timezone,finished_at_in_timezone,confirmed_at_in_timezone,created_at_in_timezone,updated_at_in_timezone,point_per_linestring,max_signlalloss_meters,length_leg,rel_max_signalloss,low_quality_legs_1,low_quality_legs_2,usr_w_constant_bad_signal,user_id_fors
498126,aa257257-d427-4a84-81b1-fe3bad92050b,CH01_20230502,2023-05-02,be168a66-975b-4558-98e8-524c04352039,Track,2023-05-02 14:46:42,Europe/Zurich,2023-05-02 15:56:14,Europe/Zurich,54817.0,Mode::Car,Mode::Car,,"LINESTRING (6.58428 46.54247, 6.58428 46.54247...",2023-05-03 11:14:02.259,2023-05-02,f,f,2023-05-02 16:07:09.917,2023-05-03 11:14:02.259,2023-05-02T16:46:42+02:00,2023-05-02T17:56:14+02:00,2023-05-03T11:14:02Z,2023-05-02T16:07:09Z,2023-05-03T11:14:02Z,1330,3909.203141,54830.991413,0.071296,0,0,0,CH01


In [8]:
result = staypoints.copy()
result['finished_at'] = pd.to_datetime(result['finished_at'], format='%Y-%m-%d %H:%M:%S')
result.sort_values(by=['user_id_fors','finished_at'], inplace=True)


previous_leg = legs[['user_id_fors', 'finished_at', 'leg_id', 'mode']].rename(columns={'finished_at': 'started_at', 'leg_id': 'previous_leg_id', 'mode':'previous_mode'})
previous_leg['started_at'] = pd.to_datetime(previous_leg['started_at'], format='%Y-%m-%d %H:%M:%S')
previous_leg.sort_values(by=['user_id_fors','started_at'], inplace=True)
previous_leg.dropna(inplace=True)

#for user_id_fors in result.user_id_fors.unique():
result = pd.merge(result, previous_leg, on=['user_id_fors','started_at'], how='left')


# Merge 'staypoints' with 'legs' to find the next leg
next_leg = legs[['user_id_fors', 'started_at', 'leg_id', 'mode']].rename(columns={'started_at': 'finished_at', 'leg_id': 'next_leg_id', 'mode':'next_mode'})
next_leg['finished_at'] = pd.to_datetime(next_leg['finished_at'], format='%Y-%m-%d %H:%M:%S')
next_leg.sort_values(by=['user_id_fors','finished_at'], inplace=True)
next_leg.dropna(inplace=True)

#for user_id_fors in result.user_id_fors.unique():
result = pd.merge(result, next_leg, on=['user_id_fors','finished_at'], how='left')


# Drop unnecessary columns from the result
result.sort_values(by=['user_id_fors','started_at'], inplace=True)#.drop(['next_leg_started_at', 'past_leg_started_at'], axis=1, inplace=True)

# 
staypoints = result.copy()

In [9]:
legs = pd.merge(legs, staypoints[['activity_id', 'previous_leg_id']],
               left_on='leg_id', right_on='previous_leg_id', how='left')
legs.rename(columns={'activity_id':'leading_stay_id'}, inplace=True)
del legs['previous_leg_id']

###  Ajouter la durée et la longueur des étapes

In [10]:
%%time 
# Add length in meters
legs['length'] = legs.to_crs('EPSG:2056').length
# Add the duration in seconds
legs['duration'] = (legs['finished_at'] - legs['started_at']).dt.total_seconds()

CPU times: user 1min 2s, sys: 21.2 s, total: 1min 23s
Wall time: 1min 39s


### Extraire les aires géographiques et les sous-échantillons (Genève et Vaud)
Nous utilisons les zones de traffic du Modèle Voyageur de l'ARE.

We want to sample :
- all the residents of Canton de Genève
- all the activities that happen in Canton de Genève

To do that we flag all destionation Kantons in the oclumns _leading_stay_id_in_KT_

In [11]:
#staypoints_ = staypoints.copy()
#staypoints = staypoints_.copy()

In [12]:
%%time
# Perform spatial join of staypoints with world_countries
staypoints = gpd.sjoin(staypoints, world_countries, how='left', predicate='within').rename(columns={'SOVEREIGNT':'activity_in_country'})

# Fill NaN values in the 'country_name' column with 'Unknown'
staypoints['activity_in_country'] = staypoints['activity_in_country'].fillna('Unknown')
staypoints.drop(columns=['index_right'], inplace=True)

# Perform spatial join with TAZ
staypoints = gpd.sjoin(staypoints, TAZ[['N_KT', 'geometry']], how='left', predicate='within').rename(columns={'N_KT': 'activity_in_KT'})
staypoints.drop(columns=['index_right'], inplace=True)
# Adjust the saptial join for corner cases
staypoints.loc[~staypoints.activity_in_KT.isna(),'activity_in_country'] = 'Switzerland'
staypoints['activity_in_KT'] = staypoints['activity_in_KT'].fillna('Other')

# Perform spatial join with Panel Lemanique area
# Function to check if a point is within the panel's geometry
staypoints = gpd.sjoin(staypoints, perimetre_panel_full.dissolve(), how='left', predicate='within')
staypoints.loc[staypoints.panel_area.isna(),'panel_area'] = 0
staypoints['panel_area'] = staypoints.panel_area.astype(int)
staypoints.drop(columns=['index_right'], inplace=True)

# Get the home and motorization of the user_
staypoints = pd.merge(staypoints, usr_stats[['KT_home_survey','user_id_fors','car_in_HH_count','main_motor']], on='user_id_fors', how='left')

CPU times: user 7.09 s, sys: 2.16 s, total: 9.26 s
Wall time: 9.84 s


In [13]:
%%time
legs = pd.merge(legs, staypoints[['activity_id','activity_in_KT','panel_area','KT_home_survey']].dropna(subset='activity_id'),
                left_on='leading_stay_id',
                right_on='activity_id',
                how='left')
del legs['activity_id']

CPU times: user 1.25 s, sys: 526 ms, total: 1.78 s
Wall time: 1.93 s


#### Cartographie pour vérifier les filtres

In [14]:
staypoints.head(2)

Unnamed: 0,activity_id,user_id_motiontag,type,started_at,started_at_timezone,finished_at,finished_at_timezone,purpose,geometry,confirmed_at,started_on,misdetected_completely,merged,created_at,updated_at,started_at_in_timezone,finished_at_in_timezone,confirmed_at_in_timezone,created_at_in_timezone,updated_at_in_timezone,user_id_fors,lon,lat,previous_leg_id,previous_mode,next_leg_id,next_mode,activity_in_country,activity_in_KT,panel_area,KT_home_survey,car_in_HH_count,main_motor
0,2f587959-8604-44c6-bf28-f19e2bf0913b,be168a66-975b-4558-98e8-524c04352039,Stay,2023-05-02 13:04:38,Europe/Zurich,2023-05-02 14:46:42,Europe/Zurich,unknown,POINT (6.58428 46.54247),2023-05-03 11:14:07.967,2023-05-02,f,f,2023-05-02 16:07:09.880,2023-05-03 11:14:07.967,2023-05-02T15:04:38+02:00,2023-05-02T16:46:42+02:00,2023-05-03T11:14:07Z,2023-05-02T16:07:09Z,2023-05-03T11:14:07Z,CH01,6.584277,46.542471,,,aa257257-d427-4a84-81b1-fe3bad92050b,Mode::Car,Switzerland,VD,1,VD,2.0,essence
1,7dd02868-9907-4315-85e6-810913107a65,be168a66-975b-4558-98e8-524c04352039,Stay,2023-05-02 15:56:14,Europe/Zurich,2023-05-02 18:26:22,Europe/Zurich,unknown,POINT (6.94385 46.27073),2023-05-03 11:13:59.761,2023-05-02,f,f,2023-05-03 04:34:05.505,2023-05-03 11:13:59.761,2023-05-02T17:56:14+02:00,2023-05-02T20:26:22+02:00,2023-05-03T11:13:59Z,2023-05-03T04:34:05Z,2023-05-03T11:13:59Z,CH01,6.943852,46.270734,aa257257-d427-4a84-81b1-fe3bad92050b,Mode::Car,5de690c8-5ee3-46d6-8dce-0733532c6c79,Mode::Car,Switzerland,VS,1,VD,2.0,essence


In [15]:
%autoreload
xyt.plot_gps(staypoints[staypoints.activity_in_KT == 'VD'].rename(columns={'user_id_fors':'user_id'}).dropna()[:2000], geo_columns='geometry')

In [16]:
import pandas as pd

def get_daily_modal_distances(df):
    
    # Create a copy of the DataFrame to avoid modifying the original
    df = df.copy()
    
    df['length'] = df['length'].astype(float)
    # Group by 'user_id_day', 'previous_mode', and 'previous_leg_id', then sum the distances
    grouped = df.groupby(['user_id_fors', 'user_id_day', 'mode'])['length'].sum().reset_index()

    # Pivot the table to have modes as columns
    pivoted = grouped.pivot_table(
        index=['user_id_fors', 'user_id_day'],
        columns='mode',
        values='length',
        aggfunc='sum'
    ).reset_index()

    # Resample to include missing days and fill NaNs with different values in different columns
    pivoted['date'] = pd.to_datetime(pivoted['user_id_day'].str[-8:])
    # Create a date range covering the entire date range for each ID
    date_ranges = pivoted.groupby('user_id_fors')['date'].agg(['min', 'max']).reset_index()
    date_ranges['legs_date'] = date_ranges.apply(lambda row: pd.date_range(row['min'], row['max'], freq='D'), axis=1)

    # Create a Cartesian product of IDs and date ranges
    cartesian = date_ranges.explode('legs_date').reset_index(drop=True)

    # Complete the original df with a continuous timeline
    pivoted_filled = pd.merge(pivoted, cartesian[['user_id_fors', 'legs_date']], how='outer', left_on=['user_id_fors', 'date'],
                              right_on=['user_id_fors', 'legs_date'])

    # Create 'days_without_track' column and mark as True for added rows, False otherwise
    pivoted_filled['days_without_track'] = pivoted_filled['date'].isnull().astype(int)
    del pivoted_filled['date']

    # Fill missing values in the user_id_day column
    pivoted_filled['user_id_day'] = pivoted_filled.apply(
        lambda row: row['user_id_day'] if not pd.isnull(row['user_id_day'])
        else row['user_id_fors'] + "_" +
             row['legs_date'].strftime('%Y%m%d'),
        axis=1
    )

    # Fill missing values in the modes columns
    # Get the columns that start with 'Mode::'
    modes_columns = [col for col in pivoted_filled.columns if col.startswith('Mode::')]

    # Fill missing values in the 'modes_columns' with 0
    pivoted_filled[modes_columns] = pivoted_filled[modes_columns].fillna(0)

    # Sort the resulting DataFrame
    pivoted_filled.sort_values(by=['user_id_fors', 'legs_date'], inplace=True)

    return pivoted_filled


###  Get the weighted sum distance per user in meter

In [17]:
col_order = col = ['leg_id', 'user_id_day', 'user_id_fors', 'user_id_motiontag', 'type',
 'geometry', 'legs_date', 'started_at', 'started_at_timezone', 
 'finished_at','finished_at_timezone', 
 'length', 'detected_mode', 'mode', 'purpose',
 'confirmed_at', 'started_on', 'misdetected_completely', 'merged',
 'created_at', 'updated_at', 'started_at_in_timezone',
 'finished_at_in_timezone', 'confirmed_at_in_timezone',
 'created_at_in_timezone', 'updated_at_in_timezone',
 'point_per_linestring', 'max_signlalloss_meters', 'length_leg',
 'rel_max_signalloss', 'low_quality_legs_1', 'low_quality_legs_2',
 'usr_w_constant_bad_signal', 'leading_stay_id',
 'duration', 'activity_in_KT', 'panel_area', 'KT_home_survey']

legs = legs[col_order].dropna(subset='user_id_fors')
legs.sort_values(by=['user_id_fors','started_at'], inplace=True)

In [23]:
# OUTPUT for streamlit app

legs_nogeometry = legs.copy()
del legs_nogeometry['geometry']
del legs_nogeometry['user_id_motiontag']
legs_nogeometry.to_pickle('../Data/processed_feuille_de_route/legs_nogeometry.pkl')

# And for other usages
#legs.to_pickle('../Data/processed_feuille_de_route/legs.pkl')

In [24]:
legs_nogeometry.head()

Unnamed: 0,leg_id,user_id_day,user_id_fors,type,legs_date,started_at,started_at_timezone,finished_at,finished_at_timezone,length,detected_mode,mode,purpose,confirmed_at,started_on,misdetected_completely,merged,created_at,updated_at,started_at_in_timezone,finished_at_in_timezone,confirmed_at_in_timezone,created_at_in_timezone,updated_at_in_timezone,point_per_linestring,max_signlalloss_meters,length_leg,rel_max_signalloss,low_quality_legs_1,low_quality_legs_2,usr_w_constant_bad_signal,leading_stay_id,duration,activity_in_KT,panel_area,KT_home_survey
0,aa257257-d427-4a84-81b1-fe3bad92050b,CH01_20230502,CH01,Track,2023-05-02,2023-05-02 14:46:42,Europe/Zurich,2023-05-02 15:56:14,Europe/Zurich,54830.991458,Mode::Car,Mode::Car,,2023-05-03 11:14:02.259,2023-05-02,f,f,2023-05-02 16:07:09.917,2023-05-03 11:14:02.259,2023-05-02T16:46:42+02:00,2023-05-02T17:56:14+02:00,2023-05-03T11:14:02Z,2023-05-02T16:07:09Z,2023-05-03T11:14:02Z,1330,3909.203141,54830.991413,0.071296,0,0,0,7dd02868-9907-4315-85e6-810913107a65,4172.0,VS,1.0,VD
1,5de690c8-5ee3-46d6-8dce-0733532c6c79,CH01_20230502,CH01,Track,2023-05-02,2023-05-02 18:26:22,Europe/Zurich,2023-05-02 18:39:12,Europe/Zurich,10293.051705,Mode::Car,Mode::Car,,2023-05-03 11:13:47.741,2023-05-02,f,f,2023-05-03 04:34:05.577,2023-05-03 11:13:47.741,2023-05-02T20:26:22+02:00,2023-05-02T20:39:12+02:00,2023-05-03T11:13:47Z,2023-05-03T04:34:05Z,2023-05-03T11:13:47Z,405,776.067258,10293.051697,0.075397,0,0,0,,770.0,,,
2,a6fa18d2-a25b-4f9d-ba5e-f91bc758774f,CH01_20230502,CH01,Track,2023-05-02,2023-05-02 18:39:17,Europe/Zurich,2023-05-02 18:48:41,Europe/Zurich,179.019953,Mode::Walk,Mode::Walk,,2023-05-03 11:13:43.612,2023-05-02,f,f,2023-05-03 04:34:05.618,2023-05-03 11:13:43.612,2023-05-02T20:39:17+02:00,2023-05-02T20:48:41+02:00,2023-05-03T11:13:43Z,2023-05-03T04:34:05Z,2023-05-03T11:13:43Z,75,44.59998,179.019953,0.249134,0,0,0,59acb1a8-2a61-4cba-b2ea-cb563e7ae7fa,564.0,VD,1.0,VD
3,4f2b8865-3290-4d4c-826d-7524d2278da9,CH01_20230503,CH01,Track,2023-05-03,2023-05-03 04:26:51,Europe/Zurich,2023-05-03 05:13:57,Europe/Zurich,55009.776548,Mode::Car,Mode::Car,,2023-05-03 12:23:52.571,2023-05-03,f,f,2023-05-03 05:28:11.735,2023-05-03 12:23:52.571,2023-05-03T06:26:51+02:00,2023-05-03T07:13:57+02:00,2023-05-03T12:23:52Z,2023-05-03T05:28:11Z,2023-05-03T12:23:52Z,1329,2432.354157,55009.776503,0.044217,0,0,0,872eee0c-b72e-4e1f-8a20-f12c5e9ff5a7,2826.0,VD,1.0,VD
4,c17d2027-28a0-4124-a7e1-2ed227436310,CH01_20230503,CH01,Track,2023-05-03,2023-05-03 09:49:49,Europe/Zurich,2023-05-03 09:50:45,Europe/Zurich,97.61172,Mode::Walk,Mode::Walk,,2023-05-03 12:24:02.279,2023-05-03,f,f,2023-05-03 10:00:16.995,2023-05-03 12:24:02.279,2023-05-03T11:49:49+02:00,2023-05-03T11:50:45+02:00,2023-05-03T12:24:02Z,2023-05-03T10:00:16Z,2023-05-03T12:24:02Z,19,44.689762,97.61172,0.457832,0,0,0,a6ec5bd7-1c05-4845-830d-2a9ac870b9a0,56.0,VD,1.0,VD


In [38]:
#Compute daily modal distances
def calculate_dmd(legs_nogeom, usr_stats, KT, weight, period_of_tracking, visitors, airplane, incl_signal_loss):

    legs_nogeometry = legs_nogeom.copy()
    # Filter Airplane if needed
    if not airplane:
        legs_nogeometry = legs_nogeometry[legs_nogeometry['mode'] != 'Mode::Airplane'].copy()

    # Filter tracks with signal loss
    if not incl_signal_loss:
        legs_nogeometry = legs_nogeometry[legs_nogeometry['low_quality_legs_1'] == 0].copy()
        
        
    # Creating a dictionary mapping user IDs to their corresponding weight values
    # If weight is 'Aucun', map each user ID to the value 1
    if weight == 'Aucun':
        weight_mapping = usr_stats.set_index('user_id_fors').apply(lambda x: 1, axis=1).to_dict()
    else:
        weight_mapping = usr_stats.set_index('user_id_fors')[weight].to_dict()
    
    # Creating a dictionary mapping user IDs to their corresponding period of tracking values
    active_days_mapping = usr_stats.set_index('user_id_fors')[period_of_tracking].to_dict()
    
    # Setting the condition based on the value of KT
    KT_condition = (legs_nogeometry['KT_home_survey'] == KT) if KT != 'Tous' else np.full(len(legs_nogeometry), True)
    
    # Setting the visit condition if visitors are considered
    visit_condition = (legs_nogeometry['activity_in_KT'] == KT) if visitors else None
    
    # Applying conditions and computing daily modal distances
    if visitors:
        dmd_condition = KT_condition | visit_condition
    else:
        dmd_condition = KT_condition
    
    dmd = get_daily_modal_distances(legs_nogeometry[dmd_condition])
    
    # Filtering columns that start with 'Mode::' for further calculations
    mode_columns = dmd.filter(like='Mode::')
    
    # Calculating the sum for each 'Mode::' column for each user_id
    sum_mode_per_user = mode_columns.groupby(dmd['user_id_fors']).apply(lambda x: x.sum())
    
    # Weighting the sum of each 'Mode::' column based on user weights and active days
    sum_mode_per_user_w = sum_mode_per_user.mul(sum_mode_per_user.index.map(weight_mapping), axis=0).div(sum_mode_per_user.index.map(active_days_mapping), axis=0).dropna()

    return sum_mode_per_user_w.astype(int), active_days_mapping


In [26]:
# IMPORTANT Check if conditions work well

KT = 'GE'
visitors = True
# Setting the condition based on the value of KT
KT_condition = (legs_nogeometry['KT_home_survey'] == KT) if KT != 'Tous' else np.full(len(legs_nogeometry), True)

# Setting the visit condition if visitors are considered
visit_condition = (legs_nogeometry['activity_in_KT'] == KT) if visitors else None

# Applying conditions and computing daily modal distances
if visitors:
    dmd_condition = KT_condition | visit_condition
else:
    dmd_condition = KT_condition

len(legs_nogeometry[dmd_condition])

129720

In [27]:
# Residents
n_res = len(legs_nogeometry[legs_nogeometry['KT_home_survey'] == KT])
# Visitors but NOT residents
n_vis = len(legs_nogeometry[(legs_nogeometry['KT_home_survey'] != KT) & (legs_nogeometry['activity_in_KT'] == KT)])
n_res + n_vis

129720

In [28]:
import pandas as pd

def dmd_aggreg_modes(dmd, level):
    df = dmd.copy()

    if level == "Motiontag":
        return df
    else:
        if level == "MRMT":
            # First level of mode mapping
            mode_mapping = {
                'Voiture conducteur': ['Mode::Car', 'Mode::Carsharing','Mode::Ecar'],
                'Taxi': ['Mode::TaxiUber'],
                '2RM': ['Mode::KickScooter','Mode::Motorbike'],
                'Train': ['Mode::RegionalTrain','Mode::Train'],
                'Bus': ['Mode::Bus'],
                'Tram/Métro': ['Mode::LightRail','Mode::Subway','Mode::Tram'],
                'Bateau': ['Mode::Boat'],
                'Marche': ['Mode::Walk'],
                'Vélo conventionnel': ['Mode::Bicycle', 'Mode::Bikesharing'],
                'Vélo électrique': ['Mode::Ebicycle'],
                'Engins assimilés à des véhicules': ['Mode::Other'],
                'Avion': ['Mode::Airplane']
            }
    
        elif level == "Niveau 1":
            # Second level of mode mapping
            mode_mapping = {
                'Voiture conducteur': ['Mode::Car', 'Mode::Carsharing','Mode::Ecar','Mode::TaxiUber'],
                '2RM': ['Mode::KickScooter', 'Mode::Motorbike'],
                'Train': ['Mode::Train','Mode::RegionalTrain'],
                'Autre TP': ['Mode::Bus','Mode::LightRail','Mode::Subway','Mode::Tram','Mode::Boat'],
                'Marche': ['Mode::Walk'],
                'Vélo': ['Mode::Bicycle', 'Mode::Bikesharing','Mode::Ebicycle'],
                'Autre': ['Mode::Other'],
                'Avion': ['Mode::Airplane']
            }
    
        elif level == "Niveau 2":
            # Third level of mode mapping
            mode_mapping = {
                'TIM': ['Mode::Car', 'Mode::Carsharing','Mode::Ecar', 'Mode::KickScooter','Mode::Motorbike','Mode::TaxiUber'],
                'TP': ['Mode::Boat','Mode::Bus','Mode::LightRail','Mode::RegionalTrain', 'Mode::Subway','Mode::Train', 'Mode::Tram'],
                'MD': ['Mode::Bicycle', 'Mode::Bikesharing','Mode::Ebicycle', 'Mode::Walk'],
                'Avion': ['Mode::Airplane'],
                'Autre': ['Mode::Other']
            }
        
        else:
            raise ValueError("Invalid level. Please choose Motiontag, MRMT, Niveau 1 or Niveau 2 for the desired level.")
        
        # Create new columns based on the mapping
        for new_column, modes in mode_mapping.items():
            # Check if modes exist in columns before summing
            valid_modes = [mode for mode in modes if mode in df.columns]
            df[new_column] = df[valid_modes].sum(axis=1, min_count=1)
        
        # Create a new DataFrame with the new columns
        new_dmd = df[list(mode_mapping.keys())].copy()
        
        # Check if 'Avion' column is full of NaN, then drop it
        if 'Avion' in new_dmd.columns and new_dmd['Avion'].isnull().all():
            new_dmd.drop(columns=['Avion'], inplace=True)
    
        return new_dmd

In [39]:
# Possible values: 'GE', 'VD', 'Tous'
KT = 'Tous'
# Possible values: 'wgt_agg_trim_gps', 'wgt_cant_gps', 'wgt_agg_gps', 'wgt_cant_trim_gps', 'Aucun'
weight = 'wgt_agg_trim_gps' 

# Selecting the period of tracking for user activities
# Possible values: 'active_days_count', 'days_with_track'
period_of_tracking = 'active_days_count'

visitors = False
airplane = False
incl_signal_loss = True
dmd, active = calculate_dmd(legs_nogeometry, usr_stats, KT, weight, 
              period_of_tracking, visitors, airplane,incl_signal_loss)

dmd

Unnamed: 0_level_0,Mode::Bicycle,Mode::Bikesharing,Mode::Boat,Mode::Bus,Mode::Car,Mode::Carsharing,Mode::Ebicycle,Mode::Ecar,Mode::KickScooter,Mode::LightRail,Mode::Motorbike,Mode::Other,Mode::RegionalTrain,Mode::Subway,Mode::TaxiUber,Mode::Train,Mode::Tram,Mode::Walk
user_id_fors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
CH01,0,0,0,322,131520,0,0,0,0,627,0,0,0,0,0,1632,0,1374
CH10003,546,0,0,35,3361,0,0,0,0,359,0,8,1454,0,0,0,14,1054
CH10039,6593,0,0,2112,158073,0,0,0,0,0,0,0,0,0,0,0,0,2289
CH10068,0,0,0,1208,45341,0,0,0,0,202,0,0,0,0,0,0,0,1516
CH1007,3430,0,0,556,8809,0,0,0,0,852,0,0,0,22,0,33954,0,762
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
FR9891,1924,0,0,0,60647,0,0,0,0,0,0,0,0,0,0,0,25,2848
FR996,2822,0,0,1392,115157,0,0,0,0,0,0,0,105670,8138,0,77469,9049,10753
FR9983,3,0,0,39,5614,0,0,0,0,0,0,0,0,0,0,33,0,132
FR9991,0,0,0,124,18656,0,0,0,0,0,0,0,0,0,0,0,0,233


In [42]:
dist_user_id_day = dmd_aggreg_modes(dmd, level='Niveau 1')
dist_user_id_day.head()

Unnamed: 0_level_0,Voiture conducteur,2RM,Train,Autre TP,Marche,Vélo,Autre
user_id_fors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CH01,131520,0,1632,949,1374,0,0
CH10003,3361,0,1454,408,1054,546,8
CH10039,158073,0,0,2112,2289,6593,0
CH10068,45341,0,0,1410,1516,0,0
CH1007,8809,0,33954,1430,762,3430,0


In [31]:
legs_nogeometry_lql = legs_nogeometry[legs_nogeometry.low_quality_legs_1 == 1].copy().reset_index(drop=True)
dmd_lql = calculate_dmd(legs_nogeometry_lql, usr_stats, KT, weight, 
              period_of_tracking, visitors, airplane,incl_signal_loss)

dist_user_id_day_lql = dmd_aggreg_modes(dmd_lql, level='Niveau 1')
dist_user_id_day_lql.head()

Unnamed: 0_level_0,Voiture conducteur,2RM,Train,Autre TP,Marche,Vélo,Autre
user_id_fors,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CH01,9267,0,0,0,11,0,0
CH10003,0,0,0,0,30,0,8
CH10039,22417,0,0,0,153,1260,0
CH10068,0,0,0,0,93,0,0
CH1007,206,0,0,0,0,0,0


In [32]:
dist_user_id_day_lql.sum() / dist_user_id_day.sum()

Voiture conducteur    0.160231
2RM                   0.058518
Train                 0.109601
Autre TP              0.080374
Marche                0.203441
Vélo                  0.245603
Autre                 0.149792
dtype: float64

In [33]:
dist_user_id_day_lql.sum().sum() / dist_user_id_day.sum().sum()

0.1519649781539152

In [34]:
(dist_user_id_day_lql / dist_user_id_day).fillna(0).mean()

Voiture conducteur    0.149475
2RM                   0.007020
Train                 0.041241
Autre TP              0.080282
Marche                0.166273
Vélo                  0.108648
Autre                 0.010602
dtype: float64

In [35]:
len(legs_nogeometry[legs_nogeometry.low_quality_legs_1 == 1]) / len(legs_nogeometry)

0.04965262768550662

In [36]:
sum_mode_per_user_w.head()

NameError: name 'sum_mode_per_user_w' is not defined

In [None]:
len(sum_mode_per_user_w)

In [None]:
modal_share = pd.DataFrame(sum_mode_per_user.sum()) #/ len(sum_mode_per_user))/ sum_mode_per_user.sum().sum() *100 
modal_share.astype(int).rename(columns={0:'Distance_cumulée_metre'}).T

In [None]:
import matplotlib.pyplot as plt

# Plotting a Pie Chart
plt.figure(figsize=(10, 6))
mode_means = sum_mode_per_user_w.sum() / sum_mode_per_user_w.sum().sum()
plt.pie(mode_means, labels=mode_means.index, autopct='%1.1f%%', startangle=140)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Modal Shares')
plt.show()

In [None]:
legs.started_at.min()

In [None]:
legs.started_at.max()

In [None]:
import matplotlib.pyplot as plt


# Plot the polygons with no background, grey lines
fig, ax = plt.subplots(figsize=(8, 8))

# Plot the polygons
perimetre_panel.plot(ax=ax, facecolor='none', linewidth=1)

# Remove axes
ax.set_axis_off()

In [None]:
# Save the plot to a PNG file
output_file = "../Data/temp_files/contour_panel.png"
plt.savefig(output_file, bbox_inches='tight', pad_inches=0, transparent=True)
plt.close()

In [None]:
perimetre_panel#.plot()

In [None]:
TAZ

In [None]:
TAZ[TAZ.N_KT=='VD'].plot()