#  Module 1 : Parts modales

**Description** : Le but de ce module est de mener un premier calcul des parts modales kilométriques i.e. les distances journalières moyennisées par mode et par motif.

**Durée estimée première partie** : 7 jours

**Objectifs spécifiques** :
- [ ] Sous-échantillonnage des résidents et visiteurs par canton (basé sur le GPS)
- [ ] Rééchantillonnage des jours d’observation pour avoir un calendrier continue par usager
- [ ] Intégrer le détail des transit
- [ ] Distinguer de façon aussi systématique que possible les jours sans déplacement des jours
non-détectés et comparaison statistique au jours non-déplacé dans d’autres bases de
données
- [ ] Recodage des modes et motifs selon besoin des cantons
- [ ] Calcul liminaire des parts modales kilométriques et par déplacements
- [ ] Ajout des données d’équipement (e.g. type de motorisation principale du ménage)
- [ ] Documenter les hypothèses et limites du calcul liminaire des parts modales (e.g. aspects
saisonniers, échantillonnage, perte de signal, moyennisation des données longitudinales, ...)

**Résultats attendus** : Parts modales kilométriques par mode pour les résidents et visiteurs de chaque canton en vue du calcul des émissions carbone. Il doit être possible de calculer les parts modales en tenant compte des jours non-mobiles.

**Sous-échantillonnage** :
- Vaud : résident·es du canton
- Genève : résident·es du canton

In [1]:
%load_ext autoreload

In [2]:
%autoreload 2

In [13]:
import geopandas as gpd
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np

from shapely import geometry, ops
from shapely.geometry import MultiLineString, LineString, Point
import os
import concurrent.futures
from shapely.ops import unary_union
import xyt

import time

### Charger les données

In [302]:
%%time
# Définir le CRS du projet (EPSG:4326 for WGS84)
target_crs = 'EPSG:4326'
print("CRS du projet: WGS84 \n")

#Charger les étapes
# Ask the user for input
choice = input("Do you want to load a sample or the full leg data frame? \n Enter 'sample' or 'full': ")

# Define the file paths
if choice.lower() == 'sample':
    file_path = '../Data/time_space_filters/legs_filtered_randsample.pkl'
elif choice.lower() == 'full':
    file_path = '../Data/time_space_filters/legs_filtered.pkl'
else:
    print("Invalid choice. Please enter 'sample' or 'full'.")

# Load the selected data frame
try:
    legs = pd.read_pickle(file_path)
    print("Fichier étape chargé")
except FileNotFoundError:
    print("File not found. Please check the file path.")
legs = gpd.GeoDataFrame(legs, geometry="geometry")

del legs['canton_dep']

#Charger les activités
staypoints = pd.read_pickle('../Data/time_space_filters/staypoints_filtered.pkl').reset_index(drop=True)
staypoints = gpd.GeoDataFrame(staypoints, geometry="geometry")
print("Fichier activité chargé")

#Charger les user_statistics
usr_stats = pd.read_csv('../Data/gps_user_statistics.csv')
print("Fichier statistiques utilisateur·ices chargé")

CRS du projet: WGS84 

Fichier étape chargé
Fichier activité chargé
Fichier activité chargé
Fichier statistiques utilisateur·ices chargé
CPU times: user 9.56 s, sys: 12.4 s, total: 21.9 s
Wall time: 34.1 s


###  Formater les données

In [26]:
staypoints['started_at'] = pd.to_datetime(staypoints['started_at'])
staypoints['finished_at'] = pd.to_datetime(staypoints['finished_at'])

legs['started_at'] = pd.to_datetime(legs['started_at'])
legs['finished_at'] = pd.to_datetime(legs['finished_at'])

staypoints.rename(columns={'IDNO':'user_id', 'id':'activity_id'}, inplace = True)
legs.rename(columns={'IDNO':'user_id', 'id':'leg_id'}, inplace = True)

staypoints['lon'] = staypoints.geometry.x
staypoints['lat'] = staypoints.geometry.y

### Ajouter le *next activity_id* aux étapes

In [27]:
# Sort 'points' and 'legs' by 'started_at' to ensure data is in chronological order
staypoints.sort_values(by=['user_id','started_at'], inplace=True, ignore_index=True)
legs.sort_values(by=['user_id','started_at'], inplace=True)

In [28]:
legs = pd.merge(legs, staypoints[['activity_id', 'previous_leg_id']],
               left_on='leg_id', right_on='previous_leg_id', how='left')
legs.rename(columns={'activity_id':'leading_stay_id'}, inplace=True)
del legs['previous_leg_id']

###  Ajouter la durée et la longueur des étapes

In [29]:
%%time 
# Add length in meters
legs['length'] = legs.to_crs('EPSG:2056').length
# Add the duration in seconds
legs['duration'] = (legs['finished_at'] - legs['started_at']).dt.total_seconds()

CPU times: user 11.9 s, sys: 693 ms, total: 12.5 s
Wall time: 12.6 s


### Extraire les aires géographiques et les sous-échantillons (Genève et Vaud)
Nous utilisons les zones de traffic du Modèle Voyageur de l'ARE.

We want to sample :
- all the residents of Canton de Genève
- all the activities that happen in Canton de Genève

In [69]:
%%time

# Prompt the user for N_KT value
n_kt = 'GE'

# Unir les zones de trafic
shp_KT = unary_union(TAZ[TAZ.N_KT == n_kt].geometry)

# Lister les résident·es du KT
list_residents_N_KT = dom.loc[dom.within(shp_KT), 'IDNO'].tolist()

# Sous Echantillon des legs des résident·es du KT
legs_N_KT = legs.loc[legs.user_id.isin(list_residents_N_KT)].copy()

# Liste des activités des résident·es du KT
list_staypoints_residents_N_KT = legs_N_KT.dropna().leading_stay_id.tolist()

CPU times: user 2.94 s, sys: 40.4 ms, total: 2.98 s
Wall time: 3.01 s


In [70]:
%%time
# Identifier les activités qui se passent dans le KT
staypoints_N_KT = staypoints[staypoints.activity_id.isin(list_staypoints_residents_N_KT)]
list_activity_id_in_KT = staypoints_N_KT.loc[staypoints_N_KT.within(shp_KT), 'activity_id'].tolist()

#Flagger les activités qui se passent dans le KT
legs_N_KT['leading_stay_id_in_KT'] = 0
legs_N_KT.loc[legs_N_KT.leading_stay_id.isin(list_activity_id_in_KT), 'leading_stay_id_in_KT'] = 1

CPU times: user 13.1 s, sys: 125 ms, total: 13.2 s
Wall time: 13.4 s


In [71]:
#Ajouter le user_id_day
legs_N_KT.insert(
    1,"user_id_day",legs_N_KT["user_id"]
    + "_" 
    + legs_N_KT.started_at.dt.year.astype(str)
    + legs_N_KT.started_at.dt.month.astype(str).str.zfill(2)
    + legs_N_KT.started_at.dt.day.astype(str).str.zfill(2),
)
legs_N_KT.insert(1, 'leg_date',legs_N_KT.started_at.dt.date)
legs_N_KT['leg_date'] = pd.to_datetime(legs_N_KT['leg_date'])

In [54]:
%autoreload
xyt.plot_gps(staypoints[staypoints.activity_id.isin(list_activity_id_in_KT)].dropna()[:2000], geo_columns='geometry')

In [286]:
usr = legs_N_KT.user_id.sample(20).tolist()
df_ = legs_N_KT.loc[legs_N_KT.user_id.isin(usr)]
df_.head()

Unnamed: 0,leg_id,leg_date,user_id_day,started_at,finished_at,type,started_at_timezone,detected_mode,mode,user_id,geometry,low_quality_legs_1,low_quality_legs_2,scattered_tracking,leading_stay_id,length,duration,leading_stay_id_in_KT
3645,9a395ad6-e135-4932-be54-f02a55f60593,2023-05-02,CH1123_20230502,2023-05-02 15:08:52,2023-05-02 15:22:46,Track,Europe/Zurich,Mode::Bus,Mode::Bus,CH1123,"LINESTRING (6.14338 46.19771, 6.14338 46.19771...",0,0,0,,2668.297805,834.0,0
3646,d3fb4043-0ee6-4fba-b4f4-973fecd6e7b1,2023-05-03,CH1123_20230503,2023-05-03 07:05:11,2023-05-03 07:08:58,Track,Europe/Zurich,Mode::Walk,Mode::Walk,CH1123,"LINESTRING (6.14904 46.21377, 6.14933 46.21416...",0,0,0,26808dea-3acb-4829-9364-795fc1a653f8,421.62147,227.0,1
3647,537f4a3c-3bd8-43c3-8e10-1ea79e933dfe,2023-05-03,CH1123_20230503,2023-05-03 07:53:42,2023-05-03 08:01:09,Track,Europe/Zurich,Mode::Walk,Mode::Walk,CH1123,"LINESTRING (6.14781 46.21660, 6.14781 46.21660...",0,0,0,8c5c8c27-30fd-4337-8571-eb0cfe129db9,502.305745,447.0,1
3648,5c28b9f8-3092-4ef1-baa6-d3c03ee9ab4e,2023-05-03,CH1123_20230503,2023-05-03 15:42:31,2023-05-03 15:46:16,Track,Europe/Zurich,Mode::Walk,Mode::Walk,CH1123,"LINESTRING (6.14267 46.19809, 6.14195 46.19779...",0,0,0,632d0fbd-0eb4-4070-b3ce-785c05ce8d1b,238.431938,225.0,1
3649,cd66e469-9d70-4f34-ae33-f396f027ec95,2023-05-03,CH1123_20230503,2023-05-03 15:47:28,2023-05-03 16:02:44,Track,Europe/Zurich,Mode::Bus,Mode::Bus,CH1123,"LINESTRING (6.14317 46.19802, 6.14317 46.19802...",0,0,0,,2664.174912,916.0,0


In [287]:
df_.leg_date.max() - df_.leg_date.min()

Timedelta('41 days 00:00:00')

In [288]:
import pandas as pd

def get_daily_modal_distances(df):
    
    # Create a copy of the DataFrame to avoid modifying the original
    df = df.copy()
    
    df['length'] = df['length'].astype(float)
    # Group by 'user_id_day', 'previous_mode', and 'previous_leg_id', then sum the distances
    grouped = df.groupby(['user_id', 'user_id_day', 'mode'])['length'].sum().reset_index()

    # Pivot the table to have modes as columns
    pivoted = grouped.pivot_table(
        index=['user_id', 'user_id_day'],
        columns='mode',
        values='length',
        aggfunc='sum'
    ).reset_index()

    # Resample to include missing days and fill NaNs with different values in different columns
    pivoted['date'] = pd.to_datetime(pivoted['user_id_day'].str[-8:])
    # Create a date range covering the entire date range for each ID
    date_ranges = pivoted.groupby('user_id')['date'].agg(['min', 'max']).reset_index()
    date_ranges['leg_date'] = date_ranges.apply(lambda row: pd.date_range(row['min'], row['max'], freq='D'), axis=1)

    # Create a Cartesian product of IDs and date ranges
    cartesian = date_ranges.explode('leg_date').reset_index(drop=True)

    # Complete the original df with a continuous timeline
    pivoted_filled = pd.merge(pivoted, cartesian[['user_id', 'leg_date']], how='outer', left_on=['user_id', 'date'],
                              right_on=['user_id', 'leg_date'])

    # Create 'resample' column and mark as True for added rows, False otherwise
    pivoted_filled['resample'] = pivoted_filled['date'].isnull()
    del pivoted_filled['date']

    # Fill missing values in the user_id_day column
    pivoted_filled['user_id_day'] = pivoted_filled.apply(
        lambda row: row['user_id_day'] if not pd.isnull(row['user_id_day'])
        else row['user_id'] + "_" +
             row['leg_date'].strftime('%Y%m%d'),
        axis=1
    )

    # Fill missing values in the modes columns
    # Get the columns that start with 'Mode::'
    modes_columns = [col for col in pivoted_filled.columns if col.startswith('Mode::')]

    # Fill missing values in the 'modes_columns' with 0
    pivoted_filled[modes_columns] = pivoted_filled[modes_columns].fillna(0)

    # Sort the resulting DataFrame
    pivoted_filled.sort_values(by=['user_id', 'leg_date'], inplace=True)

    return pivoted_filled


In [289]:
dmd = get_daily_modal_distances(df_)#.tail(20)
dmd

Unnamed: 0,user_id,user_id_day,Mode::Bicycle,Mode::Bus,Mode::Car,Mode::Ebicycle,Mode::LightRail,Mode::Motorbike,Mode::Other,Mode::Subway,Mode::TaxiUber,Mode::Train,Mode::Tram,Mode::Walk,leg_date,resample
0,CH1123,CH1123_20230502,0.0,2668.297805,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,2023-05-02,False
1,CH1123,CH1123_20230503,0.0,2664.174912,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,1162.359153,2023-05-03,False
2,CH1123,CH1123_20230504,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,2023-05-04,True
3,CH1123,CH1123_20230505,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,743.840580,2023-05-05,False
4,CH1123,CH1123_20230506,0.0,3563.804483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,1004.374852,2023-05-06,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
676,CH886,CH886_20230601,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,2023-06-01,True
677,CH886,CH886_20230602,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,640.726490,2023-06-02,False
678,CH886,CH886_20230603,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,844.981487,1160.213692,2023-06-03,False
679,CH886,CH886_20230604,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,38.746982,2023-06-04,False


###  Get the mean distance per user in meter

In [290]:
# Assuming your DataFrame is named df
# Filter columns that start with 'Mode::'
mode_columns = dmd.filter(like='Mode::')

# Calculate the mean for each user_id, considering zeros
mean_mode_per_user = mode_columns.groupby(dmd['user_id']).apply(lambda x: x.mean())
mean_mode_per_user

Unnamed: 0_level_0,Mode::Bicycle,Mode::Bus,Mode::Car,Mode::Ebicycle,Mode::LightRail,Mode::Motorbike,Mode::Other,Mode::Subway,Mode::TaxiUber,Mode::Train,Mode::Tram,Mode::Walk
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
CH1123,0.0,595.380109,1772.473062,0.0,0.0,0.0,0.0,0.0,0.0,2060.850425,128.573803,537.053906
CH11734,0.0,5178.647939,4188.225033,0.0,0.0,0.0,0.0,0.0,0.0,82.142462,0.0,1071.808156
CH11964,0.0,191.204019,908.26572,0.0,0.0,0.0,0.0,0.0,0.0,0.0,297.734388,795.389506
CH15042,0.0,1087.195707,2542.589242,0.0,0.0,0.0,0.0,0.0,0.0,0.0,380.818146,903.41453
CH15059,0.0,454.467758,2044.664956,0.0,0.0,209.911966,0.0,0.0,0.0,1010.106521,63.687047,298.45754
CH16444,0.0,131.992925,625.657009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,467.143977,407.818664
CH17330,484.42228,590.064933,562.03893,186.586782,0.0,0.0,0.0,0.0,0.0,16507.087133,0.0,653.958348
CH20219,4424.623134,37.595936,1778.621788,0.0,0.0,0.0,0.0,0.0,0.0,5081.814986,0.0,351.178334
CH20264,2238.666659,139.366609,5760.188129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,256.390918
CH20290,182.143188,263.833845,339.237859,0.0,0.0,1816.084249,0.0,0.0,0.0,0.0,0.0,327.561956


###  Get the sum distance per user in meter

In [291]:
import pandas as pd

# Assuming your DataFrame is named df
# Filter columns that start with 'Mode::'
mode_columns = dmd.filter(like='Mode::')

# Calculate the mean for each user_id, considering zeros
sum_mode_per_user = mode_columns.groupby(dmd['user_id']).apply(lambda x: x.sum())

# Count the total entries grouped by user_id
sum_mode_per_user['days_in_range_count'] = mode_columns.groupby(dmd['user_id']).size()

sum_mode_per_user


Unnamed: 0_level_0,Mode::Bicycle,Mode::Bus,Mode::Car,Mode::Ebicycle,Mode::LightRail,Mode::Motorbike,Mode::Other,Mode::Subway,Mode::TaxiUber,Mode::Train,Mode::Tram,Mode::Walk,days_in_range_count
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
CH1123,0.0,19647.543611,58491.611049,0.0,0.0,0.0,0.0,0.0,0.0,68008.06401,4242.935496,17722.778909,33
CH11734,0.0,212324.565488,171717.226345,0.0,0.0,0.0,0.0,0.0,0.0,3367.840937,0.0,43944.134396,41
CH11964,0.0,6692.140675,31789.300214,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10420.703569,27838.632703,35
CH15042,0.0,38051.849761,88990.623456,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13328.635102,31619.508551,35
CH15059,0.0,16815.307041,75652.603364,0.0,0.0,7766.742749,0.0,0.0,0.0,37373.94128,2356.420751,11042.928968,37
CH16444,0.0,4487.759464,21272.338294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15882.895205,13865.834577,34
CH17330,16954.779815,20652.27266,19671.362545,6530.537355,0.0,0.0,0.0,0.0,0.0,577748.049669,0.0,22888.542194,35
CH20219,141587.940295,1203.069962,56915.897209,0.0,0.0,0.0,0.0,0.0,0.0,162618.07955,0.0,11237.706691,32
CH20264,60443.999795,3762.898444,155525.079494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6922.554777,27
CH20290,7103.584331,10289.519962,13230.276487,0.0,0.0,70827.285697,0.0,0.0,0.0,0.0,0.0,12774.916283,39


In [292]:
user_stat = pd.read_csv('../Data/dumps_fors/UserStatistics.EPFL-Panel.2023-04-24--2023-06-05.csv', sep=';')
stats = user_stat.loc[user_stat.IDNO.isin(usr),['IDNO','inactive_days_count','days_in_range_count']]

sum_mode_per_user_ = pd.merge(sum_mode_per_user.reset_index(), stats, how='left', left_on='user_id', right_on='IDNO')
del sum_mode_per_user_['IDNO']
sum_mode_per_user_

Unnamed: 0,user_id,Mode::Bicycle,Mode::Bus,Mode::Car,Mode::Ebicycle,Mode::LightRail,Mode::Motorbike,Mode::Other,Mode::Subway,Mode::TaxiUber,Mode::Train,Mode::Tram,Mode::Walk,days_in_range_count_x,inactive_days_count,days_in_range_count_y
0,CH1123,0.0,19647.543611,58491.611049,0.0,0.0,0.0,0.0,0.0,0.0,68008.06401,4242.935496,17722.778909,33,8,43
1,CH11734,0.0,212324.565488,171717.226345,0.0,0.0,0.0,0.0,0.0,0.0,3367.840937,0.0,43944.134396,41,0,42
2,CH11964,0.0,6692.140675,31789.300214,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10420.703569,27838.632703,35,1,35
3,CH15042,0.0,38051.849761,88990.623456,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13328.635102,31619.508551,35,0,36
4,CH15059,0.0,16815.307041,75652.603364,0.0,0.0,7766.742749,0.0,0.0,0.0,37373.94128,2356.420751,11042.928968,37,7,43
5,CH16444,0.0,4487.759464,21272.338294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,15882.895205,13865.834577,34,3,35
6,CH17330,16954.779815,20652.27266,19671.362545,6530.537355,0.0,0.0,0.0,0.0,0.0,577748.049669,0.0,22888.542194,35,0,36
7,CH20219,141587.940295,1203.069962,56915.897209,0.0,0.0,0.0,0.0,0.0,0.0,162618.07955,0.0,11237.706691,32,1,33
8,CH20264,60443.999795,3762.898444,155525.079494,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6922.554777,27,1,31
9,CH20290,7103.584331,10289.519962,13230.276487,0.0,0.0,70827.285697,0.0,0.0,0.0,0.0,0.0,12774.916283,39,4,42


In [298]:
def get_user_activity_stats(count_act):
    # Convert 'started_at' column to datetime
    count_act['started_at'] = pd.to_datetime(count_act['started_at'])

    # Extract only the date part
    count_act['date'] = count_act['started_at'].dt.date

    # Group by 'user_id', then find the min and max dates
    user_stats = count_act.groupby('user_id')['date'].agg(['min', 'max']).reset_index()

    # Calculate the total days in the range for each user
    user_stats['days_in_range'] = (pd.to_datetime(user_stats['max']) - pd.to_datetime(user_stats['min'])).dt.days + 1

    # Create a date range covering the entire date range for each user
    date_ranges = user_stats.apply(lambda row: pd.date_range(row['min'], row['max'], freq='D'), axis=1)
    user_stats['date_range'] = date_ranges

   # Group by 'user_id' and count the unique dates
    user_unique_dates = count_act.groupby(['user_id'])['date'].nunique().reset_index()

    # Merge with user_unique_dates to get active_days_count
    user_stats = pd.merge(user_stats, user_unique_dates, on='user_id', how='left')
    user_stats.rename(columns={'date': 'active_days_count'}, inplace=True)

    # Calculate the number of missing days within the range for each user
    user_stats['missing_days'] = user_stats['days_in_range'] - user_stats['date_range'].apply(len)

    # Drop unnecessary columns
    user_stats.drop(columns=['date_range'], inplace=True)

    # Rename the min/may columns
    user_stats.rename(columns={'min':'first_activity_date','max':'last_activity_date'}, inplace=True)

    return user_stats

In [299]:
#subset of staypoints
staypoints_ = staypoints.loc[staypoints.user_id.isin(usr),['user_id','started_at']]

get_user_activity_stats(staypoints_)

Unnamed: 0,user_id,first_activity_date,last_activity_date,days_in_range,active_days_count,missing_days
0,CH1123,2023-05-02,2023-06-05,35,35,0
1,CH11734,2023-04-25,2023-06-05,42,42,0
2,CH11964,2023-05-02,2023-06-05,35,35,0
3,CH15042,2023-05-01,2023-06-05,36,36,0
4,CH15059,2023-04-24,2023-05-31,38,38,0
5,CH16444,2023-05-02,2023-06-05,35,35,0
6,CH17330,2023-05-01,2023-06-05,36,36,0
7,CH20219,2023-05-04,2023-06-05,33,33,0
8,CH20264,2023-05-06,2023-06-05,31,31,0
9,CH20290,2023-04-25,2023-06-04,41,41,0


In [295]:
test = staypoints.loc[staypoints.user_id.isin(['CH2158']),['user_id','started_at']]
test['date'] = test.started_at.dt.date
test

Unnamed: 0,user_id,started_at,date
189627,CH2158,2023-05-12 04:50:23,2023-05-12
189628,CH2158,2023-05-12 06:29:01,2023-05-12
189629,CH2158,2023-05-12 08:33:22,2023-05-12
189630,CH2158,2023-05-12 13:30:17,2023-05-12
189631,CH2158,2023-05-12 14:19:47,2023-05-12
...,...,...,...
189823,CH2158,2023-06-05 13:15:33,2023-06-05
189824,CH2158,2023-06-05 13:26:26,2023-06-05
189825,CH2158,2023-06-05 13:45:58,2023-06-05
189826,CH2158,2023-06-05 16:03:14,2023-06-05


In [283]:
len(test.date.unique())

25