# STM Transit Delay Data Preparation

## Data description

### Real-time STM Trip Updates

`current_time` timestamp of the time the data was collected<br>
`trip_id` unique identifier of a trip<br>
`route_id` bus or metro line<br>
`start_date` schedule date<br>
`stop_id` stop number<br>
`arrival_time` actual arrival time, in milliseconds<br>
`departure_time` actual departure time, in milliseconds<br>
`schedule_relationship` state of the trip, 0 means scheduled and 1 means skipped

### Scheduled STM Trips

`trip_id` unique identifier of a trip<br>
`arrival_time` scheduled arrival time, in milliseconds<br>
`departure_time` scheduled departure time, in milliseconds<br>
`stop_id` stop number<br>
`stop_sequence` sequence of the stop, for ordering

### STM Stops

`stop_id` unique identifier of a stop<br>
`stop_code` stop number<br>
`stop_name` stop name<br>
`stop_lat` stop latitude<br>
`stop_lon` stop longitude<br>
`stop_url` stop web page<br>
`location_type` stop type, 1 being a metro station and 2 a bus stop<br>
`parent_station` parent station (ex: a metro station with multiple exits)<br>
`wheelchair_boarding` indicates if the stop is accessible for people in wheelchair, 1 being true and 2 being false

### Weather Archive

`time` date and hour or the archived weather<br>
`temperature` air temperature at 2 meters above ground, in Celsius<br>
`precipitation` total precipitation (rain, showers, snow) sum of the preceding hour, in millimeters<br>
`windspeed` wind speed at 10 meters above ground, in km/h<br>
`weathercode` weather condition as a numeric code

## Imports

In [126]:
from datetime import timedelta
from dateutil.easter import easter
import holidays
import numpy as np
import pandas as pd

In [127]:
# Set timezone
local_timezone = 'Canada/Eastern'

## Load Data

In [128]:
real_stm_df = pd.read_csv('data/fetched_stm.csv', low_memory=False)
real_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time,departure_time,schedule_relationship
0,1745385000.0,285028348,189,20250422,54433,1745384718,1745384718,0
1,1745385000.0,285028348,189,20250422,54444,1745384751,1745384751,0
2,1745385000.0,285028348,189,20250422,54445,1745384785,1745384785,0
3,1745385000.0,285028348,189,20250422,54451,1745384806,1745384806,0
4,1745385000.0,285028348,189,20250422,54456,1745384829,1745384829,0


In [129]:
planned_stm_df = pd.read_csv('data/stop_times_2025-04-23.txt')
planned_stm_df.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence
0,281570788,05:58:00,05:58:00,51095,1
1,281570788,05:59:39,05:59:39,51126,2
2,281570788,06:00:06,06:00:06,51113,3
3,281570788,06:00:44,06:00:44,51084,4
4,281570788,06:01:17,06:01:17,51063,5


In [130]:
stops_df = pd.read_csv('data/stops_2025-04-23.txt')
stops_df.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_lat,stop_lon,stop_url,location_type,parent_station,wheelchair_boarding
0,STATION_M118,10118,STATION ANGRIGNON,45.446466,-73.603118,,1,,1
1,43,10118,Station Angrignon,45.446466,-73.603118,http://www.stm.info/fr/infos/reseaux/metro/ang...,0,STATION_M118,1
2,43-01,10118,Station Angrignon,45.446319,-73.603835,,2,STATION_M118,1
3,STATION_M120,10120,STATION MONK,45.451158,-73.593242,,1,,2
4,42,10120,Station Monk,45.451158,-73.593242,http://www.stm.info/fr/infos/reseaux/metro/monk,0,STATION_M120,2


In [131]:
weather_df = pd.read_csv('data/fetched_weather.csv')
weather_df.head()

Unnamed: 0,time,temperature,precipitation,windspeed,weathercode
0,2025-04-20T00:00,10.9,0.0,21.1,3
1,2025-04-20T01:00,6.9,0.0,21.9,2
2,2025-04-20T02:00,5.1,0.0,16.3,1
3,2025-04-20T03:00,3.7,0.0,16.1,0
4,2025-04-20T04:00,2.5,0.0,16.3,0


## Merge Data

### Realtime and Scheduled Trips

In [132]:
stm_trips_df = pd.merge(left=real_stm_df, right=planned_stm_df, how='inner', on=['trip_id', 'stop_id'])
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence
0,1745385000.0,285028348,189,20250422,54433,1745384718,1745384718,0,25:05:08,25:05:08,20
1,1745385000.0,285028348,189,20250422,54444,1745384751,1745384751,0,25:05:51,25:05:51,21
2,1745385000.0,285028348,189,20250422,54445,1745384785,1745384785,0,25:06:25,25:06:25,22
3,1745385000.0,285028348,189,20250422,54451,1745384806,1745384806,0,25:06:46,25:06:46,23
4,1745385000.0,285028348,189,20250422,54456,1745384829,1745384829,0,25:07:09,25:07:09,24


In [133]:
stm_trips_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1117217 entries, 0 to 1117216
Data columns (total 11 columns):
 #   Column                 Non-Null Count    Dtype  
---  ------                 --------------    -----  
 0   current_time           1117217 non-null  float64
 1   trip_id                1117217 non-null  int64  
 2   route_id               1117217 non-null  object 
 3   start_date             1117217 non-null  int64  
 4   stop_id                1117217 non-null  int64  
 5   arrival_time_x         1117217 non-null  int64  
 6   departure_time_x       1117217 non-null  int64  
 7   schedule_relationship  1117217 non-null  int64  
 8   arrival_time_y         1117217 non-null  object 
 9   departure_time_y       1117217 non-null  object 
 10  stop_sequence          1117217 non-null  int64  
dtypes: float64(1), int64(7), object(3)
memory usage: 93.8+ MB


In [134]:
# Convert start_date to datetime
stm_trips_df['start_date'] = pd.to_datetime(stm_trips_df['start_date'], format='%Y%m%d')
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24


In [135]:
def parse_gtfs_time(time_str:str, service_date:pd.Timestamp) -> pd.Timestamp:
	'''
	Converts GTFS time string (e.g., '25:30:00') to datetime
	based on the service date.
	'''
	hours, minutes, seconds = map(int, time_str.split(':'))
	total_seconds = hours * 3600 + minutes * 60 + seconds

	parsed_time = service_date + timedelta(seconds=total_seconds)
	return parsed_time

In [136]:
# Convert planned arrival time to localized datetime
stm_trips_df['scheduled_arrival_time'] = stm_trips_df.apply(lambda row: parse_gtfs_time(row['arrival_time_y'], row['start_date']), axis=1)
stm_trips_df['scheduled_arrival_time'] = stm_trips_df['scheduled_arrival_time'].dt.tz_localize(local_timezone)
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,2025-04-23 01:05:08-04:00
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,2025-04-23 01:05:51-04:00
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,2025-04-23 01:06:25-04:00
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,2025-04-23 01:06:46-04:00
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,2025-04-23 01:07:09-04:00


In [137]:
# Convert planned time to timestamp in milliseconds since epoch
stm_trips_df['scheduled_arrival_time'] = stm_trips_df['scheduled_arrival_time'].astype('int64') // 10**6
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,1745384708000
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,1745384751000
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,1745384785000
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,1745384806000
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,1745384829000


In [138]:
# Rename real time columns
stm_trips_df = stm_trips_df.rename(columns={'arrival_time_x': 'realtime_arrival_time'})
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,realtime_arrival_time,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,1745384708000
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,1745384751000
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,1745384785000
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,1745384806000
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,1745384829000


### Trips and Stops

In [139]:
# Merge stops to trips
merged_stm_df = pd.merge(left=stm_trips_df, right=stops_df, how='inner', left_on='stop_id', right_on='stop_code')
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id_x,realtime_arrival_time,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,...,scheduled_arrival_time,stop_id_y,stop_code,stop_name,stop_lat,stop_lon,stop_url,location_type,parent_station,wheelchair_boarding
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,...,1745384708000,54433,54433,Notre-Dame / No 10150,45.617546,-73.507835,https://www.stm.info/fr/recherche#stq=54433,0,,1
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,...,1745384751000,54444,54444,Notre-Dame / Gamble,45.62163,-73.505533,https://www.stm.info/fr/recherche#stq=54444,0,,1
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,...,1745384785000,54445,54445,Notre-Dame / No 10800,45.624606,-73.503332,https://www.stm.info/fr/recherche#stq=54445,0,,1
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,...,1745384806000,54451,54451,Notre-Dame / Richard,45.62627,-73.501486,https://www.stm.info/fr/recherche#stq=54451,0,,1
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,...,1745384829000,54456,54456,Notre-Dame / Hinton,45.628078,-73.499449,https://www.stm.info/fr/recherche#stq=54456,0,,1


In [140]:
# Keep relevant columns
merged_stm_df = merged_stm_df[[
  'trip_id',
  'route_id',
  'stop_id_x',
  'stop_lat',
  'stop_lon',
  'stop_sequence',
  'wheelchair_boarding',
  'realtime_arrival_time',
  'scheduled_arrival_time'
]]
merged_stm_df = merged_stm_df.rename(columns={'stop_id_x': 'stop_id'})
merged_stm_df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time
0,285028348,189,54433,45.617546,-73.507835,20,1,1745384718,1745384708000
1,285028348,189,54444,45.62163,-73.505533,21,1,1745384751,1745384751000
2,285028348,189,54445,45.624606,-73.503332,22,1,1745384785,1745384785000
3,285028348,189,54451,45.62627,-73.501486,23,1,1745384806,1745384806000
4,285028348,189,54456,45.628078,-73.499449,24,1,1745384829,1745384829000


In [141]:
# Convert realtime_arrival_time to milliseconds
merged_stm_df['realtime_arrival_time'] = merged_stm_df['realtime_arrival_time'] * 1000

In [142]:
# Convert arrival timestamp to datetime
rt_arrival_dt = pd.to_datetime(merged_stm_df['realtime_arrival_time'], origin='unix', unit='ms', utc=True)
rt_arrival_dt

0         2025-04-23 05:05:18+00:00
1         2025-04-23 05:05:51+00:00
2         2025-04-23 05:06:25+00:00
3         2025-04-23 05:06:46+00:00
4         2025-04-23 05:07:09+00:00
                     ...           
1117212   2025-04-25 00:52:24+00:00
1117213   2025-04-25 00:52:54+00:00
1117214   2025-04-25 00:53:29+00:00
1117215   2025-04-25 00:55:00+00:00
1117216   2025-04-25 00:58:00+00:00
Name: realtime_arrival_time, Length: 1117217, dtype: datetime64[ns, UTC]

In [143]:
# TODO: remove this cell after collecting historical data and use previous result below
# Remove 3 days to match historical data
rt_arrival_dt = rt_arrival_dt - pd.DateOffset(days=3)
rt_arrival_dt

0         2025-04-20 05:05:18+00:00
1         2025-04-20 05:05:51+00:00
2         2025-04-20 05:06:25+00:00
3         2025-04-20 05:06:46+00:00
4         2025-04-20 05:07:09+00:00
                     ...           
1117212   2025-04-22 00:52:24+00:00
1117213   2025-04-22 00:52:54+00:00
1117214   2025-04-22 00:53:29+00:00
1117215   2025-04-22 00:55:00+00:00
1117216   2025-04-22 00:58:00+00:00
Name: realtime_arrival_time, Length: 1117217, dtype: datetime64[ns, UTC]

In [144]:
# Round arrival time string to nearest hour to match weather data
merged_stm_df['time'] = rt_arrival_dt.dt.strftime('%Y-%m-%dT%H:00')
merged_stm_df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,time
0,285028348,189,54433,45.617546,-73.507835,20,1,1745384718000,1745384708000,2025-04-20T05:00
1,285028348,189,54444,45.62163,-73.505533,21,1,1745384751000,1745384751000,2025-04-20T05:00
2,285028348,189,54445,45.624606,-73.503332,22,1,1745384785000,1745384785000,2025-04-20T05:00
3,285028348,189,54451,45.62627,-73.501486,23,1,1745384806000,1745384806000,2025-04-20T05:00
4,285028348,189,54456,45.628078,-73.499449,24,1,1745384829000,1745384829000,2025-04-20T05:00


In [145]:
# Get duplicates
duplicate_mask = merged_stm_df.duplicated()
merged_stm_df[duplicate_mask]

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,time
13812,284728009,361,53757,45.556468,-73.666911,1,1,0,1745387640000,1969-12-29T00:00
19930,284728010,361,53969,45.505515,-73.558300,1,1,0,1745390760000,1969-12-29T00:00
31498,284741271,439,62200,45.618547,-73.607670,1,1,0,1745399040000,1969-12-29T00:00
53077,285006805,218,57823,45.466684,-73.830728,1,1,0,1745402580000,1969-12-29T00:00
57541,284778679,103,56388,45.458914,-73.662951,1,1,0,1745402460000,1969-12-29T00:00
...,...,...,...,...,...,...,...,...,...,...
1116437,285030600,49,60515,45.668193,-73.549411,62,1,1745542320000,1745542320000,2025-04-22T00:00
1116632,285008532,201,60415,45.465962,-73.831866,1,1,0,1745538900000,1969-12-29T00:00
1116684,285008532,201,58116,45.484706,-73.865380,53,1,0,1745541540000,1969-12-29T00:00
1117021,285032398,444,54381,45.618272,-73.607364,1,1,0,1745539200000,1969-12-29T00:00


In [146]:
# Remove duplicates
merged_stm_df = merged_stm_df.drop_duplicates(keep='last')

### STM and Weather

In [147]:
# Merge STM data with weather data
df = pd.merge(left=merged_stm_df, right=weather_df, how='inner', on='time').drop('time', axis=1)
df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,temperature,precipitation,windspeed,weathercode
0,285028348,189,54433,45.617546,-73.507835,20,1,1745384718000,1745384708000,1.5,0.0,16.2,0
1,285028348,189,54444,45.62163,-73.505533,21,1,1745384751000,1745384751000,1.5,0.0,16.2,0
2,285028348,189,54445,45.624606,-73.503332,22,1,1745384785000,1745384785000,1.5,0.0,16.2,0
3,285028348,189,54451,45.62627,-73.501486,23,1,1745384806000,1745384806000,1.5,0.0,16.2,0
4,285028348,189,54456,45.628078,-73.499449,24,1,1745384829000,1745384829000,1.5,0.0,16.2,0


## Prepare Data

In [148]:
# Convert real and scheduled timestamps
rt_arrival_dt = pd.to_datetime(df['realtime_arrival_time'], origin='unix', unit='ms', utc=True) # TODO: remove this and use previous variable
sch_arrival_dt = pd.to_datetime(df['scheduled_arrival_time'], origin='unix', unit='ms', utc=True)

In [149]:
# Calculate delay in seconds (real - scheduled)
df['delay'] = (rt_arrival_dt - sch_arrival_dt) / pd.Timedelta(seconds=1)

In [150]:
# Sort data
df = df.sort_values(by=['trip_id', 'stop_sequence'])

In [151]:
# Get previous stop delay
df['delay_previous_stop'] = df.groupby('trip_id')['delay'].shift(1)
df['delay_previous_stop'] = df['delay_previous_stop'].fillna(0)

In [152]:
# Get holiday list
holiday_list = holidays.country_holidays(country='CAN', subdiv='QC', years=[2025])
holiday_list

{datetime.date(2025, 1, 1): "New Year's Day", datetime.date(2025, 4, 18): 'Good Friday', datetime.date(2025, 7, 1): 'Canada Day', datetime.date(2025, 9, 1): 'Labour Day', datetime.date(2025, 12, 25): 'Christmas Day', datetime.date(2025, 5, 19): "National Patriots' Day", datetime.date(2025, 6, 24): 'Saint Jean Baptiste Day', datetime.date(2025, 10, 13): 'Thanksgiving Day'}

In [153]:
# Add missing holidays
easter_dt = easter(2025)
easter_monday_dt = easter_dt + timedelta(days=1)
holiday_list[easter_dt] = 'Easter'
holiday_list[easter_monday_dt] = 'Easter Monday'
holiday_list

{datetime.date(2025, 1, 1): "New Year's Day", datetime.date(2025, 4, 18): 'Good Friday', datetime.date(2025, 7, 1): 'Canada Day', datetime.date(2025, 9, 1): 'Labour Day', datetime.date(2025, 12, 25): 'Christmas Day', datetime.date(2025, 5, 19): "National Patriots' Day", datetime.date(2025, 6, 24): 'Saint Jean Baptiste Day', datetime.date(2025, 10, 13): 'Thanksgiving Day', datetime.date(2025, 4, 20): 'Easter', datetime.date(2025, 4, 21): 'Easter Monday'}

In [154]:
def is_holiday(arrival_time:pd.Timestamp) -> bool:
  return arrival_time.date() in holiday_list.keys()

In [155]:
# Add column is_holiday
rt_arrival_dt = rt_arrival_dt.dt.tz_convert(local_timezone)
df['is_holiday'] = rt_arrival_dt.apply(is_holiday)

In [156]:
# Convert route_id to integer
df['route_id'] = df['route_id'].astype('int64')

In [157]:
# Convert wheelchair_boarding to boolean
df['wheelchair_boarding'] = np.where(df['wheelchair_boarding'] == 1, True, False)

In [158]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 951210 entries, 25740 to 364918
Data columns (total 16 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   trip_id                 951210 non-null  int64  
 1   route_id                951210 non-null  int64  
 2   stop_id                 951210 non-null  int64  
 3   stop_lat                951210 non-null  float64
 4   stop_lon                951210 non-null  float64
 5   stop_sequence           951210 non-null  int64  
 6   wheelchair_boarding     951210 non-null  bool   
 7   realtime_arrival_time   951210 non-null  int64  
 8   scheduled_arrival_time  951210 non-null  int64  
 9   temperature             951210 non-null  float64
 10  precipitation           951210 non-null  float64
 11  windspeed               951210 non-null  float64
 12  weathercode             951210 non-null  int64  
 13  delay                   951210 non-null  float64
 14  delay_previous_stop  

In [160]:
df.describe()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,realtime_arrival_time,scheduled_arrival_time,temperature,precipitation,windspeed,weathercode,delay,delay_previous_stop
count,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0,951210.0
mean,285203000.0,156.25587,54845.543622,45.527022,-73.635485,25.174796,1745479000000.0,1745479000000.0,7.441714,0.309438,12.537956,16.452838,54.966223,53.680008
std,633015.3,133.479338,3173.08281,0.063952,0.08997,16.899479,47181360.0,47175280.0,3.408796,0.755094,4.814241,23.928044,354.310935,350.930493
min,284726600.0,10.0,50101.0,45.402668,-73.956204,1.0,1745384000000.0,1745384000000.0,-0.4,0.0,3.9,0.0,-8166.0,-8166.0
25%,284776800.0,55.0,52161.0,45.476514,-73.669344,12.0,1745421000000.0,1745421000000.0,5.2,0.0,8.8,1.0,0.0,0.0
50%,285008600.0,121.0,54609.0,45.519907,-73.617545,22.0,1745496000000.0,1745496000000.0,8.0,0.0,12.1,3.0,0.0,0.0
75%,285282400.0,196.0,56962.0,45.573008,-73.573549,35.0,1745520000000.0,1745519000000.0,9.7,0.3,16.9,51.0,0.0,0.0
max,286574700.0,968.0,62442.0,45.701116,-73.480581,117.0,1745539000000.0,1745540000000.0,13.5,3.5,20.6,63.0,27023.0,27023.0


In [159]:
# Export data to CSV
df.to_csv('data/stm_weather_merged.csv', index=False)

## End