# STM Transit Delay Data Preparation

## Data description

### Real-time STM Trip Updates

`current_time` timestamp of the time the data was collected<br>
`trip_id` unique identifier of a trip<br>
`route_id` bus or metro line<br>
`start_date` schedule date<br>
`stop_id` stop number<br>
`arrival_time` actual arrival time, in milliseconds<br>
`departure_time` actual departure time, in milliseconds<br>
`schedule_relationship` state of the trip, 0 means scheduled and 1 means skipped

### Scheduled STM Trips

`trip_id` unique identifier of a trip<br>
`arrival_time` scheduled arrival time, in milliseconds<br>
`departure_time` scheduled departure time, in milliseconds<br>
`stop_id` stop number<br>
`stop_sequence` sequence of the stop, for ordering

### STM Stops

`stop_id` unique identifier of a stop<br>
`stop_code` stop number<br>
`stop_name` stop name<br>
`stop_lat` stop latitude<br>
`stop_lon` stop longitude<br>
`stop_url` stop web page<br>
`location_type` stop type, 1 being a metro station and 2 a bus stop<br>
`parent_station` parent station (ex: a metro station with multiple exits)<br>
`wheelchair_boarding` indicates if the stop is accessible for people in wheelchair, 1 being true and 2 being false

### Weather Archive

`time` date and hour or the archived weather<br>
`temperature` air temperature at 2 meters above ground, in Celsius<br>
`precipitation` total precipitation (rain, showers, snow) sum of the preceding hour, in millimeters<br>
`windspeed` wind speed at 10 meters above ground, in km/h<br>
`weathercode` World Meteorological Organization (WMO) code

## Imports

In [1]:
from datetime import timedelta
import numpy as np
import pandas as pd
import requests
import sys

In [2]:
# Import custom code
sys.path.insert(0, '..')
from scripts.custom_functions import LOCAL_TIMEZONE, MTL_COORDS

In [3]:
real_stm_df = pd.read_csv('../data/fetched_stm.csv', low_memory=False)

In [4]:
planned_stm_df = pd.read_csv('../data/stop_times_2025-04-23.txt')

In [5]:
stops_df = pd.read_csv('../data/stops_2025-04-23.txt')

In [6]:
weather_df = pd.read_csv('../data/fetched_historical_weather.csv')

## Merge Data

### Realtime and Scheduled Trips

In [7]:
stm_trips_df = pd.merge(left=real_stm_df, right=planned_stm_df, how='inner', on=['trip_id', 'stop_id'])
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence
0,1745385000.0,285028348,189,20250422,54433,1745384718,1745384718,0,25:05:08,25:05:08,20
1,1745385000.0,285028348,189,20250422,54444,1745384751,1745384751,0,25:05:51,25:05:51,21
2,1745385000.0,285028348,189,20250422,54445,1745384785,1745384785,0,25:06:25,25:06:25,22
3,1745385000.0,285028348,189,20250422,54451,1745384806,1745384806,0,25:06:46,25:06:46,23
4,1745385000.0,285028348,189,20250422,54456,1745384829,1745384829,0,25:07:09,25:07:09,24


In [8]:
stm_trips_df.dtypes

current_time             float64
trip_id                    int64
route_id                  object
start_date                 int64
stop_id                    int64
arrival_time_x             int64
departure_time_x           int64
schedule_relationship      int64
arrival_time_y            object
departure_time_y          object
stop_sequence              int64
dtype: object

In [9]:
# Convert start_date to datetime
stm_trips_df['start_date'] = pd.to_datetime(stm_trips_df['start_date'], format='%Y%m%d')
assert(stm_trips_df['start_date'].dtype == 'datetime64[ns]')

In [10]:
def parse_gtfs_time(row) -> pd.Timestamp:
	'''
	Converts GTFS time string (e.g., '25:30:00') to datetime
	based on the arrival time.
	'''
	hours, minutes, seconds = map(int, row['arrival_time_y'].split(':'))
	total_seconds = hours * 3600 + minutes * 60 + seconds

	parsed_time = row['start_date'] + timedelta(seconds=total_seconds)
	return parsed_time

In [11]:
# Convert planned arrival time to localized datetime
stm_trips_df['scheduled_arrival_time'] = stm_trips_df.apply(parse_gtfs_time, axis=1)
stm_trips_df['scheduled_arrival_time'] = stm_trips_df['scheduled_arrival_time'].dt.tz_localize(LOCAL_TIMEZONE)
assert(stm_trips_df['start_date'].dtype == 'datetime64[ns]')

In [12]:
# Convert planned time to timestamp in milliseconds since epoch
stm_trips_df['scheduled_arrival_time'] = stm_trips_df['scheduled_arrival_time'].astype('int64') // 10**6
stm_trips_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,1745384708000
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,1745384751000
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,1745384785000
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,1745384806000
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,1745384829000


In [13]:
# Convert realtime arrival and departure time to milliseconds
stm_trips_df['arrival_time_x'] = stm_trips_df['arrival_time_x'] * 1000
stm_trips_df['departure_time_x'] = stm_trips_df['departure_time_x'] * 1000

In [14]:
# Get distribution of realtime arrival times
stm_trips_df[['arrival_time_x', 'departure_time_x']].describe()

Unnamed: 0,arrival_time_x,departure_time_x
count,2461222.0,2461222.0
mean,1661471000000.0,1641207000000.0
std,373782200000.0,413843600000.0
min,0.0,0.0
25%,1745496000000.0,1745494000000.0
50%,1745550000000.0,1745548000000.0
75%,1745619000000.0,1745618000000.0
max,1745711000000.0,1745711000000.0


In [15]:
# Replace null arrival time by departure time, as they are usually the same
zero_mask = stm_trips_df['arrival_time_x'] == 0
stm_trips_df.loc[zero_mask, 'arrival_time_x'] = stm_trips_df.loc[zero_mask, 'departure_time_x']

In [16]:
# Delete the rows with null arrival times
zero_mask = stm_trips_df['arrival_time_x'] == 0
stm_trips_df = stm_trips_df[~zero_mask]

In [17]:
# Rename real time arrival time
stm_trips_df = stm_trips_df.rename(columns={'arrival_time_x': 'realtime_arrival_time'})
stm_trips_df.columns

Index(['current_time', 'trip_id', 'route_id', 'start_date', 'stop_id',
       'realtime_arrival_time', 'departure_time_x', 'schedule_relationship',
       'arrival_time_y', 'departure_time_y', 'stop_sequence',
       'scheduled_arrival_time'],
      dtype='object')

### Trips and Stops

In [18]:
# Merge stops to trips
merged_stm_df = pd.merge(left=stm_trips_df, right=stops_df, how='inner', left_on='stop_id', right_on='stop_code')
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id_x,realtime_arrival_time,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,...,scheduled_arrival_time,stop_id_y,stop_code,stop_name,stop_lat,stop_lon,stop_url,location_type,parent_station,wheelchair_boarding
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718000,1745384718000,0,25:05:08,25:05:08,...,1745384708000,54433,54433,Notre-Dame / No 10150,45.617546,-73.507835,https://www.stm.info/fr/recherche#stq=54433,0,,1
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751000,1745384751000,0,25:05:51,25:05:51,...,1745384751000,54444,54444,Notre-Dame / Gamble,45.62163,-73.505533,https://www.stm.info/fr/recherche#stq=54444,0,,1
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785000,1745384785000,0,25:06:25,25:06:25,...,1745384785000,54445,54445,Notre-Dame / No 10800,45.624606,-73.503332,https://www.stm.info/fr/recherche#stq=54445,0,,1
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806000,1745384806000,0,25:06:46,25:06:46,...,1745384806000,54451,54451,Notre-Dame / Richard,45.62627,-73.501486,https://www.stm.info/fr/recherche#stq=54451,0,,1
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829000,1745384829000,0,25:07:09,25:07:09,...,1745384829000,54456,54456,Notre-Dame / Hinton,45.628078,-73.499449,https://www.stm.info/fr/recherche#stq=54456,0,,1


In [19]:
# Keep relevant columns
merged_stm_df = merged_stm_df[[
  'trip_id',
  'route_id',
  'stop_id_x',
  'stop_lat',
  'stop_lon',
  'stop_sequence',
  'wheelchair_boarding',
  'realtime_arrival_time',
  'scheduled_arrival_time'
]]
merged_stm_df.head()

Unnamed: 0,trip_id,route_id,stop_id_x,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time
0,285028348,189,54433,45.617546,-73.507835,20,1,1745384718000,1745384708000
1,285028348,189,54444,45.62163,-73.505533,21,1,1745384751000,1745384751000
2,285028348,189,54445,45.624606,-73.503332,22,1,1745384785000,1745384785000
3,285028348,189,54451,45.62627,-73.501486,23,1,1745384806000,1745384806000
4,285028348,189,54456,45.628078,-73.499449,24,1,1745384829000,1745384829000


In [20]:
# Rename stop id
merged_stm_df = merged_stm_df.rename(columns={'stop_id_x': 'stop_id'})

In [21]:
# Convert route_id to integer
merged_stm_df['route_id'] = merged_stm_df['route_id'].astype('int64')

In [22]:
# Convert wheelchair_boarding to boolean
merged_stm_df['wheelchair_boarding'] = np.where(merged_stm_df['wheelchair_boarding'] == 1, True, False)

In [23]:
# Get duplicates
duplicate_mask = merged_stm_df.duplicated()
merged_stm_df[duplicate_mask]

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time
253230,285285001,811,62138,45.589525,-73.537341,1,True,1745447520000,1745444760000
253231,285285001,811,62138,45.589525,-73.537341,22,True,1745447520000,1745447520000
258234,285285013,811,62138,45.589525,-73.537341,1,True,1745448180000,1745445480000
258235,285285013,811,62138,45.589525,-73.537341,22,True,1745448180000,1745448180000
297697,285007882,72,55717,45.508261,-73.672905,34,True,1745450840000,1745450840000
...,...,...,...,...,...,...,...,...,...
2390096,284300774,67,55078,45.577700,-73.640251,34,True,1745703225000,1745703225000
2390097,284300774,67,55057,45.579543,-73.643171,35,True,1745703293000,1745703293000
2390098,284300774,67,55046,45.581373,-73.646077,36,True,1745703360000,1745703360000
2390099,284300774,67,55333,45.583717,-73.649799,37,True,1745703589000,1745703589000


In [24]:
# Remove duplicates
merged_stm_df = merged_stm_df.drop_duplicates()

### STM and Weather

In [25]:
# Convert arrival timestamp to datetime
rt_arrival_dt = pd.to_datetime(merged_stm_df['realtime_arrival_time'], origin='unix', unit='ms', utc=True)
rt_arrival_dt.head()

0   2025-04-23 05:05:18+00:00
1   2025-04-23 05:05:51+00:00
2   2025-04-23 05:06:25+00:00
3   2025-04-23 05:06:46+00:00
4   2025-04-23 05:07:09+00:00
Name: realtime_arrival_time, dtype: datetime64[ns, UTC]

In [26]:
# Round arrival time to the nearest hour
merged_stm_df['rounded_arrival_dt'] = rt_arrival_dt.dt.round('h')

In [27]:
# Format time to match weather data
merged_stm_df['time'] = merged_stm_df['rounded_arrival_dt'].dt.strftime('%Y-%m-%dT%H:%M')
merged_stm_df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,rounded_arrival_dt,time
0,285028348,189,54433,45.617546,-73.507835,20,True,1745384718000,1745384708000,2025-04-23 05:00:00+00:00,2025-04-23T05:00
1,285028348,189,54444,45.62163,-73.505533,21,True,1745384751000,1745384751000,2025-04-23 05:00:00+00:00,2025-04-23T05:00
2,285028348,189,54445,45.624606,-73.503332,22,True,1745384785000,1745384785000,2025-04-23 05:00:00+00:00,2025-04-23T05:00
3,285028348,189,54451,45.62627,-73.501486,23,True,1745384806000,1745384806000,2025-04-23 05:00:00+00:00,2025-04-23T05:00
4,285028348,189,54456,45.628078,-73.499449,24,True,1745384829000,1745384829000,2025-04-23 05:00:00+00:00,2025-04-23T05:00


In [28]:
# Merge STM data with historical weather
df = pd.merge(left=merged_stm_df, right=weather_df, how='left', on='time')
df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,rounded_arrival_dt,time,temperature,precipitation,windspeed,weathercode
0,285028348,189,54433,45.617546,-73.507835,20,True,1745384718000,1745384708000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
1,285028348,189,54444,45.62163,-73.505533,21,True,1745384751000,1745384751000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
2,285028348,189,54445,45.624606,-73.503332,22,True,1745384785000,1745384785000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
3,285028348,189,54451,45.62627,-73.501486,23,True,1745384806000,1745384806000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
4,285028348,189,54456,45.628078,-73.499449,24,True,1745384829000,1745384829000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0


In [29]:
# Get rows with null weather
null_weather_mask = df.isna().any(axis=1)
df[null_weather_mask]

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,rounded_arrival_dt,time,temperature,precipitation,windspeed,weathercode
251713,285032149,32,54794,45.596627,-73.608481,30,True,1745451042000,1745451042000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,,,,
251714,285032149,32,54771,45.599039,-73.611732,31,True,1745451117000,1745451117000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,,,,
251715,285032149,32,61547,45.600558,-73.613056,32,True,1745451157000,1745451157000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,,,,
251716,285032149,32,55283,45.603131,-73.615974,33,True,1745451231000,1745451231000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,,,,
251717,285032149,32,55280,45.604570,-73.617505,34,True,1745451271000,1745451271000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2184734,286589203,106,57064,45.419931,-73.648673,22,True,1745702615000,1745702615000,2025-04-26 21:00:00+00:00,2025-04-26T21:00,,,,
2184735,286589203,106,56765,45.420986,-73.651430,23,True,1745702660000,1745702660000,2025-04-26 21:00:00+00:00,2025-04-26T21:00,,,,
2184736,286589203,106,56768,45.422666,-73.650789,24,True,1745702700000,1745702700000,2025-04-26 21:00:00+00:00,2025-04-26T21:00,,,,
2184737,286589203,106,56770,45.424801,-73.649819,25,True,1745702761000,1745702761000,2025-04-26 21:00:00+00:00,2025-04-26T21:00,,,,


In [30]:
# Separate null and non null rows
not_null_df = df[~null_weather_mask]
null_df = df[null_weather_mask]

In [31]:
def fetch_forecast_weather() -> pd.DataFrame:
	start_date = null_df['rounded_arrival_dt'].min().strftime('%Y-%m-%d')
	end_date = null_df['rounded_arrival_dt'].max().strftime('%Y-%m-%d')

	weather_url = (
		f'https://api.open-meteo.com/v1/forecast?'
		f'latitude={MTL_COORDS['latitude']}&longitude={MTL_COORDS['longitude']}'
		f'&hourly=temperature_2m,precipitation,windspeed_10m,weathercode'
		f'&start_date={start_date}&end_date={end_date}'
		f'&timezone=America%2FToronto'
	)

	response = requests.get(weather_url)
	data = response.json()  

	weather_list = []

	if 'hourly' in data.keys():
		for i in range(len(data['hourly']['time'])):
			weather_list.append({
				'time': data['hourly']['time'][i],
				'temperature': data['hourly']['temperature_2m'][i],
				'precipitation': data['hourly']['precipitation'][i],
				'windspeed': data['hourly']['windspeed_10m'][i],
				'weathercode': data['hourly']['weathercode'][i]
			})  

	if len(weather_list) > 0:
		return pd.DataFrame(weather_list)
	else:
		return pd.DataFrame()

In [32]:
weather_df = fetch_forecast_weather()
weather_df

Unnamed: 0,time,temperature,precipitation,windspeed,weathercode
0,2025-04-24T00:00,6.4,0.0,7.9,0
1,2025-04-24T01:00,5.4,0.0,2.5,0
2,2025-04-24T02:00,5.0,0.0,2.0,0
3,2025-04-24T03:00,4.5,0.0,3.3,2
4,2025-04-24T04:00,3.5,0.0,5.4,0
...,...,...,...,...,...
91,2025-04-27T19:00,11.7,0.0,9.2,0
92,2025-04-27T20:00,10.2,0.0,13.4,0
93,2025-04-27T21:00,9.6,0.0,14.2,0
94,2025-04-27T22:00,9.1,0.0,16.3,0


In [33]:
# Merge null weather dataframe with forecast
null_df = null_df.drop(['temperature', 'precipitation', 'windspeed', 'weathercode'], axis=1)
null_df = pd.merge(left=null_df, right=weather_df, how='inner', on='time')
null_df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,rounded_arrival_dt,time,temperature,precipitation,windspeed,weathercode
0,285032149,32,54794,45.596627,-73.608481,30,True,1745451042000,1745451042000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,6.4,0.0,7.9,0
1,285032149,32,54771,45.599039,-73.611732,31,True,1745451117000,1745451117000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,6.4,0.0,7.9,0
2,285032149,32,61547,45.600558,-73.613056,32,True,1745451157000,1745451157000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,6.4,0.0,7.9,0
3,285032149,32,55283,45.603131,-73.615974,33,True,1745451231000,1745451231000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,6.4,0.0,7.9,0
4,285032149,32,55280,45.60457,-73.617505,34,True,1745451271000,1745451271000,2025-04-24 00:00:00+00:00,2025-04-24T00:00,6.4,0.0,7.9,0


In [34]:
# Merge null and non null weather dataframes
df = pd.concat([not_null_df, null_df]).reset_index()
df.head()

Unnamed: 0,index,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,wheelchair_boarding,realtime_arrival_time,scheduled_arrival_time,rounded_arrival_dt,time,temperature,precipitation,windspeed,weathercode
0,0,285028348,189,54433,45.617546,-73.507835,20,True,1745384718000,1745384708000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
1,1,285028348,189,54444,45.62163,-73.505533,21,True,1745384751000,1745384751000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
2,2,285028348,189,54445,45.624606,-73.503332,22,True,1745384785000,1745384785000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
3,3,285028348,189,54451,45.62627,-73.501486,23,True,1745384806000,1745384806000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0
4,4,285028348,189,54456,45.628078,-73.499449,24,True,1745384829000,1745384829000,2025-04-23 05:00:00+00:00,2025-04-23T05:00,4.5,0.0,9.5,0.0


## Export Data

In [35]:
df.columns

Index(['index', 'trip_id', 'route_id', 'stop_id', 'stop_lat', 'stop_lon',
       'stop_sequence', 'wheelchair_boarding', 'realtime_arrival_time',
       'scheduled_arrival_time', 'rounded_arrival_dt', 'time', 'temperature',
       'precipitation', 'windspeed', 'weathercode'],
      dtype='object')

In [36]:
# Keep relevant columns
df = df[['trip_id', 'route_id', 'stop_id', 'stop_lat', 'stop_lon',
       'stop_sequence', 'wheelchair_boarding', 'realtime_arrival_time',
       'scheduled_arrival_time', 'temperature', 'precipitation', 'windspeed', 'weathercode']]

In [37]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2184739 entries, 0 to 2184738
Data columns (total 13 columns):
 #   Column                  Dtype  
---  ------                  -----  
 0   trip_id                 int64  
 1   route_id                int64  
 2   stop_id                 int64  
 3   stop_lat                float64
 4   stop_lon                float64
 5   stop_sequence           int64  
 6   wheelchair_boarding     bool   
 7   realtime_arrival_time   int64  
 8   scheduled_arrival_time  int64  
 9   temperature             float64
 10  precipitation           float64
 11  windspeed               float64
 12  weathercode             float64
dtypes: bool(1), float64(6), int64(6)
memory usage: 202.1 MB


In [38]:
df.describe()

Unnamed: 0,trip_id,route_id,stop_id,stop_lat,stop_lon,stop_sequence,realtime_arrival_time,scheduled_arrival_time,temperature,precipitation,windspeed,weathercode
count,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0,2184739.0
mean,285157000.0,150.1766,54847.65,45.5271,-73.63386,24.7303,1745559000000.0,1745559000000.0,10.7605,0.1208163,10.98015,8.761765
std,746532.8,128.5057,3186.937,0.06386357,0.08910073,17.07973,85506450.0,85500910.0,3.394291,0.4122717,3.631941,17.97729
min,283551800.0,10.0,50101.0,45.40267,-73.9562,1.0,1745384000000.0,1745384000000.0,2.2,0.0,2.0,0.0
25%,284739800.0,55.0,52164.0,45.47667,-73.66707,11.0,1745500000000.0,1745500000000.0,8.6,0.0,8.0,0.0
50%,285008400.0,121.0,54573.0,45.52035,-73.61605,22.0,1745558000000.0,1745558000000.0,10.1,0.0,10.7,3.0
75%,285282700.0,193.0,56960.0,45.57297,-73.5729,35.0,1745620000000.0,1745620000000.0,13.3,0.0,13.9,3.0
max,286592000.0,968.0,62442.0,45.70112,-73.48058,117.0,1745711000000.0,1745709000000.0,18.1,2.2,18.0,61.0


In [39]:
# Export data to CSV
df.to_csv('../data/stm_weather_merged.csv', index=False)

## End