# STM Transit Delay Data Preparation

## Data description

### Real-time STM GTFS

`current_time` timestamp of the time the data was collected<br>
`trip_id` unique identifier of a trip<br>
`route_id` bus line<br>
`start_date` schedule date<br>
`stop_id` stop number<br>
`arrival_time` actual arrival time, in milliseconds<br>
`departure_time` actual departure time, in milliseconds<br>
`schedule_relationship` state of the trip, 0 means scheduled and 1 means skipped

### Scheduled STM GTFS

`trip_id` unique identifier of a trip<br>
`arrival_time` scheduled arrival time, in milliseconds<br>
`departure_time` scheduled departure time, in milliseconds<br>
`stop_id` stop number<br>
`stop_sequence` sequence of the stop, for ordering

### Weather Archive

`time` date and hour or the archived weather<br>
`temperature` air temperature at 2 meters above ground, in Celsius<br>
`precipitation` total precipitation (rain, showers, snow) sum of the preceding hour, in millimeters<br>
`windspeed` wind speed at 10 meters above ground, in km/h<br>
`weathercode` weather condition as a numeric code, see [this table](https://open-meteo.com/en/docs#weather_variable_documentation) for details

## Imports

In [1]:
from datetime import timedelta
import pandas as pd

## Load Data

In [2]:
local_timezone = 'Canada/Eastern'

In [3]:
real_stm_df = pd.read_csv('data/fetched_stm.csv', low_memory=False)
real_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time,departure_time,schedule_relationship
0,1745385000.0,285028348,189,20250422,54433,1745384718,1745384718,0
1,1745385000.0,285028348,189,20250422,54444,1745384751,1745384751,0
2,1745385000.0,285028348,189,20250422,54445,1745384785,1745384785,0
3,1745385000.0,285028348,189,20250422,54451,1745384806,1745384806,0
4,1745385000.0,285028348,189,20250422,54456,1745384829,1745384829,0


In [4]:
real_stm_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344313 entries, 0 to 344312
Data columns (total 8 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   current_time           344313 non-null  float64
 1   trip_id                344313 non-null  int64  
 2   route_id               344313 non-null  object 
 3   start_date             344313 non-null  int64  
 4   stop_id                344313 non-null  int64  
 5   arrival_time           344313 non-null  int64  
 6   departure_time         344313 non-null  int64  
 7   schedule_relationship  344313 non-null  int64  
dtypes: float64(1), int64(6), object(1)
memory usage: 21.0+ MB


In [5]:
planned_stm_df = pd.read_csv('data/stop_times_2025-04-23.txt')
planned_stm_df.head()

Unnamed: 0,trip_id,arrival_time,departure_time,stop_id,stop_sequence
0,281570788,05:58:00,05:58:00,51095,1
1,281570788,05:59:39,05:59:39,51126,2
2,281570788,06:00:06,06:00:06,51113,3
3,281570788,06:00:44,06:00:44,51084,4
4,281570788,06:01:17,06:01:17,51063,5


In [6]:
weather_df = pd.read_csv('data/fetched_weather.csv')
weather_df

Unnamed: 0,time,temperature,precipitation,windspeed,weathercode
0,2025-04-20T00:00,10.9,0.0,21.1,3
1,2025-04-20T01:00,6.9,0.0,21.9,2
2,2025-04-20T02:00,5.1,0.0,16.3,1
3,2025-04-20T03:00,3.7,0.0,16.1,0
4,2025-04-20T04:00,2.5,0.0,16.3,0
5,2025-04-20T05:00,1.5,0.0,16.2,0
6,2025-04-20T06:00,0.5,0.0,16.2,1
7,2025-04-20T07:00,0.5,0.0,19.6,1
8,2025-04-20T08:00,1.4,0.0,20.1,1
9,2025-04-20T09:00,3.7,0.0,20.2,1


## Merge Data

In [7]:
# Merge real and planned STM data
merged_stm_df = pd.merge(left=real_stm_df, right=planned_stm_df, how='inner', on=['trip_id', 'stop_id'])
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence
0,1745385000.0,285028348,189,20250422,54433,1745384718,1745384718,0,25:05:08,25:05:08,20
1,1745385000.0,285028348,189,20250422,54444,1745384751,1745384751,0,25:05:51,25:05:51,21
2,1745385000.0,285028348,189,20250422,54445,1745384785,1745384785,0,25:06:25,25:06:25,22
3,1745385000.0,285028348,189,20250422,54451,1745384806,1745384806,0,25:06:46,25:06:46,23
4,1745385000.0,285028348,189,20250422,54456,1745384829,1745384829,0,25:07:09,25:07:09,24


In [8]:
merged_stm_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 341818 entries, 0 to 341817
Data columns (total 11 columns):
 #   Column                 Non-Null Count   Dtype  
---  ------                 --------------   -----  
 0   current_time           341818 non-null  float64
 1   trip_id                341818 non-null  int64  
 2   route_id               341818 non-null  object 
 3   start_date             341818 non-null  int64  
 4   stop_id                341818 non-null  int64  
 5   arrival_time_x         341818 non-null  int64  
 6   departure_time_x       341818 non-null  int64  
 7   schedule_relationship  341818 non-null  int64  
 8   arrival_time_y         341818 non-null  object 
 9   departure_time_y       341818 non-null  object 
 10  stop_sequence          341818 non-null  int64  
dtypes: float64(1), int64(7), object(3)
memory usage: 28.7+ MB


In [9]:
# Convert start_date to datetime
merged_stm_df['start_date'] = pd.to_datetime(merged_stm_df['start_date'], format='%Y%m%d')
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24


In [10]:
def parse_gtfs_time(time_str:str, service_date:pd.Timestamp) -> pd.Timestamp:
	'''
	Converts GTFS time string (e.g., '25:30:00') to datetime
	based on the service date.
	'''
	hours, minutes, seconds = map(int, time_str.split(':'))
	total_seconds = hours * 3600 + minutes * 60 + seconds

	parsed_time = service_date + timedelta(seconds=total_seconds)
	return parsed_time

In [11]:
# Convert planned arrival time to localized datetime
merged_stm_df['scheduled_arrival_time'] = merged_stm_df.apply(lambda row: parse_gtfs_time(row['arrival_time_y'], row['start_date']), axis=1)
merged_stm_df['scheduled_arrival_time'] = merged_stm_df['scheduled_arrival_time'].dt.tz_localize(local_timezone)
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,2025-04-23 01:05:08-04:00
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,2025-04-23 01:05:51-04:00
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,2025-04-23 01:06:25-04:00
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,2025-04-23 01:06:46-04:00
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,2025-04-23 01:07:09-04:00


In [12]:
# Convert planned departure time to localized datetime
merged_stm_df['scheduled_departure_time'] = merged_stm_df.apply(lambda row: parse_gtfs_time(row['departure_time_y'], row['start_date']), axis=1)
merged_stm_df['scheduled_departure_time'] = merged_stm_df['scheduled_departure_time'].dt.tz_localize(local_timezone)
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time,scheduled_departure_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,2025-04-23 01:05:08-04:00,2025-04-23 01:05:08-04:00
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,2025-04-23 01:05:51-04:00,2025-04-23 01:05:51-04:00
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,2025-04-23 01:06:25-04:00,2025-04-23 01:06:25-04:00
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,2025-04-23 01:06:46-04:00,2025-04-23 01:06:46-04:00
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,2025-04-23 01:07:09-04:00,2025-04-23 01:07:09-04:00


In [13]:
# Convert planned times to timestamp in milliseconds since epoch
merged_stm_df['scheduled_arrival_time'] = merged_stm_df['scheduled_arrival_time'].astype('int64') // 10**6
merged_stm_df['scheduled_departure_time'] = merged_stm_df['scheduled_departure_time'].astype('int64') // 10**6
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,arrival_time_x,departure_time_x,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time,scheduled_departure_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,1745384708000,1745384708000
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,1745384751000,1745384751000
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,1745384785000,1745384785000
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,1745384806000,1745384806000
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,1745384829000,1745384829000


In [14]:
# Rename real time columns
merged_stm_df = merged_stm_df.rename(columns={'arrival_time_x': 'realtime_arrival_time', 'departure_time_x': 'realtime_departure_time'})
merged_stm_df.head()

Unnamed: 0,current_time,trip_id,route_id,start_date,stop_id,realtime_arrival_time,realtime_departure_time,schedule_relationship,arrival_time_y,departure_time_y,stop_sequence,scheduled_arrival_time,scheduled_departure_time
0,1745385000.0,285028348,189,2025-04-22,54433,1745384718,1745384718,0,25:05:08,25:05:08,20,1745384708000,1745384708000
1,1745385000.0,285028348,189,2025-04-22,54444,1745384751,1745384751,0,25:05:51,25:05:51,21,1745384751000,1745384751000
2,1745385000.0,285028348,189,2025-04-22,54445,1745384785,1745384785,0,25:06:25,25:06:25,22,1745384785000,1745384785000
3,1745385000.0,285028348,189,2025-04-22,54451,1745384806,1745384806,0,25:06:46,25:06:46,23,1745384806000,1745384806000
4,1745385000.0,285028348,189,2025-04-22,54456,1745384829,1745384829,0,25:07:09,25:07:09,24,1745384829000,1745384829000


In [15]:
# Keep relevant columns
merged_stm_df = merged_stm_df[[
  'trip_id',
  'route_id',
  'stop_id',
  'stop_sequence',
  'realtime_arrival_time',
  'scheduled_arrival_time',
  'realtime_departure_time',
  'scheduled_departure_time',
  'schedule_relationship'
]]
merged_stm_df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_sequence,realtime_arrival_time,scheduled_arrival_time,realtime_departure_time,scheduled_departure_time,schedule_relationship
0,285028348,189,54433,20,1745384718,1745384708000,1745384718,1745384708000,0
1,285028348,189,54444,21,1745384751,1745384751000,1745384751,1745384751000,0
2,285028348,189,54445,22,1745384785,1745384785000,1745384785,1745384785000,0
3,285028348,189,54451,23,1745384806,1745384806000,1745384806,1745384806000,0
4,285028348,189,54456,24,1745384829,1745384829000,1745384829,1745384829000,0


In [16]:
# Get duplicates
duplicate_mask = merged_stm_df.duplicated()
merged_stm_df[duplicate_mask]

Unnamed: 0,trip_id,route_id,stop_id,stop_sequence,realtime_arrival_time,scheduled_arrival_time,realtime_departure_time,scheduled_departure_time,schedule_relationship
260083,285285001,811,62138,1,1745447520,1745444760000,0,1745444760000,0
260084,285285001,811,62138,22,1745447520,1745447520000,0,1745447520000,0
265217,285285013,811,62138,1,1745448180,1745445480000,0,1745445480000,0
265218,285285013,811,62138,22,1745448180,1745448180000,0,1745448180000,0
305782,285007882,72,55717,34,1745450840,1745450840000,1745450840,1745450840000,0
...,...,...,...,...,...,...,...,...,...
341813,284759120,61,52617,49,1745451807,1745451807000,1745451807,1745451807000,0
341814,284759120,61,62338,50,1745451854,1745451854000,1745451854,1745451854000,0
341815,284759120,61,52482,51,1745451894,1745451894000,1745451894,1745451894000,0
341816,284759120,61,52397,52,0,1745451960000,0,1745451960000,1


In [17]:
# Remove duplicates
merged_stm_df = merged_stm_df.drop_duplicates(keep='last')

In [18]:
# Convert arrival timestamp to datetime
arrival_time = pd.to_datetime(merged_stm_df['realtime_arrival_time'] * 1000, origin='unix', unit='ms', utc=True)
arrival_time

0        2025-04-23 05:05:18+00:00
1        2025-04-23 05:05:51+00:00
2        2025-04-23 05:06:25+00:00
3        2025-04-23 05:06:46+00:00
4        2025-04-23 05:07:09+00:00
                    ...           
341813   2025-04-23 23:43:27+00:00
341814   2025-04-23 23:44:14+00:00
341815   2025-04-23 23:44:54+00:00
341816   1970-01-01 00:00:00+00:00
341817   2025-04-23 23:49:00+00:00
Name: realtime_arrival_time, Length: 338405, dtype: datetime64[ns, UTC]

In [19]:
# TODO: remove this cell after collecting historical data
# Remove 3 days to match historical data
arrival_time = arrival_time - pd.DateOffset(days=3)
arrival_time

0        2025-04-20 05:05:18+00:00
1        2025-04-20 05:05:51+00:00
2        2025-04-20 05:06:25+00:00
3        2025-04-20 05:06:46+00:00
4        2025-04-20 05:07:09+00:00
                    ...           
341813   2025-04-20 23:43:27+00:00
341814   2025-04-20 23:44:14+00:00
341815   2025-04-20 23:44:54+00:00
341816   1969-12-29 00:00:00+00:00
341817   2025-04-20 23:49:00+00:00
Name: realtime_arrival_time, Length: 338405, dtype: datetime64[ns, UTC]

In [20]:
# Round arrival time string to nearest hour to match weather data
merged_stm_df['time'] = arrival_time.dt.strftime('%Y-%m-%dT%H:00')
merged_stm_df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_sequence,realtime_arrival_time,scheduled_arrival_time,realtime_departure_time,scheduled_departure_time,schedule_relationship,time
0,285028348,189,54433,20,1745384718,1745384708000,1745384718,1745384708000,0,2025-04-20T05:00
1,285028348,189,54444,21,1745384751,1745384751000,1745384751,1745384751000,0,2025-04-20T05:00
2,285028348,189,54445,22,1745384785,1745384785000,1745384785,1745384785000,0,2025-04-20T05:00
3,285028348,189,54451,23,1745384806,1745384806000,1745384806,1745384806000,0,2025-04-20T05:00
4,285028348,189,54456,24,1745384829,1745384829000,1745384829,1745384829000,0,2025-04-20T05:00


In [21]:
# Merge STM data with weather data
df = pd.merge(left=merged_stm_df, right=weather_df, how='inner', on='time').drop('time', axis=1)
df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_sequence,realtime_arrival_time,scheduled_arrival_time,realtime_departure_time,scheduled_departure_time,schedule_relationship,temperature,precipitation,windspeed,weathercode
0,285028348,189,54433,20,1745384718,1745384708000,1745384718,1745384708000,0,1.5,0.0,16.2,0
1,285028348,189,54444,21,1745384751,1745384751000,1745384751,1745384751000,0,1.5,0.0,16.2,0
2,285028348,189,54445,22,1745384785,1745384785000,1745384785,1745384785000,0,1.5,0.0,16.2,0
3,285028348,189,54451,23,1745384806,1745384806000,1745384806,1745384806000,0,1.5,0.0,16.2,0
4,285028348,189,54456,24,1745384829,1745384829000,1745384829,1745384829000,0,1.5,0.0,16.2,0


In [22]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 311035 entries, 0 to 311034
Data columns (total 13 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   trip_id                   311035 non-null  int64  
 1   route_id                  311035 non-null  object 
 2   stop_id                   311035 non-null  int64  
 3   stop_sequence             311035 non-null  int64  
 4   realtime_arrival_time     311035 non-null  int64  
 5   scheduled_arrival_time    311035 non-null  int64  
 6   realtime_departure_time   311035 non-null  int64  
 7   scheduled_departure_time  311035 non-null  int64  
 8   schedule_relationship     311035 non-null  int64  
 9   temperature               311035 non-null  float64
 10  precipitation             311035 non-null  float64
 11  windspeed                 311035 non-null  float64
 12  weathercode               311035 non-null  int64  
dtypes: float64(3), int64(9), object(1)
memory us

In [23]:
rt_arrival_dt = pd.to_datetime(df['realtime_arrival_time'] * 1000, origin='unix', unit='ms', utc=True)
rt_arrival_dt

0        2025-04-23 05:05:18+00:00
1        2025-04-23 05:05:51+00:00
2        2025-04-23 05:06:25+00:00
3        2025-04-23 05:06:46+00:00
4        2025-04-23 05:07:09+00:00
                    ...           
311030   2025-04-23 23:42:51+00:00
311031   2025-04-23 23:43:27+00:00
311032   2025-04-23 23:44:14+00:00
311033   2025-04-23 23:44:54+00:00
311034   2025-04-23 23:49:00+00:00
Name: realtime_arrival_time, Length: 311035, dtype: datetime64[ns, UTC]

In [24]:
sch_arrival_dt = pd.to_datetime(df['scheduled_arrival_time'], origin='unix', unit='ms', utc=True)
sch_arrival_dt

0        2025-04-23 05:05:08+00:00
1        2025-04-23 05:05:51+00:00
2        2025-04-23 05:06:25+00:00
3        2025-04-23 05:06:46+00:00
4        2025-04-23 05:07:09+00:00
                    ...           
311030   2025-04-23 23:42:51+00:00
311031   2025-04-23 23:43:27+00:00
311032   2025-04-23 23:44:14+00:00
311033   2025-04-23 23:44:54+00:00
311034   2025-04-23 23:49:00+00:00
Name: scheduled_arrival_time, Length: 311035, dtype: datetime64[ns, UTC]

In [25]:
# Calculate delay in seconds (real - scheduled)
df['delay'] = (rt_arrival_dt - sch_arrival_dt) / pd.Timedelta(seconds=1)
df.head()

Unnamed: 0,trip_id,route_id,stop_id,stop_sequence,realtime_arrival_time,scheduled_arrival_time,realtime_departure_time,scheduled_departure_time,schedule_relationship,temperature,precipitation,windspeed,weathercode,delay
0,285028348,189,54433,20,1745384718,1745384708000,1745384718,1745384708000,0,1.5,0.0,16.2,0,10.0
1,285028348,189,54444,21,1745384751,1745384751000,1745384751,1745384751000,0,1.5,0.0,16.2,0,0.0
2,285028348,189,54445,22,1745384785,1745384785000,1745384785,1745384785000,0,1.5,0.0,16.2,0,0.0
3,285028348,189,54451,23,1745384806,1745384806000,1745384806,1745384806000,0,1.5,0.0,16.2,0,0.0
4,285028348,189,54456,24,1745384829,1745384829000,1745384829,1745384829000,0,1.5,0.0,16.2,0,0.0


In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 311035 entries, 0 to 311034
Data columns (total 14 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   trip_id                   311035 non-null  int64  
 1   route_id                  311035 non-null  object 
 2   stop_id                   311035 non-null  int64  
 3   stop_sequence             311035 non-null  int64  
 4   realtime_arrival_time     311035 non-null  int64  
 5   scheduled_arrival_time    311035 non-null  int64  
 6   realtime_departure_time   311035 non-null  int64  
 7   scheduled_departure_time  311035 non-null  int64  
 8   schedule_relationship     311035 non-null  int64  
 9   temperature               311035 non-null  float64
 10  precipitation             311035 non-null  float64
 11  windspeed                 311035 non-null  float64
 12  weathercode               311035 non-null  int64  
 13  delay                     311035 non-null  f

In [29]:
df.describe()

Unnamed: 0,trip_id,stop_id,stop_sequence,realtime_arrival_time,scheduled_arrival_time,realtime_departure_time,scheduled_departure_time,schedule_relationship,temperature,precipitation,windspeed,weathercode,delay
count,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0,311035.0
mean,285207000.0,54842.414828,25.516926,1745417000.0,1745417000000.0,1691450000.0,1745417000000.0,0.0,5.228688,0.0,16.137928,0.851882,46.669349
std,636383.1,3167.413859,17.4138,18038.56,18034050.0,302130900.0,18034070.0,0.0,2.005994,0.0,4.928912,1.161018,252.967438
min,284726600.0,50101.0,1.0,1745384000.0,1745384000000.0,0.0,1745384000000.0,0.0,0.5,0.0,6.3,0.0,-8166.0
25%,284776800.0,52164.0,12.0,1745407000.0,1745407000000.0,1745406000.0,1745407000000.0,0.0,4.3,0.0,16.2,0.0,0.0
50%,285008600.0,54609.0,23.0,1745412000.0,1745412000000.0,1745412000.0,1745412000000.0,0.0,5.2,0.0,18.2,0.0,0.0
75%,285282400.0,56963.0,35.0,1745420000.0,1745420000000.0,1745420000.0,1745420000000.0,0.0,7.1,0.0,19.0,1.0,0.0
max,286574700.0,62442.0,117.0,1745453000.0,1745453000000.0,1745453000.0,1745453000000.0,0.0,9.1,0.0,20.6,3.0,9689.0


In [27]:
df.nunique()

trip_id                      8300
route_id                      208
stop_id                      8361
stop_sequence                 117
realtime_arrival_time       42042
scheduled_arrival_time      41418
realtime_departure_time     41944
scheduled_departure_time    41418
schedule_relationship           1
temperature                    13
precipitation                   1
windspeed                      12
weathercode                     3
delay                        1428
dtype: int64

In [28]:
# Export data to CSV
df.to_csv('data/stm_weather_merged.csv', index=False)

## End