# WeGo Public Transit
[WeGo Public Transit](https://www.wegotransit.com/) is a public transit system serving the Greater Nashville and Davidson County area. WeGo provides local and regional bus routes, the WeGo Star train service connecting Lebanon to downtown Nashville, along with several other transit services.

In this project, you'll be analyzing the bus spacing to look for patterns and try to identify correlations to controllable or external factors. Specifically, you'll be using a dataset containing information on the headway, or amount of time between vehicle arrivals at a stop. This dataset contains a column HDWY_DEV, which shows the headway deviation. This variable will be negative when bunching has occurred (shorter headway than scheduled) and will be positive for gapping (longer headway than scheduled). Note that you can calculate headway deviation percentage as HDWY_DEV/SCHEDULED_HDWY.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns

In [2]:
wego_excel= pd.read_excel('../data/Headway Data, 8-1-2023 to 9-30-2023.xlsx')
wego_excel

Unnamed: 0,CALENDAR_ID,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,...,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS
0,120230801,1,99457890,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,2,6.500000
1,120230801,1,99457891,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,9,0.000000
2,120230801,1,99457892,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,0.0,0,19,0.000000
3,120230801,1,99457893,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,...,,,0,0,1,0,,0,35,0.000000
4,120230801,1,99457894,2023-08-01,22,2200,1040,345105,0,FROM DOWNTOWN,...,,,0,0,1,0,0.0,0,36,12.866666
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
350323,120230930,2,100406610,2023-09-30,7,702,2312,353448,0,TO DOWNTOWN,...,35.266666,5.266666,0,1,0,0,0.0,0,539,0.000000
350324,120230930,2,100406611,2023-09-30,7,702,2312,353448,0,TO DOWNTOWN,...,,,0,1,0,0,,0,551,0.000000
350325,120230930,2,100406612,2023-09-30,7,702,2312,353449,0,FROM DOWNTOWN,...,31.866666,1.866666,0,0,1,0,0.0,0,552,0.000000
350326,120230930,2,100406613,2023-09-30,7,702,2312,353449,0,FROM DOWNTOWN,...,47.500000,17.500000,0,1,0,0,0.0,0,563,0.000000


In [3]:
wego_excel.to_csv('../data/wego.csv', index=False)

In [4]:
pd.set_option('display.max_columns', None)

In [5]:
wego=pd.read_csv('../data/wego.csv')
wego

Unnamed: 0,CALENDAR_ID,SERVICE_ABBR,ADHERENCE_ID,DATE,ROUTE_ABBR,BLOCK_ABBR,OPERATOR,TRIP_ID,OVERLOAD_ID,ROUTE_DIRECTION_NAME,TIME_POINT_ABBR,ROUTE_STOP_SEQUENCE,TRIP_EDGE,LATITUDE,LONGITUDE,SCHEDULED_TIME,ACTUAL_ARRIVAL_TIME,ACTUAL_DEPARTURE_TIME,ADHERENCE,SCHEDULED_HDWY,ACTUAL_HDWY,HDWY_DEV,ADJUSTED_EARLY_COUNT,ADJUSTED_LATE_COUNT,ADJUSTED_ONTIME_COUNT,STOP_CANCELLED,PREV_SCHED_STOP_CANCELLED,IS_RELIEF,BLOCK_STOP_ORDER,DWELL_IN_MINS
0,120230801,1,99457890,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,MHSP,14.0,1,36.181248,-86.847705,04:42:00,04:37:38,04:44:08,-2.133333,,,,0,0,1,0,0.0,0,2,6.500000
1,120230801,1,99457891,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,ELIZ,10.0,0,36.193454,-86.839981,04:46:00,04:48:27,04:48:27,-2.450000,,,,0,0,1,0,0.0,0,9,0.000000
2,120230801,1,99457892,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,CV23,5.0,0,36.182177,-86.814445,04:54:00,04:54:56,04:54:56,-0.933333,,,,0,0,1,0,0.0,0,19,0.000000
3,120230801,1,99457893,2023-08-01,22,2200,1040,345104,0,TO DOWNTOWN,MCC5_10,1.0,2,36.167091,-86.781923,05:10:00,05:03:43,05:03:43,6.283333,,,,0,0,1,0,,0,35,0.000000
4,120230801,1,99457894,2023-08-01,22,2200,1040,345105,0,FROM DOWNTOWN,MCC5_10,1.0,1,36.167091,-86.781923,05:15:00,05:03:43,05:16:35,-1.583333,,,,0,0,1,0,0.0,0,36,12.866666
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
350323,120230930,2,100406610,2023-09-30,7,702,2312,353448,0,TO DOWNTOWN,21BK,4.0,0,36.138372,-86.800622,22:23:00,22:31:26,22:31:26,-8.433333,30.0,35.266666,5.266666,0,1,0,0,0.0,0,539,0.000000
350324,120230930,2,100406611,2023-09-30,7,702,2312,353448,0,TO DOWNTOWN,MCC5_9,3.0,2,36.167091,-86.781923,22:38:00,22:49:18,22:49:18,-11.300000,,,,0,1,0,0,,0,551,0.000000
350325,120230930,2,100406612,2023-09-30,7,702,2312,353449,0,FROM DOWNTOWN,MCC5_9,3.0,1,36.167091,-86.781923,22:45:00,22:49:19,22:49:19,-4.316666,30.0,31.866666,1.866666,0,0,1,0,0.0,0,552,0.000000
350326,120230930,2,100406613,2023-09-30,7,702,2312,353449,0,FROM DOWNTOWN,21BK,4.0,0,36.138881,-86.800622,22:59:00,23:21:05,23:21:05,-22.083333,30.0,47.500000,17.500000,0,1,0,0,0.0,0,563,0.000000


In [6]:
wego.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 350328 entries, 0 to 350327
Data columns (total 30 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   CALENDAR_ID                350328 non-null  int64  
 1   SERVICE_ABBR               350328 non-null  int64  
 2   ADHERENCE_ID               350328 non-null  int64  
 3   DATE                       350328 non-null  object 
 4   ROUTE_ABBR                 350328 non-null  int64  
 5   BLOCK_ABBR                 350328 non-null  int64  
 6   OPERATOR                   350328 non-null  int64  
 7   TRIP_ID                    350328 non-null  int64  
 8   OVERLOAD_ID                350328 non-null  int64  
 9   ROUTE_DIRECTION_NAME       350328 non-null  object 
 10  TIME_POINT_ABBR            350328 non-null  object 
 11  ROUTE_STOP_SEQUENCE        350317 non-null  float64
 12  TRIP_EDGE                  350328 non-null  int64  
 13  LATITUDE                   35

In [7]:
wego.iloc[119, :]

CALENDAR_ID                            120230801
SERVICE_ABBR                                   1
ADHERENCE_ID                            99458009
DATE                                  2023-08-01
ROUTE_ABBR                                    22
BLOCK_ABBR                                  2200
OPERATOR                                    2689
TRIP_ID                                   345139
OVERLOAD_ID                                    0
ROUTE_DIRECTION_NAME                 TO DOWNTOWN
TIME_POINT_ABBR                          MCC5_10
ROUTE_STOP_SEQUENCE                          1.0
TRIP_EDGE                                      2
LATITUDE                               36.167091
LONGITUDE                             -86.781923
SCHEDULED_TIME               1900-01-01 00:08:00
ACTUAL_ARRIVAL_TIME          1900-01-01 00:04:51
ACTUAL_DEPARTURE_TIME        1900-01-01 00:04:51
ADHERENCE                                   3.15
SCHEDULED_HDWY                               NaN
ACTUAL_HDWY         

In [8]:
wego['DATE'] = pd.to_datetime(wego['DATE'], format = "%Y-%m-%d")
wego['SCHEDULED_TIME'] = pd.to_datetime(wego['SCHEDULED_TIME'], format = "%H:%M:%S")
#wego['ACTUAL_ARRIVAL_TIME'] = pd.to_datetime(wego['ACTUAL_ARRIVAL_TIME'], format = "%H:%M:%S")
#wego['ACTUAL_DEPARTURE_TIME'] = pd.to_datetime(wego['ACTUAL_DEPARTURE_TIME'], format = "%H:%M:%S")

ValueError: time data "1900-01-01 00:08:00" doesn't match format "%H:%M:%S", at position 119. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

Goals of this project:
1. How much impact does being late or too spaced out at the first stop have downstream?

2. What is the impact of the layover at the start of the trip (the difference between the first top arrival and departure time)? Does more layover lead to more stable headways (lower values for % headway deviation)?

3. How closely does lateness (ADHERENCE) correlate to headway?

4. What is the relationship between distance or time travelled since the start of a given trip and the headway deviation? Does headway become less stable the further along the route the bus has travelled?

5. How much of a factor does the driver have on headway and on-time performance? The driver is indicated by the OPERATOR variable.

6. How does direction of travel, route, or location affect the headway and on-time performance?

7. How does time of day or day of week affect headway and on-time performance? Can you detect an impact of school schedule on headway deviation (for certain routes and at certain times of day)?

8. Does weather have any effect on headway or on-time performance? To help answer this question, the file bna_weather.csv contains historical weather data recorded at Nashville International Airport. 