# iSEPTA Philly Project
## "The Tappiest Time of the Year" Contest

### by John A. Fonte

__Goal:__ Guess the total number of SEPTA key taps done on Friday, December 20, 2019 _without going over._

- __Definition:__ A SEPTA "key tap" is a payment for public transportation fare by way of "tapping" the recently introduced magnetic cards onto digital turnstiles. <br><br>
  - For purposes of this project, a "key tap" shall refer to a single use of a SEPTA key card for payment of fare.

__Link to Contest:__ https://www.iseptaphilly.com/contests/172

   - (_Update:_ The contest closed on December 20, 2019.)

***

***
# OVERVIEW OF PROJECT

We need to perform the following for our analysis:

1. Figure out what data to obtain.
2. Where to go to obtain said data.
3. Within the desired data, figure out what metric to use to determine SEPTA key taps.
4. Perform data analysis and data modeling, assuming that there is enough available relevant data.
5. Draw Conclusions

## 1. What Data includes SEPTA Key Card Taps?

SEPTA key taps are done for fare on all four modes of Philadelphia public transportation: 

   1. Subway (Market-Frankford Line (MFL) and Broad Street Line (BSL)) 
   2. Buses
   3. Trolleys 
   4. Regional Rail Lines
   
SEPTA key cards were introduced tpo the public on February 9, 2017, according to the timeline set forth on the [Wikipedia page](https://en.wikipedia.org/wiki/SEPTA_Key).
   
### Data Limitation #1: No Historic Key Card Information for Regional Rail Lines

The use of SEPTA key cards is a recent change to Philadelphia's public transportation system, and indeed, SEPTA is still in the process of transitioning. Notably, the transition from standard payment to key taps for regional rail lines is [still in flux](https://www.inquirer.com/transportation/septa-key-regional-rail-rollout-travel-wallet-20191118.html). It is unclear when the transition will be completed; according to the Wikipedia timeline, as of now, the regional rail transition is still in its early adoption program phase.

As such, there is not sufficient information to determine what portion of regional rail riders are currently using the SEPTA key card as the transition is so new (and quite frankly, it is unclear if this is calculated into the total number on iSEPTA Philly's end).

For this reason, we will not include regional rail ridership into the analysis.

### Data Limitation #2: Fares Paid without SEPTA Key Card

Although the SEPTA key card has made fare tokens obsolete, individuals may still pay for fare via single "quick trip" purchases at digital kiosks or with cash via a SEPTA representative in person. As we will see, the data does not show a distinction between key tap ridership and non-key tap ridership.
***

***
## 2. Where to Find SEPTA Data

Luckily, Philadelphia is fairly progressive in offering open data. Here are some open data links used to obtain the data:

1. __[SEPTA Open Data](http://septaopendata-septa.opendata.arcgis.com/):__ Provides .csv compilations of SEPTA statistics for each mode of transportation described above, along with aggregations of data.


2. __Federal Transit Adminsitration's [National Transit Database](https://www.transit.dot.gov/ntd/what-national-transit-database-ntd-program):__ Has reports on all major public transportation systems, in both aggregated and raw data form.


3. __Various Aggregated Reports:__ These reports do not provide analyzable tabular data, but they are still good reference points/sanity checks for a ballpark estimation as to where are analyzed values should be.

    - _SEPTA Annual Reports_: Annual Reports provided by SEPTA itself, such as [this one for November 2019](http://septa.org/strategic-plan/pdf/2019-11-revenue-ride.pdf)!<br><br>
    
    - _Center City District & Central Philadelphia Development Corporation_: Has Annual Reports such as this [2018 Annual Report](https://centercityphila.org/uploads/attachments/cjusnd2j40fjnukqdy7j9p2ng-socc-2019-transportation.pdf).
    

Here are some open data links that seemed relevant but are __NOT__ used in this project:

1. __[Scraped Septa-stats.com Data](https://www.dropbox.com/sh/3jnvonaqtmvc3wh/AACvwz3DMTXrW56P8xBUUIcSa?dl=0):__ This is voluminous data on each and every SEPTA service route and timestamps for each stop on each and every service route. This data does not include ridership or revenue stats - only the routes themselves.
    - Plus...I am pretty sure the .json format is broken because a parsing error is thrown on the first line! I think there is no delimiter (i.e., comma) between json lines, and that is simply not worth fixing, especially for irrelevant data.
***

In [None]:
# IMPORT OF RAW DATA

import pandas as pd

In [23]:
'''
Data obtained from SEPTA Open Data and
the FTA National Transit Database. These
data files were downloaded onto local terminal.
'''
# (local path reads)

## Subway Aggregated Datasets - Spring 2018 (with revenue of Fiscal Year 2017)

df_BSLagg = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA__Broad_Street_Line.csv')
df_MFLagg = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Market_Frankford_Line.csv')
df_Norrisagg = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA__Norristown_Highspeed_Line - agg.csv')

In [None]:
## Bus & Trolley Aggregated Datasets - Spring 2018

df_trolleyagg = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Trolley - Spring 2018 revenue.csv')
df_busagg = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Bus - Spring 2018 revenue.csv')

In [25]:
## Subway Non-Aggregated Datasets - Spring 2018

df_BSL = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_BSL - Spring 2018.csv')
df_MFL = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_MFL - Spring 2018.csv')

### (Norristown - Spring 2017)

df_Norris = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA__Norristown_Highspeed_line_Stations.csv')

In [26]:
## Bus & Trolley Non-Aggregated Datasets - Spring 2018 and Spring 2019

df_bus2018 = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Bus - Spring 2018.csv')
df_bus2019 = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Bus - Spring 2019.csv')
df_trolley = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Trolley - Spring 2018.csv')

  interactivity=interactivity, compiler=compiler, result=result)


***
## 3. What Target Metric Should be Used for Calculating SEPTA Key Taps?

If there was a data series of "SEPTA Card Key Taps", the analysis would be fairly straightforward. Upon review of the data (as taken from above), there is no such data series.  Therefore, we will have to construct our own translation between the target variable (number of key taps) and the available features.

While feature exploration/selection is always a critical part of data analysis, a good starting point is to look more into _ridership_ and _revenue_. 

***

***
## 4. Data Analysis

Now that we 

In [18]:
df = pd.read_csv('D:/Github/Data-Science-Bootcamp/SEPTA-Competition-Project/SEPTA_Market_Frankford_Line.csv')

In [19]:
df.head()

Unnamed: 0,OBJECTID,Route,Vehicle_Hours,Vehicle_Miles,Peak_Vehicles,Average_Weekday_Passengers,Annual_Passengers,Average_Trip_Length,Passenger_Miles,Passenger_Revenue,Variable_Expenses,Variable_Recovery,Fully_Allocated_Expenses,Operating_Ratio,GlobalID,Shape__Length
0,1,Market-Frankford Line,470930,9230221,144,180512,54767414,4.7,257406800,59810392,137752084,158,136694484,44,9b7ec014-18d2-42af-a482-a86be8a370e6,0.237995
