# 1.2 | Data Acquisition: Manual BART Addendum
* [01 API Data Requests](01_API_pulls.ipynb)
* _[01.1 Additional BART Data](01_v2_bart.ipynb.ipynb)_
* [02 Initial EDA](02_EDA.ipynb)
* [03 First Model: PROPHET](03_prophet.ipynb)
---

### [BART](bart.gov) Reporting changed in mid-2018
* this notebook extracts monthly ridership counts for `August 2018` through `April 2022` from monthly reports archived in yearly directories, appends all to file that is appened to data obtained in previous API requests. 


In [2]:
##### BASIC IMPORTS 
import numpy as np
import pandas as pd
import glob
import os

In [8]:
path = '../data/raw/bart/'
file = 'customer-ridership.csv'

filename = path + file
df = pd.read_csv(filename)

df.shape

FileNotFoundError: [Errno 2] No such file or directory: '../data/raw/bart/customer-ridership.csv'

In [7]:
df.tail()

Unnamed: 0.1,Unnamed: 0,RM,EN,EP,NB,BK,AS,MA,19,12,...,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64
46,ML,3.545455,9.5,4.772727,5.409091,44.045455,10.636364,14.772727,20.909091,21.590909,...,,,,,,,,,,
47,BE,6.590909,11.727273,6.045455,6.5,60.090909,11.045455,20.409091,18.772727,32.090909,...,,,,,,,,,,
48,PC,6.181818,3.363636,1.772727,1.363636,9.363636,1.5,23.0,7.818182,19.954545,...,,,,,,,,,,
49,AN,7.5,9.045455,2.318182,3.545455,22.590909,5.954545,52.727273,39.636364,70.5,...,,,,,,,,,,
50,Entries,2038.5,3062.590909,1658.727273,1430.318182,4216.363636,1663.681818,3123.590909,3401.909091,3953.590909,...,,,,,,,,,,


In [None]:
df.rename(columns = {'Unnamed: 0': 'exit '}, inplace=True)          # rename column 0 column to dictionary 
df.drop(columns = ['RIDERSHIP GOAL'], inplace = True)

new_col = {
    'RIDERSHIP WEEKAVG' : 'ridership',
    'FiscalMonth':'month',
    'FiscalYear':'year', 
}

df.rename(columns = new_col, inplace = True)
bart = df
bart.head()

In [None]:
# add new cols from old date colum
bart['day'] = '01'
# bart['month'] = bart['month'].apply(lambda x: '0' + str(x) if x < 10 else x )
bart['ds'] = bart['year'].astype(str) + '-' + bart['month'].astype(str) + '-01'

bart['ridership'] = 4*bart['ridership'].astype(int) # ridershiop is weekly, assume 4-week months

In [None]:
bart['date'] = pd.to_datetime(bart['ds'])
bart.index = bart['date']
bart.sort_index(inplace=True)

bart_out = bart[['ds', 'ridership']]

In [None]:
bart_out.tail()

### Manually extracting data for BART 2018 forward 

This function: 
* goes through one folder
* goes through each file 
* gets a monthly ridership value
* returns values for a a year

This function is called in a loop that iterates over a list of 5 years to concact all data

In [None]:
def get_monthly_bart(year):
    path = '../data/raw/bart/'
    folder = 'ridership_' + str(year) + '/'

    files = os.listdir( path + folder )

    df_year = []

    for file in files:
        filename = path + folder + file
        df_in = pd.read_excel(filename, None, skiprows = 1) 
        
        rides = df_in['Total Trips OD']['Exits'][50] 
        rides = int(rides/7)   # divide by 7 days

        ds = str(year) + '-' + file[14:16] + '-01'

        bart_month = (ds, rides)
        df_year.append(bart_month)

    return(df_year)

In [None]:
years = [2018, 2019, 2020, 2021, 2022]
all_years = []

for year in years:
    each_year = get_monthly_bart(year)
    all_years.extend(each_year)

In [None]:
pip install openpyxl

In [None]:
all_years = pd.DataFrame(all_years)
all_years.columns = ('ds', 'ridership')
all_years.head()

all_years['date'] = pd.to_datetime(all_years['ds'])
all_years.sort_index(inplace=True)
all_years = all_years.set_index('date')

In [None]:
merged = pd.concat([bart_out['2000-01-01':], all_years])
merged.sort_index(inplace=True)
merged.info()

In [None]:
merged.head()

In [None]:
merged.tail()

In [None]:
merged.to_csv('../data/processed/bart.csv', index = False)