#  1 |  Data Acquisition: API Requests
---

* _[01 API Requests](01_API_pulls.ipynb)_
* [02 Initial EDA](02_EDA.ipynb)
* [03 First Model: PROPHET](03_prophet.ipynb)
---


### Requests for `4` datasets via respective APIs. 

* `PROPHET` requires `df` with `2` series, columns `ds` and `y`. 
 In this notebook, I document API pulls and data manipulation to create new `csv` files in an appropriate format to slice as needed in modeling.

#### <b>Data Sources</b>

* [BART](https://data.bart.gov/dataset/customer-ridership/resource/6e653520-58cf-45c5-b40c-d37c8957ec77) publishes monthly ridership reports, using faregate information [monthly totals](bart.csv)- [source](https://data.bart.gov/group/ridership)

* <b>Gas Prices</b> [Fuel data from US Energy Information Admininstration](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M)
    * weekly: [fuel_m.csv](../data/processed/fuel_w.csv) - [source](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.W)
    * monthly data saved: [fuel_w.csv](../processed/fuel_m.csv) - [source](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M)    
<br>

* <b>Car registrations </b> [total for state](cars.csv) - ([CA DMV Vehicle Annual Count](https://www.energy.ca.gov/data-reports/energy-almanac/zero-emission-vehicle-and-infrastructure-statistics/vehicle-population)) ([API for counts by zip code](https://data.ca.gov/api/3/action/datastore_search?resource_id=888bbb6c-09b4-469c-82e6-1b2a47439736' ))

* <b>Consumer Debt</b> [annual](debt.csv) ([Annual Consumer Debt](https://www.federalreserve.gov/datadownload/Choose.aspx?rel=G19))

> Will not be using: ~~[caltrain](caltrain.com)  Publishes yearly ridership counts, based on model validated with a yearly county daily in January.~~

In [3]:
##### BASIC IMPORTS 
import numpy as np
import pandas as pd
import requests

In [4]:
# CUSTOM IMPORTS AND SETTINGS 
pd.options.display.max_columns = 90                    
pd.options.display.max_rows = 100

## Function to format with date-time index

In [1]:
def date_index(df): 
    # df.dropna(inplace=True)
    df['ds'] = df['date']
    df['date'] = pd.to_datetime(df['date'])
    df = df.set_index('date')
    print(df.head(3))
    return(df)

## EIA DATA: Fuel Price History 

In [2]:
# EIA KEY - SET UP YOUR KEY TO USE EIA AIP # 
try:
    KEY = os.environ['EIAAPI']
except KeyError:
    sys.exit('keys not found')

#### Function for EIA requests takes KEY and CATEGORY

In [5]:
# function for EIA data requests
def eia_req(KEY, CATEGORY):
    url = 'https://api.eia.gov/series/?api_key=' + KEY + '&series_id=' + CATEGORY
    
    # REQUEST 
    req = requests.get(url)
    print ('Request Code:' + str(req.status_code))

    # getting data 
    data = pd.DataFrame(req.json()['series'][0]['data'])

    return (data)

#### Fuel prices by month

In [6]:
CATEGORY = 'PET.EMM_EPM0_PTE_SCA_DPG.M'

# call function for pull 
fuel_m = eia_req(KEY, CATEGORY)
fuel_m.columns = ['date', 'fuel_m']
fuel_m.head()

Request Code:200


Unnamed: 0,date,fuel_m
0,202206,6.294
1,202205,5.871
2,202204,5.692
3,202203,5.655
4,202202,4.66


In [7]:
# add new columns with split data and reformatted date
fuel_m['year'] = fuel_m['date'].str[:4]
fuel_m['month'] = fuel_m['date'].str[-2:]
fuel_m['day'] = '01'    # using day 01 for all monthly data
fuel_m['date'] = fuel_m['year'] + '-' + fuel_m['month'] + '-01' 

fuel_m = date_index(fuel_m)
fuel_m.sort_index(inplace=True)

fuel_m_out = fuel_m[['ds','fuel_m']]
fuel_m_out.tail()

            fuel_m  year month day          ds
date                                          
2022-06-01   6.294  2022    06  01  2022-06-01
2022-05-01   5.871  2022    05  01  2022-05-01
2022-04-01   5.692  2022    04  01  2022-04-01


Unnamed: 0_level_0,ds,fuel_m
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2022-02-01,2022-02-01,4.66
2022-03-01,2022-03-01,5.655
2022-04-01,2022-04-01,5.692
2022-05-01,2022-05-01,5.871
2022-06-01,2022-06-01,6.294


In [8]:
# print to file
fuel_m_out.to_csv('../data/processed/fuel_m.csv', index = False)

#### Fuel Prices by Week

In [13]:
CATEGORY = 'PET.EMM_EPM0_PTE_SCA_DPG.W'

# call function for pull 
fuel_w = eia_req(KEY, CATEGORY)
fuel_w.columns = ['date', 'fuel_w']
fuel_w.head(12)

Request Code:200


Unnamed: 0,date,fuel_w
0,20220718,5.794
1,20220711,5.994
2,20220704,6.138
3,20220627,6.23
4,20220620,6.307
5,20220613,6.364
6,20220606,6.276
7,20220530,6.087
8,20220523,5.998
9,20220516,5.891


In [14]:
#add new cols from old date colum
fuel_w['year'] = fuel_w['date'].str[:4]

fuel_w['month'] = fuel_w['date'].str[4:6]
fuel_w['day'] = fuel_w['date'].str[-2:]
fuel_w['date'] = fuel_w['date'].str[:4] + '-' + fuel_w['month'] + '-' + fuel_w['day']

fuel_w = date_index(fuel_w)
fuel_w.sort_index(inplace=True)

fuel_w_out = fuel_w[['ds', 'fuel_w']]
fuel_w_out.head()

            fuel_w  year month day          ds
date                                          
2022-07-18   5.794  2022    07  18  2022-07-18
2022-07-11   5.994  2022    07  11  2022-07-11
2022-07-04   6.138  2022    07  04  2022-07-04


Unnamed: 0_level_0,ds,fuel_w
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-05-22,2000-05-22,1.679
2000-05-29,2000-05-29,1.673
2000-06-05,2000-06-05,1.661
2000-06-12,2000-06-12,1.662
2000-06-19,2000-06-19,1.664


In [15]:
# print to file
fuel_w_out.to_csv('../data/processed/fuel_w.csv', index = False)

## BART Data

#### Weekly Ridership by Month `1997 - 2018`

In [17]:
url = 'https://data.bart.gov/api/3/action/datastore_search?resource_id=6e653520-58cf-45c5-b40c-d37c8957ec77&'
#url = 'https://data.bart.gov/api/3/action/datastore_search?resource_id=6e653520-58cf-45c5-b40c-d37c8957ec77'

bart_req = requests.get(url)
bart_req.status_code

200

In [18]:
# getting data 
bart_data = pd.DataFrame(bart_req.json()['result']['records'])
bart_data.tail()

Unnamed: 0,_id,FiscalYear,FiscalMonth,RIDERSHIP WEEKAVG,RIDERSHIP GOAL
95,96,2004,12,308792.5455,332874.1668
96,97,2005,1,308189.9524,308517.7023
97,98,2005,2,304724.3182,313073.2802
98,99,2005,3,323235.5714,318924.7632
99,100,2005,4,315753.45,319959.4083


In [19]:
path = '../data/raw/bart/'
file = 'customer-ridership.csv'

filename = path + file
bart_date = pd.read_csv(filename)

bart_date.tail()

Unnamed: 0,FiscalYear,FiscalMonth,RIDERSHIP WEEKAVG,RIDERSHIP GOAL
254,2018,3,422201.0,437728.0
255,2018,4,426492.0,439970.0
256,2018,5,423264.0,430308.0
257,2018,6,391219.0,405048.0
258,2018,7,395222.0,423540.0


In [20]:
bart_data.drop(columns = ['RIDERSHIP GOAL'], inplace = True)

new_col = {
    'RIDERSHIP WEEKAVG' : 'ridership',
    'FiscalMonth':'month',
    'FiscalYear':'year', 
}

bart_data.rename(columns = new_col, inplace = True)
bart = bart_data
bart.head()

Unnamed: 0,_id,year,month,ridership
0,1,1997,1,251524.0
1,2,1997,2,256261.0
2,3,1997,3,263602.0
3,4,1997,4,264442.0
4,5,1997,5,265244.0


In [21]:
# add new cols from old date colum
bart['day'] = '01'
# bart['month'] = bart['month'].apply(lambda x: '0' + str(x) if x < 10 else x )
bart['date'] = bart['year'].astype(str) + '-' + bart['month'].astype(str) + '-01'

bart['ridership'] = 4*bart['ridership'].astype(int) # ridershiop is weekly, assume 4-week months

bart = date_index(bart)
bart_out = bart[['ds', 'ridership']]

bart_out.tail()

            _id  year  month  ridership day         ds
date                                                  
1997-01-01    1  1997      1    1006096  01  1997-1-01
1997-02-01    2  1997      2    1025044  01  1997-2-01
1997-03-01    3  1997      3    1054408  01  1997-3-01


Unnamed: 0_level_0,ds,ridership
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2004-12-01,2004-12-01,1235168
2005-01-01,2005-1-01,1232756
2005-02-01,2005-2-01,1218896
2005-03-01,2005-3-01,1292940
2005-04-01,2005-4-01,1263012


In [22]:
bart_out.tail()

Unnamed: 0_level_0,ds,ridership
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2004-12-01,2004-12-01,1235168
2005-01-01,2005-1-01,1232756
2005-02-01,2005-2-01,1218896
2005-03-01,2005-3-01,1292940
2005-04-01,2005-4-01,1263012


In [23]:
bart_out.to_csv('../data/processed/bart_2005.csv', index = False)

## Vehicle Registration Counts

In [6]:
url = 'https://data.ca.gov/api/3/action/datastore_search?resource_id=888bbb6c-09b4-469c-82e6-1b2a47439736' 

veh_req = requests.get(url)
veh_req.status_code

200

In [7]:
# getting data 
veh = pd.DataFrame((veh_req.json())['result']['records'])
veh.head()

Unnamed: 0,Duty,Make,Vehicles,Zip Code,Fuel,Date,_id,Model Year
0,Light,ACURA,12,90001,Gasoline,1/1/2021,1,2008
1,Light,ACURA,24,90003,Gasoline,1/1/2021,2,2008
2,Light,ACURA,20,90004,Gasoline,1/1/2021,3,2008
3,Light,ACURA,12,90005,Gasoline,1/1/2021,4,2008
4,Light,ACURA,15,90006,Gasoline,1/1/2021,5,2008


In [8]:
veh.drop(columns = ['_id', 'Duty', 'Make', 'Fuel', 'Model Year'], inplace = True)

new_col = {
    'Zip Code':'zip',
    'Date':'ds', 
    'Vehicles' : 'vehs',
}

veh.rename(columns = new_col, inplace=True)

In [9]:
# add new cols from old date colum
veh_date = veh['ds'].str.rpartition('/')

veh['day'] = '01'
veh['month'] = '01'
veh['year'] = veh_date[2]
veh['date'] = veh['year'].astype(str) + '-01-01'

veh = date_index(veh)

veh_out = veh[['ds','zip', 'vehs']]
veh_out.tail()

           vehs    zip          ds day month  year
date                                              
2021-01-01   12  90001  2021-01-01  01    01  2021
2021-01-01   24  90003  2021-01-01  01    01  2021
2021-01-01   20  90004  2021-01-01  01    01  2021


Unnamed: 0_level_0,ds,zip,vehs
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2021-01-01,2021-01-01,90806,17
2021-01-01,2021-01-01,90808,17
2021-01-01,2021-01-01,90810,13
2021-01-01,2021-01-01,90813,22
2021-01-01,2021-01-01,90815,19


In [10]:
veh_out.to_csv('../data/processed/vehs_zip.csv', index = False)

## CONSUMER DEBT

In [12]:
path = '../data/raw/debt/'

filename = path + 'consumer_debt.csv'
debt_df = pd.read_csv(filename)

new_col = {
    'DATE':'date',
    'TOTALSL' : 'debt',
}

debt_df.rename(columns = new_col, inplace=True)
debt_df.head()

Unnamed: 0,date,debt
0,1943-01-01,6.57783
1,1943-02-01,6.46304
2,1943-03-01,6.23421
3,1943-04-01,6.12575
4,1943-05-01,5.93626


In [13]:
debt_df.to_csv('../data/processed/debt.csv', index = False)