#  1 |  Data Acquisition: API Requests
---

* _[01 API Requests](01_API_pulls.ipynb)_
* [02 Initial EDA](02_EDA.ipynb)
* [03 First Model: PROPHET](03_prophet.ipynb)
---


### Requests for `4` datasets via respective APIs. 

* `PROPHET` requires `df` with `2` series, columns `ds` and `y`. 
 In this notebook, I document API pulls and data manipulation to create new `csv` files in an appropriate format to slice as needed in modeling.

#### <b>Data Sources</b>

* [BART](https://data.bart.gov/dataset/customer-ridership/resource/6e653520-58cf-45c5-b40c-d37c8957ec77) publishes monthly ridership reports, using faregate information [monthly totals](bart.csv)- [source](https://data.bart.gov/group/ridership)

* <b>Gas Prices</b> [Fuel data from US Energy Information Admininstration](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M)
    * weekly: [fuel_m.csv](../data/processed/fuel_w.csv) - [source](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.W)
    * monthly data saved: [fuel_w.csv](../processed/fuel_m.csv) - [source](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M)    
<br>

* <b>Car registrations </b> [total for state](cars.csv) - ([CA DMV Vehicle Annual Count](https://www.energy.ca.gov/data-reports/energy-almanac/zero-emission-vehicle-and-infrastructure-statistics/vehicle-population)) ([API for counts by zip code](https://data.ca.gov/api/3/action/datastore_search?resource_id=888bbb6c-09b4-469c-82e6-1b2a47439736' ))

* <b>Consumer Debt</b> [annual](debt.csv) ([Annual Consumer Debt](https://www.federalreserve.gov/datadownload/Choose.aspx?rel=G19))

> Will not be using: ~~[caltrain](caltrain.com)  Publishes yearly ridership counts, based on model validated with a yearly county daily in January.~~

In [12]:
##### BASIC IMPORTS 
import numpy as np
import pandas as pd
import requests

In [13]:
# CUSTOM IMPORTS AND SETTINGS 
pd.options.display.max_columns = 90                    
pd.options.display.max_rows = 100

## Function to format with date-time index

In [14]:
def date_index(df): 
    # df.dropna(inplace=True)
    df['d'] = df['date']
    df['date'] = pd.to_datetime(df['date'])
    df = df.set_index('date')
    print(df.head(3))
    return(df)

## EIA DATA: Fuel Price History 

In [15]:
# EIA KEY - SET UP YOUR KEY TO USE EIA AIP # 
try:
    KEY = os.environ['EIAAPI']
except KeyError:
    sys.exit('keys not found')

#### Function for EIA requests takes KEY and CATEGORY

In [16]:
# function for EIA data requests
def eia_req(KEY, CATEGORY):
    url = 'https://api.eia.gov/series/?api_key=' + KEY + '&series_id=' + CATEGORY
    
    # REQUEST 
    req = requests.get(url)
    print ('Request Code:' + str(req.status_code))

    # getting data 
    data = pd.DataFrame(req.json()['series'][0]['data'])

    return (data)

#### Fuel prices by month

In [17]:
CATEGORY = 'PET.EMM_EPM0_PTE_SCA_DPG.M'

# call function for pull 
fuel_m = eia_req(KEY, CATEGORY)
fuel_m.columns = ['date', 'fuel_m']
fuel_m.head()

Request Code:200


JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [None]:
# add new columns with split data and reformatted date
fuel_m['year'] = fuel_m['date'].str[:4]
fuel_m['month'] = fuel_m['date'].str[-2:]
fuel_m['day'] = '01'    # using day 01 for all monthly data
fuel_m['date'] = fuel_m['year'] + '-' + fuel_m['month'] + '-01' 

fuel_m = date_index(fuel_m)
fuel_m.sort_index(inplace=True)

fuel_m_out = fuel_m[['d','fuel_m']]
fuel_m_out.tail()

In [None]:
# print to file
fuel_m_out.to_csv('../data/processed/fuel_m.csv', index = False)

#### Fuel Prices by Week

In [None]:
CATEGORY = 'PET.EMM_EPM0_PTE_SCA_DPG.W'

# call function for pull 
fuel_w = eia_req(KEY, CATEGORY)
fuel_w.columns = ['date', 'fuel_w']
fuel_w.head(12)

In [None]:
#add new cols from old date colum
fuel_w['year'] = fuel_w['date'].str[:4]

fuel_w['month'] = fuel_w['date'].str[4:6]
fuel_w['day'] = fuel_w['date'].str[-2:]
fuel_w['date'] = fuel_w['date'].str[:4] + '-' + fuel_w['month'] + '-' + fuel_w['day']

fuel_w = date_index(fuel_w)
fuel_w.sort_index(inplace=True)

fuel_w_out = fuel_w[['d', 'fuel_w']]
fuel_w_out.head()

In [None]:
# print to file
fuel_w_out.to_csv('../data/processed/fuel_w.csv', index = False)

## BART Data

#### Weekly Ridership by Month `1997 - 2018`

In [None]:
url = 'https://data.bart.gov/api/3/action/datastore_search?resource_id=6e653520-58cf-45c5-b40c-d37c8957ec77&'
#url = 'https://data.bart.gov/api/3/action/datastore_search?resource_id=6e653520-58cf-45c5-b40c-d37c8957ec77'

bart_req = requests.get(url)
bart_req.status_code

In [None]:
# getting data 
bart_data = pd.DataFrame(bart_req.json()['result']['records'])
bart_data.tail()

In [None]:
path = '../data/raw/bart/'
file = 'customer-ridership.csv'

filename = path + file
bart_date = pd.read_csv(filename)

bart_date.tail()

In [None]:
bart_data.drop(columns = ['RIDERSHIP GOAL'], inplace = True)

new_col = {
    'RIDERSHIP WEEKAVG' : 'ridership',
    'FiscalMonth':'month',
    'FiscalYear':'year', 
}

bart_data.rename(columns = new_col, inplace = True)
bart = bart_data
bart.head()

In [None]:
# add new cols from old date colum
bart['day'] = '01'
# bart['month'] = bart['month'].apply(lambda x: '0' + str(x) if x < 10 else x )
bart['date'] = bart['year'].astype(str) + '-' + bart['month'].astype(str) + '-01'

bart['ridership'] = 4*bart['ridership'].astype(int) # ridershiop is weekly, assume 4-week months

bart = date_index(bart)
bart_out = bart[['ds', 'ridership']]

bart_out.tail()

In [None]:
bart_out.tail()

In [None]:
bart_out.to_csv('../data/processed/bart_2005.csv', index = False)

## Vehicle Registration Counts

In [None]:
url = 'https://data.ca.gov/api/3/action/datastore_search?resource_id=888bbb6c-09b4-469c-82e6-1b2a47439736' 

veh_req = requests.get(url)
veh_req.status_code

In [None]:
# getting data 
veh = pd.DataFrame((veh_req.json())['result']['records'])
veh.head()

In [None]:
veh.drop(columns = ['_id', 'Duty', 'Make', 'Fuel', 'Model Year'], inplace = True)

new_col = {
    'Zip Code':'zip',
    'Date':'ds', 
    'Vehicles' : 'vehs',
}

veh.rename(columns = new_col, inplace=True)

In [None]:
# add new cols from old date colum
veh_date = veh['ds'].str.rpartition('/')

veh['day'] = '01'
veh['month'] = '01'
veh['year'] = veh_date[2]
veh['date'] = veh['year'].astype(str) + '-01-01'

veh = date_index(veh)

veh_out = veh[['d','zip', 'vehs']]
veh_out.tail()

In [None]:
veh_out.to_csv('../data/processed/vehs_zip.csv', index = False)

## CONSUMER DEBT

In [None]:
path = '../data/raw/debt/'

filename = path + 'consumer_debt.csv'
debt_df = pd.read_csv(filename)

new_col = {
    'DATE':'date',
    'TOTALSL' : 'debt',
}

debt_df.rename(columns = new_col, inplace=True)
debt_df.head()

In [None]:
debt_df.to_csv('../data/processed/debt.csv', index = False)