#  1 | _API_ Requests: Data Acquisition
<!-- ---

* [02 the next notebook](02.ipynb)
* [03 the one after](03.ipynb) -->

---

### Data Discussion

* Requests for `4` datasets via respective APIs. 

`PROPHET` requires `df` with `2` series, columns `ds` and `y`. In this notebook, I document API pulls and data manipulation to create new `csv` files in the appropriate format. 

* [Fuel data from US Energy Information Admininstration](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M)
    * weekly data saved:[fuel_m.csv](../data/processed/fuel_w.csv) - [source](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.W)
    * monthly data saved: [fuel_w.csv](../processed/fuel_m.csv) - [source](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M)    
<br>

* [BART](https://data.bart.gov/dataset/customer-ridership/resource/6e653520-58cf-45c5-b40c-d37c8957ec77) Publishes monthly ridership reports, using faregate information
    * [montly totals](https://data.bart.gov/group/ridership)
    * monthly by entrance and exit, saved 
<br>

* Car registration data from 
    * by county: [CA DMV Vehicle Annual Count](https://www.energy.ca.gov/data-reports/energy-almanac/zero-emission-vehicle-and-infrastructure-statistics/vehicle-population)

<br>

* will not be using: ~~[caltrain](caltrain.com)  Publishes yearly ridership counts, based on model validated with a yearly county daily in January.~~

In [95]:
##### BASIC IMPORTS 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os 
import sys

import requests

In [96]:
# CUSTOM IMPORTS AND SETTINGS 
plt.style.use('gstyle.mplstyle')

pd.options.display.max_columns = 90                    
pd.options.display.max_rows = 100

## EIA DATA: Fuel Price History 

In [97]:
# EIA KEY - SET UP YOUR KEY TO USE EIA AIP # 
try:
    KEY = os.environ['EIAAPI']
except KeyError:
    sys.exit('keys not found')

#### Function for EIA requests takes KEY and CATEGORY

In [98]:
# function for EIA data requests
def eia_req(KEY, CATEGORY):
    url = 'https://api.eia.gov/series/?api_key=' + KEY + '&series_id=' + CATEGORY
    
    # REQUEST 
    req = requests.get(url)
    print ('Request Code:' + str(req.status_code))

    # getting data 
    data = pd.DataFrame(req.json()['series'][0]['data'])

    return (data)

#### Fuel prices by month

In [99]:
CATEGORY = 'PET.EMM_EPM0_PTE_SCA_DPG.M'

# call function for pull 
fuel_m = eia_req(KEY, CATEGORY)
fuel_m.columns = ['ds', 'y']
fuel_m.head()

Request Code:200


Unnamed: 0,ds,y
0,202204,5.692
1,202203,5.655
2,202202,4.66
3,202201,4.584
4,202112,4.597


In [100]:
# add new columns with split data and reformatted date
fuel_m['year'] = fuel_m['ds'].str[:4]
fuel_m['month'] = fuel_m['ds'].str[-2:]
fuel_m['day'] = '01'    # using day 01 for all monthly data
fuel_m['ds'] = fuel_m['year'] + '-' + fuel_m['month'] + '-01' 

#drop old date info 
fuel_m.head()

Unnamed: 0,ds,y,year,month,day
0,2022-04-01,5.692,2022,4,1
1,2022-03-01,5.655,2022,3,1
2,2022-02-01,4.66,2022,2,1
3,2022-01-01,4.584,2022,1,1
4,2021-12-01,4.597,2021,12,1


In [101]:
# print to file
fuel_m.to_csv('../data/processed/fuel_m.csv', index = False)

#### Fuel Prices by Week

In [102]:
CATEGORY = 'PET.EMM_EPM0_PTE_SCA_DPG.W'

# call function for pull 
fuel_w = eia_req(KEY, CATEGORY)
fuel_w.columns = ['ds', 'y']
fuel_w.head()

Request Code:200


Unnamed: 0,ds,y
0,20220509,5.748
1,20220502,5.629
2,20220425,5.609
3,20220418,5.641
4,20220411,5.715


In [103]:
# add new cols from old date colum
fuel_w['year'] = fuel_w['ds'].str[:4]
fuel_w['month'] = fuel_w['ds'].str[4:6]
fuel_w['day'] = fuel_w['ds'].str[-2:]
fuel_w['ds'] = fuel_w['ds'].str[:4] + '-' + fuel_w['month'] + '-' + fuel_w['day']

# drop old date info 
fuel_w.head()

Unnamed: 0,ds,y,year,month,day
0,2022-05-09,5.748,2022,5,9
1,2022-05-02,5.629,2022,5,2
2,2022-04-25,5.609,2022,4,25
3,2022-04-18,5.641,2022,4,18
4,2022-04-11,5.715,2022,4,11


In [104]:
# print to file
fuel_w.to_csv('../data/processed/fuel_w.csv', index = False)

## BART Data

#### Weekly Ridership by Month `1997 - 2018`

In [105]:
url = 'https://data.bart.gov/api/3/action/datastore_search?resource_id=6e653520-58cf-45c5-b40c-d37c8957ec77&'
#url = 'https://data.bart.gov/api/3/action/datastore_search?resource_id=6e653520-58cf-45c5-b40c-d37c8957ec77'

bart_req = requests.get(url)
bart_req.status_code

200

In [106]:
# getting data 
bart_data = pd.DataFrame(bart_req.json()['result']['records'])
bart_data.tail()

Unnamed: 0,_id,FiscalYear,FiscalMonth,RIDERSHIP WEEKAVG,RIDERSHIP GOAL
95,96,2004,12,308792.5455,332874.1668
96,97,2005,1,308189.9524,308517.7023
97,98,2005,2,304724.3182,313073.2802
98,99,2005,3,323235.5714,318924.7632
99,100,2005,4,315753.45,319959.4083


In [107]:
bart_data.shape

(100, 5)

In [108]:
bart_data.drop(columns = ['_id', 'RIDERSHIP GOAL'], inplace = True)

new_col = {
    'FiscalMonth':'month',
    'FiscalYear':'year', 
    'RIDERSHIP WEEKAVG' : 'y',
}

bart_data.rename(columns = new_col, inplace=True)
bart = bart_data
bart.head()

Unnamed: 0,year,month,y
0,1997,1,251524.0
1,1997,2,256261.0
2,1997,3,263602.0
3,1997,4,264442.0
4,1997,5,265244.0


In [109]:
# add new cols from old date colum
bart['day'] = '01'
bart['month'] = bart['month'].apply(lambda x: '0' + str(x) if x < 10 else x )

bart['ds'] = bart['year'].astype(str) + '-' + bart['month'].astype(str) + '-01'

In [110]:
bart.head()

Unnamed: 0,year,month,y,day,ds
0,1997,1,251524.0,1,1997-01-01
1,1997,2,256261.0,1,1997-02-01
2,1997,3,263602.0,1,1997-03-01
3,1997,4,264442.0,1,1997-04-01
4,1997,5,265244.0,1,1997-05-01


In [111]:
bart.to_csv('../data/processed/bart_2018.csv', index = False)

## Vehicle Registration Counts

In [112]:
url = 'https://data.ca.gov/api/3/action/datastore_search?resource_id=888bbb6c-09b4-469c-82e6-1b2a47439736' 

veh_req = requests.get(url)
veh_req.status_code

200

In [113]:
# getting data 
veh = pd.DataFrame((veh_req.json())['result']['records'])
veh.head()

Unnamed: 0,Duty,Make,Vehicles,Zip Code,Fuel,Date,_id,Model Year
0,Light,ACURA,12,90001,Gasoline,1/1/2021,1,2008
1,Light,ACURA,24,90003,Gasoline,1/1/2021,2,2008
2,Light,ACURA,20,90004,Gasoline,1/1/2021,3,2008
3,Light,ACURA,12,90005,Gasoline,1/1/2021,4,2008
4,Light,ACURA,15,90006,Gasoline,1/1/2021,5,2008


In [114]:
veh.drop(columns = ['_id', 'Duty', 'Make', 'Fuel', 'Model Year'], inplace = True)

new_col = {
    'Zip Code':'zip',
    'Date':'ds', 
    'Vehicles' : 'y',
}

veh.rename(columns = new_col, inplace=True)

In [115]:
# add new cols from old date colum
veh_date = veh['ds'].str.rpartition('/')

veh['day'] = '01'
veh['month'] = '01'
veh['year'] = veh_date[2]
#bart['month'] = bart['month'].apply(lambda x: '0' + str(x) if x < 10 else x )

veh['ds'] = veh['year'].astype(str) + '-01-01'
veh.head()

Unnamed: 0,y,zip,ds,day,month,year
0,12,90001,2021-01-01,1,1,2021
1,24,90003,2021-01-01,1,1,2021
2,20,90004,2021-01-01,1,1,2021
3,12,90005,2021-01-01,1,1,2021
4,15,90006,2021-01-01,1,1,2021


In [116]:
veh.to_csv('../data/processed/vehs_2018.csv')

## CONSUMER DEBT

In [117]:
path = '../data/raw/debt/'

filename = path + 'consumer_debt.csv'
debt_df = pd.read_csv(filename)

new_col = {
    'DATE':'ds',
    'TOTALSL' : 'y',
}

debt_df.rename(columns = new_col, inplace=True)
debt_df.head()

Unnamed: 0,ds,y
0,1943-01-01,6.57783
1,1943-02-01,6.46304
2,1943-03-01,6.23421
3,1943-04-01,6.12575
4,1943-05-01,5.93626


In [118]:
debt_df.to_csv('../data/processed/debt.csv')