# 2 | Importing Data for Initial EDA, Visualizations
---
* [01 API Data Requests](01_API_pulls.ipynb)
* _[02 Initial EDA](02_EDA.ipynb)_
* [03 First Model: PROPHET](02.ipynb)
---

### Data Discussion

* For `Prophet` and `greykite`, all data must be in format: ` date series  |  y `

`Prophet` requires date column lable `ds`, whereas greykite requires `ts`; this will be edited in preprocessing. This notebook will append all `y` values to the date index `ds` in a `.csv ` which then can be sliced as needed in modelling. 

* [BART](bart.gov) Publishes monthly rerpots, with daily ridership that month, using faregate counts for on and off boarding. - use `weekly*4` 2010 - 2018 
* [EIA](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M) publishes monthly and weekly fuel rates - will turn to monthly 1943 - 2021
* [CA Energy](https://www.energy.ca.gov/data-reports/energy-almanac/zero-emission-vehicle-and-infrastructure-statistics/vehicle-population) publishes vehicle counts annualy. DMV and CA Data only provide annual counts. - will turn to monthly. 2010 - 2021
* [Fed Reserve](federalreserve.gov) publishes yearly consumer debt - will turn to monthly 1942 - 2021

In [1]:
##### BASIC IMPORTS 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [14]:
# CUSTOM IMPORTS AND SETTINGS 
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

plt.style.use('gstyle.mplstyle')                        # stylesheet for sns

pd.options.display.max_columns = 90                     # view settings
pd.options.display.max_rows = 100

path = '../data/processed/'

In [59]:
# FUNCTION RETURNS PLOTLY TRACES
# TAKES 3 ARGUMENTS: (dataframe, y, and title for plot)
def plot_traces(df, y, title):
    y_trace = go.Scatter(
                    x = df.ds,
                    y = df[y], 
                    name = y + 'trace',
                    line = dict(color = 'blue'),
                    opacity = 0.4)

    layout = dict(title = title)

    fig = dict(data=[y_trace], layout=layout)
    iplot(fig)
    return (print ('done') )

> <br>
>
> 1. Manipulating BART file: 
> 
> <br>

In [6]:
filename = path + 'bart_2018.csv'
bart = pd.read_csv(filename)
bart.head()

Unnamed: 0,year,month,ridership,day,ds
0,1997,1,1006096.0,1,1997-01-01
1,1997,2,1025044.0,1,1997-02-01
2,1997,3,1054408.0,1,1997-03-01
3,1997,4,1057768.0,1,1997-04-01
4,1997,5,1060976.0,1,1997-05-01


In [60]:
bart_plot = plot_traces(bart, 'ridership', 'BART Monthly Ridership, 2010 - 2005 ')

done


> <br>
>
> 2. Manipulating 'GAS' prices file: 
> 
> <br>

In [38]:
filename = path + 'fuel_m.csv'
fuel = pd.read_csv(filename)

# fuel = fuel_df.drop(columns = ['year', 'month', 'day'])
fuel.head()

Unnamed: 0,ds,fuel_m,year,month,day
0,2022-04-01,5.692,2022,4,1
1,2022-03-01,5.655,2022,3,1
2,2022-02-01,4.66,2022,2,1
3,2022-01-01,4.584,2022,1,1
4,2021-12-01,4.597,2021,12,1


In [61]:
fuel_plot = plot_traces(fuel, 'fuel_m', 'Monthly Average Gas Price ($), California: 2000 - 2022')

done


> <br>
>
> 3. Manipulating 'REGISTERED VEHICLES' file: 
> 
> <br>

In [48]:
filename = path + 'vehs.csv'
vehs = pd.read_csv(filename)

vehs.head()

Unnamed: 0,ds,cars
0,2010-01-01,22286130
1,2011-01-01,22288061
2,2012-01-01,22502680
3,2013-01-01,23270577
4,2014-01-01,23899504


In [62]:
cars_plot = plot_traces(vehs, 'cars', 'Estimated Count of Registered Cars CA: 2010 - 2022')

done


> <br>
>
> 4. Manipulating 'CONSUMER DEBT' file: 
> 
> <br>

In [53]:
filename = path + 'debt.csv'
debt = pd.read_csv(filename)

debt.head()

Unnamed: 0,ds,debt
0,1943-01-01,6.57783
1,1943-02-01,6.46304
2,1943-03-01,6.23421
3,1943-04-01,6.12575
4,1943-05-01,5.93626


In [63]:
debt_plot = plot_traces(debt, 'debt', 'Consumer Debt ($), 1943 - 2022 (not adjusted for inflation, via Fed Reserve)')

done


> <br>
>
> 5. Manipulating 'POPULATION' file: 
> 
> <br>