# 2 | Importing Data for Initial EDA, Visualizations
---
* [01 API Data Requests](01_API_pulls.ipynb)
* _[02 Initial EDA](02_EDA.ipynb)_
* [03 First Model: PROPHET](02.ipynb)
---

### Data Discussion

* [BART](bart.gov) Publishes monthly rerpots, with daily ridership that month, using faregate counts for on and off boarding. - use `weekly*4` 2010 - 2018 
* [EIA](https://www.eia.gov/opendata/qb.php?category=240839&sdid=PET.EMM_EPM0_PTE_SCA_DPG.M) publishes monthly and weekly fuel rates - will turn to monthly 1943 - 2021
* [CA Energy](https://www.energy.ca.gov/data-reports/energy-almanac/zero-emission-vehicle-and-infrastructure-statistics/vehicle-population) publishes vehicle counts annualy. DMV and CA Data only provide annual counts. - will turn to monthly. 2010 - 2021
* [Fed Reserve](federalreserve.gov) publishes yearly consumer debt - will turn to monthly 1942 - 2021

In [1]:
##### BASIC IMPORTS 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# CUSTOM IMPORTS AND SETTINGS 

import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

pd.options.display.max_columns = 90                     # view settings
pd.options.display.max_rows = 100

path = '../data/processed/'

In [3]:
def date_index(df): 
    df['ds'] = pd.to_datetime(df['ds'])
    df = df.set_index('ds')
    print(df.head(3))
    return(df)

In [4]:
# FUNCTION RETURNS PLOTLY TRACES
# TAKES 3 ARGUMENTS: (dataframe, y, and title for plot)
def plot_traces(df, y, title):
    y_trace = go.Scatter(
                    x = df.index,
                    y = df[y], 
                    name = y + 'trace',
                    line = dict(color = 'blue'),
                    opacity = 0.4)

    layout = dict(title = title)

    fig = dict(data=[y_trace], layout=layout)
    iplot(fig)
    return (print ('done') )

> <br>
>
> 1. BART Ridership
> 
> <br>

In [5]:
filename = path + 'bart.csv'
bart = pd.read_csv(filename)
bart = date_index(bart)

            ridership
ds                   
1997-01-01    1006096
1997-02-01    1025044
1997-03-01    1054408


In [6]:
bart2 = bart.loc['2010-01-01':]
bart_plot = plot_traces(bart2, 'ridership', 'BART Monthly Ridership, 2010 - 2022')

done


> <br>
>
> 2. Fuel Prices
> 
> <br>

In [7]:
filename = path + 'fuel_w.csv'
fuel_w = pd.read_csv(filename)

fuel_w = date_index(fuel_w)
fuel_w.tail()

            fuel_w
ds                
2000-05-22   1.679
2000-05-29   1.673
2000-06-05   1.661


Unnamed: 0_level_0,fuel_w
ds,Unnamed: 1_level_1
2022-04-11,5.715
2022-04-18,5.641
2022-04-25,5.609
2022-05-02,5.629
2022-05-09,5.748


In [8]:
fuel2 = fuel_w.loc['2010-01-01':]
fuel_plot2 = plot_traces(fuel2, 'fuel_w', 'Weekly Average Gas Price ($), California: 2010 - 2022')

done


In [9]:
filename = path + 'fuel_m.csv'
fuel_m = pd.read_csv(filename)

fuel_m = date_index(fuel_m)
fuel_m.tail()

            fuel_m
ds                
2000-05-01     NaN
2000-06-01   1.669
2000-07-01   1.754


Unnamed: 0_level_0,fuel_m
ds,Unnamed: 1_level_1
2021-12-01,4.597
2022-01-01,4.584
2022-02-01,4.66
2022-03-01,5.655
2022-04-01,5.692


In [10]:
fuel3 = fuel_m.loc['2010-01-01':]
fuel_plot3 = plot_traces(fuel3, 'fuel_m', 'Monthly Average Gas Price ($), California: 2010 - 2022')

done


> <br>
>
> 3. Manipulating 'REGISTERED VEHICLES' file: 
> 
> <br>

In [11]:
filename = path + 'vehs.csv'
vehs = pd.read_csv(filename)

vehs = date_index(vehs)
vehs.tail()

                cars
ds                  
2010-01-01  22286130
2011-01-01  22288061
2012-01-01  22502680


Unnamed: 0_level_0,cars
ds,Unnamed: 1_level_1
2017-01-01,28418039
2018-01-01,28681493
2019-01-01,29029787
2020-01-01,28665934
2021-01-01,29942517


In [12]:
vehs2 = vehs.loc['2010-01-01':]
cars_plot = plot_traces(vehs, 'cars', 'Estimated Count of Registered Cars CA: 2010 - 2021')

done


> <br>
>
> 4. Manipulating 'CONSUMER DEBT' file: 
> 
> <br>

In [13]:
filename = path + 'debt.csv'
debt = pd.read_csv(filename)
debt['ds'] = debt['date']

debt = date_index(debt)
debt.tail()

                  date     debt
ds                             
1943-01-01  1943-01-01  6.57783
1943-02-01  1943-02-01  6.46304
1943-03-01  1943-03-01  6.23421


Unnamed: 0_level_0,date,debt
ds,Unnamed: 1_level_1,Unnamed: 2_level_1
2021-11-01,2021-11-01,4408.96983
2021-12-01,2021-12-01,4431.91715
2022-01-01,2022-01-01,4448.88285
2022-02-01,2022-02-01,4486.57969
2022-03-01,2022-03-01,4539.01445


In [17]:
debt2 = debt.loc['2010-01-01':]
debt_plot = plot_traces(debt, 'debt', 'Consumer Debt ($), 2010 - 2022 (not adjusted, Federal Reserve)')

done


> <br>
>
> 5. Manipulating 'POPULATION' file: 
> 
> <br>