# Data exploration in advance of model selection

Thanks to neptune.ai for their great tutorials in visualization; they were quite helpful here. Much of the visualization in this section was [described helpfully here](https://neptune.ai/blog/select-model-for-time-series-prediction-task).

### Import Libraries

In [29]:
import pandas as pd
# import matplotlib.pyplot as plt
import plotly.express as px
from statsmodels.tsa.seasonal import seasonal_decompose


### Import data

The CSV here is an extract of the most recent Google BigQuery run of 'weekly_variables_flattened.sql'. My intent was to do this analysis in Google Colab, but as it turns out, activating that feature in my personal cloud instance would cost me about $400 on the low end, so instead I'll be doing it here.

In [32]:
# import data extracted from last GCP run of 'weekly_variables_flattened'
df = pd.read_csv("weekly_variables_flattened.csv")
df['week_start'] = pd.to_datetime(df['week_start'])
df = df.sort_values(by='week_start', ascending=True)

In [33]:

print(df.head())
print(len(df))
df = df.dropna()
print(len(df))

    week_start  approving  disapproving  unsure_or_no_data  \
93  2004-07-04       48.0     48.333333           3.666667   
95  2004-07-11       48.0     48.333333           3.666667   
92  2004-07-18       48.0     48.333333           3.666667   
94  2004-07-25       48.0     48.333333           3.666667   
827 2004-08-01       50.0     46.500000           3.500000   

     BusinessApplications  ConstructionSpending  DurableGoodsNewOrders  \
93                 159034             1006184.0               186835.0   
95                 159034             1006184.0               186835.0   
92                 159034             1006184.0               186835.0   
94                 159034             1006184.0               186835.0   
827                191673             1013616.0               183728.0   

     InternationalTrade_Exports  ManuInventories  ManuNewOrders  \
93                      96907.0         426260.0       356710.0   
95                      96907.0         426260.0

### Data visualizations

It is unfortunate that we do not have sufficient data at the current time to build a model through the current administration; at the time of this writing, the full tenure of the Biden presidency is not reflected in this data.

That said, in addition to the quality checks done in previous instances, this basic visualization can be [compared to prior work](https://ballotpedia.org/Joe_Biden%27s_executive_orders_and_actions) to see that the structure of the data is as would be expected.

In [35]:
# df[['week_start', 'orders_outcome_var']].plot()
fig1 = px.line(df, x='week_start', y='orders_outcome_var')
fig1.update_layout(
    xaxis_title = 'Date',
    yaxis_title = 'No. Exec. Orders',
    title = 'Executive Orders by Date'
)
fig1.show()

In [36]:
fig2 = seasonal_decompose(df[['week_start', 'orders_outcome_var']])
fig2.plot()

TypeError: float() argument must be a string or a real number, not 'Timestamp'