# <img src="./resources/GA.png" width="25" height="25" />   <span style="color:Purple">Project 5 :  Food Insecurity Regression Study</span> 
---
## <span style="color:Green"> 06 - Univariate Time Series Modeling</span>      

#### Alec Edgecliffe-Johnson, Ryan McDonald, Andrew Roberts, Ira Seidman - General Assembly 



---

In this notebook we develop three univariate time series models using Arima, Auto Arima and Prophet to forecast predictions for food insecurity rates in a single state until 2026. We then develop a method to generate these predictions for each state in the country, save the predictions to a dataframe and then concatenate all dataframes together into a single dataframe. Ultimately, given the poor accuracy of the Arima and Auto-Arima models, we do not keep these models and instead use the predictions from the Prophet model as our forecasted Food Insecurity rates. We use this exported dataframe with our forecasts in Tableau, which is then hosted on our Streamlit app, the script for which can be found in the **'04_streamlit_code.py'** file in this repo.

### Notebook Contents:

- [Reading the Data](#intro)
- [Model Preprocessing](#pre)
- [Time Series Models](#model)
    - [ARIMA](#ARIMA)
    - [Auto-ARIMA](#auto)
    - [Prophet](#prophet)
    - [Aggregation 1 (ARIMA+Auto-ARIMA)](#agg1)
    - [Aggregation 2 (ARIMA+Auto-ARIMA+Prophet)](#agg2)
- [Merging Forecast w/ Original Data](#merge)

**Imports**

In [1]:
# Data manipulation imports
import pandas as pd
import numpy as np

# Plotting imports
import matplotlib.pyplot as plt
import seaborn as sns

# Modeling imports
from sklearn.metrics import r2_score, mean_squared_error
import datetime
import pmdarima as pm
from pmdarima.model_selection import train_test_split
from fbprophet import Prophet
from statsmodels.tsa.arima.model import ARIMA
from sklearn.linear_model import LinearRegression

ModuleNotFoundError: No module named 'pmdarima'

<a id='intro'></a>
## 1. Reading in the Datasets to clean

In [None]:
df = pd.read_csv('./data/time_series/df_ts_state_mean_years.csv')

In [None]:
#Dropping unnecessary columns, transposing and resetting index
df = df.drop(columns = 'fips', axis = 1)
df = df.rename(columns = {"state_name": ""})
df_t = df.T
df_t.reset_index()

In [None]:
# #Exporting to CSV
df_t.to_csv('./data/time_series/df_ts_rev.csv')

In [None]:
#Reimporting a renamed file
df_rev = pd.read_csv('./data/time_series/df_ts_rev_fin_yr10.csv')

In [None]:
df_rev.head()

In [None]:
#Final drop
df_rev = df_rev.drop(columns = 'Unnamed: 0', axis = 1)

In [None]:
df_rev['time'] = pd.to_datetime(df_rev['time'], format = "%Y/%m/%d")

<a id='pre'></a>
## 2. Modeling Preprocessing - Make Train and Validation DFs

In [None]:
## Credit to https://stackoverflow.com/questions/26921651/how-to-delete-the-last-row-of-data-of-a-pandas-dataframe
df_train = pd.DataFrame(df_rev.drop(df_rev.tail(2).index))

In [None]:
df_val = pd.DataFrame(df_rev.drop(df.head(7).index))

In [None]:
df_val.head()

<a id='model'></a>
## 3. Models
<a id='ARIMA'></a>
### Model 1: ARIMA

In [None]:
## Credit to https://www.youtube.com/watch?v=axjgEgBgIY0

In [None]:
au = df_train[['time', 'Georgia']]
au_v = df_val[['time', 'Georgia']]

au.set_index('time', inplace = True)
au_v.set_index('time', inplace = True)

index_5_years = pd.date_range(au.index[-1], freq = 'AS-JAN', periods = 5, tz = None)

In [None]:
mod_arima = ARIMA(au, order = (1, 1, 1), freq = 'AS-JAN')
model_arima_fit = mod_arima.fit()

fcast1 = model_arima_fit.forecast(5)[0]

fcast1 = pd.Series(fcast1, index = index_5_years)
fcast1 = fcast1.rename('Arima');

In [None]:
fig, ax = plt.subplots(figsize = (15, 5))
chart = sns.lineplot(x = 'time', y = 'Georgia', data = au)
chart.set_title('AU')
fcast1.plot(ax = ax, color = 'green', marker = 'o', legend = True)
au.plot(ax = ax, color = 'blue', marker = 'o', legend = True)
au_v.plot(ax = ax, color = 'orange', marker = 'o', legend = True);

<a id='auto'></a>
### Model 2: Auto-Arima

In [None]:
## Credit to https://www.youtube.com/watch?v=axjgEgBgIY0

In [None]:
auto_arima_mod = pm.auto_arima(au, seasonal = False, m = 0, error_action = 'ignore')
fcast2 = auto_arima_mod.predict(5)
fcast2 = pd.Series(fcast2, index = index_5_years)
fcast2 = fcast2.rename('Auto_Arima')

In [None]:
fig, ax = plt.subplots(figsize = (15, 5))
chart = sns.lineplot(x = 'time', y = 'Georgia', data = au)
chart.set_title('AU')
fcast2.plot(ax = ax, color = 'green', marker = 'o', legend = True)
au.plot(ax = ax, color = 'blue', marker = 'o', legend = True)
au_v.plot(ax = ax, color = 'orange', marker = 'o', legend = True);

<a id='prophet'></a>
### Model 3: Prophet

In [None]:
## Credit to https://www.youtube.com/watch?v=axjgEgBgIY0

In [None]:
au.head()

In [None]:
## Copying and renaming 
au_c = df_train[['time', 'Georgia']].copy()
au_c.columns = ['ds', 'y']
au_c['ds'] = pd.to_datetime(au_c['ds'])

model_p = Prophet(n_changepoints = 5, weekly_seasonality = False, daily_seasonality = False)
model_p.fit(au_c)

future = model_p.make_future_dataframe(5, freq = 'Y')

#make predictions

fcast3 = model_p.predict(future)

fcast3 = pd.Series(fcast3['yhat'].values, name = 'Prophet', index = fcast3['ds']);

In [None]:
fig, ax = plt.subplots(figsize = (15, 5))
chart = sns.lineplot(x = 'time', y = 'Georgia', data = au)
chart.set_title('AU')
fcast3.plot(ax = ax, color = 'green', marker = 'o', legend = True)
au.plot(ax = ax, color = 'blue', marker = 'o', legend = True)
au_v.plot(ax = ax, color = 'orange', marker = 'o', legend = True);

In [None]:
# print(f'The MSE of Prophet is: {mean_squared_error(au_v['Georgia'].values, fcast3.values, squared= False)}.')

<a id='agg1'></a>
### Model Aggregation 1: Running Arima and Auto-Arima Models Side-By-Side

In [None]:
## Help and base code comes from: https://www.youtube.com/watch?v=axjgEgBgIY0

In [None]:
df_train.head()

In [None]:
## Creating a states list for proof that it can work on multiple states.

states = ['Georgia', 'Alabama']

for s in states:
    #training data
    train_data = df_train[['time', s]]
    
    #valid data
    valid_data = df_val[['time', s]]
    
    #all data
    all_data = df_rev[['time', s]]
    
    #Set time column to index
    train_data.set_index('time', inplace = True)
    valid_data.set_index('time', inplace = True)
    valid_data.columns = ['Valid Data'] ##To see it on the graph
    all_data.set_index('time', inplace = True)
    
    #Set valid index for 5 years
    index_7_years = pd.date_range(train_data.index[-1], freq = 'AS', periods = 7)
    
    #Future index - 5 years
    future_7_years = pd.date_range(valid_data.index[-1], freq = 'AS', periods = 7)
    
    ## Tricky bit of code in order to basically reset the forecasts from the previous state. 
    # Otherwise, if a state fails to work in the Arima model, the forecast from the previous state is going to be passed in.
    
    #Drop all tables:
    try:
        del t_fcast1
        del t_fcast2
        #del t_fcast3
    
        del f_fcast1
        del f_fcast2
        #del f_fcast3
    except:
        print("")
    
    try:
        del t_fcast1
        del t_fcast2
        del t_fcast3
    except:
        print("")

## Arima Model ##

#Arima Validation Phase 

    try:
        model_arima = ARIMA(train_data, order=(1,1,1),freq='AS-JAN')
        model_arima_fit = model_arima.fit()

        t_fcast1 = model_arima_fit.forecast(7)[0]

        t_fcast1 = pd.Series(fcast1, index=index_7_years)
        t_fcast1 = t_fcast1.rename('Arima')
    except:
        print(s, 'Arima Train Error')
          

#Arima Future Phase
            
    try:
        model_arima = ARIMA(all_data, order=(1,1,1),freq='AS-JAN')
        model_arima_fit = model_arima.fit()

        f_fcast1 = model_arima_fit.forecast(7)[0]

        f_fcast1 = pd.Series(fcast1, index=future_7_years)
        f_fcast1 = f_fcast1.rename('Future_Arima')
    except:
        print(s, 'Arima Future Error')
        

##### Auto-Arima #####


#Auto Arima Valid phase
    try:
        auto_arima_model = pm.auto_arima(train_data, seasonal = False, m = 0,freq='AS-JAN', error_action='ignore')

        t_fcast2 = auto_arima_model.predict(7)
        t_fcast2 = pd.Series(t_fcast2, index=index_7_years)
        t_fcast2 = t_fcast2.rename('Auto_Arima')
    except:
        print(s, 'Auto Arima Train Error')
        

#Auto Arima Future phase
    try:
        auto_arima_model = pm.auto_arima(all_data, seasonal = False, m = 0,freq='AS-JAN')

        f_fcast2 = auto_arima_model.predict(7)
        f_fcast2 = pd.Series(f_fcast2, index=future_7_years)
        f_fcast2 = f_fcast2.rename('Future_Auto_Arima')
    except:
        print(s, 'Auto Arima Future Error')
    
#Plotting
fig, ax = plt.subplots(figsize =(15,5))
chart = sns.lineplot(x='time', y = s, data = train_data)
chart.set_title(s)
valid_data.plot(ax = ax, color = 'blue', marker = 'o', legend = True)
#Plotting val
try:
    t_fcast1.plot(ax = ax, color = 'red', marker = 'o', legend = True)
except:
    print('')
try:
    t_fcast2.plot(ax = ax, color = 'green', marker = 'o', legend = True)
except:
    print('')
    
#plotting future
try:
     f_fcast1.plot(ax = ax, color = 'red', marker = 'v', legend = True)
except:
    print('')
try:
    f_fcast2.plot(ax = ax, color = 'green', marker = 'v', legend = True)
except:
    print('');

<a id='agg2'></a>
### Model Aggregation 2: Arima, Auto Arima and Prophet 

In [None]:
df_train.head()

### **The below code aggregates all 4 ML Methods. Running with sereval ERRORS, but... running nonetheless!**
**Will take a considerable (several minutes or more, based on your PC specs) time to run!**

In [None]:
## Creating a full states list:

states = ["Alabama","Alaska","Arizona","Arkansas","California","Colorado",
          "Connecticut","District of Columbia", "Delaware","Florida","Georgia","Hawaii","Idaho",
          "Illinois", "Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine","Maryland",
          "Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana",
          "Nebraska","Nevada","New Hampshire","New Jersey","New Mexico","New York",
          "North Carolina","North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania",
          "Rhode Island","South Carolina","South Dakota","Tennessee","Texas","Utah",
          "Vermont","Virginia","Washington","West Virginia","Wisconsin","Wyoming"]

for s in states:
    #training data
    train_data = df_train[['time', s]]
    
    #valid data
    valid_data = df_val[['time', s]]
    
    #all data
    all_data = df_rev[['time', s]]
    
    #Set time column to index
    train_data.set_index('time', inplace=True)
    valid_data.set_index('time', inplace=True)
    valid_data.columns = ['Valid Data'] ##To see it on the graph
    all_data.set_index('time', inplace=True)
    
    #Set valid index for 7 years
    index_7_years = pd.date_range(train_data.index[-1], freq = 'AS', periods = 7)
    
    #Future index - 7 years
    future_7_years = pd.date_range(valid_data.index[-1], freq = 'AS', periods = 7)
    
    ## Tricky bit of code in order to basically reset the forecasts from the previous state. 
    # Otherwise, if a state fails to work in the Arima model, the forecast from the previous state is going to be passed in.
    
    #Drop all tables:
    try:
        del t_fcast1
        del t_fcast2
        del t_fcast3
#         del t_fcast4
    
        del f_fcast1
        del f_fcast2
        del f_fcast3
#         del f_fcast4
    except:
        print("")
    
    try:
        del t_fcast1
        del t_fcast2
        del t_fcast3
    except:
        print("")

##################################################################################################################
################################################ Arima #########################################################
##################################################################################################################


############################################# Arima Validation Phase ###################################################


    try:
        model_arima = ARIMA(train_data, order=(1, 1, 1),freq = 'AS-JAN');
        model_arima_fit = model_arima.fit();

        t_fcast1 = model_arima_fit.forecast(7)[0]

        t_fcast1 = pd.Series(fcast1, index = index_7_years)
        t_fcast1 = t_fcast1.rename('Arima')
    except:
        print(s, 'Arima Train Error')
          

################################################  Arima Future Phase ###################################################

            
    try:
        model_arima = ARIMA(all_data, order = (1, 1, 1), freq = 'AS-JAN');
        model_arima_fit = model_arima.fit();

        f_fcast1 = model_arima_fit.forecast(7)[0]

        f_fcast1 = pd.Series(fcast1, index = future_7_years)
        f_fcast1 = f_fcast1.rename('Future_Arima')
    except:
        print(s, 'Arima Future Error')
        

##################################################################################################################
################################################ Auto Arima ######################################################
##################################################################################################################


################################################  Auto Arima Validation Phase ###################################################

    try:
        auto_arima_model = pm.auto_arima(train_data, seasonal = False, m = 0)

        t_fcast2 = auto_arima_model.predict(7)
        t_fcast2 = pd.Series(t_fcast2, index = index_7_years)
        t_fcast2 = t_fcast2.rename('Auto_Arima')
    except:
        print(s, 'Auto Arima Train Error')
        

################################################  Auto Arima Future Phase ###################################################
   
    try:
        auto_arima_model = pm.auto_arima(all_data, seasonal = False, m = 0)

        f_fcast2 = auto_arima_model.predict(7);
        f_fcast2 = pd.Series(f_fcast2, index = future_7_years)
        f_fcast2 = f_fcast2.rename('Future_Auto_Arima')
    except:
        print(s, 'Auto Arima Future Error')
    
##################################################################################################################
################################################ Prophet #########################################################
##################################################################################################################

    #Expected column names Train
    train_data3 = df_train[['time', s]].copy()
    train_data3.columns = ['ds', 'y']
    train_data3['ds'] = pd.to_datetime(train_data3['ds'])

    #Expected column names Tall
    all_data3 = df_rev[['time', s]].copy()
    all_data3.columns = ['ds', 'y']
    all_data3['ds'] = pd.to_datetime(all_data3['ds'])
    
################################################  Prophet Validation Phase ###################################################
        
    model_p = Prophet(daily_seasonality=False, weekly_seasonality=False);
    model_p.fit(train_data3);

    ##make validation index 
    val = model_p.make_future_dataframe(7, freq = 'Y')
    
    #make predictions
    t_fcast3 = model_p.predict(val)
    t_fcast3 = pd.Series(t_fcast3['yhat'].values, name = 'Prophet', index = t_fcast3['ds'])

################################################  Prophet Future Phase #######################################################
    
    model_pf = Prophet(daily_seasonality=False, weekly_seasonality=False);
    model_pf.fit(all_data3);

    ##make validation index 
    future = model_pf.make_future_dataframe(7, freq = 'Y')
    
    #make predictions
    f_fcast3 = model_pf.predict(future)
    f_fcast3 = pd.Series(f_fcast3['yhat'].values, name = 'Future_Prophet', index = f_fcast3['ds'])


                                                            
##################################################################################################################
################################################ Plotting ######################################################
##################################################################################################################

    fig, ax = plt.subplots(figsize = (15, 5))
    chart = sns.lineplot(x = 'time', y = s, data = train_data)
    chart.set_title(s)
    valid_data.plot(ax = ax, color = 'blue', marker = 'o', legend = True);
#Plotting val
    try:
        t_fcast1.plot(ax = ax, color = 'red', marker = 'o', legend = True);
    except:
        print('')
    try:
        t_fcast2.plot(ax = ax, color = 'green', marker = 'o', legend = True);
    except:
        print('')

#plotting future
    try:
        f_fcast1.plot(ax = ax, color = 'red', marker = 'v', legend = True);
    except:
        print('')
    try:
        f_fcast2.plot(ax = ax, color = 'green', marker = 'v', legend = True);
    except:
        print('')
    
    t_fcast3.plot(ax = ax, color = 'orange', marker = 'o', legend = True);
    f_fcast3.plot(ax = ax, color = 'orange', marker = 'o', legend = True);
#     t_fcast4.plot(ax = ax, color = 'black', marker = 'o', legend = True)
#     f_fcast4.plot(ax = ax, color = 'black', marker = 'o', legend = True)

##################################################################################################################
################################################ Saving into DataFrames ##########################################
##################################################################################################################

################################################  DF Arima #######################################################

    try:
        #Creating df for forecast1
        t_fcast1 = t_fcast1.reset_index()
        t_fcast1.columns = ['Year', 'Arima ForecastValue Validation']
        
        f_fcast1 = f_fcast1.reset_index()
        f_fcast1.columns = ['Year', 'Arima ForecastValue Future']
        
        #Extra Columns
        t_fcast1['Arima ForecastValue Future'] = np.nan
        f_fcast1['Arima ForecastValue Validation'] = np.nan
        
        #Reordering
        t_fcast1 = t_fcast1[['Year', 'Arima ForecastValue Future', 'Arima ForecastValue Validation']]
        
        # Joining them togther
        df_fcast1 = pd.concat([t_fcast1, f_fcast1], axis = 0)
        df_fcast1['State'] = s
        df_fcast1['ML Method'] = 'Arima'
#         df_fcast1['Arima MSE'] = t_fcast1_mse
    
    except:
        df_fcast1 = pd.DataFrame({'Year': [np.nan], 'Arima ForecastValue Future': [np.nan], 'Arima ForecastValue Validation': [np.nan], 'State': [s], 'ML Method': ['Arima']})

################################################  DF Auto Arima #######################################################
     
    try:
        #Creating df for forecast2
        t_fcast2 = t_fcast2.reset_index()
        t_fcast2.columns = ['Year', 'Auto Arima ForecastValue Validation']
        
        f_fcast2 = f_fcast2.reset_index()
        f_fcast2.columns = ['Year', 'Auto Arima ForecastValue Future']
        
        #Extra Columns
        t_fcast2['Auto Arima ForecastValue Future'] = np.nan
        f_fcast2['Auto Arima ForecastValue Validation'] = np.nan
        
        #Reordering
        t_fcast2 = t_fcast2[['Year', 'Auto Arima ForecastValue Future', 'Auto Arima ForecastValue Validation']]
        
        # Joining them togther
        df_fcast2 = pd.concat([t_fcast2, f_fcast2], axis = 0)
        df_fcast2['State'] = s
        df_fcast2['ML Method'] = 'Auto Arima'
#         df_fcast2['Auto Arima MSE'] = t_fcast2_mse
    
    except:
        df_fcast2 = pd.DataFrame({'Year': [np.nan], 'Auto Arima ForecastValue Future': [np.nan], 'Auto Arima ForecastValue Validation': [np.nan], 'State': [s], 'ML Method': ['Auto Arima']})

################################################  DF Prophet #######################################################
     
    try:
        #Creating df for forecast3
        t_fcast3 = t_fcast3.reset_index()
        t_fcast3.columns = ['Year', 'Prophet ForecastValue Validation']
    
        f_fcast3 = f_fcast3.reset_index()
        f_fcast3.columns = ['Year', 'Prophet ForecastValue Future']
        
        #Extra Columns
        t_fcast3['Prophet ForecastValue Future'] = np.nan
        f_fcast3['Prophet ForecastValue Validation'] = np.nan
        
        #Reordering
        t_fcast3 = t_fcast3[['Year', 'Prophet ForecastValue Future', 'Prophet ForecastValue Validation']]
        
        # Joining them togther
        df_fcast3 = pd.concat([t_fcast3, f_fcast3], axis = 0)
        df_fcast3['State'] = s
        df_fcast3['ML Method'] = 'Prophet'
#         df_fcast3['Prophet MSE'] = t_fcast3_mse

    except:
        df_fcast2 = pd.DataFrame({'Year': [np.nan], 'Prophet ForecastValue Future': [np.nan], 'Prophet ForecastValue Validation': [np.nan], 'State': [s], 'ML Method': ['Prophet']})
        
################################################ Aggregating and Joining #######################################################

#     df_fcast1 = df_fcast1.groupby(['Year', 'State', 'ML Method'], as_index = False).agg({'Arima ForecastValue Future': 'sum', 'Arima ForecastValue Validation': 'sum'})
#     df_fcast2 = df_fcast2.groupby(['Year', 'State', 'ML Method'], as_index = False).agg({'Auto Arima ForecastValue Future': 'sum', 'Auto Arima ForecastValue Validation': 'sum'})
    df_fcast3 = df_fcast3.groupby(['Year', 'State', 'ML Method'], as_index = False).agg({'Prophet ForecastValue Future': 'sum', 'Prophet ForecastValue Validation': 'sum'})
        
    all_forecasts = df_fcast3.copy()
#     all_forecasts = df_fcast3.merge(df_fcast2[['Year', 'Auto Arima ForecastValue Future', 'Auto Arima ForecastValue Validation']], how ='left', on = 'Year').copy()
#     all_forecasts = pd.DataFrame(all_forecasts.merge(df_fcast1[['Year', 'Arima ForecastValue Future', 'Arima ForecastValue Validation']], how ='left', on = 'Year'))
        
        #Save predictions in df. First time this will fail and just give all_forecasts, after that will concat for each state
    try: 
        final_forecasts = pd.concat([final_forecasts, all_forecasts], ignore_index = True)
    except:
        final_forecasts = all_forecasts;

In [None]:
final_forecasts.shape

In [None]:
#final_forecasts.to_csv('./data/time_series/final_v1.csv')

<a id='merge'></a>
## 4. Merging Final Forecasts with Original Data

**Formatting completed in the initial run.  Below is commented out for data preservation**

Merging the final forecasts with original data. However our original dataframe is in the wrong format, so we import a new one that has a "states" row with the historical data in the correct format.

In [None]:
#Reading in our original data in the correct format and final_forecasts (which we exported above)
df_data = pd.read_csv('./data/time_series/hor_states.csv')

final_forecasts = pd.read_csv('./data/time_series/final_v1.csv')

In [None]:
df_data.head()

In [None]:
# Formatting already applied. Commented out to preserve data

# Setting consistent columns
df_data['Prophet ForecastValue Future'] = np.nan
df_data['Prophet ForecastValue Validation'] = np.nan

# df_data['Year'] = pd.to_datetime(df_data['Year'])
#Dropping Ml Method as it is not necessary
del final_forecasts['ML Method']

#adding this column to keep consistency across both before union
final_forecasts['fi'] = np.nan

In [None]:
# Formatting already applied. Commented out to preserve data
#Converting both to datetime so types are the same for the concat
df_data['Year'] = pd.to_datetime(df_data['Year'])
final_forecasts['Year'] = pd.to_datetime(final_forecasts['Year'])

In [None]:
df_data.head(1)

In [None]:
final_forecasts.head(1)

In [None]:
# Formatting already applied. Commented out to preserve data
# Reordering for concat

final_forecasts = final_forecasts[['Unnamed: 0', 'Year', 'State', 'fi', 'Prophet ForecastValue Future', 'Prophet ForecastValue Validation']]

In [None]:
# Formatting already applied. Commented out to preserve data
# Concatening and dropping and renaming columns 
output_df = pd.concat([df_data, final_forecasts], axis = 0)

del output_df['Unnamed: 0']

output_df.rename(columns = {'fi': 'Food Insecurity Rate'}, inplace = True)

In [None]:
# Exporting to CSV
output_df.to_csv('./data/time_series/output_df.csv')