# General overview

-----------------------------------------------------------------------------------------------------------------
The following Jupyter notebook was created in order to automate the process of generating and visualizing forecasts for the macroeconomic indicators using historical time series retrieved using World Bank's Indicators API. Forecast will be generated using ARIMA, SARIMA or SARIMAX model, depending on the suggestion given by the Auto ARIMA library.

# Import libraries and set display options

In [6]:
import json
import numpy as np
import pandas as pd
from pmdarima.arima import auto_arima
from retrieval_funcs import create_world_bank_api_url_string, retrieve_url_content
from preprocessing_funcs import convert_bytes_to_unicode, extract_dates_and_values_from_json


# Set pandas data display options
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Script parameters  

Please provide the country and economic indicator codes for which you would like to generate the forecast

In [3]:
COUNTRY_CODE = 'afg'
INDICATOR_CODE = 'NY.GDP.MKTP.CN'

# Retrieve and preprocess data 

In [4]:
# Create query string used to retrieve the data using World Bank's Indicators API
query_string = create_world_bank_api_url_string(COUNTRY_CODE, INDICATOR_CODE)

# Retrieve data 
data = retrieve_url_content(query_string)

# Convert retrieved data represented as bytes object to JSON
json_string = convert_bytes_to_unicode(data)
json = json.loads(json_string)

# Extract dates and values from JSON
dates, values = extract_dates_and_values_from_json(json)

# Create DataFrame object using dates and values
time_series_df = pd.DataFrame.from_dict({"Date": dates, "Value": values})

# Change index to Date column
time_series_df.set_index('Date', inplace=True)

### Missing values check

Now it's time to check if our time series has any missing values, and if so, save this information for later to use Kalman smoother with the model suggested by the Auto Arima library. Using Auto Arima should help us to make this script applicable to different time series.

In [5]:
# Constant variable storing information about the missing values
HAS_MISSING = False

# Set HAS_MISSING to True if there are any missing values
if time_series_df.isnull().sum().sum() > 0:
    HAS_MISSING = True
else:
    pass

# Forecasting with Auto ARIMA

Using Auto ARIMA library will speed up the forecasting process because the data preparation and parameter tuning processes for ARIMA, SARIMA and SARIMAX models end up being really time consuming. Auto ARIMA makes forecast preparation much simpler, because it:

<ul>
    <ul>
        <li> Performs data stationarity checks
        <li> Determines the <b>d</b> value which stands for the number of times the differencing operation has to be applied to the time series to make it stationary
        <li> Handles hyperparameter selection for models, hence saves us from doing this manually by looking at the ACF and PACF plots
        <li> Choses most accurate forecasting model for the given time series
    </ul>
</ul>


In [9]:
step_wise_model_selection = auto_arima(np.array(time_series_df['Value']),
                                       start_p=1,
                                       start_q=1,
                                       max_p=6,
                                       max_q=6,
                                       max_d=3,
                                       max_P=3,
                                       max_Q=3,
                                       trace=True,
                                       error_action='ignore',
                                       suppress_warnings=True,
                                       stepwise=True
                                      )