<a href="https://colab.research.google.com/github/idalyfranco/Idalylearning.github.io/blob/main/PFX_Fall22_SkillsOH_456_working.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `Final Exam`, `Fall 2022`: `Time Series Analysis of US Inflation`
_Version 1.0.1_

Change history:   
1.0.1 - bugfix ex2 test code.  
1.0 - initial release  

*All of the header information is important. Please read it..*

**Topics, number of exercises:** This problem builds on your knowledge of Pandas, Numpy, basic Python data structures, and implementing mathematical functions. It has **9** exercises, numbered 0 to **8**. There are **18** available points. However, to earn 100% the threshold is **13** points. (Therefore, once you hit **13** points, you can stop. There is no extra credit for exceeding this threshold.)

**Exercise ordering:** Each exercise builds logically on previous exercises, but you may solve them in any order. That is, if you can't solve an exercise, you can still move on and try the next one. Use this to your advantage, as the exercises are **not** necessarily ordered in terms of difficulty. Higher point values generally indicate more difficult exercises.

**Demo cells:** Code cells starting with the comment `### define demo inputs` load results from prior exercises applied to the entire data set and use those to build demo inputs. These must be run for subsequent demos to work properly, but they do not affect the test cells. The data loaded in these cells may be rather large (at least in terms of human readability). You are free to print or otherwise use Python to explore them, but we did not print them in the starter code.

**Debugging your code:** Right before each exercise test cell, there is a block of text explaining the variables available to you for debugging. You may use these to test your code and can print/display them as needed (careful when printing large objects, you may want to print the head or chunks of rows at a time).

**Exercise point breakdown:**

- Exercise 0: **1** point(s)
- Exercise 1: **1** point(s)
- Exercise 2: **2** point(s)
- Exercise 3: **2** point(s)
- Exercise 4: **2** point(s)
- Exercise 5: **2** point(s)
- Exercise 6: **2** point(s)
- Exercise 7: **3** point(s)
- Exercise 8: **3** point(s)

**Final reminders:**

- Submit after **every exercise**
- Review the generated grade report after you submit to see what errors were returned
- Stay calm, skip problems as needed, and take short breaks at your leisure


## Background Inflation

Inflation is an increase in overall prices in an economy over time. Deflation is "negative inflation", a decrease in prices over time. A common way to measure inflation is to first calculate the CPI (price of a representative basket of goods), then compute the difference in CPI over a time interval. In other words if the CPI is 100 at one point in time, and the CPI is 105 one year later then we would say that the inflation rate over that year was 5%.

## Data

We have obtained the US CPI for each month going back to the early 20th century from The Organisation for Economic Co-operation and Development.

## Analysis goals
- Use the CPI data to calculate the inflation rate at any point in history over an arbitrary number of months.
- Attempt to predict the inflation rate in future months based on the inflation rate in previous months using exponential smoothing models.
    - Evaluate how "good" the predictions are.
    - Tune the models to pick the best parameters.
    - Make inferences based on the selected parameters.

In [2]:
# uncomment in Google Colab
# !python --version
!pip install dill
import dill as pickle
!pip install cryptography

Collecting dill
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/115.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━[0m [32m81.9/115.3 kB[0m [31m2.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dill
Successfully installed dill-0.3.7


In [3]:
### Global Imports
import pandas as pd
import numpy as np
import pickle

# Some functionality needed by the notebook and demo cells:
from pprint import pprint, pformat
import math

# === Messages === #

def status_msg(s, verbose=True, **kwargs):
    if verbose:
        print(s, **kwargs)

# === Input/output === #

# def load_df_from_file(basename, dirname='resource/asnlib/publicdata/', abort_on_error=False, verbose=False):
def load_df_from_file(basename, dirname='', abort_on_error=False, verbose=False):
    from os.path import isfile
    from dill import loads
    from pandas import DataFrame
    df = DataFrame()
    filename = f"{dirname}{basename}"
    status_msg(f"Loading `DataFrame` from '{filename}'...", verbose=verbose)
    if isfile(filename):
        try:
            with open(filename, "rb") as fp:
                df = loads(fp.read())
            status_msg(f"  ==> Done!", verbose=verbose)
        except:
            if abort_on_error:
                raise
            else:
                df = DataFrame()
                status_msg(f"  ==> An error occurred.", verbose=verbose)
    return df

# def load_obj_from_file(basename, dirname='resource/asnlib/publicdata/', abort_on_error=False, verbose=False):
def load_obj_from_file(basename, dirname='', abort_on_error=False, verbose=False):
    from os.path import isfile
    from dill import loads
    from pandas import DataFrame
    filename = f"{dirname}{basename}"
    status_msg(f"Loading object from '{filename}'...", verbose=verbose)
    if isfile(filename):
        try:
            with open(filename, "rb") as fp:
                df = loads(fp.read())
            status_msg(f"  ==> Done! Type: `{type(df)}`", verbose=verbose)
        except:
            if abort_on_error:
                raise
            else:
                df = DataFrame()
                status_msg(f"  ==> An error occurred.", verbose=verbose)
    else:
        df = None
    return df

In [5]:
# import files
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tc_4
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tc_5
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tc_6
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/cpi_urban_all.csv

!mkdir tester_fw
%cd tester_fw

!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tester_fw/__init__.py
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tester_fw/test_utils.py
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tester_fw/testers.py

%cd ..

--2023-11-19 19:17:17--  https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tc_4
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 79116 (77K) [text/plain]
Saving to: ‘tc_4’


2023-11-19 19:17:17 (2.66 MB/s) - ‘tc_4’ saved [79116/79116]

--2023-11-19 19:17:17--  https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_456/main/tc_5
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 224568 (219K) [text/plain]
Saving to: ‘tc_5’


2023-11-19 19:17:18 (4.61 MB/s) - ‘tc_5’ saved [224568/224568]

--2023-11-19 19:

## Exercise 0 - (**1** Points):
To start things off we will load the CPI data into the notebook environment. You do not need to modify the cell below, just execute the test and collect your free point!

This cell will also display the first few rows and last few rows of the CPI data we just loaded.

In [6]:
cpi_all_df = pd.read_csv('cpi_urban_all.csv')
display(cpi_all_df.head())
display(cpi_all_df.tail())

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,HALF1,HALF2
0,1913,9.8,9.8,9.8,9.8,9.7,9.8,9.9,9.9,10.0,10.0,10.1,10.0,,
1,1914,10.0,9.9,9.9,9.8,9.9,9.9,10.0,10.2,10.2,10.1,10.2,10.1,,
2,1915,10.1,10.0,9.9,10.0,10.1,10.1,10.1,10.1,10.1,10.2,10.3,10.3,,
3,1916,10.4,10.4,10.5,10.6,10.7,10.8,10.8,10.9,11.1,11.3,11.5,11.6,,
4,1917,11.7,12.0,12.0,12.6,12.8,13.0,12.8,13.0,13.3,13.5,13.5,13.7,,


Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,HALF1,HALF2
105,2018,247.867,248.991,249.554,250.546,251.588,251.989,252.006,252.146,252.439,252.885,252.038,251.233,250.089,252.125
106,2019,251.712,252.776,254.202,255.548,256.092,256.143,256.571,256.558,256.759,257.346,257.208,256.974,254.412,256.903
107,2020,257.971,258.678,258.115,256.389,256.394,257.797,259.101,259.918,260.28,260.388,260.229,260.474,257.557,260.065
108,2021,261.582,263.014,264.877,267.054,269.195,271.696,273.003,273.567,274.31,276.589,277.948,278.802,266.236,275.703
109,2022,281.148,283.716,287.504,289.109,292.296,296.311,296.276,296.171,296.808,298.012,,,288.347,


<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 0. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [7]:
### test_cell_ex0
assert 'cpi_all_df' in globals()
assert isinstance(cpi_all_df, pd.DataFrame)
print('Passed! Please submit.')

Passed! Please submit.


## Exercise 4 - (**2** Points):
We have the CPI data re-organized into a time series. We are concerned with inflation, which is the multiplicative change in CPI over some time interval. We will need to transform the data a final time to get an inflation time series.

Define the function `multiplicative_change(ts, lag)`. The input `ts` is a 1-D array of floats representing monthly observations of the CPI. The input `lag` is an integer indicating the time interval we want to measure inflation over in months.

Your function should implement the following formula to calculate $\hat{x}$ and return the result as a 1-D array. In the mathematical notation $x$ is `ts`, and $\ell$ is `lag`:
$$\hat{x_i} = \frac{x_i - x_{i-\ell}}{x_{i-\ell}}$$

Note that by this definition the first $\ell$ (or `lag`) entries in $\hat{x}$ are undefined. The output will start with the first defined value.

In [10]:
### Define demo inputs
demo_ts_ex4 = np.array([100., 150., 180., 216., 324.])

<!-- Expected demo output text block -->
The demo included in the solution cell below should display the following output:
```
lag of 1
[0.5 0.2 0.2 0.5]

lag of 2
[0.8  0.44 0.8 ]
```
<!-- Include any shout outs here -->

In [11]:
### Exercise 4 solution
def multiplicative_change(ts, lag):
    ### YOUR CODE HERE

### demo function call
demo_output_ex4_lag_1 = multiplicative_change(demo_ts_ex4, 1)
demo_output_ex4_lag_2 = multiplicative_change(demo_ts_ex4, 2)
print('lag of 1')
print(demo_output_ex4_lag_1)
print()
print('lag of 2')
print(demo_output_ex4_lag_2)

IndentationError: ignored

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 4. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex4
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_4',
    'func': multiplicative_change, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'ts':{
            'dtype':'np.ndarray', # data type of param.
            'check_modified':True,
        },
        'lag':{
            'dtype':'int', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'z0BNF11iKYQicR63590bVXZGa19YGvJcmzrbP6R7oAY=', path='')
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

## On time-series analysis
The following two exercises will focus on implementing two triple time-series analysis techniques, triple and double exponential smoothing. The high level idea for simple smoothing is that we will make an initial guess, compare it with the observation, and use that information to improve our guess on the following observation. For double smoothing, we will do this on two levels - adjusting successive guesses for the observations themselves as well as for the difference between observations in an attempt to capture any trends in our model.

## Exercise 5 - (**2** Points):
This is the formula for our application of simple exponential smoothing. In the math notation $x_t$ is `ts[t]`, and $\hat{x_t}$ is our prediction for $x_t$:   

Initial conditions  
- $s_0 = x_0$.                This is our initial guess.  
- $\hat{x_0}$ is undefined.   We can't call the first guess a prediction since it's actually the first observation.  

For $t > 0$  
- $s_t = \alpha(x_{t}) + (1-\alpha)s_{t-1}$  
- $\hat{x_t} = s_{t-1}$

When $\alpha$ is closer to 1 the model is more sensitive to recent observations. When $\alpha$ is closer to 0 the model is more sensitive to past observations.

Define the function `simple_exp_smoothing(ts, alpha)`. The input `ts` will be a 1-D numerical array (the vector $x$ from the formula above), and the input `alpha` (the scalar $\alpha$ from the formula above) will be a floating point number between 0 and 1.

Your function should implement the formula above and return the vector $\hat{x}$ as a 1-D array.   
- Since $\hat{x_0}$ is undefined, the first element in your result should be `np.nan`.  
- Since $\hat{x}_{n+1}$ is well-defined for $x \in \mathcal{R}^n$, your result should have exactly one more element than the input `ts`.

In [None]:
### Define demo inputs

demo_ts_ex5 = np.array([100., 105., 120., 110., 115.])

<!-- Expected demo output text block -->
The demo included in the solution cell below should display the following output:
```
[ nan 100. 105. 120. 110. 115.]
[ nan 100. 100. 100. 100. 100.]
[ nan 100. 102.5 111.25 110.625 112.8125]
```
The demo below will run your solution 3 times with `alpha` values of `1`, `0`, and `0.5`.

In [None]:
### Exercise 5 solution
def simple_exp_smoothing(ts, alpha):
    ### YOUR CODE HERE

### demo function call
print(simple_exp_smoothing(demo_ts_ex5, 1))
print(simple_exp_smoothing(demo_ts_ex5, 0))
print(simple_exp_smoothing(demo_ts_ex5, 0.5))

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 5. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex5
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_5',
    'func': simple_exp_smoothing, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'ts':{
            'dtype':'np.ndarray', # data type of param.
            'check_modified':True,
        },
        'alpha':{
            'dtype':'float', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'np.ndarray',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'z0BNF11iKYQicR63590bVXZGa19YGvJcmzrbP6R7oAY=', path='')
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

## Exercise 6 - (**2** Points):
Now we will implement double exponential smoothing. For our implementation the formula is as follows:

- $s_0 = x_0$  
- $b_0 = 0$  
- $\hat{x}_0$ is undefined  

For $t > 0$:  
- $s_t = \alpha x_{t} + (1-\alpha)(s_{t-1} + b_{t-1})$  
- $b_t = \beta (s_t - s_{t-1}) + (1-\beta)b_{t-1}$  
- $\hat{x}_{t} = s_{t-1} + b_{t-1}$  

Define the function `double_exp_smoothing(ts, alpha, beta)`. The input `ts` will be a 1-D numerical array (the vector $x$ from the formula above), and the inputs `alpha` and `beta` (the scalars $\alpha$ and $\beta$ from the formula above) will be floating point numbers between 0 and 1.

Your function should implement the formula above and return the vector $\hat{x}$ as a 1-D array.   
- Since $\hat{x_0}$ is undefined, the first element in your result should be `np.nan`.  
- Since $\hat{x}_{n+1}$ is well-defined for $x \in \mathcal{R}^n$, your result should have exactly one more element than the input `ts`.

In [None]:
### Define demo inputs

demo_ts_ex6 = np.array([100., 105., 120., 110., 115.])

<!-- Expected demo output text block -->
The demo included in the solution cell below should display the following output:
```
[nan 100. 102.5   111.25     110.625      112.8125]
[nan 100. 105.    122.5      120.         118.75]
[nan 100. 103.75  117.1875   117.109375   119.04296875]
[nan 100. 101.875 109.296875 112.45117188 116.38549805]
```
The demo below performs 4 runs with your solution. Each run with different `alpha` or `beta` parameters.

In [None]:
### Exercise 6 solution
def double_exp_smoothing(ts, alpha, beta):
    ### YOUR CODE HERE

print(double_exp_smoothing(demo_ts_ex6, alpha=0.5, beta=0))
print(double_exp_smoothing(demo_ts_ex6, alpha=0.5, beta=1))
print(double_exp_smoothing(demo_ts_ex6, alpha=0.5, beta=0.5))
print(double_exp_smoothing(demo_ts_ex6, alpha=0.25, beta=0.5))

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 6. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex6
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_6',
    'func': double_exp_smoothing, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'ts':{
            'dtype':'np.ndarray', # data type of param.
            'check_modified':True,
        },
        'alpha':{
            'dtype':'float', # data type of param.
            'check_modified':True,
        },
        'beta':{
            'dtype':'float', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'np.ndarray',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'z0BNF11iKYQicR63590bVXZGa19YGvJcmzrbP6R7oAY=', path='')
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')