# Module 12 Assignment

A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from nose.tools import assert_equal, assert_almost_equal, assert_is_instance
import helper

import pandas as pd


import matplotlib.pyplot as plt
import seaborn as sns

# These two lines suppress warnings that sometimes
# occur when making visualizations
import warnings
warnings.filterwarnings('ignore')

# Set global figure properties
import matplotlib as mpl
mpl.rcParams.update({'axes.titlesize' : 20,
                     'axes.labelsize' : 18,
                     'legend.fontsize': 16})

# Set default Seaborn plotting style
sns.set_style('white')

from datetime import datetime
import statsmodels.api as sm

Let's load in the stock data and convert its index to date time.

In [None]:
df = pd.read_csv('./dow_jones_index.data', index_col=1, usecols=(1,2,3,4,5,6,8))
df.index = pd.to_datetime(df.index)

# Problem 1: Plot time series data

Write a function called $\texttt{time_series_plotter}$ which takes in a pandas data frame and a stock ticker name and plots the time series of the stocks high price. The function should return a matplotlib.axes object. Furthermore, title the plot "Stock price", label the x-axis "Date", and label the y-axis "Price high".

In [None]:
def time_series_plotter(stock_name,data):
    '''
    
    Inputs
    --------
    
    stock_name: a string, the ticker name of the stock
    
    data: the data containing the stock names and prices
    
    
    Outputs
    --------
    
    ax: a matplotlib axes object containing the time series plot
    
    '''
    
    
    # YOUR CODE HERE
    
    return ax

In [None]:
#Let's see how the plot looks
time_series_plot = time_series_plotter('AA',df)

In [None]:
assert_equal(time_series_plot.get_title(),'Stock price')
assert_equal(time_series_plot.get_xlabel(),'Date')
assert_equal(time_series_plot.get_ylabel(),'Price high')
assert_equal(time_series_plot.lines[0].get_data()[1][0],16.72)
assert_equal(np.shape(time_series_plot.lines[0].get_data())[0],2)
assert_equal(np.shape(time_series_plot.lines[0].get_data())[1],25)

# Problem 2: Fit an ARMA model

Write a function called $\texttt{ARMA_model}$ which takes in a pandas data frame and a stock ticker name and fits an ARMA model to the price high of the stock. When initializing the ARMA model, pass (7,1) as the order parameters. Furthermore when fitting the model, set "trend = 'c'" and "disp = True".

In [None]:
def ARMA_model(stock_name,data):
    '''
    
    Inputs
    --------
    
    stock_name: a string, the ticker name of the stock
    
    data: the data containing the stock names and prices
    
    
    Outputs
    --------
    
    estimated_parameters: a pandas series, containing the estimated parameter values for the ARMA model
    
    '''
    
    
    # YOUR CODE HERE
    
    return estimated_parameters

In [None]:
estimated_params = ARMA_model('AA',df)
assert_equal(estimated_params[0],16.990166121630605)
assert_equal(len(estimated_params),9)
assert_is_instance(estimated_params,pd.core.series.Series)

In [None]:
#Let's see what the estimated parameter values are.
print(estimated_params)

# Problem 3: Simulating Hidden States

Recall in Assignment 9 Problem 3 We had a 2 State Markov Chain: Cancer and No Cancer for 50 trials. where the state transition probabilities were as:
```
 .90 .10
 .05 .95 
```
which corresponded to
```
CC CN
NC NN
```
where C is Cancer and N is No Cancer and the first state was [0 1].

To complete this problem simulate 100 hidden states using the first state (as the starting state) and transisition probabilites with numpy's implementation for a [pseudo-random number generator](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.random.RandomState.html).

Store the hidden states in a variable called hidden_state (this should be a 100x1 numpy array).


In [None]:
# YOUR CODE HERE

In [None]:
assert_equal(hidden_state.tolist(), 
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1])

# Problem 4: Plot the Hidden States

For this problem create a figure and axes object using the subplot function. Use the plot function from the Axes object to plot the index (0-99) and the hidden states. Name the axes object ax.

In [None]:
# YOUR CODE HERE

In [None]:
assert_equal(ax.lines[0].get_xdata().tolist()[0:20], 
             [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

assert_equal(ax.lines[0].get_ydata().tolist()[0:100], 
[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1])

# Problem 5: Estimating Hidden States with Kalman Filters

Write a function called kalman_filter that accepts 1 parameter. The autograder will pass in hidden states that you genereated as the signal. Create 100x1 numpy arrays for the signal estimates and prior, the error estimates and prior, and the gain. Set the inital state of signal estimate to 0 and the error estimate to 2.5.

Lastly, Perform Kalman filtering:
- when updating priors use a variance of 0.1
- when updating estimates use a variance of 0.01

Return the estimate of the signal and also the error of the signal.

In [None]:
# YOUR CODE HERE

In [None]:
x_est, err_prior = kalman_filter(hidden_state)

Let's take a look at the estimates.

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
x = np.linspace(0, len(hidden_state) - 1, len(hidden_state))
ax.scatter(x, hidden_state, marker='+', alpha=0.75, label='Observed Hidden States')
ax.plot(x_est, lw=2,color='orange', label='Estimate')
ax.legend()

In [None]:
for ans, sol in zip(x_est.tolist(), [0.0, 0.996168582375479, 0.9996806132226126, 0.9999731982525261, 0.08391793990566809, 0.007042411702294915, 0.0005910007175486858, 4.959690840496253e-05, 4.162183310932926e-06, 3.492913262326489e-07, 2.931260385886664e-08, 2.459920073751117e-09, 2.0643702614680221e-10, 1.7324240010509863e-11, 1.4538539793162977e-12, 1.2200774128570992e-13, 1.0238916112222692e-14, 8.592520609625629e-16, 7.210861932808018e-17, 6.051370974399703e-18, 5.078323647163105e-19, 0.9160797830996161, 0.9929573971953926, 0.9994089832450941, 0.9999504017457366, 0.9999958377037443, 0.9999996506991954, 0.9999999706866007, 0.9999999975400132, 0.9999999997935574, 0.9999999999826753, 0.9999999999985462, 0.999999999999878, 0.9999999999999898, 0.9999999999999991, 0.9999999999999999, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.08392021690038387, 0.007042602804607473, 0.000591016754905911, 4.959825426346508e-05, 4.162296255670378e-06, 3.4930080457951383e-07, 2.9313399283791385e-08, 2.4599868259833305e-09, 2.0644262800860806e-10, 1.732471011996765e-11, 0.91607978310107, 0.9929573971955146, 0.9994089832451043, 0.9999504017457373, 0.9999958377037445, 0.9999996506991954, 0.9999999706866007, 0.9999999975400132, 0.9999999997935574, 0.08392021688305917, 0.00704260280315358, 0.0005910167547839001, 4.9598254253225877e-05, 4.1622962548111e-06, 3.4930080450740295e-07, 0.9160798124130154, 0.9929573996553793, 0.9994089834515367, 0.9999504017630613, 0.9999958377051982, 0.9999996506993174, 0.08392018758699482, 0.007042600344621508, 0.0005910165484633552, 4.959823693876102e-05, 0.916083945394418, 0.992957746496075, 0.9994090125584831, 0.9999504042057225]):
    assert_almost_equal(ans, sol, places=2)


# Problem 6: Plotting the error convergence

Using a matplotlib axes object, plot the estimated error versus the number of iterations of the Kalman filter. Title the plot "Convergence", label the x-axis "Iteration", and label the y-axis "Error in estimate". Save the matplotlib axes object as a variable called $\texttt{ax}$.

In [None]:
# YOUR CODE HERE

In [None]:
assert_equal(ax.get_title(),'Convergence')
assert_equal(ax.get_xlabel(),'Iteration')
assert_equal(ax.get_ylabel(),'Error in estimate')
assert_equal(ax.lines[0].get_data()[1][0],0.1099616858237546)
assert_equal(np.shape(ax.lines[0].get_data())[0],2)
assert_equal(np.shape(ax.lines[0].get_data())[1],98)