<a href="https://colab.research.google.com/github/shihanxie/Econ475/blob/main/Course_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Project
Instructions:
- Please read the project description before your start.
- To download a copy of your ipython notebook: click ```File -> Download .ipyhon```
- Write your code in the code cells below each Step description. Write your answer to questions in a text cell. You may add extra cells if needed.

In [None]:
import numpy as np
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import statsmodels
import statsmodels.api as sm
from statsmodels.tsa.arima.model import ARIMA

## Question 1: model and forecat electricity consumption data
Step 1: import data from Github
- Check to see that DataFrame elec has two columns ```date``` and ```elec```
- The variable ```elec``` is electricity retail sales to the residential sector in the US in million kilowatt hours.
- The sample is monthly and covers the period from 1973M1 to 2011M12 but use the sample up to **2010M12** (do not include observations from 2011)

In [None]:
elec = pd.read_csv('https://raw.githubusercontent.com/shihanxie/Econ475/main/data/elec.csv')
elec.index = pd.date_range(start='1973-01-01', periods= elec.shape[0], freq='M')

# Define start and end date of sample
start = '1973-01-01'
end = '2011-01-01'

Step 2: determine whether it is better to take the log of electricity consumption or not. Plot the level or the log of electricity consumption depending on your choice. Properly label the x- and y- axis in your plot

Step 3: Estimate a model with a linear trend and a model with a quadratic trend. Which one would you choose?

Step 4: Depending on your choice of the model from Step 3, provide the plot and correlogram (up to 12 lags) of the residuals of the model you chose in Step 3.  Is there any seasonal pattern?

Step 5: Estimate a model with a trend and a full set of dummy variables (for 12 months) and report the result.

Step 6: Provide  the  plot  and  correlogram  (up  to  12  lags)  of  the  residuals  of  the  model  you estimated in Step 5. Is there any cycles or serial correlation in the residuals?

Step 7: Estimate an ARMA(p,q) model with $p= 0,1,2,3$ and $q= 0,1,2,3$ except $p=q=0$ and report SICs. Which lag orders would you choose?

Step 8: Provide the plot and correlogram (up to 12 lags) of the residuals of the model you chose in Step 7.  Is there any evidence of cycles in the residuals?

Step 9: Use  the  model  you  chose  in  Step 7,  forecast  the level  of  electricity  retail  sales  for year 2011 and compute its 95% interval forecasts as well. Plot your point and interval forecasts together with actual data for the period from 2008 to 2011.

#Question 2: model and forecat industrial production data
Step 1: import data from Github

- Check to see that DataFrame ```industrial``` has two columns ```date``` and ```lip```
- The variable ```lip``` is the log of the seasonally adjusted industrial production index of the US ($ip=100$ in 2007).
- The sample is monthly and covers the period from 1980M1 to 2014M12 but use the sample up to 2013M12 (do not include observations in 2014).

In [None]:
industrial = pd.read_csv('https://raw.githubusercontent.com/shihanxie/Econ475/main/data/industrial.csv')
industrial.index = pd.date_range(start='1980-01-01', periods= industrial.shape[0], freq='M')

# Define start and end date of sample
start = '1980-01-01'
end = '2014-01-01'

Step 2: Estimate a model with the intercept and a linear time trend and report the estimation result

Step 3: Compute the correlogram of the residuals in Step 2 up to 12 lags and describe any interesting characteristics. Would an AR model or an MA model fit the data better?

Step 4: Estimate an AR model, including the intercept and a linear time trend, with 1, 2, ..., 6 lags and report SICs of all these models.

Step 5: Choose the lag length for an AR model based on SIC and report the estimation result of the AR model with the chosen lag length.  Check out the ```Durbin-Watson``` statistic.  What doesthe DW statistic suggest?

Step 6: Now use the model chosen in Step 5. Consider the correlogram of the residuals up to 12 lags. Is there any evidence of serial correlation in the residuals?

Step 7: Use the model chosen in Step 5. Do the ```Breusch-Godfrey``` test (serial correlation LM test) on the residuals with 6 lags included.  Do you reject the null hypothesis at the 5% significance level?

Step 8: The Great Recession made industrial production drop substantially in 2008 and 2009. Do you think the model chosen in Step 5 became invalid to describe the dynamics ofthe industrial production after the Great Recession?  Try to provide some evidence for your conclusion using appropriate statistics.

Step 9: Forecast the **level** of the industrial production for year 2014 and compute its 95% interval forecasts as well. Plot your point and interval forecasts together with actual data forthe period from 2010 to 2014.



## Question 3: Modeling volatility using US Dollar / Australian Dollar exchange rate data
Step 1: import data from Github
- Check to see that DataFrame usdaud a column ```USDAUD```, which is the USD/AUD exchange rate
- The sample is daily and covers the period from Jan 2, 2001 to Oct 14, 2004

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from scipy import stats

!pip install arch
from arch import arch_model

In [None]:
usdaud = pd.read_csv('https://raw.githubusercontent.com/shihanxie/Econ475/main/data/usdaud.csv')
usdaud.index = pd.date_range(start='2001-01-02', periods= usdaud.shape[0], freq='D')

Step 2: Compute and plot the first difference of the log(exchange rate), or $\Delta \log(usdaud_t) \times 100$. From now on, we will use $y_t$ to refer $\Delta \log(usdaud_t) \times 100$, which is the daily percentage change in USD/AUD exchange rate.

Hint: use ```np.log(var).diff()``` to compute the first difference of the log of ```var```.

Step 3: Plot the histogram and compute the descriptive statistics of $y_t$. Conduct the proper test to see if it is normally distributed.

Step 4: Compare the historgram of $y_t$ to a normal distribution.

Step 5: Compute the correlogram of squared $y_t$ up to 12 lags.

Step 6: Estimate an AR(1) model for squared $y_t$

Step 7: Estimate an ARCH(1) model and a GARCH(1,1) model for
$$
\begin{aligned}
&y_t =\mu+\varepsilon_{t} \\
&\varepsilon_{t} \mid \Omega_{t-1} \sim N\left(0, \sigma_{t}^{2}\right)
\end{aligned}
$$

Step 8: Estimate an AR(1)-ARCH(1) model and an AR(1)-GARCH(1,1) model for
$$
\begin{aligned}
&y_t =\mu+\rho y_{t-1} + \varepsilon_{t} \\
&\varepsilon_{t} \mid \Omega_{t-1} \sim N\left(0, \sigma_{t}^{2}\right)
\end{aligned}
$$

Step 9: Plot the estimated conditional variance of the best-fitting model among the ones considered