In [6]:
using CSV, DataFrames

# Problem Set 3
**Due: May 24, 2021** (in class; subject to change if COVID restrictions apply)

## Problem 1 (Gibbs Sampling)

Distributions of sizes and frequencies often tend to follow a Pareto distributions. Examples include:
- wealth of individuals
- size of oil reserves
- size of cities
- word frequency
- return on stocks

The Pareto distribution with shape $\alpha>0$ and scale $c>0$ has pdf 
$$\text{Pareto}(x\vert \alpha,c)=\frac{\alpha c^{\alpha}}{x^{\alpha+1}}\mathbb{1}(x>c)$$
- This is referred to as a power law distribution, because the pdf is proportional to x raised to a power
- $\alpha$ tells us the scaling relationship between the size of cities and their probability of occurring. 
    - Let $\alpha=1$
    - Density looks like $1/x^{\alpha+1}=1/x^{2}$
    - Cities with 10,000-20,000 inhabitants occur roughly $10^{\alpha+1}=100$ times as frequently as cities with 100,000-110,000 inhabitants.
 
- c is a lower bound on the observed values; c represents the cut off point
- We will use Gibbs sampling to perform inference for $\alpha$ and $c$.


Let us use an improper **prior**:
$$p(\alpha,c) \propto \mathbb{1}(\alpha,c>0)$$

Note: An improper prior is a nonnegative function of the parameters which integrates to infinity,
so it can’t really be considered to define a prior distribution. But, we can still plug it into
Bayes’ formula, and often (but not always!) the resulting “posterior” will be proper—in other
words, the likelihood times the prior integrates to a finite value, and so this “posterior” is a
well-defined a probability distribution. It is important that the “posterior” be proper, since
otherwise the whole Bayesian framework breaks down. Improper priors are often used in an
attempt to make a prior as non-informative as possible, in other words, to represent as
little prior knowledge as possible. They are sometimes also mathematically convenient.

Plugging into the Bayes' theorem, we define the **posterior** to be proportional to the likelihood times the prior:

$$p(\alpha,c\vert x_{1:n})\propto p(x_{1:n} \vert \alpha,c)p(\alpha,c)
\propto \mathbb{1}(\alpha,c>0) \prod _{i=1}^{n}\frac{\alpha c^{\alpha}}{x_i^{\alpha+1}}\mathbb{1}(x_1>c)
= \frac{\alpha c^{n\alpha}}{(\prod x_i)^{\alpha+1}}\mathbb{1}(c<x_*)\mathbb{1}(\alpha,c>0)$$

where $x_*=\min{x_1,\dots,x_n}$. 

To use Gibbs, we need to be able to sample $\alpha \vert x_{1:n}$ and $c\vert x_{1:n}$:

$$\alpha \vert x_{1:n} \sim \text{Gamma}(n+1,\sum \log x_i-n\log c)$$
$$c\vert x_{1:n} \sim \text{Mono}(n\alpha+1,x_*)$$


To see the derivation, please reference: https://jwmi.github.io/BMS/chapter6-gibbs-sampling.pdf.

(1.a) Download the data that is of your interest.  It could be any data from the population size of the cities in Spain to income distribution of the U.S. working population, and etc.

(1.b) Estimate the posterior distribuiton of $\alpha$ and $c$. Show the plot of the distributions.

(1.c) Calculate the mean of the $\alpha$ and $c$ as you increase the iteration size. In other words plot $k\in {1,\dots,\text{number of simulations}}$ on x-axis and $\frac{1}{k}\sum_i^k\alpha_i$ on y-axis.

(1.d) Given the estimated $\alpha$ and $c$, generate data using $Pareto(𝑥|𝛼,𝑐)$. Plot the distribution. Compare the distribution with the data you have downloaded.

## Problem 2 (Time Series)

Let's examine historical stock price of Google. 

In [8]:
data = DataFrame(CSV.File("../data/GOOG.csv",header=true))
first(data,6)

Unnamed: 0_level_0,Date,Open,High,Low,Close,Adj Close,Volume
Unnamed: 0_level_1,Date…,Float64,Float64,Float64,Float64,Float64,Int64
1,2020-05-11,1378.28,1416.53,1377.15,1403.26,1403.26,1412100
2,2020-05-12,1407.12,1415.0,1374.77,1375.74,1375.74,1390600
3,2020-05-13,1377.05,1385.48,1328.4,1349.33,1349.33,1812600
4,2020-05-14,1335.02,1357.42,1323.91,1356.13,1356.13,1603100
5,2020-05-15,1350.0,1374.48,1339.0,1373.19,1373.19,1707700
6,2020-05-18,1361.75,1392.32,1354.25,1383.94,1383.94,1822400


(2.a) Estimate the following models: GARCH(1,1); GARCH(1,2); GARCH(2,1); and GARCH(2,2) of the adjusted closing price.

(2.b) Calculate  AIC for all four models above.

(2.c) Calculate  BIC for all four models above.

(2.d) Choose the optimal model (i.e., choose the combination of (p,q) that maximizes either AIC or BIC) and explain your result. 

