# Estimation Tutorial

In this section, we dive into the topic of model estimation using **pydsge**.

Now, for this tutorial we will assume a folder set-up of the form

```
analysis
|   README.md
|-- src/
|  |   estimation.py or .ipynb
|  |   model.yaml
|-- data/
|  |   example_data
|-- output/
````

In [5]:
# Just for the tutorial: Setting up example structure
import tempfile
import os

# Temporary output folder
output_path = Path(tempfile.gettempdir(), 'analysis/output')
os.makedirs(output_path)

# Parsing and loading the model

Let us first load the relevant packages.

In [6]:
from pathlib import Path # For Windows/Unix compatibility
import pandas as pd
import numpy as np
import emcee # For sampling from posterior distribution

from pydsge import DSGE, example

Now let us get started. Here is a prepared example for you. If you would like to parse your own model or data file, you could easily set it as well.

In [None]:
yaml, data_file = example
print(yaml)
print(data_file)

Next, parse a DSGE model using the yaml file. You can set the name or description of your model by assign a value to it.

In [9]:
mod = DSGE.read(yaml)

mod.name = 'rank_test'
mod.description = 'RANK, crisis sample'

mod.path = output_path

And now, load the model data in data_file. Here ``FFR`` in the original example data has be modified. Feel free to set your own data into the model.

In [None]:

d0 = pd.read_csv(
    data_file, sep=";", index_col="date", parse_dates=True
).dropna()

# adjust elb
zlb = mod.get_par('elb_level')
rate = d0['FFR']
d0['FFR'] = np.maximum(rate,zlb)

mod.load_data(d0, start='1998Q1')

# Preparing the estimation

The prep_estim method is used to specify how to estimate the parameters of the model. The estimation method, filter type, seed, number of members in ensemble and various other parameters can be specified here. If, however, no arguments are provided, it will run with the default values. We can choose custom number of ensemble members for the TEnKF by setting N to particular number (default is 300). To select Bayesian estimation, we can set priors to True (eval_priors=True). If set to false, the maximum likelihood method will be used for estimation. The selection of the type of filter, linear or nonlinear, can be done via setting Linear to particular Boolean value. Moreover, a number of additional arguments can be given as input to the method. 

The mod.filter.R stores the covariance matrix of the measurement errors. And the last two lines set the ME of FFR to a small value.

In [None]:

mod.prep_estim(N=350, seed=0, verbose=True)

mod.filter.R = mod.create_obs_cov(1e-1)
ind = mod.observables.index('FFR')
mod.filter.R[ind,ind] /= 1e1 

# Running the estimation

Now that the we have all the variables and defined the type of estimation to perform, we can turn to estimating to the model. To be able to deal with very high-dimensional models, `pdygse` uses *Markov Chain Monte Carlo* (MCMC) Integration to sample from the posterior distribution. For further information on MCMC, please refer to the `emcee` [website](https://emcee.readthedocs.io/en/stable/) and the additional resources provided there. We recommend running a **Tempered Ensemble MCMC** first, by using the `tmcmc` method. Doing this is particularly valuable for high-dimensional problems, since defining the initial states of the walkers in the parameterspace in this way is a powerful tool to improve sampling. However, due to its efficiency, we also use it for small models such as the one we are dealing with here.

For our ensemble sampling, we can specify a variety of options. Note, `tmcmc` always requires the specification of the first four arguments, which are the i) number of steps, ii) number of walks, iii) number of temperatures, and iv) a temperature target! Here we do not want to set a target and, in turn, set `fmax = None`. Moreover, we have the option to set different "moves", i.e. coordinate updating algorithms for the walkers. As a wrapper for a lot of `emcee` functionality,  `tmcmc` can work with many different "moves" - for a list and implementation details please consult the `emcee` documentation. For using them here, specify them as a list of tuples, containing the type of move and its "weight". If no move is specified, "StretchMove" is used. For seed setting, the user can choose between three options, here we use the standard numpy seed. Finally, the states are saved in the `p0` object as a numpy array in order to later pass them to our main sampling process.

In [None]:
fmax = None

moves = [(emcee.moves.DEMove(), 0.8), 
         (emcee.moves.DESnookerMove(), 0.2),]

p0 = mod.tmcmc(200, 200, 0, fmax, moves=moves, update_freq=100, lprob_seed='set')
mod.save()

As we can see, the output provides us with various important details. In particular, we lean that `mod.save()` saved the meta data of our model in the directory which we specified earlier in `mod.path`. This information is stored as an `.npz` file so that it is avialable even in the event of a crash and can be loaded anytime using `numpy.load()`.

We now use the initial states derived above to conduct our full Bayesian estimation. Still, initial states do not have to be specified and, unless `mcmc` can identify previous runs or estimations, the initial values of the "prior" section in the `*.yaml` are used. The default number of sampling steps is 3000, so it makes sense to allow this to run in parallel. However, if you want to avoid this, simply set `debug` to "True". And as before, seed setting is essential for creating reproducible results.

[*What is purpose of "tune", "update_freq", "append"?*]

In [None]:
mod.mcmc(p0,
         moves=moves,
        #  nsteps=3000,
         nsteps = 20,
         tune=500,
         update_freq=500,
         lprob_seed='set',
         append=True,
         debug=True)
mod.save()

In [None]:
mod.__dict__

But, so were are our estimates? Remember that, so far, we have only drawn samples from our posterior distribution. Our converged (burnt-in) MCMC samples are currently stored in the `rank_test_sampler.h5` file created by `mcmc`. To get our parameter estimates, we now still need to draw a sample form the MCMC object. 

In [None]:
pars = mod.get_par('posterior', nsamples=250, full=True)

Now, let's have a look at the estimated shocks. We can do this by using `extract()` which gives us the smoothed shocks. This method takes a variety of arguments, all of which have sensible default values. For example, here we specify the number of parameter draws in each verification sample to 1. [*is that correct?*]

 Note also that the default seed is 0, which we simply use here. 

In [None]:
epsd0 = mod.extract(pars, nsamples=1)
mod.save_rdict(epsd0)

In [None]:
mod.mode_summary()

In [None]:
mod.mcmc_summary()