# Deep Equilibrium Nets for Guvenen (2009)

#### By  Matias Covarrubias and Min Fang

In this notebook, we use `TensorFlow` to solve [Guvenen (2009)](http://users.econ.umn.edu/~guvenen/HABHET2008.pdf) with _deep equilibrium nets_ method by [Azinovic, Gaegauf, & Scheidegger (2020)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3393482).


## The model <a id='model'></a>

GDSGE offers a good [summary](http://www.gdsge.com/example/Guvenen2009/Guvenen2009.html).


## Table of contents

0. [Set up workspace](#workspace)
1. [Model calibration](#modelcal)
2. [_Deep equilibrium net_ hyper-parameters](#deqnparam)
    1. [Neural network](#nn)
3. [Economic model](#econmodel)
    1. [Current period (t)](#currentperiod)
    2. [Next period (t+1)](#nextperiod)
    3. [Cost/Euler function](#cost)
4. [Training](#training)

## 0. Set up workspace <a id='workspace'></a>

First, we need to set up the workspace. All of the packages are standard python packages. This version of the _deep equilibrium net_ notebook will be computed with `TensorFlow 2`. Make sure you are working in an environment with TF2 installed.

The only special module is `utils` from which we import a mini-batch function `random_mini_batches` and a function that initializes the neural network weights `initialize_nn_weight`.

### Saving and continuing training

You can save and resume training by saving and loading the tensorflow session and data starting point.
* The saved session stores the neural network weights and the optimizer's state. If you have saved a session that you would like to reload, set `sess_path` to the session checkpoint path. For example, to load the 100th episode's session set `sess_path = './output/sess_100.ckpt'`. Otherwise, set `sess_path` to `None` to train from scratch. Currently, this script saves the session at the end of each [episode](#deqnparam).
* The saved data starting point stores the an exogeneous shock and a capital distribution, which can be used to simulate states into the future from. If you have saved a starting point that you would like to reload, set `data_path` to the numpy data path. For example, to load the 100th episode's starting point set `data_path = './output/data_100.npy'`. Otherwise, set `data_path` to `None` to train from scratch.

In [1]:
import tensorflow as tf
print('tf version:', tf.__version__)

tf version: 2.4.1


In [2]:
%matplotlib notebook

# Import modules
import os
import re
from datetime import datetime

import numpy as np
from utils import initialize_nn_weight, random_mini_batches

import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 12})
plt.rc('xtick', labelsize='small')
plt.rc('ytick', labelsize='small')
std_figsize = (4, 4)

# Set the seed for replicable results
seed = 0
np.random.seed(seed)
tf.random.set_seed(seed) # was "tf.set_random_seed(seed)" in tf1

# Helper variables
eps = 0.00001  # Small epsilon value

# Make output directory to save network weights and starting point
if not os.path.exists('./output'):
    os.mkdir('./output')

# Path to saved tensorflow session
sess_path = None
# Path to saved data starting point
data_path = None

## 1. Model calibration <a id='modelcal'></a>

In [3]:
### Parameters
alpha = 6
alpha_tf = tf.constant(alpha)
beta = 0.9966
beta_tf = tf.constant(beta)
theta = 0.3
theta_tf = tf.constant(theta)
rho_h = 1/0.3
tho_h_tf = tf.constant(rho_h)
rho_n = 1/0.1
rho_n_tf = tf.constant(rho_n)
delta = 0.0066
delta_tf = tf.constant(delta)
mu = 0.2
mu_tf = tf.constant(mu)
phi_k = 0.4
phi_k_tf = tf.constant(phi_k)
chi = 0.005
chi_tf = tf.constant(phi_k)
Kbar = ((1/beta - 1 + delta)/theta)**(1/(theta-1))
Bbar = -0.1*(1-theta)*Kbar**theta
Kbar_tf = tf.constant(Kbar)
Bbar_tf = tf.constant(Bbar)

In [39]:
### Exodogeneous TFP shock
import quantecon
phi_z = 0.95
sigma_z = 0.05
x = quantecon.tauchen(phi_z,sigma_z,n=3)
Pi = x.P
Zgrid = np.exp(x.state_values)
Pi_tf = tf.constant(Pi, dtype=tf.float32)
Zgrid_tf = tf.constant(Zgrid, dtype=tf.float32)

# ！TF2 is very different here

We would use Model subclassing:

import tensorflow as tf

from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

## 2. _Deep equilibrium net_ hyper-parameters <a id='deqnparam'></a>

Here, we will briefly outline the _deep equilibrium net_ architecture and choice of hyper-parameters. By hyper-parameters, we refer too all of the parameters that can be chosen freely by the modeller. This is in contrast to "parameters", which refers to the neural network weights that are learned during training.
We found it helpful to augment the state we pass to the neural network with redundant information, such as aggregate variables and the distribution of financial wealth.
In this notebook, we do the same.

  * Neural network architecture (see NN diagram below):
    * The neural network takes an extended state that is 32-dimensional:
        * Natural state:
            * 1 dimension for the shock index,
            * 1 dimension for aggregate capital
            * 1 dimension for bond position
        * Redundant extension: (do we need this?)
            * 1 dimension for the current TFP,
            * 1 dimension for the current depreciation,
            * 1 dimension for aggregate capital,
            * 1 dimension for aggregate labor supply, 
            * 1 dimension for the return on capital,
            * 1 dimension for the wage,
            * 1 dimension for the aggregate production,
            * 6 dimensions for the distribution of financial wealth,
            * 6 dimensions for the distribution of labor income, 
            * 6 dimensions for the distribution of total income.
    * The first and second hidden layers have 100 and 50 nodes, respectively.
    * The output layer is 5-dimensional.
  * Training hyper-parameters:
    * We simulate 5'000 episodes.*
    * Per episode, we compute 20 epochs.
    * We use a minibatch size of 512.
    * Each episode is 10'240 periods long.
    * The learning rate is set to 0.00001.

\* In the _deep equilibrium net_ framework we iterate between a simulation and training phase (see [training](#training)). We first simulate a training dataset based on the current state of the neural network. That is, we simulate a random sequence of exogenous shocks for which we compute the capital holdings of the agents. We call the set of simulated periods an _episode_. 5'000 episodes means that we re-simulate the training data 5'000 times.

Note that this calibration is not optimal for the economic model being solved. This script is intended to provide a simple introduction to _deep equilibrium nets_ and not an indepth discussion on optimizing the performance.

<img src='analytic_NN1.png' style="width:1000px">

In [42]:
num_episodes = 5000 
len_episodes = 10240
epochs_per_episode = 20 
minibatch_size = 512
num_minibatches = int(len_episodes / minibatch_size)
lr = 0.00001

# Neural network architecture parameters
num_input_nodes = 3  # Dimension of extended state space
num_hidden_nodes = [100, 50]  # Dimension of hidden layers
num_output_nodes = 17  # Output dimension

### 2.A. Neural network <a id='nn'></a>

Since we are using a neural network with 2 hidden layers, the network maps:
$$X \rightarrow \mathcal{N}(X) = \sigma(\sigma(XW_1 + b_1)W_2 + b_2)W_3 + b_3$$
where $\sigma$ is the rectified linear unit (ReLU) activation function and the output layer is activated with the linear function (which is the identity function and hence omitted from the equation). Therefore, we need 3 weight matrices $\{W_1, W_2, W_3\}$ and 3 bias vectors $\{b_1, b_2, b_3\}$ (compare with the neural network diagram above). In total, we train $300+5000+850+167$ parameters:

| $W_1$ | $3 \times 100$  | $= 300$ |
| --- | --- | --- |
| $W_2$ | $100 \times 50$ | $= 5000$ |
| $W_3$ | $50 \times 17$ | $= 850$ |
| $b_1 + b_2 + b_3$ | $100 + 50 + 17$ | $= 167$|


We initialize the neural network parameters with the  `initialize_nn_weight` helper function from `utils`.

Then, we compute the neural network prediction using the parameters in `nn_predict`. 

In [43]:
# We create a placeholder for X, the input data for the neural network, which corresponds
# to the state.
X = tf.placeholder(tf.float32, shape=(None, num_input_nodes))
# Get number samples
m = tf.shape(X)[0]

# We create all of the neural network weights and biases. The weights are matrices that
# connect the layers of the neural network. For example, W1 connects the input layer to
# the first hidden layer
W1 = initialize_nn_weight([num_input_nodes, num_hidden_nodes[0]])
W2 = initialize_nn_weight([num_hidden_nodes[0], num_hidden_nodes[1]])
W3 = initialize_nn_weight([num_hidden_nodes[1], num_output_nodes])

# The biases are extra (shift) terms that are added to each node in the neural network.
b1 = initialize_nn_weight([num_hidden_nodes[0]])
b2 = initialize_nn_weight([num_hidden_nodes[1]])
b3 = initialize_nn_weight([num_output_nodes])

# Then, we create a function, to which we pass X, that generates a prediction based on
# the current neural network weights. Note that the hidden layers are ReLU activated.
# The output layer is not activated (i.e., it is activated with the linear function).
def nn_predict(X):
    hidden_layer1 = tf.nn.relu(tf.add(tf.matmul(X, W1), b1))
    hidden_layer2 = tf.nn.relu(tf.add(tf.matmul(hidden_layer1, W2), b2))
    output_layer = tf.add(tf.matmul(hidden_layer2, W3), b3)
    return output_layer


AttributeError: module 'tensorflow' has no attribute 'placeholder'

## 3. Economic model <a id='econmodel'></a>

In this section, we implement the economics outlined in [the model](#model).

Each period, based on the current distribution of capital and the exogenous state, agents decide whether and how much to save in risky capital and to consume. Their savings together with the labor supplied implies the rest of the economic state (e.g., capital return, wages, incomes, ...). We use the neural network to generate the savings based on the agents' capital holdings at the beginning of the period. The remaining economic state is computed using the equations outlined [above](#model). The economic mechanisms are encoded in helper functions. We create one for the `firm`, the `shocks`, and `wealth` (see cell below).

Then, in the face of future uncertainty, the agents again decide whether and how much to save in risky capital for the next period. We use the neural network to generate new savings for each of the future states (one for each shock). The input state for these network predictions is the next period's capital holding which are current periods savings.

First, we define the helper functions for the `firm`, the `shocks`, and `wealth`.


In [None]:
"""        TO DO: replace by these
          Y = Z*(K^theta);              % output
  W = (1-theta)*Z*(K^theta);    % Wage = F_l
  Div = Y - W - Inv - (1-Pf)*chi*Kss;            % dividends
   b_h = (1-bn_shr)*chi*Kss/mu;
  b_n = bn_shr*chi*Kss/(1-mu);"""

def firm(K, eta, alpha, delta):
    """Calculate return, wage and aggregate production.
    
    r = eta * K^(alpha-1) * L^(1-alpha) + (1-delta)
    w = eta * K^(alpha) * L^(-alpha)
    Y = eta * K^(alpha) * L^(1-alpha) + (1-delta) * K 

    Args:
        K: aggregate capital,
        eta: TFP value,
        alpha: output elasticity,
        delta: depreciation value.

    Returns:
        return: return (marginal product of capital), 
        wage: wage (marginal product of labor).
        Y: aggregate production.
    """
    L = tf.ones_like(K)

    r = alpha * eta * K**(alpha - 1) * L**(1 - alpha) + (1 - delta)
    w = (1 - alpha) * eta * K**alpha * L**(-alpha)
    Y = eta * K**alpha * L**(1 - alpha) + (1 - delta) * K

    return r, w, Y

def shocks(z, eta, delta):
    """Calculates tfp and depreciation based on current exogenous shock.

    Args:
        z: current exogenous shock (in {1, 2, 3, 4}),
        eta: tensor of TFP values to sample from,
        delta: tensor of depreciation values to sample from.

    Returns:
        tfp: TFP value of exogenous shock, 
        depreciation: depreciation values of exogenous shock.


    """
    tfp = tf.gather(eta, tf.cast(z, tf.int32))
    depreciation = tf.gather(delta, tf.cast(z, tf.int32))
    return tfp, depreciation
    
def wealth(k, R, l, W):
    """Calculates the wealth of the agents.

    Args:
        k: capital distribution,
        R: matrix of return,
        l: labor distribution,
        W: matrix of wages.

    Returns:
        fin_wealth: financial wealth distribution,
        lab_wealth: labor income distribution,
        tot_income: total income distribution.
    """
    fin_wealth = k * R
    lab_wealth = l * W
    tot_income = tf.add(fin_wealth, lab_wealth)
    return fin_wealth, lab_wealth, tot_income


### 3.A. Current period (t) <a id='currentperiod'></a>
Using the current state `X` we can calculate the economy. The state is composed of today's shock ($z_t$) and today's capital ($k_t^h$). Note that this constitutes the minimal state. Often, including redundant variables is a simple way to increase the speed of convergence (see the [working paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3393482) for more information on this). However, this notebook attempts to present the most basic example for simplicity.

Using the current state `X`, we predict how much the agents save ($a_t^h$) based on their current capital holdings ($k_t^h$). We do this by passing the state `X` to the neural network.

Then we can calculate the agents' consumptions. To do this, we need to calculate the aggregate capital ($K_t$), the return on capital payed by the firm ($r_t$), and the wages payed by the firm ($w_t$). 
$$K_t = \sum_{h=0}^{A-1} k_t^h,$$
$$r_t = \alpha \eta_t K_t^{\alpha - 1} L_t^{1 - \alpha} + (1 - \delta_t),$$
$$w_t = (1 - \alpha) \eta_t K_t^{\alpha} L_t^{-\alpha}.$$

Aggregate labor $L_t$ is always 1 by construction. Then, we can calculate each agent's total income. Finally, we calculate the resulting consumption ($c_t^h$):
$$c_{t}^{h} = r_t k_t^h + w_t l^h_t - a_t^{h}.$$

In [None]:
"""To do: input states Z K bn_shr """

# Today's extended state: 
z = X[:, 0]  # exogenous shock
tfp = X[:, 1]  # total factor productivity
depr = X[:, 2]  # depreciation
K = X[:, 3]  # aggregate capital
L = X[:, 4]  # aggregate labor
r = X[:, 5]  # return on capital
w = X[:, 6]  # wage
Y = X[:, 7]  # aggregate production
k = X[:, 8 : 8 + A]  # distribution of capital
fw = X[:, 8 + A : 8 + 2 * A]  # distribution of financial wealth
linc = X[:, 8 + 2 * A : 8 + 3 * A]  # distribution of labor income
inc = X[:, 8 + 3 * A : 8 + 4 * A]   # distribution of total income

""" TO DO:
policy functions c_h c_n Ps Pf Inv bn_shr_next lambdah lambdan;"""
# Today's assets: How much the agents save
# Get today's assets by executing the neural network
a = nn_predict(X)
# The last agent consumes everything they own
a_all = tf.concat([a, tf.zeros([m, 1])], axis=1)

# c_orig: the original consumption predicted by the neural network However, the
#     network can predict negative values before it learns not to. We ensure that
#     the network learns itself out of a bad region by penalizing negative
#     consumption. We ensure that consumption is not negative by including a penalty
#     term on c_orig_prime_1
# c: is the corrected version c_all_orig_prime_1, in which all negative consumption
#     values are set to ~0. If none of the consumption values are negative then
#     c_orig_prime_1 == c_prime_1.
c_orig = inc - a_all
c = tf.maximum(c_orig, tf.ones_like(c_orig) * eps)

### 3.B. Next period (t+1) <a id='nextperiod'></a>

Now that we have calculated the current economic variables, we simulate one period forward. Since we have 4 possible futures---1 for each of the possible shocks that can realize---we simulate forward for each of the 4 states. Hence, we repeat each calculation 4 times. Below, we first calculate the aggregate variables for the beginning of the period. Then, we calculate the economy for shock $z=1$. The calculations for shock 2, 3, and 4 are analogous and will not be commented on.

From the current period's savings $a_t$ we can calculate the next period's capital holdings $k_{t+1}^h$ and, consequently, aggregate capital $K_{t+1}$. Again, aggregate labor $L_{t+1}$ is 1 by construction.

Then, like above, we calculate economic variables for each shock $z_{t+1} \in \{1, 2, 3, 4\}$. That is, the new state `X'`$=[z_{t+1}, k_{t+1}^h]$ (together with the redundant extensions) is passed to the neural network to generate the agents' savings $a_{t+1}^h$. Then, we use `firm`, the `shocks`, and `wealth` to calculate the return on capital ($r_{t+1}$), the wages ($w_{t+1}$), and the agents' total income. Finally, consumption $c_{t+1}^h$ is calculated.

In [None]:
""" Next period"
  Knext = (1-delta)*K + (a1*((Inv/K)^((xsi-1)/xsi))+a2)*K;
  bh_share_nezt is already defined
"""

# Today's savings become tomorrow's capital holding, but the first agent
# is born without a capital endowment.
k_prime = tf.concat([tf.zeros([m, 1]), a], axis=1)

# Tomorrow's aggregate capital
K_prime_orig = tf.reduce_sum(k_prime, axis=1, keepdims=True)
K_prime = tf.maximum(K_prime_orig, tf.ones_like(K_prime_orig) * eps)

# Tomorrow's labor
l_prime = tf.tile(labor_endow, [m, 1])
L_prime = tf.ones_like(K_prime)

# Shock 1 ---------------------------------------------------------------------
# 1) Get remaining parts of tomorrow's extended state
# Exogenous shock
"""TO DO: construct next period exogenous state Z_prime, the auxiliary variables, the full X next period, and policy functions
Also: figure out if we can use this to construct recursive preferences
construct loop for i in range(n_shocks):"""
z_prime_1 = 0 * tf.ones_like(z)

# TFP and depreciation
tfp_prime_1, depr_prime_1 = shocks(z_prime_1, eta, delta)

# Return on capital, wage and aggregate production
r_prime_1, w_prime_1, Y_prime_1 = firm(K_prime, tfp_prime_1, alpha, depr_prime_1)
R_prime_1 = r_prime_1 * tf.ones([1, A])
W_prime_1 = w_prime_1 * tf.ones([1, A])

# Distribution of financial wealth, labor income, and total income
fw_prime_1, linc_prime_1, inc_prime_1 = wealth(k_prime, R_prime_1, l_prime, W_prime_1)

# Tomorrow's state: Concatenate the parts together
x_prime_1 = tf.concat([tf.expand_dims(z_prime_1, -1),
                       tfp_prime_1,
                       depr_prime_1,
                       K_prime,
                       L_prime,
                       r_prime_1,
                       w_prime_1,
                       Y_prime_1,
                       k_prime,
                       fw_prime_1,
                       linc_prime_1,
                       inc_prime_1], axis=1)

# 2) Get tomorrow's policy
# Tomorrow's capital: capital holding at beginning of period and how much they save
a_prime_1 = nn_predict(x_prime_1)
a_prime_all_1 = tf.concat([a_prime_1, tf.zeros([m, 1])], axis=1)

# 3) Tomorrow's consumption
c_orig_prime_1 = inc_prime_1 - a_prime_all_1
c_prime_1 = tf.maximum(c_orig_prime_1, tf.ones_like(c_orig_prime_1) * eps)

#### Repeat for the remaining shocks
Then, we do the same for the remaining shocks ($z = 2,3,4$). Nothing changes in terms of the math for the remaining shocks.

In [None]:
# Shock 2 ---------------------------------------------------------------------
# 1) Get remaining parts of tomorrow's extended state
# Exogenous shock
z_prime_2 = 1 * tf.ones_like(z)

# TFP and depreciation
tfp_prime_2, depr_prime_2 = shocks(z_prime_2, eta, delta)

# return on capital, wage and aggregate production
r_prime_2, w_prime_2, Y_prime_2 = firm(K_prime, tfp_prime_2, alpha, depr_prime_2)
R_prime_2 = r_prime_2 * tf.ones([1, A])
W_prime_2 = w_prime_2 * tf.ones([1, A])

# distribution of financial wealth, labor income, and total income
fw_prime_2, linc_prime_2, inc_prime_2 = wealth(k_prime, R_prime_2, l_prime, W_prime_2)

# Tomorrow's state: Concatenate the parts together
x_prime_2 = tf.concat([tf.expand_dims(z_prime_2, -1),
                       tfp_prime_2,
                       depr_prime_2,
                       K_prime,
                       L_prime,
                       r_prime_2,
                       w_prime_2,
                       Y_prime_2,
                       k_prime,
                       fw_prime_2,
                       linc_prime_2,
                       inc_prime_2], axis=1)

# 2) Get tomorrow's policy
a_prime_2 = nn_predict(x_prime_2)
a_prime_all_2 = tf.concat([a_prime_2, tf.zeros([m, 1])], axis=1)

# 3) Tomorrow's consumption
c_orig_prime_2 = inc_prime_2 - a_prime_all_2
c_prime_2= tf.maximum(c_orig_prime_2, tf.ones_like(c_orig_prime_2) * eps)

# Shock 3 ---------------------------------------------------------------------
# 1) Get remaining parts of tomorrow's extended state
# Exogenous shock
z_prime_3 = 2 * tf.ones_like(z)

# TFP and depreciation
tfp_prime_3, depr_prime_3 = shocks(z_prime_3, eta, delta)

# return on capital, wage and aggregate production
r_prime_3, w_prime_3, Y_prime_3 = firm(K_prime, tfp_prime_3, alpha, depr_prime_3)
R_prime_3 = r_prime_3 * tf.ones([1, A])
W_prime_3 = w_prime_3 * tf.ones([1, A])

# distribution of financial wealth, labor income, and total income
fw_prime_3, linc_prime_3, inc_prime_3 = wealth(k_prime, R_prime_3, l_prime, W_prime_3)

# Tomorrow's state: Concatenate the parts together
x_prime_3 = tf.concat([tf.expand_dims(z_prime_3, -1),
                       tfp_prime_3,
                       depr_prime_3,
                       K_prime,
                       L_prime,
                       r_prime_3,
                       w_prime_3,
                       Y_prime_3,
                       k_prime,
                       fw_prime_3,
                       linc_prime_3,
                       inc_prime_3], axis=1)

# 2) Get tomorrow's policy
# Tomorrow's capital: capital holding at beginning of period and how much they save
a_prime_3 = nn_predict(x_prime_3)
a_prime_all_3 = tf.concat([a_prime_3, tf.zeros([m, 1])], axis=1)

# 3) Tomorrow's consumption
c_orig_prime_3 = inc_prime_3 - a_prime_all_3
c_prime_3 = tf.maximum(c_orig_prime_3, tf.ones_like(c_orig_prime_3) * eps)

# Shock 4 ---------------------------------------------------------------------
# 1) Get remaining parts of tomorrow's extended state
# Exogenous shock
z_prime_4 = 3 * tf.ones_like(z)

# TFP and depreciation
tfp_prime_4, depr_prime_4 = shocks(z_prime_4, eta, delta)

# return on capital, wage and aggregate production
r_prime_4, w_prime_4, Y_prime_4 = firm(K_prime, tfp_prime_4, alpha, depr_prime_4)
R_prime_4 = r_prime_4 * tf.ones([1, A])
W_prime_4 = w_prime_4 * tf.ones([1, A])

# distribution of financial wealth, labor income, and total income
fw_prime_4, linc_prime_4, inc_prime_4 = wealth(k_prime, R_prime_4, l_prime, W_prime_4)

# Tomorrow's state: Concatenate the parts together
x_prime_4 = tf.concat([tf.expand_dims(z_prime_4, -1),
                       tfp_prime_4,
                       depr_prime_4,
                       K_prime,
                       L_prime,
                       r_prime_4,
                       w_prime_4,
                       Y_prime_4,
                       k_prime,
                       fw_prime_4,
                       linc_prime_4,
                       inc_prime_4], axis=1)

# 2) Get tomorrow's policy
# Tomorrow's capital: capital holding at beginning of period and how much they save
a_prime_4 = nn_predict(x_prime_4)
a_prime_all_4 = tf.concat([a_prime_4, tf.zeros([m, 1])], axis=1)

# 3) Tomorrow's consumption
c_orig_prime_4 = inc_prime_4 - a_prime_all_4
c_prime_4 = tf.maximum(c_orig_prime_4, tf.ones_like(c_orig_prime_4) * eps)


### 3.C. Cost / Euler function <a id='cost'></a>

#### tldr

We train the neural network in an unsupervised fashion by approximating the equilibrium functions. The cost function is composed of the Euler equation, and the punishments for negative consumption and negative aggregate capital.

***

The final key ingredient is the cost function, which encodes the equilibrium functions.

A loss function is needed to train a neural network. In supervised learning, the neural network's prediction is compared to the true value. The weights are updated in the direction that decreases the discripancy between the prediction and the truth. In _deep equilibrium nets_, we approximate all equilibrium functions directly. That is, we update the weights in the direction that minimizes these equilibrium functions. Since the equilibrium functions are approximately 0 in equilibrium, we do not require labeled data and can train in an unsupervised fashion.

Our loss function has 3 components:

  * the Euler equation,
  * the punishments for negative consumption, and
  * the punishments for negative aggregate capital.

The relative errors in the Euler equation is given by:
$$e_{\text{REE}}^i(\mathbf{x}_j) := \frac{u^{\prime -1}\left(\beta \mathbf{E}_{z_j}{r(\hat{\mathbf{x}}_{j,+})u^{\prime}(\hat{c}^{i+1}(\hat{\mathbf{x}}_{j,+}))}\right)}{\hat{c}^i(\mathbf{x}_j)}-1$$

In [None]:
"""Put our equilibrium conditions
  err_bdgt_h = 1 - (W + (Div/mu) + b_h - Pf*(chi*Kss*(1-bn_shr_next)/mu))/c_h; % these are individual consumptions
  err_bdgt_n = 1 - (W + b_n - Pf*(bn_shr_next*chi*Kss/(1-mu)))/c_n;
  foc_stock = 1 - (beta*EEulerstock_future*(Evh_future^((alpha-rhoh)/(1-alpha))))/((c_h^(-rhoh))*Ps);
  foc_bondh = 1 - (beta*EEulerbondh_future*(Evh_future^((alpha-rhoh)/(1-alpha))) + lambdah)/((c_h^(-rhoh))*Pf);
  foc_bondn = 1 - (beta*EEulerbondn_future*(Evn_future^((alpha-rhon)/(1-alpha))) + lambdan)/((c_n^-rhon)*Pf);
  foc_f = 1 - (beta*EEulerf_future*(Evh_future^((alpha-rhoh)/(1-alpha))))/((c_h^(-rhoh))*dIdKp);
  
  slack_bn = lambdan*(bn_shr_next - bn_shr_lb);    %mun_lw*bn_shr_next;
  slack_bh = lambdah*(bn_shr_ub - bn_shr_next);    %mun_up*(1-bn_shr_next);
  
  ALSO: try weights for the functions"""

# Prepare transitions to the next periods states. In this setting, there is a 25% chance
# of ending up in any of the 4 states in Z. This has been hardcoded and need to be changed
# to accomodate a different transition matrix.
pi_trans_to1 = p_transition * tf.ones((m, A-1))
pi_trans_to2 = p_transition * tf.ones((m, A-1))
pi_trans_to3 = p_transition * tf.ones((m, A-1))
pi_trans_to4 = p_transition * tf.ones((m, A-1))

# Euler equation
opt_euler = - 1 + (
    (
        (
            beta * (
                pi_trans_to1 * R_prime_1[:, 0:A-1] * c_prime_1[:, 1:A]**(-gamma) 
                + pi_trans_to2 * R_prime_2[:, 0:A-1] * c_prime_2[:, 1:A]**(-gamma) 
                + pi_trans_to3 * R_prime_3[:, 0:A-1] * c_prime_3[:, 1:A]**(-gamma) 
                + pi_trans_to4 * R_prime_4[:, 0:A-1] * c_prime_4[:, 1:A]**(-gamma)
            )
        ) ** (-1. / gamma)
    ) / c[:, 0:A-1]
)

# Punishment for negative consumption (c)
orig_cons = tf.concat([c_orig, c_orig_prime_1, c_orig_prime_2, c_orig_prime_3, c_orig_prime_4], axis=1)
opt_punish_cons = (1./eps) * tf.maximum(-1 * orig_cons, tf.zeros_like(orig_cons))

# Punishment for negative aggregate capital (K)
opt_punish_ktot_prime = (1./eps) * tf.maximum(-K_prime_orig, tf.zeros_like(K_prime_orig))

# Concatenate the 3 equilibrium functions
combined_opt = [opt_euler, opt_punish_cons, opt_punish_ktot_prime]
opt_predict = tf.concat(combined_opt, axis=1)

# Define the "correct" outputs. For all equilibrium functions, the correct outputs is zero.
opt_correct = tf.zeros_like(opt_predict)


#### Optimizer

Next, we chose an optimizer; i.e., the algorithm we use to perform gradient descent. We use [Adam](https://arxiv.org/abs/1412.6980), a favorite in deep learning research. Adam uses a parameter specific learning rate and momentum, which encourages gradient descent steps that occur in a consistent direction.

In [None]:
"""all the same"""

# Define the cost function
cost = tf.losses.mean_squared_error(opt_correct, opt_predict)

# Adam optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=lr)

# Clip the gradients to limit the extent of exploding gradients
gvs = optimizer.compute_gradients(cost)
capped_gvs = [(tf.clip_by_value(grad, -1., 1.), var) for grad, var in gvs]

# Define a training step
train_step = optimizer.apply_gradients(capped_gvs)


## 4. Training <a id='training'></a>

### tldr

We iterate between simulating a dataset using the current neural network and training on this dataset.

***

In the final stage, we put everything together and train the neural network. 

In this section, we iterate between a simulation phase and a training phase. That is, we first simulate a dataset. We simulate a sequence of states ($[z, k]$) based on a random sequence of shocks. The states are computed using the current state of the neural network. We created the `simulate_episodes` helper function to do this. Then, in the training phase, we use the dataset to update our network parameters through multiple epochs. After completion of the training phase, we resimulate a dataset using the new network parameters and repeat.

By computing the error directly after simulating a new dataset, we are able to evaluate our algorithms out-of-sample performance.

First, we define the helper function `simulate_episodes` that simulates the training data used in an episode.

In [None]:
def simulate_episodes(sess, x_start, episode_length, print_flag=True):
    """Simulate an episode for a given starting point using the current
       neural network state.

    Args:
        sess: Current tensorflow session,
        x_start: Starting state to simulate forward from,
        episode_length: Number of steps to simulate forward,
        print_flag: Boolean that determines whether to print simulation stats.

    Returns:
        X_episodes: Tensor of states [z, k] to train on (training set).
    """
    time_start = datetime.now()
    if print_flag:
        print('Start simulating {} periods.'.format(episode_length))
    dim_state = np.shape(x_start)[1]

    X_episodes = np.zeros([episode_length, dim_state])
    X_episodes[0, :] = x_start
    X_old = x_start

    # Generate a sequence of random shocks
    rand_num = np.random.rand(episode_length, 1)

    for t in range(1, episode_length):
        z = int(X_old[0, 0])  # Current period's shock
"""TO DO: change simulation part""" 
        # Determine which state we will be in in the next period based on
        # the shock and generate the corresponding state (x_prime)
        if rand_num[t - 1] <= pi_np[z, 0]:
            X_new = sess.run(x_prime_1, feed_dict={X: X_old})
        elif rand_num[t - 1] <= pi_np[z, 0] + pi_np[z, 1]:
            X_new = sess.run(x_prime_2, feed_dict={X: X_old})
        elif rand_num[t - 1] <= pi_np[z, 0] + pi_np[z, 1] + pi_np[z, 2]:
            X_new = sess.run(x_prime_3, feed_dict={X: X_old})
        else:
            X_new = sess.run(x_prime_4, feed_dict={X: X_old})
        
        # Append it to the dataset
        X_episodes[t, :] = X_new
        X_old = X_new

    time_end = datetime.now()
    time_diff = time_end - time_start
    if print_flag:
        print('Finished simulation. Time for simulation: {}.'.format(time_diff))

    return X_episodes

### The true analytic solution
This model can be solved analytically.
Therefore, additionally to the relative errors in the Euler equations, we can compare the solution learned by the neural network directly to the true solution.
The true policy is given by
\begin{align}
\mathbf{a}^{\text{analytic}}_t=
\beta
\begin{bmatrix}
\frac{1-\beta^{A-1}}{1-\beta^{A}} w_t \\
\frac{1-\beta^{A-2}}{1-\beta^{A-1}} r_t k^{1}_t \\
\frac{1-\beta^{A-3}}{1-\beta^{A-2}} r_t k^{2}_t \\
\dots \\
\frac{1-\beta^{1}}{1-\beta^2} r_t k^{A-2}_t \\
\end{bmatrix}.
\end{align}

In [None]:
beta_vec = beta_np * (1 - beta_np ** (A - 1 - np.arange(A-1))) / (1 - beta_np ** (A - np.arange(A-1)))
beta_vec = tf.constant(np.expand_dims(beta_vec, 0), dtype=tf.float32)
a_analytic = inc[:, : -1] * beta_vec

### Training the _deep equilibrium net_

Now we can begin training.

In [None]:
"""TO DO: change plots and number of shocks"""

# Helper variables for plotting
all_ages = np.arange(1, A+1)
ages = np.arange(1, A)

# Initialize tensorflow session
sess = tf.Session()

# Initialize interactive plotting
fig, ((ax1, ax2, ax3), (ax4, ax5, ax6), (ax7, ax8, ax9)) = plt.subplots(3, 3, figsize=(18, 18))
plt.ion()
fig.show()
fig.canvas.draw()

# Generate a random starting point
if data_path:
    X_data_train = np.load(data_path)
    print('Loaded initial data from ' + data_path)
    start_episode = int(re.search('_(.*).npy', data_path).group(1))
else:
    X_data_train = np.random.rand(1, num_input_nodes)
    X_data_train[:, 0] = (X_data_train[:, 0] > 0.5)
    X_data_train[:, 1:] = X_data_train[:, 1:] + 0.1
    assert np.min(np.sum(X_data_train[:, 1:], axis=1, keepdims=True) > 0) == True, 'Starting point has negative aggregate capital (K)!'
    print('Calculated a valid starting point')
    start_episode = 0

train_seed = 0

cost_store, mov_ave_cost_store = [], []

time_start = datetime.now()
print('start time: {}'.format(time_start))

# Initialize the random variables (neural network weights)
init = tf.global_variables_initializer()

# Initialize saver to save and load previous sessions
saver = tf.train.Saver()

# Run the initializer
sess.run(init)

if sess_path is not None:
    saver.restore(sess, sess_path)
            
for episode in range(start_episode, num_episodes):
    # Simulate data: every episode uses a new training dataset generated on the current
    # iteration's neural network parameters.
    X_episodes = simulate_episodes(sess, X_data_train, len_episodes, print_flag=(episode==0))
    X_data_train = X_episodes[-1, :].reshape([1, -1])
    k_dist_mean = np.mean(X_episodes[:, 8 : 8 + A], axis=0)
    k_dist_min = np.min(X_episodes[:, 8 : 8 + A], axis=0)
    k_dist_max = np.max(X_episodes[:, 8 : 8 + A], axis=0)
    
    ee_error = np.zeros((1, num_agents-1))
    max_ee = np.zeros((1, num_agents-1))

    for epoch in range(epochs_per_episode):
        # Every epoch is one full pass through the dataset. We train multiple passes on 
        # one training set before we resimulate a new dataset.
        train_seed += 1
        minibatch_cost = 0

        # Mini-batch the simulated data
        minibatches = random_mini_batches(X_episodes, minibatch_size, train_seed)

        for minibatch_X in minibatches:
            # Run optimization; i.e., determine the cost of each mini-batch.
            minibatch_cost += sess.run(cost, feed_dict={X: minibatch_X}) / num_minibatches
            if epoch == 0:
                # For the first epoch, save the mean and max euler errors for plotting
                # This way, the errors are calculated out-of-sample.
                opt_euler_ = np.abs(sess.run(opt_euler, feed_dict={X: minibatch_X}))
                ee_error += np.mean(opt_euler_, axis=0) / num_minibatches
                mb_max_ee = np.max(opt_euler_, axis=0, keepdims=True)
                max_ee = np.maximum(max_ee, mb_max_ee)

        if epoch == 0:
            # Record the cost and moving average of the cost at the beginning of each
            # episode to track learning progress.
            cost_store.append(minibatch_cost)
            mov_ave_cost_store.append(np.mean(cost_store[-100:]))

        for minibatch_X in minibatches:
            # Take a mini-batch gradient descent training step. That is, update the
            # weights for one mini-batch.
            sess.run(train_step, feed_dict={X: minibatch_X})
            
    if episode % 20 == 0:
        # Plot
        # Plot the loss function
        ax1.clear()
        line_cost = ax1.plot(np.log10(cost_store), label='Cost')
        line_mov_ave = ax1.plot(np.log10(mov_ave_cost_store), label='Moving average')
        ax1.set_xlabel('Episodes')
        ax1.set_ylabel('Cost [log10]')
        ax1.legend(loc='upper right')

        # Plot the relative errors in the Euler equation
        ax2.clear()
        ee_mean_cost = ax2.plot(ages, np.log10(ee_error).ravel(), 'k-', label='mean')
        ee_max_cost = ax2.plot(ages, np.log10(max_ee).ravel(), '--', label='max')
        ax2.set_xlabel('Age')
        ax2.set_ylabel('Rel EE [log10]')
        ax2.set_xticks(ages)
        ax2.legend()

        # Plot the capital distribution
        ax3.clear()
        k_mean_cost = ax3.plot(all_ages, k_dist_mean, 'k-')
        k_min_cost = ax3.plot(all_ages, k_dist_min, 'k--')
        k_max_cost = ax3.plot(all_ages, k_dist_max, 'k--')
        ax3.set_xlabel('Age')
        ax3.set_ylabel('capital (k)')
        ax3.set_xticks(all_ages)
        
        # =======================================================================================
        # Sample 50 states and compare the neural network's prediction to the analytical solution
        pick = np.random.randint(len_episodes, size=50)
        random_states = X_episodes[pick, :]

        # Sort the states by the exogenous shock
        random_states_1 = random_states[random_states[:, 0] == 0]
        random_states_2 = random_states[random_states[:, 0] == 1]
        random_states_3 = random_states[random_states[:, 0] == 2]
        random_states_4 = random_states[random_states[:, 0] == 3]

        # Get corresponding capital distribution for plots
        random_k_1 = random_states_1[:, 8 : 8 + A]
        random_k_2 = random_states_2[:, 8 : 8 + A]
        random_k_3 = random_states_3[:, 8 : 8 + A]
        random_k_4 = random_states_4[:, 8 : 8 + A]

        # Generate a prediction using the neural network
        nn_pred_1 = sess.run(a, feed_dict={X: random_states_1})
        nn_pred_2 = sess.run(a, feed_dict={X: random_states_2})
        nn_pred_3 = sess.run(a, feed_dict={X: random_states_3})
        nn_pred_4 = sess.run(a, feed_dict={X: random_states_4})

        # Calculate the analytical solution
        true_pol_1 = sess.run(a_analytic, feed_dict={X: random_states_1})
        true_pol_2 = sess.run(a_analytic, feed_dict={X: random_states_2})
        true_pol_3 = sess.run(a_analytic, feed_dict={X: random_states_3})
        true_pol_4 = sess.run(a_analytic, feed_dict={X: random_states_4})

        ax_list = [ax4, ax5, ax6, ax7, ax8]
        # Plot both
        for i in range(A - 1):
            ax = ax_list[i]
            
            ax.clear()
            # Plot the true solution with a circle
            ax.plot(random_k_1[:, i], true_pol_1[:, i], 'ro', mfc='none', alpha=0.5, markersize=6, label='analytic')
            ax.plot(random_k_2[:, i], true_pol_2[:, i], 'bo', mfc='none', alpha=0.5, markersize=6)
            ax.plot(random_k_3[:, i], true_pol_3[:, i], 'go', mfc='none', alpha=0.5, markersize=6)
            ax.plot(random_k_4[:, i], true_pol_4[:, i], 'yo', mfc='none', alpha=0.5, markersize=6)
            # Plot the prediction of the neural net
            ax.plot(random_k_1[:, i], nn_pred_1[:, i], 'r*', markersize=2, label='DEQN')
            ax.plot(random_k_2[:, i], nn_pred_2[:, i], 'b*', markersize=2)
            ax.plot(random_k_3[:, i], nn_pred_3[:, i], 'g*', markersize=2)
            ax.plot(random_k_4[:, i], nn_pred_4[:, i], 'y*', markersize=2)
            ax.set_title('Agent {}'.format(i+1))
            ax.set_xlabel(r'$k_t$')
            ax.set_ylabel(r'$a_t$')
            ax.legend()

        ax9.axis('off')
        fig.canvas.draw()
        #========================================================================================

    # Print cost and time log
    print('Episode {}: log10(Cost): {:.4f}; time: {}; time since start: {}'.format(episode, 
                                                                                   np.log10(cost_store[-1]), 
                                                                                   datetime.now().time(), 
                                                                                   datetime.now() - time_start))

    if episode % 100 == 0:
        # Save the tensorflow session
        saver.save(sess, './output/sess_{}.ckpt'.format(episode))
        # Save the starting point
        np.save('./output/data_{}.npy'.format(episode), X_data_train)
