# Solving Krussel-Smith with Deep Equilibrium Nets
*Simon Lebastard, 02/07/2023*

## Krussel-Smith (1998) model
In the KS'98 model, we look a flavour of the stochastic growth model with a continuum of households under partially uninsurable risk.

**Demand side**
$$V(c) = \mathrm{E}_0\Big[\sum_{t=0}^{\infty}{\beta^t U(c_t)}\Big]$$
with
$$U(c) = \lim_{\nu \to \sigma}{\frac{c^{1-\nu} - 1}{1 - \nu}}$$

Agents are each endowed with $\epsilon \tilde{l}$ units of labor per period, where $\epsilon$ is a first-order Markov chain. (In K&S'98, $\epsilon$ can only take two values: 0 and 1, representing unemployed and employed idiosyncratic states, respectively).
As a very large number of agents is consided, and by assumed independence of the idiosyncratic shocks, K&S'98 assume that the total number of employed and unemployed people remain constant over time. That is, there is no aggregate fluctuations of jobs supply.

Idiosyncratic endogenous state: $k$ holding of capital
Idiosyncratic exogenous state: $\epsilon$ at the individual level
$\Gamma$ the joint distribution of idiosyncratic states $(k,\epsilon)$

Given the aggregate states (see $z$ defined below), the consumer's optimization problem is:
$$v(k,\epsilon;\Gamma,z) = \max_{c,k'}{ \Big\{ U(c) + \beta\mathrm{E}\Big[v(k',\epsilon';\Gamma',z') \mid z,\epsilon \Big] \Big\} } $$
under budget constraint, rational expectations wrt law of motion and non-negative capital holding.

The budget constraint writes:
$$c+k' = r(\hat{k},\hat{l},z)k + w(\hat{k},\hat{l},z)\tilde{l}\epsilon + (1-\delta)k$$

**Supply side**
A single-type good is produced using two factors of production: labor $l$ and capital $k$.
The good is produced according to a Cobb-Douglas production function:
$$y = zk^{\alpha}l^{1-\alpha}, \quad \alpha \in [0,1]$$
The TFP is stochastic and coresponds to the source of aggregate risk. In K&S'98, two aggregate states are considered: $(z_b, z_g)$.
Again, $z$ follows a first-order Markov chain.

We assume the production market to be competitive, such that wages $w$ and rental rates $r$, both functions of aggregate states, are respectively determined by:
$$w(\hat{k},\hat{l},z) = (1 - \alpha)z(\frac{\hat{k}}{\hat{l}})^\alpha$$
$$r(\hat{k},\hat{l},z) = \alpha z(\frac{\hat{k}}{\hat{l}})^{\alpha-1}$$

Here we assume that as the population of households is infinitely large, the share of unemployed remains constant. Moreover, here consumers have no disutility from labor, implying that the labor supply at each period is constant at $L_s = N*\mathrm{E}\big[\epsilon\big]$.
We have the market clearing condition for the capital/cons good: $$\int{(c(k,\epsilon;\Gamma,z) + k'(k,\epsilon;\Gamma,z))dF(\Gamma)} = (1-\delta)\int{k dF(\Gamma)} + z\int{k dF(\Gamma)}^{\alpha}L_s^{1-\alpha}$$

For both Markov chains, we assume the economy is already running at the stationary distribution.

**Law of motion**
$$\Gamma' = H(\Gamma,z,z')$$


### Formulating the problem for solving with DEN
Here we will consider two "implicit" policy functions for which we solve:
- Next-period capital $k'(k,\epsilon;\Gamma,z)$
- The Lagrange multiplier on the next-period capital non-negativity constraint: $\mu_k(k,\epsilon;\Gamma,z)$
- The Lagrange multiplier on the positivity of current-period consumption, ie of the residual of the budget constraint: $\mu_c(k,\epsilon;\Gamma,z)$
Implicitly, we also have as a current-period consumption $c(k,\epsilon;\Gamma,z)$ as a policy function. Following Azinovic et al, however, we use the consumer's budget constraint to compute the consumption at each period in time. As the utility function we're working with is locally nonsatiated, the consumer will bind it's budget constraint at each period.

As in Azinovic et al, we simulate $N$ agents, and index $i \in [1,...,N]$.

#### Idiosyncratic error terms
The Euler equation can be obtained by taking the FOC of the consumer's objective function with respect to next-period capital:
$$1 = \beta(1-\delta)\mathrm{E}\Big[\Big(\frac{c(k,\epsilon;\Gamma,z)}{c(k',\epsilon';\Gamma',z')}\Big)^{\nu} \mid z,\epsilon\Big] + \mu(k,\epsilon;\Gamma,z)c(k,\epsilon;\Gamma,z)^{\nu}$$

Based on that, we defined the Euler error as:
$$e_{EE}(k,\epsilon,\Gamma,z) \equiv 1 - \beta(1-\delta)\mathrm{E}\Big[\Big(\frac{c(k,\epsilon;\Gamma,z)}{c(k',\epsilon';\Gamma',z')}\Big)^{\nu} \mid z,\epsilon\Big] + \mu(k,\epsilon;\Gamma,z)c(k,\epsilon;\Gamma,z)^{\nu}$$
and
$$e_{EE,i} \equiv e_{EE}(k_i,\epsilon_i,\Gamma,z)$$

We define the error on the complementary-slackness condition on k as:
$$e_{CS_k}(k,\epsilon,\Gamma,z) \equiv \mu(k,\epsilon;\Gamma,z)k'(k,\epsilon;\Gamma,z)$$
and
$$e_{CS_k,i} \equiv e_{CS_k}(k_i,\epsilon_i,\Gamma,z)$$

We define the error on the complementary-slackness condition on c as:
$$e_{CS_c}(k,\epsilon,\Gamma,z) \equiv \mu(k,\epsilon;\Gamma,z)k'(k,\epsilon;\Gamma,z)$$
and
$$e_{CS_c,i} \equiv e_{CS_c}(k_i,\epsilon_i,\Gamma,z)$$

#### Aggregate error term
At each period, we compute the aggregate $K \equiv \sum_{i=1}^{N}{k_i}$

One way to proceed could be to define an error based on the market clearing condition on capital. Instead, we will enforce consumption to satisfy the market clearing condition at each period. Note that in doing so, we could still need to enforce that consumption is positive. Instead of enforcing a new constraint on a Lagrangian multiplier associated with consumption, I compute the error on MC by constraining consumption to be positive in the error, by using a transformation of the error that satisfies:
$$\lim_{c \downarrow 0}{e_{MC}(c)} = \infty$$
This should prevent consumption from ever being non-positive.

We define the error on the market clearing condition as:
$$e_{MC} \equiv \sum_{i=1}^{N}{\Big[c(k_i,\epsilon_i;\Gamma,z) + k'(k_i,\epsilon_i;\Gamma,z)\Big]} - zK^{\alpha}(Nu)^{1-\alpha} - (1-\delta)K$$

#### Defining the loss function
On a batch $\mathcal{D}_{train}$, the loss function is defined as:
$$l(\theta) \equiv \frac{1}{\mid\mathcal{D}_{train}\mid} \sum_{x \in \mathcal{D}_{train}}{\frac{1}{N-1}\sum_{i=1}^{N}{\Big(e_{EE,i}^2 + e_{CS_k,i}^2 + e_{CS_c,i}^2\Big)} + e_{MC}^2}$$

**Note: alternative implementation**<br>
Instead of having a policy variable for the Lagrange multiplier on the positivity of next-period capital, we could have a transformation function on  next-period capital that ensures it remains non-negative at all times. I will implement this alternative method and compare results.

#### Defining network dimensions
As we make consumption implicit, we have the following dimensions:
- $2N+1$ scalar inputs: $(z, \big\{(k_i,\epsilon_i)\big\}_{i \in [1..N]})$
- $4N$ scalar outputs: ${k_i', c_i, \mu_{k,i}, \mu_{c,i}}_{i \in [1..N]}$

In [10]:
%matplotlib notebook

# Import modules
import os
import re
from datetime import datetime

import tensorflow as tf
import numpy as np

import matplotlib.pyplot as plt
from tqdm import tqdm
from keras.models import Model
from keras.layers import Input, Dense, BatchNormalization
class Vector: pass

# Set the seed for replicable results
seed = 0
np.random.seed(seed)
#tf.set_random_seed(seed)

In [6]:
N = num_agents = 1000
num_episodes = 5000 
len_episodes = 1000
epochs_per_episode = 20
minibatch_size = 512
num_minibatches = int(len_episodes / minibatch_size)
lr = 0.00001

# Neural network architecture parameters
num_input_nodes = 2*N + 1  # Dimension of extended state space (8 aggregate quantities and 4 distributions)
num_output_nodes = 2*N  # Output dimension (capital holding for each agent. Agent 1 is born without capital (k^1=0))

In [7]:
from numpy.random import binomial, multinomial

# Shocks structure
Z = np.array([0.9, 1.1])
z_l = Z[0]
z_h = Z[1]
P_z = np.array([[0.9, 0.1], [0.1, 0.9]])
def draw_z(z):
    z_id = int(z==z_h)
    zp_id = multinomial(1,P_z[z_id,:])
    return Z[zp_id]

E = np.array([0, 1])
ε_l = E[0]
ε_h = E[1]
P_ε = np.array([[0.5, 0.5], [0.5, 0.5]])
def draw_ε(size):
    return binomial(1,0.5,size)

# Other constants
α = tf.constant(0.3)
β = tf.constant(0.7)
δ = tf.constant(0.1)
γ = tf.constant(2.0)
L = N*E.mean()

2023-02-08 09:55:18.044050: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [11]:
def firm(K,z):
    prod = z*tf.pow(K, α)*tf.pow(L, 1-α)
    r = z*α*tf.pow(K, α-1)*tf.pow(L, 1-α)
    w = z*(1-α)*tf.pow(K, α)*tf.pow(L, -α)
    return prod, r, w

## Neural Network - Architecture

### Compact network approach

In [None]:
initializer = tf.keras.initializers.GlorotUniform()
n_aggr_repr = 8

## AGGREGATE REPRESENTATION UNITS
# Common network processes distribution-relevant information
x_aggr = Input(shape = (2,))
xn_aggr = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,)(x_aggr)
aggr_1 = Dense(4*n_aggr_repr, activation = 'tanh', kernel_initializer=initializer)(xn_aggr)
aggr_2 = Dense(4*n_aggr_repr, activation = 'tanh', kernel_initializer=initializer)(aggr_1)
aggr_3 = Dense(2*n_aggr_repr, activation = 'tanh', kernel_initializer=initializer)(aggr_2)
aggr_4 = Dense(n_aggr_repr, activation = 'tanh', kernel_initializer=initializer)(aggr_3)

## POLICY UNITS
# Agent-specific policy units
x_idio = Input(shape = (2,))
interp_c_h_1 = Dense(32, input_dim=2+n_aggr_repr, activation = 'tanh', kernel_initializer=initializer)(tf.concat(values=(x_idio,aggr_4),axis=1))
interp_c_h_2 = Dense(32, activation = 'tanh', kernel_initializer=initializer)(interp_c_h_1)
policy = Dense(4, activation = 'tanh', kernel_initializer=initializer)(interp_c_h_2)

nn1 = Model(inputs = (x_aggr,x_idio), outputs= policy)

### Brute-force MC-Distribution-processing neural network

In [None]:
n_aggr_reprb = 8

## AGGREGATE REPRESENTATION UNITS
# Common network processes distribution-relevant information
x_aggrb = Input(shape = (2,))
xn_aggrb = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,)(x_aggr)
aggr_1b = Dense(4*n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(xn_aggr)
aggr_2b = Dense(4*n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(aggr_1)
aggr_3b = Dense(2*n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(aggr_2)
aggr_4b = Dense(n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(aggr_3)

## POLICY UNITS
# Agent-specific policy units
x_idiob = Input(shape = (2*N,))
interp_c_h_1b = Dense(32, input_dim=2*N+n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(tf.concat(values=(x_idiob,aggr_4b),axis=1))
interp_c_h_2b = Dense(32, activation = 'tanh', kernel_initializer=initializer)(interp_c_h_1b)
policyb = Dense(2*N, activation = 'tanh', kernel_initializer=initializer)(interp_c_h_2b)

nn2 = Model(inputs = tf.concat(values=(x_aggrb,x_idiob)), outputs= policyb)

In [None]:
class model():
    def __init__(self, nn: Model):
        self.nn = nn
        
def forward(nn:Model, k:np.array, ε:np.array, z:float):
    N = k.shape[0]
    kp = np.full_like(k)
    mup = np.zeros_like(k)
    K = k.sum()
    for agent_id in range(N):
        pol = nn((z,K,k[agent_id], ε[agent_id]))
        kp[agent_id] = pol[0]
        mup[agent_id] = pol[1]
    return kp, mup

def budget_residual(k: np.array, c: np.array, ε:int, kp: np.array, r:float, w:float):
    return w*ε + (1+r-δ)*k - kp - c

def FB(a:float, b:float):
    return a + b - tf.sqrt(tf.pow(a,2) + tf.pow(b,2))

def residuals(nn:Model, k:np.array, ε:np.array, z:float):
    N = k.shape[0]
    K = tf.math.reduce_sum(k)
    Y,r,w = firm(K,z)

    # 1st forward pass
    kp, mup, c, lambdap = forward(nn, k, ε, z)
    C = tf.math.reduce_sum(c)
    Kp = tf.math.reduce_sum(kp)
    BUDGET_RES = budget_residual(k, c, ε, kp, r, w)
    CSK_RES = mup*kp
    CSC_RES = c*lambdap

    # For each possible value of next-period exogenous states, compute the next-period policy
    kpp = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    Kpp = np.zeros((2,2))
    mupp = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    lambdapp = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    cp = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    Cp = np.zeros((2,2))
    ee_comp = np.zeros((2,2))
    BUDGET_RES_COND = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    CSK_RES_COND = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    CSC_RES_COND = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent
    MC_RES_COND = np.zeros((2,2)) # ToDo: change size, there will be one scalar error term per agent

    for zp_id, zp in enumerate(Z):
        Yp,rp,wp = firm(Kp,zp)
        for εp_id, εp in enumerate(E):
            ypp = forward(nn, kp, εp, zp)
            kpp[zp_id,εp_id] = ypp[0]
            mupp[zp_id,εp_id] = ypp[1]
            cp[zp_id,εp_id] = ypp[2]
            lambdapp[zp_id,εp_id] = ypp[3]
            ee_comp[zp_id,εp_id] = tf.pow(c/cp[zp_id,εp_id],nu)
            BUDGET_RES_COND[zp_id,εp_id] = budget_residual(kp, ypp[2], εp, ypp[0], rp, wp)
            CSK_RES_COND[zp_id,εp_id] = mupp[zp_id,εp_id]*kpp[zp_id,εp_id]
            CSC_RES_COND[zp_id,εp_id] = cp[zp_id,εp_id]*lambdapp[zp_id,εp_id]

            Kpp[zp_id,εp_id] = tf.math.reduce_sum(kpp[zp_id,εp_id])
            Cp[zp_id,εp_id] = tf.math.reduce_sum(cp[zp_id,εp_id])
            MC_RES_COND[zp_id,εp_id] = Cp[zp_id,εp_id] + Kpp[zp_id,εp_id] - (1-δ)*Kp - Yp

    EE_RES = 1 - β*(1-δ)*tf.math.reduce_mean(tf.tensordot(P_z[int(z==z_h)],ee_comp))
    MC_RES = C + Kp - (1-δ)*K - Y
    return BUDGET_RES, CSK_RES, CSC_RES, MC_RES

def J(nn: Model, z, k, ε, mb_size):
    ERR = 0
    for per_id in range(mb_size):
        y = forward(nn, k, ε, z)
        BUDGET_RES, CSK_RES, CSC_RES, MC_RES = residuals(nn, k, ε, z)
        ERR += (1./mb_size)*tf.math.reduce_mean(BUDGET_RES*BUDGET_RES + CSK_RES*CSK_RES + CSC_RES*CSC_RES) + MC_RES*MC_RES
        k = y[O]
        ε = draw_ε(N)
        z = draw_z(z)
    return ERR