# Solving Krussel-Smith with Deep Equilibrium Nets
*Simon Lebastard, 02/07/2023*

## Krussel-Smith (1998) model
In the KS'98 model, we look a flavour of the stochastic growth model with a continuum of households under partially uninsurable risk.

**Demand side**
$$V(c) = \mathrm{E}_0\Big[\sum_{t=0}^{\infty}{\beta^t U(c_t)}\Big]$$
with
$$U(c) = \lim_{\nu \to \sigma}{\frac{c^{1-\nu} - 1}{1 - \nu}}$$

Agents are each endowed with $\epsilon \tilde{l}$ units of labor per period, where $\epsilon$ is a first-order Markov chain. (In K&S'98, $\epsilon$ can only take two values: 0 and 1, representing unemployed and employed idiosyncratic states, respectively).
As a very large number of agents is consided, and by assumed independence of the idiosyncratic shocks, K&S'98 assume that the total number of employed and unemployed people remain constant over time. That is, there is no aggregate fluctuations of jobs supply.

Idiosyncratic endogenous state: $k$ holding of capital
Idiosyncratic exogenous state: $\epsilon$ at the individual level
$\Gamma$ the joint distribution of idiosyncratic states $(k,\epsilon)$

Given the aggregate states (see $z$ defined below), the consumer's optimization problem is:
$$v(k,\epsilon;\Gamma,z) = \max_{c,k'}{ \Big\{ U(c) + \beta\mathrm{E}\Big[v(k',\epsilon';\Gamma',z') \mid z,\epsilon \Big] \Big\} } $$
under budget constraint, rational expectations wrt law of motion and non-negative capital holding.

The budget constraint writes:
$$c+k' = r(\hat{k},\hat{l},z)k + w(\hat{k},\hat{l},z)\tilde{l}\epsilon + (1-\delta)k$$

**Supply side**
A single-type good is produced using two factors of production: labor $l$ and capital $k$.
The good is produced according to a Cobb-Douglas production function:
$$y = zk^{\alpha}l^{1-\alpha}, \quad \alpha \in [0,1]$$
The TFP is stochastic and coresponds to the source of aggregate risk. In K&S'98, two aggregate states are considered: $(z_b, z_g)$.
Again, $z$ follows a first-order Markov chain.

We assume the production market to be competitive, such that wages $w$ and rental rates $r$, both functions of aggregate states, are respectively determined by:
$$w(\hat{k},\hat{l},z) = (1 - \alpha)z(\frac{\hat{k}}{\hat{l}})^\alpha$$
$$r(\hat{k},\hat{l},z) = \alpha z(\frac{\hat{k}}{\hat{l}})^{\alpha-1}$$

Here we assume that as the population of households is infinitely large, the share of unemployed remains constant. Moreover, here consumers have no disutility from labor, implying that the labor supply at each period is constant at $L_s = N*\mathrm{E}\big[\epsilon\big]$.
We have the market clearing condition for the capital/cons good: $$\int{(c(k,\epsilon;\Gamma,z) + k'(k,\epsilon;\Gamma,z))dF(\Gamma)} = (1-\delta)\int{k dF(\Gamma)} + z\int{k dF(\Gamma)}^{\alpha}L_s^{1-\alpha}$$

For both Markov chains, we assume the economy is already running at the stationary distribution.

**Law of motion**
$$\Gamma' = H(\Gamma,z,z')$$


### Formulating the problem for solving with DEN
Here we will consider two "implicit" policy functions for which we solve:
- Next-period capital $k'(k,\epsilon;\Gamma,z)$
- The Lagrange multiplier on the next-period capital non-negativity constraint: $\mu_k(k,\epsilon;\Gamma,z)$
- The Lagrange multiplier on the positivity of current-period consumption, ie of the residual of the budget constraint: $\mu_c(k,\epsilon;\Gamma,z)$

As in Azinovic et al, we simulate $N$ agents, and index $i \in [1,...,N]$.

#### Idiosyncratic error terms
The Euler equation can be obtained by taking the FOC of the consumer's objective function with respect to next-period capital:
$$1 = \beta(1-\delta)\mathrm{E}\Big[\Big(\frac{c(k,\epsilon;\Gamma,z)}{c(k',\epsilon';\Gamma',z')}\Big)^{\nu} \mid z,\epsilon\Big] - \big(\mu_c(k,\epsilon;\Gamma,z) - \mu_k(k,\epsilon;\Gamma,z) \big)c(k,\epsilon;\Gamma,z)^{\nu}$$

Based on that, we defined the Euler error as:
$$e_{EE}(k,\epsilon,\Gamma,z) \equiv \bigg[\beta(1-\delta)\mathrm{E}\Big[\Big(\frac{c(k,\epsilon;\Gamma,z)}{c(k',\epsilon';\Gamma',z')}\Big)^{\nu} \mid z,\epsilon\Big] - \big(\mu_c(k,\epsilon;\Gamma,z) - \mu_k(k,\epsilon;\Gamma,z) \big)\bigg]^{-\frac{1}{\nu}} - 1$$
and
$$e_{EE,i} \equiv e_{EE}(k_i,\epsilon_i,\Gamma,z)$$
Here we want to train the model to capture binding constraints by itself. Note that in practice, during training we may end up with negative consumptions (that will be penalized), such that ratio $\frac{c(k,\epsilon;\Gamma,z)}{c(k',\epsilon';\Gamma',z')}^\nu$ may not be defined for $\nu \in \mathbb{R}$.
To solve this issue, I use RELU activation in the last two layers of the network, to enforce $c(k,\epsilon;\Gamma,z), k'(k,\epsilon;\Gamma,z), \mu_c(k,\epsilon;\Gamma,z), \mu_k(k,\epsilon;\Gamma,z)$ to be non-negative for all states.

We define the error on the complementary-slackness condition on k as:
$$e_{CS_k}(k,\epsilon,\Gamma,z) \equiv \frac{\mu_k(k,\epsilon;\Gamma,z)}{U'(\bar{c})} \frac{k'(k,\epsilon;\Gamma,z)}{\bar{k}}$$
and
$$e_{CS_k,i} \equiv e_{CS_k}(k_i,\epsilon_i,\Gamma,z)$$

We define the error on the complementary-slackness condition on c as:
$$e_{CS_c}(k,\epsilon,\Gamma,z) \equiv \frac{\mu_c(k,\epsilon;\Gamma,z)}{U'(\bar{c})} \frac{c(k,\epsilon;\Gamma,z)}{\bar{c}}$$
and
$$e_{CS_c,i} \equiv e_{CS_c}(k_i,\epsilon_i,\Gamma,z)$$

The agent's budget constraint is:
$$e_{BC}(k,\epsilon,\Gamma,z) \equiv \frac{1}{\hat{c}+\hat{k}}\bigg[ c + k' - \big( 1 + r(\hat{k}, \hat{l}, z) - \delta \big)k - w(\hat{k}, \hat{l}, z)\tilde{l}\epsilon \bigg] $$

#### Aggregate error term
At each period, we compute the aggregate $K \equiv \sum_{i=1}^{N}{k_i}$

One way to proceed could be to define an error based on the market clearing condition on capital. Instead, we will enforce consumption to satisfy the market clearing condition at each period. Note that in doing so, we could still need to enforce that consumption is positive. Instead of enforcing a new constraint on a Lagrangian multiplier associated with consumption, I compute the error on MC by constraining consumption to be positive in the error, by using a transformation of the error that satisfies:
$$\lim_{c \downarrow 0}{e_{MC}(c)} = \infty$$
This should prevent consumption from ever being non-positive.

We define the error on the market clearing condition as:
$$e_{MC} \equiv \sum_{i=1}^{N}{\Big[c(k_i,\epsilon_i;\Gamma,z) + k'(k_i,\epsilon_i;\Gamma,z)\Big]} - zK^{\alpha}(Nu)^{1-\alpha} - (1-\delta)K$$

We also define the error on the aggregate law of motion as:
$$e_{LM} \equiv $$

#### Defining the loss function
On a batch $\mathcal{D}_{train}$, the loss function is defined as:
$$l(\theta) \equiv \frac{1}{\mid\mathcal{D}_{train}\mid} \sum_{x \in \mathcal{D}_{train}}{\frac{1}{N-1}\sum_{i=1}^{N}{\Big(e_{EE,i}^2 + e_{CS_k,i}^2 + e_{CS_c,i}^2\Big)} + e_{MC}^2}$$

**Note: alternative implementation**<br>
Instead of having a policy variable for the Lagrange multiplier on the positivity of next-period capital, we could have a transformation function on  next-period capital that ensures it remains non-negative at all times. I will implement this alternative method and compare results.

#### Defining network dimensions
As we make consumption implicit, we have the following dimensions:
- $2N+1$ scalar inputs: $(z, \big\{(k_i,\epsilon_i)\big\}_{i \in [1..N]})$
- $4N$ scalar outputs: ${k_i', c_i, \mu_{k,i}, \mu_{c,i}}_{i \in [1..N]}$

In [39]:
# Import modules
import os
import re
from datetime import datetime

import tensorflow as tf
tf.config.run_functions_eagerly(True)
import numpy as np

import matplotlib.pyplot as plt
from tqdm import tqdm
from keras.models import Model
from keras.layers import * #Input, Dense, BatchNormalization
from tensorflow import Tensor

# Set the seed for replicable results
seed = 0
np.random.seed(seed)
#tf.set_random_seed(seed)

In [2]:
def firm(K:Tensor, z:Tensor):
    prod = z*tf.pow(K, α)*tf.pow(L, 1-α)
    r = z*α*tf.pow(K, α-1)*tf.pow(L, 1-α)
    w = z*(1-α)*tf.pow(K, α)*tf.pow(L, -α)
    return prod, r, w

In [3]:
from tensorflow.random import stateless_binomial

# Shocks structure
Z = tf.constant([0.9, 1.1], dtype=tf.float32)
z_l = Z[0]
z_h = Z[1]
P_z = tf.constant([[0.9, 0.1], [0.1, 0.9]], dtype=tf.float32)
def draw_z(z):
    z_id = int(z==z_h)
    zp_id = stateless_binomial(shape=[1], seed=[123, 456], counts=[1.], probs=[P_z[z_id,1]])[0]
    return Z[zp_id]

E = tf.constant([0, 1], dtype=tf.float32)
eps_l = E[0]
eps_h = E[1]
P_eps = tf.constant([[0.5, 0.5], [0.5, 0.5]], dtype=tf.float32)
def draw_eps(N: int):
    return stateless_binomial(
        shape=[N,],
        seed=[123, 456],
        counts=1,
        probs=0.5,
        output_dtype=tf.float32,
    )

# Other constants
α = tf.constant(0.3, dtype=tf.float32)
β = tf.constant(0.7, dtype=tf.float32)
δ = tf.constant(0.1, dtype=tf.float32)
γ = tf.constant(2.0, dtype=tf.float32)

2023-02-10 15:33:57.420116: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Neural Network - Architecture

The following is a specialization class of Keras' Model, with a custom training step with gradient taping

In [27]:
@tf.function
def budget_residual(k: Tensor, c: Tensor, eps:Tensor, kp: Tensor, r:float, w:float):
    return w*eps + (1+r-δ)*k - kp - c

@tf.function
def FB(a:float, b:float):
    return a + b - tf.sqrt(tf.pow(a,2) + tf.pow(b,2))

class DENModel(Model):

    @tf.function
    def __init__(self, inputs, *args, **kwargs):
        self.N = inputs[0].shape[1] - 1
        super().__init__(inputs, *args, **kwargs)

    @tf.function
    def initialize_k(self):
        k_ss = 2.
        return tf.Variable(tf.random.uniform(shape=[self.N,], minval=0.8*k_ss, maxval=1.2*k_ss))

    @tf.function
    def _forward_DEPRECATED(self, z:Tensor, k:Tensor, eps:Tensor, training:bool=False):
        kp = tf.Variable(tf.zeros_like(k))
        mup = tf.Variable(tf.zeros_like(k))
        c = tf.Variable(tf.zeros_like(k))
        lambdap = tf.Variable(tf.zeros_like(k))

        x_aggr = tf.concat([z[None],k],axis=0)
        x_aggr = tf.reshape(x_aggr, shape=[1,-1])
        for agent_id in range(self.N):
            x_idio = tf.concat([k[agent_id][None], eps[agent_id][None]], axis=0)
            x_idio = tf.reshape(x_idio, shape=[1,-1])
            pol = self(inputs=(x_aggr, x_idio), training=training)
            kp[agent_id].assign(pol[0,0])
            mup[agent_id].assign(pol[0,1])
            c[agent_id].assign(pol[0,2])
            lambdap[agent_id].assign(pol[0,3])
        return kp, mup, c, lambdap

    @tf.function
    def forward(self, z:Tensor, k:Tensor, eps:Tensor, training:bool=False):
        x_aggr = tf.concat([z[None],k],axis=0)
        x_aggr = tf.tile(tf.reshape(x_aggr, shape=[1,-1]), tf.constant([self.N,1], tf.int32))
        
        x_idio = tf.concat([tf.reshape(k,[-1,1]),tf.reshape(eps,[-1,1])],axis=1)
        y = self(inputs=(x_aggr, x_idio), training=training)
        return y

    @tf.function
    def residuals(self, z:Tensor, k:Tensor, eps:Tensor):
        K = tf.math.reduce_sum(k)
        if K<0:
            print("K: NEGATIVE AGGREGATE CAPITAL DESPITE RELU!!")
        Y,r,w = firm(K,z)
        #print("CURRENT-PERIOD AGGREGATE CAPITAL")
        #print(K)
        #print("CURRENT-PERIOD PRODUCTION, ROC, WAGE")
        #print(Y,r,w)

        # 1st forward pass
        yp = self.forward(z, k, eps, training=True)
        kp = yp[:,0]
        mup = yp[:,1]
        c = yp[:,2]
        lambdap = yp[:,3]
        C = tf.math.reduce_sum(c)
        Kp = tf.math.reduce_sum(kp)
        if Kp<0:
            print("Kp: NEGATIVE AGGREGATE CAPITAL DESPITE RELU!!")
        BUDGET_RES = budget_residual(k, c, eps, kp, r, w)
        CSK_RES = mup*kp
        CSC_RES = c*lambdap

        # For each possible value of next-period exogenous states, compute the next-period policy
        kpp = tf.Variable(tf.zeros((self.N,2)))
        Kpp = tf.Variable(tf.zeros((2,)))
        mupp = tf.Variable(tf.zeros((self.N,2)))
        lambdapp = tf.Variable(tf.zeros((self.N,2)))
        cp = tf.Variable(tf.zeros((self.N,2)))
        Cp = tf.Variable(tf.zeros((2,)))
        ee_comp = tf.Variable(tf.zeros((self.N,2)))
        # BUDGET_RES_COND = np.zeros((N,2))
        # CSK_RES_COND = np.zeros((N,2))
        # CSC_RES_COND = np.zeros((N,2))
        # MC_RES_COND = np.zeros((N,2))

        for zp_id, zp in enumerate(Z):
            Yp,rp,wp = firm(Kp,zp)
            epsp = draw_eps(self.N)
            ypp = self.forward(zp, kp, epsp, training=True)
            kpp[:,zp_id].assign(ypp[:,0])
            mupp[:,zp_id].assign(ypp[:,1])
            cp[:,zp_id].assign(ypp[:,2])
            lambdapp[:,zp_id].assign(ypp[:,3])
            ee_comp_tmp = tf.pow(c/cp[:,zp_id],γ)
            ee_comp[:,zp_id].assign(tf.where(tf.math.is_nan(ee_comp_tmp), 1e6*tf.ones_like(ee_comp_tmp), ee_comp_tmp))
            # BUDGET_RES_COND[:,zp_id] = budget_residual(kp, ypp[2], epsp, ypp[0], rp, wp)
            # CSK_RES_COND[:,zp_id] = mupp[:,zp_id]*kpp[:,zp_id]
            # CSC_RES_COND[:,zp_id] = cp[:,zp_id]*lambdapp[:,zp_id]

            Kpp[zp_id].assign(tf.math.reduce_sum(kpp[:,zp_id]))
            if Kpp[zp_id]<0:
                print("Kpp[{0:d}]: NEGATIVE AGGREGATE CAPITAL DESPITE RELU!!".format(zp_id))
            Cp[zp_id].assign(tf.math.reduce_sum(cp[:,zp_id]))
            # MC_RES_COND[zp_id] = Cp[zp_id] + Kpp[zp_id] - (1-δ)*Kp - Yp
        
        EE_RES = tf.pow(
            β*(1-δ)*tf.tensordot(ee_comp,tf.transpose(P_z[int(z==z_h),:]), axes=1) - (lambdap - mup),
            -1./γ
        ) - 1
        
        MC_RES = C + Kp - (1-δ)*K - Y
        return BUDGET_RES, CSK_RES, CSC_RES, MC_RES

    @tf.function
    def train_step(self, data, batch_size):
        z, k, eps = data
        ERR = 0
        for per_id in range(batch_size):
            y = self.forward(z, k, eps, training=False)
            BUDGET_RES, CSK_RES, CSC_RES, MC_RES = self.residuals(z, k, eps)
            # print("BUDGET RESIDUAL: ", BUDGET_RES*BUDGET_RES)
            # print("CS on K: ", CSK_RES*CSK_RES)
            # print("CS on C: ", CSC_RES*CSC_RES)
            # print("MK: ", MC_RES*MC_RES)
            # print("TOTAL ERROR FORM IS SOURCES: ", tf.math.reduce_mean(BUDGET_RES*BUDGET_RES + CSK_RES*CSK_RES + CSC_RES*CSC_RES))
            ERR += (1./batch_size)*tf.math.reduce_mean(BUDGET_RES*BUDGET_RES + CSK_RES*CSK_RES + CSC_RES*CSC_RES) + MC_RES*MC_RES
            k = y[:,0]
            eps = draw_eps(N)
            z = draw_z(z)
        return ERR, z, k, eps
    
    @tf.function
    def train(self, optimizer, n_epochs=1000, batch_size: int=64):
        z = z_h
        eps = draw_eps(self.N)
        k = self.initialize_k()
        metrics = {'mse': []}

        for epoch in tqdm(range(n_epochs)):
            with tf.GradientTape() as tape:
                ERR, z, k, eps = self.train_step([z, k, eps], batch_size)
            grads = tape.gradient(ERR, self.trainable_weights)
            optimizer.apply_gradients(zip(grads, self.trainable_weights))
            metrics['mse'].append(ERR.numpy())

            # Log every 200 batches.
            if epoch % 100 == 0:
                print(
                    "Training loss (for one batch) at epoch %d: %.4f"
                    % (epoch, float(ERR))
                )
                print("Total # time iterations: %d" % (batch_size*(1+epoch)))
            
        return metrics
    
lr = 0.00001
optimizer = tf.keras.optimizers.Adam(
    learning_rate=lr,
)

### Compact network approach

In [28]:
N = num_agents = 500
L = N*tf.reduce_mean(E)
n_aggr_repr = 8

initializer = tf.keras.initializers.GlorotUniform()

## AGGREGATE REPRESENTATION UNITS
# Common network processes distribution-relevant information
x_aggr = Input(shape=(N+1, ), name='Distr-In')
#xn_aggr = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,)(x_aggr)
aggr_1 = Dense(units=4*n_aggr_repr, activation = 'tanh', kernel_initializer=initializer, name='Distr-Dense1')(x_aggr)
aggr_2 = Dense(units=4*n_aggr_repr, activation = 'tanh', kernel_initializer=initializer, name='Distr-Dense2')(aggr_1)
aggr_3 = Dense(units=2*n_aggr_repr, activation = 'tanh', kernel_initializer=initializer, name='Distr-Dense3')(aggr_2)
aggr_4 = Dense(units=n_aggr_repr, activation = 'tanh', kernel_initializer=initializer, name='Distr-Dense4')(aggr_3)

## POLICY UNITS
# Agent-specific policy units
x_idio = Input(shape = (2, ), name='Idio-In')
combined = Concatenate(name='Intermediate_Input')([aggr_4, x_idio])
interp_c_h_1 = Dense(units=32, input_dim=2+n_aggr_repr, activation = 'tanh', kernel_initializer=initializer, name='Policy-Dense1')(combined)
interp_c_h_2 = Dense(units=32, activation = 'tanh', kernel_initializer=initializer, name='Policy-Dense2')(interp_c_h_1)
interp_c_h_3 = Dense(units=32, activation = 'relu', kernel_initializer=initializer, name='Policy-Relu1')(interp_c_h_2)
policy = Dense(units=4, activation = 'relu', kernel_initializer=initializer, name='Policy-Relu2')(interp_c_h_3)

model = DENModel(inputs = [x_aggr,x_idio], outputs= policy)

In [29]:
model.summary()

Model: "den_model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 Distr-In (InputLayer)          [(None, 501)]        0           []                               
                                                                                                  
 Distr-Dense1 (Dense)           (None, 32)           16064       ['Distr-In[0][0]']               
                                                                                                  
 Distr-Dense2 (Dense)           (None, 32)           1056        ['Distr-Dense1[0][0]']           
                                                                                                  
 Distr-Dense3 (Dense)           (None, 16)           528         ['Distr-Dense2[0][0]']           
                                                                                        

### Brute-force MC-Distribution-processing neural network

In [30]:
# n_aggr_reprb = 8

## AGGREGATE REPRESENTATION UNITS
## Common network processes distribution-relevant information
# x_aggrb = Input(shape = (None,2))
# xn_aggrb = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,)(x_aggr)
# aggr_1b = Dense(4*n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(xn_aggr)
# aggr_2b = Dense(4*n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(aggr_1)
# aggr_3b = Dense(2*n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(aggr_2)
# aggr_4b = Dense(n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(aggr_3)

## POLICY UNITS
# Agent-specific policy units
# x_idiob = Input(shape = (None,2*N))
# interp_c_h_1b = Dense(32, input_dim=2*N+n_aggr_reprb, activation = 'tanh', kernel_initializer=initializer)(tf.concat(values=(x_idiob,aggr_4b),axis=1))
# interp_c_h_2b = Dense(32, activation = 'tanh', kernel_initializer=initializer)(interp_c_h_1b)
# policyb = Dense(2*N, activation = 'tanh', kernel_initializer=initializer)(interp_c_h_2b)

# nn2 = DENModel(inputs = tf.concat(values=(x_aggrb,x_idiob)), outputs= policyb)

### Model training

In [31]:
k = model.initialize_k()
eps = draw_eps(N)

In [33]:
yp = model.forward(z_h, k, eps)

In [23]:
RES = model.residuals(z_h, k, eps)

In [24]:
ERR, z, k, eps = model.train_step([z_h,k,eps], batch_size=3)

Training time!

In [34]:
metrics = model.train(optimizer=optimizer, n_epochs=100, batch_size=16)

  1%|          | 1/100 [00:01<01:57,  1.19s/it]

Training loss (for one batch) at epoch 0: 1794371.0000
Total # time iterations: 16


100%|██████████| 100/100 [01:38<00:00,  1.02it/s]


In [35]:
metrics

{'mse': [1794371.0,
  490332.0,
  483397.97,
  476075.2,
  468594.34,
  461042.22,
  453463.25,
  445885.2,
  439635.47,
  433902.6,
  428245.4,
  422646.1,
  417103.06,
  411615.03,
  406180.94,
  400800.28,
  395472.78,
  390197.9,
  384975.72,
  379806.1,
  374689.06,
  369624.38,
  364611.9,
  359651.62,
  354743.2,
  349886.34,
  345080.84,
  340326.12,
  335621.53,
  330967.47,
  326484.0,
  322104.16,
  317777.53,
  313496.62,
  309284.22,
  305332.06,
  301417.94,
  297529.03,
  293661.5,
  289816.88,
  285996.03,
  282199.16,
  278994.2,
  276345.16,
  273779.0,
  271266.94,
  269080.25,
  267005.56,
  265003.12,
  263067.06,
  261159.02,
  259278.16,
  257423.42,
  255593.81,
  253816.86,
  252080.19,
  250371.0,
  248687.42,
  247029.56,
  245393.86,
  243780.88,
  242253.34,
  240751.62,
  239266.64,
  237799.31,
  236394.78,
  235031.3,
  233651.42,
  232255.3,
  230846.53,
  229473.28,
  228141.9,
  226832.61,
  225528.02,
  224228.55,
  222934.55,
  221646.36,
  220381.7

In [47]:
#fig, ax = plt.subplots(1,1)
plt.plot(
    range(100),
    metrics['mse'],
)
plt.xscale('log')
plt.yscale('log')
plt.grid()

<IPython.core.display.Javascript object>