# 4 Heterogeneous Agents 
Consider the environment in section 5. We need to solve the following:
\begin{align}
    \begin{split}
    {L}^e_{HJB} \left(v^e, v^h, \kappa, \chi;x \right) &= \frac{\rho_e}{1-\rho_e} \delta_e^{1 / \rho_e} \exp\left[({1- \frac{1}{\rho_e}})v^e)\right]-\frac{\delta_e}{1-\rho_e}+r\\&+\frac{1}{2 \gamma_e} \frac{\left(\Delta^e+\pi^h \cdot \sigma_R\right)^2}{\left\|\sigma_R\right\|^2}
+\left[\mu_X+\frac{1-\gamma_e}{\gamma_e}\left(\frac{\Delta^e+\pi^h \cdot \sigma_R}{\left\|\sigma_R\right\|^2}\right) \sigma_X \sigma_R\right] \cdot \partial_X v^e \\
&+\frac{1}{2}\left[\operatorname{tr}\left(\sigma_X^{\prime} \partial_{xx^{\prime}} v^e \sigma_X\right)+\frac{1-\gamma_e}{\gamma_e}\left(\sigma_X^{\prime} \partial_x v_e\right)^{\prime}\left[\gamma_e \mathbb{I}_d+\left(1-\gamma_e\right) \frac{\sigma_R \sigma_R^{\prime}}{\left\|\sigma_R\right\|^2}\right] \sigma_X^{\prime} \partial_x v^e\right]= 0 
\end{split}\\
\begin{split}
    {L}^h_{HJB} \left(v^e, v^h, \kappa, \chi;x \right) &=\frac{\rho_h}{1-\rho_h} \delta_h^{1 / \rho_h}  \exp\left[({1- \frac{1}{\rho_h}})v^h)\right]-\frac{\delta_h}{1-\rho_h}+r+\frac{1}{2 \gamma_h}\|\pi^h\|^2\\&+\left[\mu_X+\frac{1-\gamma_h}{\gamma_h} \sigma_X \pi^h\right] \cdot \partial_x v^h +\frac{1}{2}\left[\operatorname{tr}\left(\sigma_X^{\prime} \partial_{xx^{\prime}} v^h \sigma_X\right)+\frac{1-\gamma_h}{\gamma_h}\left\|\sigma_X^{\prime} \partial_x v^h\right\|^2\right]=0\end{split} \\
\begin{split}
    {L}_{\kappa} \left(v^e, v^h,\kappa, \chi;x \right) &= \min\Big\{ 1 - \kappa, \, w\gamma_h (1-\chi\kappa) | \sigma_R |^2 - (1-w) \gamma_e \chi \kappa | \sigma_R |^2  \\
	\qquad &+ w(1-w) \frac{\alpha_e - \alpha_h}{\underline{\chi} q} + w(1-w) \left( \sigma_x \sigma_R \right) \cdot \left[ (\gamma_h-1)\partial_x \upsilon^h -  (\gamma_e-1)\partial_x \upsilon^e \right] \Big\} = 0\\
\end{split} \\
\begin{split}
    {L}_{\chi} \left(v^e, v^h, \kappa, \chi;x \right) &= \min\Big\{ \chi - \underline{\chi}, \, \Big[ ((1-w)\gamma_e + w\gamma_h) | D_{z} |^2 + (\partial_w \log q) D_{\upsilon,z} - D_{\upsilon,w} \Big](\chi - w) \\
 \quad &+ w(1-w) (\gamma_e - \gamma_h) | D_{z} |^2 - D_{\upsilon,z} \Big\} = 0
\end{split}
\end{align}

Where:

\begin{align}
 D_{z} &\doteq \sqrt{z_2}\sigma_k + \sigma_{z}' \partial_{z} \log q \\
 D_{\upsilon,w} &\doteq w(1-w) | D_{z} |^2 \partial_w  \big[ (\gamma_h - 1) \upsilon^h - (\gamma_e - 1)\upsilon^e \big] \\
 D_{\upsilon,z} &\doteq w(1-w) \left(\sigma_{z}D_{z} \right) \cdot \partial_{z} \big[ (\gamma_h - 1) \upsilon^h - (\gamma_e - 1)\upsilon^e  \big]
\end{align}

Since $\chi$ can be solved algebraically, we solve (1) to (3).

## Model Architecture
We modify the `DeepGalerkinMethod` code from https://github.com/alialaradi/DeepGalerkinMethod. We construct an object `sim_NN` of class `DGMNet` with the following hyperparameters:

```{list-table}
:header-rows: 1

* - Input
  - Description
  - Parameter used in paper
* - `n_layers`
  - Number of layers
  - 2
* - `units`
  - Number of neurons in each layer
  - 16
* - `input_dim`
  - Dimension of input into first layer
  - 3 (This should be the same as the number of states)
* - `activation`
  - Activation function for all layers except the last
  - tanh
* - `final_activation`
  - Activation function for final layer
  - Identity function for first two dimensions; sigmoid for third dimension. This is so that...
* - `seed`
  - Seed for weight and bias initialization
  - 256
```

We use a Glorot normal initializer to initialize weights and a Glorot uniform initializer to initialize the biases.


## Training
The training set is constructed by drawing uniformly from the three-dimensional cube bounded by [`wMin`,`zMin`,`vMin`] and [`wMax`,`zMax`,`vMax`]. The loss function is given by:

$$
{L}^e_{HJB} + {L}^h_{HJB} + p{L}_{\kappa}
$$

Where $p$ is a penalization parameter. We compute the gradient using `tf.GradientTape` and use an `L-BFGS-B` solver. The full list of parameters for the training stage are:

```{list-table}
:header-rows: 1

* - Input
  - Description
  - Parameter used in paper
* - `points_size`
  - Determines the `batchSize`, which is $2^x$ where $x$ is `points_size`. Batch size is the number of training samples in each epoch.
  - 10
* - `iter_num`
  - Number of epochs, i.e. the number of complete passes through the training set.
  - 5
* - `penalization`
  - Penalty for violating $\kappa$ constraint
  - 10000
* - `seed`
  - Seed for drawing uniform samples
  - 256 (same as seed for initialization)
* - `max_iter`
  - Maximum number of L-BFGS-B iterations (number of times parameters are updated)
  - 100
* - `maxfun`
  - Maximum number of function evaluations
  - 100
* - `gtol`
  - Iteration will stop when $\max|proj(g_i)| \leq$ `gtol` for each entry $i$ of the (projected) gradient vector
  - Machine epsilon
* - `seed`
  - Seed for weight and bias initialization
  - 256
```



(10000 for the results in the paper).