### DeepHAM to solve KS model with GM (code on Nuvolos)

In [1]:
!/opt/bin/nvidia-smi

Tue Aug 26 00:24:40 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:00:04.0 Off |                    0 |
| N/A   31C    P0             45W /  400W |       0MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
import os
os.chdir('/content/drive/MyDrive/DeepHAM_Turin/src')
os.getcwd()

'/content/drive/MyDrive/DeepHAM_Turin/src'

In [4]:
!pip install quantecon

Collecting quantecon
  Downloading quantecon-0.9.0-py3-none-any.whl.metadata (5.3 kB)
Downloading quantecon-0.9.0-py3-none-any.whl (324 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m324.4/324.4 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: quantecon
Successfully installed quantecon-0.9.0


## Change to the code directory on Nuvolos

In [5]:
import os
os.chdir('/files/day2/Yang/code/DeepHAM_nuvolos/src')
os.getcwd()

#### code on local machine starts here

In [6]:
# Define the configurations directly instead of using absl flags
config_path = "./configs/KS/game_nn_n50_0fm1gm.json"
exp_name = "1gm"
seed_index = 3

In [7]:
# Imports from the original script
import json
import time
import datetime
from param import KSParam
from dataset import KSInitDataSet
from value import ValueTrainer
from policy import KSPolicyTrainer
from util import print_elapsedtime
from util import set_random_seed

In [8]:
# Load the configuration from the JSON file
with open(config_path, 'r') as f:
    config = json.load(f)

if "random_seed" in config:
    seed = config["random_seed"][seed_index]
    set_random_seed(seed)
    print(f"Using seed {seed} (index {seed_index})")

print("Solving the problem based on the config path {}".format(config_path))

Using seed 789 (index 3)
Solving the problem based on the config path ./configs/KS/game_nn_n50_0fm1gm.json


In [9]:
mparam = KSParam(config["n_agt"], config["beta"], config["mats_path"])
# save config at the beginning for checking
model_path = "../data/simul_results/KS/{}_{}_n{}_{}".format(
    "game" if config["policy_config"]["opt_type"] == "game" else "sp",
    config["dataset_config"]["value_sampling"],
    config["n_agt"],
    exp_name,
)
config["model_path"] = model_path
config["current_time"] = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
os.makedirs(model_path, exist_ok=True)
with open(os.path.join(model_path, "config_beg.json"), 'w') as f:
    json.dump(config, f)

## Block 1 — Dataset & policy initialization (one-time setup)

In [None]:
# --- Setup & dataset ---
start_time   = time.monotonic()
init_ds      = KSInitDataSet(mparam, config)
value_config = config["value_config"]

# --- Initial policy choice used to build the first value-training dataset ---
if config["init_with_bchmk"]:
    init_policy = init_ds.k_policy_bchmk     # PDE / bspline benchmark policy
    policy_type = "pde"
else:
    init_policy = init_ds.c_policy_const_share  # constant consumption share NN policy
    policy_type = "nn_share"

# --- Build value-training datasets from the chosen initial policy (supervised targets) ---
train_vds, valid_vds = init_ds.get_valuedataset(
    init_policy,
    policy_type,
    update_init=False,
)

## Block 2 — Initial value-function training (before any policy optimization)


In [None]:
vtrainers = []
for i in range(value_config["num_vnet"]):
    config["vnet_idx"] = str(i)
    vtrainers.append(ValueTrainer(config))

for vtr in vtrainers:
    vtr.train(train_vds, valid_vds, value_config["num_epoch"], value_config["batch_size"])

> **Notes for readers:**
>
> * We pre-train the value network(s) once on data generated by an initial policy.
> * Each `ValueTrainer.train(...)` runs for `value_config["num_epoch"]` epochs over the prepared datasets.
> * These $V$ nets will be used as the terminal bootstrap $\beta^T V(s_T)$ inside policy optimization.


## Block 3 — Policy training with periodic value function updating

**What happens here.**

* We launch `KSPolicyTrainer.train(num_step, batch_size)`.
* **Every step**:

  * draw a **fresh mini-batch** from `policy_ds` and **simulate new shocks** (`sampler`),
  * take **one policy gradient step** (`train_step`).
* **Policy-dataset refresh (from simulation)**:

  * Inside `sampler`, when `policy_ds.epoch_used > epoch_resample`, we **rebuild the dataset** via `update_policydataset(update_init)`.
  * With your config `epoch_resample = 0`, this means: **rebuild after each full pass over the dataset** (cadence ≈ `ceil(dataset_rows / batch_size)` steps).
  * If `update_init=True` (set right after value retraining), the next rebuild is a **hard refresh**: we also update dataset stats from the new simulation.
* **Every `freq_valid` steps**: run validation on a fixed validation set.
* **Every `freq_update_v` steps** (if `value_sampling != "bchmk"`):

  * **rebuild value datasets** under the **current policy**,
  * **retrain** each value net for `value_config["num_epoch"]` epochs,
  * set `update_init=True` so the **next policy-dataset rebuild** performs a **hard refresh**.


In [10]:
# Iterative policy and value training
policy_config = config["policy_config"]
ptrainer = KSPolicyTrainer(vtrainers, init_ds)
ptrainer.train(policy_config["num_step"], policy_config["batch_size"])

Average of total utility 20.068157.
The dataset has 4608 samples in total.
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 8

100%|██████████| 500/500 [07:36<00:00,  1.10it/s]


Step: 500, valid util: 76.0626, k_end: 36.0824


100%|██████████| 500/500 [03:49<00:00,  2.17it/s]


Step: 1000, valid util: 87.6414, k_end: 34.6034


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]


Step: 1500, valid util: 87.746, k_end: 34.0113


100%|██████████| 500/500 [03:48<00:00,  2.19it/s]


Step: 2000, valid util: 87.7874, k_end: 33.8822


  0%|          | 0/500 [00:00<?, ?it/s]

Average of total utility 20.068157.
The dataset has 4608 samples in total.
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 8

  0%|          | 1/500 [01:19<11:01:06, 79.49s/it]

Value function learning epoch: 200
{'current': 60438016, 'peak': 2091181312}


100%|██████████| 500/500 [05:09<00:00,  1.62it/s]


Step: 2500, valid util: 103.806, k_end: 37.5111


100%|██████████| 500/500 [03:46<00:00,  2.20it/s]


Step: 3000, valid util: 103.801, k_end: 37.0634


100%|██████████| 500/500 [03:50<00:00,  2.17it/s]


Step: 3500, valid util: 103.803, k_end: 37.0857


100%|██████████| 500/500 [03:48<00:00,  2.18it/s]


Step: 4000, valid util: 103.806, k_end: 37.1375


  0%|          | 0/500 [00:00<?, ?it/s]

Average of total utility 20.068157.
The dataset has 4608 samples in total.
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 8

  0%|          | 1/500 [01:19<11:03:17, 79.76s/it]

Value function learning epoch: 200
{'current': 60130304, 'peak': 2091201280}


100%|██████████| 500/500 [05:08<00:00,  1.62it/s]


Step: 4500, valid util: 103.981, k_end: 39.202


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]


Step: 5000, valid util: 103.983, k_end: 39.3408


100%|██████████| 500/500 [03:46<00:00,  2.21it/s]


Step: 5500, valid util: 103.979, k_end: 39.1172


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]


Step: 6000, valid util: 103.984, k_end: 39.4195


  0%|          | 0/500 [00:00<?, ?it/s]

Average of total utility 20.068157.
The dataset has 4608 samples in total.
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 8

  0%|          | 1/500 [01:19<11:04:36, 79.91s/it]

Value function learning epoch: 200
{'current': 60888576, 'peak': 2091201280}


100%|██████████| 500/500 [05:09<00:00,  1.62it/s]


Step: 6500, valid util: 104.012, k_end: 38.956


100%|██████████| 500/500 [03:49<00:00,  2.17it/s]


Step: 7000, valid util: 104.01, k_end: 38.8526


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]


Step: 7500, valid util: 104.006, k_end: 38.725


100%|██████████| 500/500 [03:46<00:00,  2.20it/s]


Step: 8000, valid util: 104.012, k_end: 39.067


  0%|          | 0/500 [00:00<?, ?it/s]

Average of total utility 20.068157.
The dataset has 4608 samples in total.
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 80
Value function learning epoch: 100
Value function learning epoch: 120
Value function learning epoch: 140
Value function learning epoch: 160
Value function learning epoch: 180
Value function learning epoch: 200
Value function learning epoch: 0
Value function learning epoch: 20
Value function learning epoch: 40
Value function learning epoch: 60
Value function learning epoch: 8

  0%|          | 1/500 [01:19<11:02:16, 79.63s/it]

Value function learning epoch: 200
{'current': 60439552, 'peak': 2091201280}


100%|██████████| 500/500 [05:09<00:00,  1.62it/s]


Step: 8500, valid util: 104.026, k_end: 39.4568


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]


Step: 9000, valid util: 104.024, k_end: 39.3557


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]


Step: 9500, valid util: 104.023, k_end: 39.0643


100%|██████████| 500/500 [03:49<00:00,  2.18it/s]

Step: 10000, valid util: 104.025, k_end: 39.3181






> **Notes for readers:**
>
> * **Mini-batch & shocks:** new **every step**.
> * **Policy-dataset cadence:** with `t_sample=200` and `t_skip=4`, each path contributes \~50 time-slices;
>   dataset rows ≈ `n_path * 50` (minus NaN rows). Rebuild after each full pass:
>   `steps_per_dataset ≈ ceil(dataset_rows / batch_size)`.
>   Example: if `n_path=384`, rows ≈ `384*50=19,200` → `19,200/384=50` steps per rebuild.
> * **Validation:** every `freq_valid=500` steps (20 times for `num_step=10,000`).
> * **Value retrain:** every `freq_update_v=2000` steps (5 times total). This sets `update_init=True`; the **next** dataset rebuild then also updates dataset statistics from the new simulation (hard refresh).
>

In [11]:
# Save config and models
with open(os.path.join(model_path, "config.json"), 'w') as f:
    json.dump(config, f)

for i, vtr in enumerate(vtrainers):
    vtr.save_model(os.path.join(model_path, "value{}.weights.h5".format(i)))

ptrainer.save_model(os.path.join(model_path, "policy.weights.h5"))

end_time = time.monotonic()
print_elapsedtime(end_time - start_time)

Elapsed time: 01:27:57.68


In [12]:
model_path

'../data/simul_results/KS/game_nn_n50_1gm'