# 1D J1=1.0, J2=0.2: Training with gradient clipping

This notebook is part of the work arXiv:2505.22083 (https://arxiv.org/abs/2505.22083), "Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz". Code written by HLD. 

In this notebook, we collected the best training results from training with gradient clipping by value and by global norm.

In [1]:
import sys
sys.path.append('../../utility')
from j1j2_hyprnn_train_loop_grad_clipping_gn import *

2025-10-24 19:37:22.329153: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Training with gradient clipping by global norm of 9.0 (both Euclidean and hyperbolic)
The norm clipping is defined in the Adam optimizer
Num GPUs Available:  0


In [3]:
E_exact = -20.3150
syssize = 50
nssamples = 50
J1 = 1.0
J2 = 0.2
nsteps = 401
var_tol = 2.0

# EuclGRU
- The energy curve with no gradient clipping showed a big jump which caused the convergence to occur at a high value.
- Clipping by value eliminated the jump but still led to a large bump in the curve. 
-  Clipping by global norm eliminated the jump (with only 2 small kinks before the convergence is reached) and overall improved the convergence.

In [3]:
cell_type = 'EuclGRU'
hidden_units = 75
wf_egru = rnn_eucl_wf(syssize, cell_type, hidden_units)
wf_egru

<j1j2_hyprnn_wf.rnn_eucl_wf at 0x1aa96aa50>

In [4]:
# WITH GRAD CLIPPING (GLOBAL NORM = 8.0)
nsteps = 451
start = time.time()

mE, vE = run_J1J2(wf=wf_egru, numsteps=nsteps, systemsize=syssize, var_tol=var_tol, J1_  = J1, 
                   J2_ = J2, Marshall_sign = True, 
                  numsamples = nssamples, learningrate = 1e-2, seed = 111, fname = '../gn_results_grad_clipping_global_norm')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -0.73938, mean energy: 14.45013+0.08308j, varE: 0.32071
step: 10, loss: 3.19461, mean energy: -3.29343-0.04501j, varE: 9.89220
step: 20, loss: -2.86215, mean energy: -7.93687+0.06412j, varE: 6.51807
step: 30, loss: -2.64299, mean energy: -13.70494+0.01687j, varE: 9.23479
step: 40, loss: -3.08258, mean energy: -13.84703-0.09183j, varE: 5.69278
step: 50, loss: 3.34232, mean energy: -14.58271+0.17713j, varE: 5.62505
step: 60, loss: -0.82777, mean energy: -15.45306+0.01563j, varE: 6.77785
step: 70, loss: 2.34790, mean energy: -16.64071-0.18790j, varE: 3.77840
step: 80, loss: 2.29428, mean energy: -16.63818-0.13210j, varE: 2.08316
step: 90, loss: 3.44128, mean energy: -16.73352+0.28136j, varE: 4.86252
step: 100, loss: 6.12521, mean energy: -17.62121-0.12376j, varE: 2.24605
step: 110, loss: 0.69685, mean energy: -17.65031+0.03554j, varE: 3.08913
step: 120, loss: 4.11014, mean energy: -17.81325-0.08980j, varE: 1.39905
Best model saved at epoch 127 with best E=-18.20065+0.19124j

# HypGRU
- With no gradient clipping, there is a small kink in the energy curve before convergence is reached
- With gradient clipping by value in the range [-1,1] and by global norm of 8.0, convergence at the right value could not be reached
- With gradient clipping by global norm of 9.0, the kink is eliminated and convergence is reached at near the correct value.

In [4]:
cell_type = 'HypGRU'
hidden_units = 75
wf_hgru = rnn_hyp_wf(syssize, cell_type, 'hyp', 'id', hidden_units)
wf_hgru

<j1j2_hyprnn_wf.rnn_hyp_wf at 0x10f53f890>

In [5]:
#gn_results_grad_clipping_global_norm of 9
# WITH GRAD CLIPPING
nsteps=451
start = time.time()
mE, vE = run_J1J2_hypvars(wf=wf_hgru, numsteps=nsteps, systemsize=syssize, var_tol=var_tol,
                          J1_ = J1, J2_ = J2, Marshall_sign = True, 
                           numsamples = nssamples,  lr1=1e-2, lr2=1e-2, seed = 111, fname = '../gn_results_grad_clipping_global_norm')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -0.92465, mean energy: 12.68636+0.32971j, varE: 3.82003
step: 10, loss: 3.08801, mean energy: -2.07289-0.03995j, varE: 10.76103
step: 20, loss: 7.24075, mean energy: -4.85589+0.18382j, varE: 10.33183
step: 30, loss: -0.66522, mean energy: -9.19421+0.08008j, varE: 8.62918
step: 40, loss: -2.61273, mean energy: -12.04500+0.39547j, varE: 8.87465
step: 50, loss: -7.09064, mean energy: -14.25949-0.01635j, varE: 7.86061
step: 60, loss: -2.34944, mean energy: -16.53273-0.13804j, varE: 3.29932
step: 70, loss: 2.31102, mean energy: -17.25464+0.05029j, varE: 3.49061
step: 80, loss: -0.07616, mean energy: -16.96469-0.15660j, varE: 2.76358
Best model saved at epoch 81 with best E=-17.33574-0.23736j, varE=1.62055
Best model saved at epoch 83 with best E=-17.46841-0.00353j, varE=1.74533
Best model saved at epoch 87 with best E=-17.57648-0.15065j, varE=1.70142
step: 90, loss: 0.25479, mean energy: -16.86822+0.06549j, varE: 4.29619
step: 100, loss: 1.20917, mean energy: -17.87919-0.1769