# 1D J1=1.0, J2=0.5: Training with gradient clipping

This notebook is part of the work arXiv:2505.22083 (https://arxiv.org/abs/2505.22083), "Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz". Code written by HLD. 

In this notebook, we collected the best training results from training with gradient clipping by value and by global norm.

In [1]:
import sys
sys.path.append('../../utility')

In [2]:
E_exact = -18.75
syssize = 50
nssamples = 50
J1 = 1.0
J2 = 0.5
nsteps = 401
var_tol = 1.0

# EuclGRU
- The energy curve with no gradient clipping showed multiple kinks and converged at a high value far from the correct value.
- Gradient clipping by value of [-1,1] eliminated the kinks and improved the convergence to the correct value.
-  No gradient clipping by global norm was performed.

In [3]:
from j1j2_hyprnn_train_loop_grad_clipping import *
cell_type = 'EuclGRU'
hidden_units = 75
wf_egru = rnn_eucl_wf(syssize, cell_type, hidden_units)
wf_egru

2025-10-21 16:38:44.567611: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Training with gradient clipping in the range [-1.0,1.0] (both Euclidean and hyperbolic)
Num GPUs Available:  0


<j1j2_hyprnn_wf.rnn_eucl_wf at 0x19a6e5dd0>

In [4]:
#WITH GRAD CLIPPING [-1,1]
nsteps = 451
start = time.time()

mE, vE = run_J1J2(wf=wf_egru, numsteps=nsteps, systemsize=syssize, var_tol= 1.0, J1_  = J1, 
                   J2_ = J2, Marshall_sign = True, 
                  numsamples = nssamples, learningrate = 1e-2, seed = 111, fname = '../results_grad_clipping')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -1.05695, mean energy: 18.03808+0.09419j, varE: 0.64345
step: 10, loss: 13.47987, mean energy: -3.38808-0.27251j, varE: 13.48253
step: 20, loss: 6.60693, mean energy: -8.06062+0.16802j, varE: 6.12464
step: 30, loss: -1.91852, mean energy: -10.75037+0.11048j, varE: 4.17511
step: 40, loss: -2.93944, mean energy: -12.41296+0.09308j, varE: 5.97954
step: 50, loss: -5.46370, mean energy: -11.79622+0.18810j, varE: 4.25440
step: 60, loss: -3.16319, mean energy: -12.91744+0.00946j, varE: 5.26457
step: 70, loss: -0.61833, mean energy: -13.43084+0.12261j, varE: 3.00497
step: 80, loss: 6.42728, mean energy: -13.94592-0.47673j, varE: 3.41955
step: 90, loss: 2.85764, mean energy: -14.56147-0.03251j, varE: 5.14067
step: 100, loss: 11.43860, mean energy: -14.77916+0.45880j, varE: 6.21000
step: 110, loss: -0.65875, mean energy: -15.89623-0.20456j, varE: 4.75950
step: 120, loss: -1.23068, mean energy: -16.21000+0.30454j, varE: 4.01557
step: 130, loss: -3.93219, mean energy: -16.91632+0.02

# HypGRU
- NO gradient clipping was performed since the original training showed a smooth curve that converged at the correct value.

In [3]:
# NO GRAD CLIPPING
from j1j2_hyprnn_train_loop import *
cell_type = 'HypGRU'
hidden_units = 70
wf_hgru = rnn_hyp_wf(syssize, cell_type, 'hyp', 'id', hidden_units)
wf_hgru

<j1j2_hyprnn_wf.rnn_hyp_wf at 0x1a104eb90>

In [4]:
nsteps=451
start = time.time()
mE, vE = run_J1J2_hypvars(wf=wf_hgru, numsteps=nsteps, systemsize=syssize, var_tol=2.0,
                          J1_ = J1, J2_ = J2, Marshall_sign = True, 
                           numsamples = nssamples,  lr1=1e-2, lr2=1e-2, seed = 111, fname = '../results')
finish = time.time()
duration = finish-start
print(f'Total time taken: {duration}')

step: 0, loss: -2.03650, mean energy: 17.50668-0.35098j, varE: 1.58106
step: 10, loss: 1.77698, mean energy: -2.41737+0.21426j, varE: 10.51818
step: 20, loss: -0.17918, mean energy: -3.62105+0.00802j, varE: 13.37337
step: 30, loss: -0.57536, mean energy: -6.97623+0.04984j, varE: 9.35391
step: 40, loss: -0.86969, mean energy: -7.91884+0.01327j, varE: 8.56101
step: 50, loss: -13.84029, mean energy: -9.77025-0.77116j, varE: 10.29447
step: 60, loss: 0.27380, mean energy: -11.32603-0.25297j, varE: 6.58634
step: 70, loss: -3.91599, mean energy: -13.35884+0.24383j, varE: 5.84760
step: 80, loss: -3.76614, mean energy: -13.64775-0.25852j, varE: 3.91111
step: 90, loss: 5.40735, mean energy: -14.65660+0.13743j, varE: 6.07031
step: 100, loss: -13.16647, mean energy: -15.23525+0.03523j, varE: 5.97112
step: 110, loss: 1.22281, mean energy: -16.09338+0.12285j, varE: 3.08350
step: 120, loss: 3.10577, mean energy: -16.17319-0.10365j, varE: 4.42243
step: 130, loss: -4.07657, mean energy: -16.82370-0.004

In [5]:
print(f'Total time taken in hours is {np.round(165683.04630088806/3600,3)}')

Total time taken in hours is 46.023
