# 1D J1=1.0, J2=0.8: Training with gradient clipping

This notebook is part of the work arXiv:2505.22083 (https://arxiv.org/abs/2505.22083), "Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz". Code written by HLD. 

In this notebook, we collected the best training results from training with gradient clipping by value and by global norm.

In [1]:
#Training with gradient clipping
import sys
sys.path.append('../../utility')
from j1j2_hyprnn_train_loop_grad_clipping_gn import *

2025-11-02 17:57:05.552643: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Training with gradient clipping by global norm of 9.0 (both Euclidean and hyperbolic)
The norm clipping is defined in the Adam optimizer
Num GPUs Available:  0


In [2]:
E_exact = -20.9841
syssize = 50
nssamples = 50
J1 = 1.0
J2 = 0.8
var_tol = 2.0

# EuclGRU

- The energy curve with no gradient clipping showed some kinks and was overall not smooth.
- GC by value in the range [-1,1] led to a worse result with a converged energy value being far away from the correct value.
- GC by global norm of 8.0 made the curve smoother but also at a very slightly higher converged energy compared to the no-GC case.

In [3]:
cell_type = 'EuclGRU'
hidden_units = 75
wf_egru = rnn_eucl_wf(syssize, cell_type, hidden_units)
wf_egru

<j1j2_hyprnn_wf.rnn_eucl_wf at 0x7fcf204b7b50>

In [5]:
#TRAINING WITH GRADIENT CLIPPING BY GLOBAL NORM 8.0
nsteps = 451
start = time.time()

mE, vE = run_J1J2(wf=wf_egru, numsteps=nsteps, systemsize=syssize, var_tol= var_tol, J1_  = J1, 
                   J2_ = J2, Marshall_sign = True, 
                  numsamples = nssamples, learningrate = 1e-2, seed = 111, fname = '../gn_results_grad_clipping_global_norm')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -1.37457, mean energy: 21.62602+0.10530j, varE: 1.11107
step: 10, loss: 19.63758, mean energy: -3.92291-0.02272j, varE: 12.71488
step: 20, loss: -0.26210, mean energy: -8.55120+0.30371j, varE: 7.94827
step: 30, loss: -7.23647, mean energy: -9.89894+0.03235j, varE: 14.51643
step: 40, loss: 4.14861, mean energy: -9.67870+0.18526j, varE: 2.90372
step: 50, loss: -6.01268, mean energy: -11.30545+0.16722j, varE: 2.46759
step: 60, loss: -4.62308, mean energy: -11.19780-0.08426j, varE: 5.25986
step: 70, loss: -0.80757, mean energy: -11.82998+0.18075j, varE: 13.72872
step: 80, loss: 0.97972, mean energy: -11.13771+0.13303j, varE: 4.90203
step: 90, loss: 10.39035, mean energy: -11.97471-0.36018j, varE: 8.09400
step: 100, loss: 2.78839, mean energy: -11.36987-0.55350j, varE: 6.05023
step: 110, loss: -2.45224, mean energy: -14.45003-0.01314j, varE: 6.97956
step: 120, loss: 4.46085, mean energy: -14.54065+0.32027j, varE: 5.57924
step: 130, loss: 1.66693, mean energy: -14.08788-0.1258

# HypGRU
- The energy curve with no gradient clipping showed some kinks in the late stage of the training and was overall not smooth.
- GC by value in the range [-1,1] led to a worse result with a jump nearing the end of the training
- GC by global norm of 9.0 caused the training to be unstable and go towards to opposite value 
- GC by global norm of 8.0 improved the curve by eliminating the 2 late kinks but it also introduced an earlier kink (way before the convergence was reached). The overall converged value also happened at a slighlty higher value compared to the no-GC case.

In [3]:
cell_type = 'HypGRU'
hidden_units = 75
wf_hgru = rnn_hyp_wf(syssize, cell_type, 'hyp', 'id', hidden_units)
wf_hgru

<j1j2_hyprnn_wf.rnn_hyp_wf at 0x7fbb064b7d60>

In [7]:
#TRAINING WITH GRADIENT CLIPPING BY GLOBAL NORM 8.0
nsteps=451
start = time.time()
mE, vE = run_J1J2_hypvars(wf=wf_hgru, numsteps=nsteps, systemsize=syssize, var_tol=var_tol,
                          J1_ = J1, J2_ = J2, Marshall_sign = True, 
                           numsamples = nssamples,  lr1=1e-2, lr2=1e-2, seed = 111, fname = '../gn_results_grad_clipping_global_norm')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -2.68951, mean energy: 18.92231+0.30570j, varE: 8.41262
step: 10, loss: 12.79593, mean energy: -2.87399+0.01304j, varE: 17.69459
step: 20, loss: -12.23351, mean energy: -6.03501-0.93176j, varE: 20.92467
step: 30, loss: -10.16785, mean energy: -10.47509+0.20206j, varE: 10.04062
step: 40, loss: -17.00056, mean energy: -8.32347-0.17412j, varE: 12.44150
step: 50, loss: -12.61209, mean energy: -10.68806-0.41055j, varE: 9.36091
step: 60, loss: -3.36862, mean energy: -11.26937+0.18819j, varE: 7.40503
step: 70, loss: 3.67706, mean energy: -11.68194+0.45844j, varE: 8.33232
step: 80, loss: -7.35777, mean energy: -10.16223-0.19470j, varE: 9.66072
step: 90, loss: 4.06979, mean energy: -10.59185+0.08429j, varE: 10.90531
step: 100, loss: -9.17120, mean energy: -6.70228-0.46023j, varE: 9.04622
step: 110, loss: -3.20023, mean energy: -5.12317-0.04790j, varE: 15.04341
step: 120, loss: 4.78419, mean energy: -7.56306-0.07825j, varE: 16.01200
step: 130, loss: -12.68428, mean energy: -11.416