# 1D J1=1.0, J2=0.0: Training with gradient clipping

This notebook is part of the work arXiv:2505.22083 (https://arxiv.org/abs/2505.22083), "Hyperbolic recurrent neural network as the first type of non-Euclidean neural quantum state ansatz". Code written by HLD. 

In [1]:
import sys
sys.path.append('../../utility')
from j1j2_hyprnn_train_loop_grad_clipping_gn import *

2025-10-22 18:41:40.004661: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Training with gradient clipping by global norm of 8.0 (both Euclidean and hyperbolic)
The norm clipping is defined in the Adam optimizer
Num GPUs Available:  0


In [2]:
E_exact = -21.9721
syssize = 50
nssamples = 50
J1 = 1.0
J2 = 0.0
var_tol = 2.0

# EuclGRU
- The no clipping case has a kink in the energy convergence curve.
- Clipping by value in the range [-1,1] eliminated the kink but led to a worse result in the no clipping case.
- Clipping with global norm of 8.0 eliminated the kink and led to a better result than the no clipping case.

In [4]:
cell_type = 'EuclGRU'
hidden_units = 75
wf_egru = rnn_eucl_wf(syssize, cell_type, hidden_units)
wf_egru

<j1j2_hyprnn_wf.rnn_eucl_wf at 0x13d2dce60>

In [5]:
#WIH GRADIENT CLIPPING: GLOBAL NORM 8.0
nsteps = 451
start = time.time()

mE, vE = run_J1J2(wf=wf_egru, numsteps=nsteps, systemsize=syssize, var_tol=0.8, J1_  = J1, 
                   J2_ = J2, Marshall_sign = True, 
                  numsamples = nssamples, learningrate = 1e-2, seed = 111, fname = '../gn_results_grad_clipping_global_norm')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -0.52756, mean energy: 12.05817+0.07567j, varE: 0.18605
step: 10, loss: -3.35701, mean energy: -3.01039+0.11845j, varE: 9.50387
step: 20, loss: -1.59732, mean energy: -10.23455-0.00318j, varE: 9.74746
step: 30, loss: -4.76671, mean energy: -14.28209-0.06600j, varE: 10.34802
step: 40, loss: 4.08353, mean energy: -16.18631-0.16386j, varE: 6.41848
step: 50, loss: -0.08484, mean energy: -16.90627-0.07891j, varE: 7.01464
step: 60, loss: 2.18680, mean energy: -16.70524-0.00042j, varE: 4.39975
step: 70, loss: 1.32184, mean energy: -17.56263-0.11947j, varE: 6.37796
step: 80, loss: 3.11919, mean energy: -19.20714+0.00435j, varE: 3.41796
step: 90, loss: 6.82774, mean energy: -19.21720-0.13891j, varE: 3.63528
step: 100, loss: -4.40378, mean energy: -19.07112+0.19135j, varE: 3.84235
step: 110, loss: 8.89722, mean energy: -19.16039-0.00685j, varE: 5.30370
step: 120, loss: -3.08514, mean energy: -19.60507+0.00065j, varE: 2.43396
step: 130, loss: 12.27612, mean energy: -19.65952-0.0377

# HypGRU (units = 60)

- The no clipping case has a kink in the energy curve.
- Clipping by the global norm of 8.0 led to a much worse results compared to the no gradient clipping case (values converging in the opposite direction, with many instabilities).
- Clipping by value in the range of [-1,1] eliminated the kink and led to a better result compared to no gradient clipping

In [6]:
cell_type = 'HypGRU'
hidden_units = 60
wf_hgru = rnn_hyp_wf(syssize, cell_type, 'hyp', 'id', hidden_units)
wf_hgru

<j1j2_hyprnn_wf.rnn_hyp_wf at 0x1330a2c90>

In [None]:
#WITH GRAD CLIPPING BY VALUE [-1,1]
nsteps=451
start = time.time()
mE, vE = run_J1J2_hypvars(wf=wf_hgru, numsteps=nsteps, systemsize=syssize, var_tol=var_tol,
                          J1_ = J1, J2_ = J2, Marshall_sign = True, 
                           numsamples = nssamples,  lr1=1e-2, lr2=1e-2, seed = 111, fname = '../results_grad_clipping')
finish = time.time()
duration = finish-start
print(f'Total time taken: {np.round(duration/3600,3)}')

step: 0, loss: -0.92902, mean energy: 11.82525-0.35288j, varE: 1.14977
step: 10, loss: -6.00800, mean energy: -4.82071+0.66407j, varE: 15.49046
step: 20, loss: -5.75761, mean energy: -9.20987-0.02513j, varE: 9.76353
step: 30, loss: 6.09009, mean energy: -13.62626+0.01417j, varE: 11.07539
step: 40, loss: -4.46379, mean energy: -15.18603+0.02579j, varE: 8.35044
step: 50, loss: -2.60687, mean energy: -14.58490-0.20262j, varE: 10.14807
step: 60, loss: -10.39885, mean energy: -15.69618-0.21522j, varE: 8.73098
step: 70, loss: -8.58556, mean energy: -17.20873-0.21899j, varE: 6.68708
step: 80, loss: 11.73067, mean energy: -17.53881-0.16029j, varE: 3.75793
step: 90, loss: -3.82687, mean energy: -18.34618-0.17288j, varE: 3.65695
step: 100, loss: 0.71140, mean energy: -18.55058+0.12360j, varE: 3.55686
step: 110, loss: -1.60905, mean energy: -18.82741-0.04570j, varE: 2.11870
Best model saved at epoch 117 with best E=-19.55673+0.00414j, varE=1.41861
step: 120, loss: -1.51529, mean energy: -19.23568