In [None]:
import numpy as np
import struct 

In [None]:
np.set_printoptions(suppress=True)

In [None]:
gdd = np.loadtxt("pod.gdd.dat", skiprows=4)
ldd = np.loadtxt("pod.ldd.dat", skiprows=4)

ld_atom = np.loadtxt("dump_ld", skiprows=9)
dd_atom = np.loadtxt("dump_dd", skiprows=9)

coeffs = np.loadtxt("../HfO2_FPOD_020224_v2_coefficients.pod", skiprows=1)


In [None]:
def parse_bin_file(fname):
    numbers = []
    with open(fname, mode="rb") as f:
        while (byte := f.read(8)):
            (number, ) = struct.unpack('d',byte) 
            numbers.append(number)
    return numbers 


In [None]:
fit_gd = parse_bin_file("../fitpod_ref/train/globaldescriptors_config1.bin")
fit_ld = parse_bin_file("../fitpod_ref/train/basedescriptors_config1.bin")

In [None]:
print(np.shape(gdd))
print(np.shape(ldd))
print(np.shape(ld_atom))
print(np.shape(dd_atom))
print(np.shape(fit_gd[2:]))
print(np.shape(fit_ld[2:]))

12 atoms in this sytem, 4 Hf, 8 O

For gdd, first row is global energy descriptors, then there are $12*3=36$ (12 atoms, 3 cartesian directions) rows of descriptor derivatives. The first column is atom id/index (ignore id=0, which is global), then the 1122 descriptors ($560*2=1120$ and two 1-body columns)

ldd is, as far as I can tell, kind of nonsense. It's aggregating descriptors that should be different (based off of central atom element type), doing this for both global energy and "global" descriptor derivatives. The only way I can justify thiss would be due to the equivalence between AB/BA or something, but then no, I think even AA and BB style descriptors get summed together in a single descriptor column which doesn't make sense. 

ld_atom has, for each atom, the id, type and then the 560 (local) energy descriptors
dd_atom has, for each atom, the id, type, and then the $560*12*3=20160$ descriptor derivatives (i.e. with respect to all other atoms, including itself)

fit_gd should have the same global energy descriptors as the first column of gdd 
fit_ld has the $12*560=6720$ local energy descriptors for each atom, all flattened into a single array. Also it's organized in blocks of descriptors, so the first 12 entries are the first descriptor for atom 1, atom 2, etc., then the next 12 entries are the second descriptor for atom 1, atom 2, etc.

In [None]:
gdd[0][1:22]

So here, a couple notes about how these global descriptors are organized. The first descriptor column is just the one-body descriptor, which has the number of Hf atoms in this case (implying that Hf atom is the first type, so type A). Then, starting with the 2-body descriptors, we have the 8 AA descriptors, then the 8 AB descriptors. Then we move to the three body descriptors, etc. 

Now we skip ahead to descriptors corresponding to a central atom of O:

In [None]:
gdd[0][562:583]

Here, once again, the first column corresponds to the 1-body descriptor, then we have 8 BA descriptors, 8 BB, then move on to the 3 body descriptors. 

Now a key thing to note here is that Cuong has set up POD in such a way that AB and BA are equivalent (this is not necessarily true in other potentials like ACE). Despite this equivalence, here they are explicitly listed out even though they are equivalent. 

In [None]:
np.allclose(gdd[0][10:18],gdd[0][563:571],atol=1e-8)

Moreover, this explicit AB/BA enumeration also shows up in the coefficients and in the displayed calculation of the number of global descriptors in the LAMMPS output. Critically, they are very close to each other, but not equivalent! So clearly some numerical noise preventing the exact coefficients from being recovered

In [None]:
print(coeffs[9:17])
print(coeffs[562:570])

In [None]:
def threebody_invar(p,q,s,Ne):
    if s >= q:
        l = s + (q-1)*Ne - q*(q-1)/2 + (p-1)*Ne*(1+Ne)/2 
    else:
        l = q + (s-1)*Ne -s*(s-1)/2 + (p-1)*Ne*(1+Ne)/2 
    return int(l)

def iterate_3body_invar(Ne): 
    for p in range(1,Ne+1):
        for q in range(1,Ne+1):
            for s in range(1,Ne+1):
                l = threebody_invar(p,q,s,Ne)
                print(f"{p}{q}{s}: {l}")

In [None]:
iterate_3body_invar(2)

While there is this equivalence, it does **not** show up in the descriptors or coefficients, as can be seen by the following quick check, which demos that the only contiguous stretch of repeated descriptors is with the two-body (other matches are spurious/a consequence of just looking at one small and fairly symmetric configuration)

In [None]:
for i in range(1122):
    for j in range(i,1122):
        if i!=j and np.isclose(gdd[0][i+1], gdd[0][j+1], atol=1e-8):
            print(f"{i} {j} {gdd[0][i+1]} {gdd[0][j+1]}")


Uh oh, the global descriptors printed by the dump and that outputted during fitpod are not the same. (close, but meaningfully different)
Nevermind, this was because I was displacing the atoms in the lammps dump test. oops. 

In [None]:
print(gdd[0][1:10])
print(fit_gd[2:12])
np.allclose(gdd[0][1:],fit_gd[2:],atol=1e-5)

In [None]:
print(ld_atom[0][2:12])
print(fit_ld[2:122:12])

np.allclose(ld_atom[0][2:],fit_ld[2:6720:12], atol=1e-5)


Checking that I can do this as expected with numpy

In [None]:
test_arr = np.zeros(12)
add_arr1 = np.array([0,2,2,2,2,2,])
add_arr2 = np.array([1,10,10,10,10,10])

test_arr[1:6] += add_arr1[1:]
test_arr[7:]  += add_arr2[1:]
print(test_arr)

Checking that I can recover the global descriptors from the local descriptors without any funkiness

In [None]:
my_gd = np.zeros(1122)
num_ld = 561
for atom_ld in ld_atom:
    atom_type = int(atom_ld[1]) -1 
    my_gd[atom_type*num_ld] += 1.0
    start = (atom_type*num_ld)+1
    stop = start + num_ld-1
    my_gd[start:stop] += atom_ld[2:]


In [None]:
print(my_gd[:10])
print(gdd[0][1:11])
np.allclose(my_gd, gdd[0][1:], atol=1e-4)

In [None]:
ref_energy = np.loadtxt("pe.dat",skiprows=1)[1]

Pretty good agreement, though there is likely some issue with the output precision of the descriptors

In [None]:
energy_check = np.dot(coeffs,gdd[0][1:])
print(ref_energy)
print(energy_check)


So I thought that using higher precision local descriptors would get a better accuracy, but actually it got a bit worse...

In [None]:
energy_check2 = np.dot(coeffs,my_gd)
print(energy_check2)