# 02 Train PIP-NN

## Load rawdata files:

- `NaOH_train.xyz`
- `NaOH_valid.xyz`
- `NaOH_CCSD-T_train.txt`
- `NaOH_CCSD-T_valid.txt`

## PIP

In [1]:
from openbabel import pybel
from MZMol import MZMol
import numpy as np
from pip_1_1_1_4 import basis

In [2]:
# Load mol
def loadxyz(filename):
    return [MZMol(mol) for mol in pybel.readfile(format="xyz", filename=filename)]

mol_train = loadxyz("rawdata/NaOH_train.xyz")
mol_valid = loadxyz("rawdata/NaOH_valid.xyz")

### Distance Vectors

Since we have the geometry structures of molecules, we can compute their distance matrix:

$$
D_{ij} = \begin{bmatrix}
0 & r_{12} & r_{13} & \cdots & r_{1n} \\
r_{21} & 0 & r_{23} & \cdots & r_{2n} \\
r_{31} & r_{32} & 0 & \cdots & r_{3n} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
r_{n1} & r_{n2} & r_{n3} & \cdots & 0
\end{bmatrix}
$$

where $r_{ij}$ is the distance between atom $i$ and atom $j$, also, $r_{ij} = r_{ji}$.

If we only take the none-zero elements in upper triangular part of distance matrix, we will get the distance vector $R$.

### Morse-Like Transform

We will perform this transform on distance vector $R$

$$
\mathrm{morse}(r) = \exp \left[ - \frac{r}{\alpha}\right]
$$

### Permutation Invariant Polynomial

Here we need to generate PIP from the Morse-like vector, the fortran module from MSA-2.0 is used.

In [3]:
def morse(r: np.array, alpha=1.0) -> np.array:
    return np.exp(-1.0 * r / alpha)

r_train = np.array([mol.distance_vector for mol in mol_train])
M_train = np.apply_along_axis(morse, 1, r_train)
pip_train = np.apply_along_axis(basis.bemsav, 1, M_train)
X_train = pip_train[:, 1:]

r_valid = np.array([mol.distance_vector for mol in mol_valid])
M_valid = np.apply_along_axis(morse, 1, r_valid)
pip_valid = np.apply_along_axis(basis.bemsav, 1, M_valid)
X_valid = pip_valid[:, 1:]

X_train.shape, X_valid.shape

((12720, 34), (3181, 34))

## Energy

In [4]:
from scipy import constants as C

# Load E
E_train = np.loadtxt("rawdata/NaOH_CCSD-T_train.txt").reshape((-1, 1))
E_valid = np.loadtxt("rawdata/NaOH_CCSD-T_valid.dat").reshape((-1, 1))

E_min = min(E_train.min(), E_valid.min())
Y_train = (E_train - E_min) * C.physical_constants["Hartree energy in eV"][0]
Y_valid = (E_valid - E_min) * C.physical_constants["Hartree energy in eV"][0]

Y_train.shape, Y_valid.shape

((12720, 1), (3181, 1))

## Save to files

In [5]:
np.savetxt("X_train.txt", X_train)
np.savetxt("X_valid.txt", X_valid)
np.savetxt("Y_train.txt", Y_train)
np.savetxt("Y_valid.txt", Y_valid)