Copyright (c) 2021 April M. Misch <miksch@theochem.uni-stuttgart.de>, Alex Urban <a.urban@columbia.edu>, and Nong Artrith <nartrith@atomistic.net>
Distributed under the terms of the Mozilla Public License, version 2.0 (https://www.mozilla.org/en-US/MPL/2.0/)

This notebook is a companion to Miksch, Morawietz, Kaestner, Urban, Artrith, to be published (2021).

The tutorial demonstrates the construction of an artificial neural network (ANN) interatomic potential for TiO2 Clusters using the dataset given.



# Prerequisites - installation of ænet

We will use the atomic energy network (ænet) package (http://ann.atomistic.net) artificial neural network (ANN) potential package

    N. Artrith and A. Urban, Comput. Mater. Sci. 114 (2016) 135-150.
    N. Artrith, A. Urban, and G. Ceder, Phys. Rev. B 96 (2017) 014112.
    Larsen et al., J. Phys.: Condens. Matter Vol. 29 273002, 2017


In [None]:
try:
  import aenet
  print("successfully imported aenet")
except ImportError:
  !! git clone https://github.com/atomisticnet/aenet.git
  !! cd aenet/lib && make
  !! cd aenet/src && make -f makefiles/Makefile.gfortran_serial
  !! cd aenet/src && make -f makefiles/Makefile.gfortran_serial lib
  !! cd aenet/python3 && python3 setup.py build_ext --inplace
  !! cd aenet/python3 && pip install -e . --user
  print("completed installation of aenet")

# Importing Materials from Github
Get additional materials for this tutorial from GitHub (https://github.com/patra-group/AENET_TiO2_Files.git).

In [None]:
!rm -rf 0* xsf sanple_data AENET_TiO2_Files
import os
if not os.path.exists('AENET_TiO2_Files'):
  !! git clone https://github.com/patra-group/AENET_TiO2_Files.git
else:
  print("Tutorial files are already installed.")
!!cd AENET_TiO2_Files/
!!unzip AENET_TiO2_Files/AENET_TiO2_Files.zip

# Python imports

We need to import Python packages/libaries that we will use below.

In [None]:
import numpy as np
import pandas as pd
import re
from IPython.display import Image
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams.update({"font.size": 15})
plt.close("all")

In [None]:
try:
  import ase
  print("successfully imported ase")
except ImportError:
  !! pip install ase --user --upgrade
  print("completed installing ASE")

In [None]:
try:
  import aenet
  import ase
  print("Both aenet and ase could be imported. You are all set for the tutorial.")
except ImportError:
  print("The notebook needs to be restarted. Run this cell again once the restart is done.")
  import os
  os.kill(os.getpid(), 9)
import ase
import ase.spacegroup
import ase.visualize
atoms = ase.io.read("/content/xsf/structure0200.xsf")
ase.visualize.view(atoms, viewer='x3d')

# 01-generate: Generating the reference data set

The first step is the transformation of the atomic structures in the reference data set into a feature-vector representation of local atomic environments.  This is done with the ænet tool `generate.x`.

We enter a subdirectory in which we will generate the reference data set.

Let's take a look at the *fingerprint* set-up file for **Ti**.  The one for **O** is similar.

[1] J. Behler, [J. Chem. Phys. 134, 074106 (2011)](https://doi.org/10.1063/1.3553717).

In this example, everything is already prepared to start with the reference data set generation.  So, we can simply run `generate.x` and wait until all structures have been processed.  We collect the output that `generate.x` prints to the screen in a file named `generate.out`.  Note that this can take a minute:

The training-set file `TiO2.train` has been generated. **Note that training-set files are binary files and cannot be opened with a text editor.**

Now, we are ready to train our TiO2 cluster potential.  Let's return to the main directory.

# 02-train: Training an ANN potential

In this section, we will use the reference data set generated in the previous section to train an artificial neural network (ANN) potential.

Potential training is also done with a command-line tool, which is named `train.x`. 

An input file for `train.x`, named `train.in`, has already been prepared for us.  This file contains the following instructions

- The path to the reference data set file (`TiO2.train`),
- The fraction of the data set that should be set aside as validation set,
- The number of training iterations (epochs) to perform,
- The training method and its parameters, and
- The *architecture* of the atomic-energy ANNs for each species.

For more details, see also the aenet documentation: http://ann.atomistic.net/documentation/

Before we can start with training, the reference data set file needs to be made available.  We will copy **TiO2.train from 01-generate** to the present directory.

Now, we perform the training.  Note that this can take a few minutes.  We can observe the training iterations.

# Analyzing the training results

Was the training successful?  One measure is the performance of the trained potential on the validation set, which is reported for each training epoch on lines that like the following:

    N  MAE_train  RMSE_train  MAE_val  RMSE_val  <

`N` is the training iteration (or *epoch*), `MAE_train` and `RMSE_train` are the *mean absolute error* and *root mean squared error* on the training set, and `MAE_val` and `RMSE_val` are the same for the validation set.

In [None]:
errors = []
with open("/content/02-train/train.out") as fp:
  for line in fp:
    if re.match("^ *[0-9].*<$", line):
      errors.append([float(a) for a in line.split()[1:-1]])
errors = np.array(errors)
errors = pd.DataFrame(
    data=errors, 
    columns=['MAE_train', 'RMSE_train', 'MAE_test', 'RMSE_test'])
ax = errors[['RMSE_train', 'RMSE_test']].plot(logy=True)
ax.set_xlabel("Epoch"); ax.set_ylabel("RMSE (eV/atom)")
plt.show()

## Comparison of training and validation errors

`train.x` can export the errors for all training samples in the training and validation sets.

In [None]:
test = "/content/02-train/energies.train.0"
test_errors = np.loadtxt(test, skiprows=1, usecols=(3,4))
limits = np.linspace(-8.533210, -6.884724)
plt.plot(limits, limits, color="black")
plt.ticklabel_format(useOffset=False)
plt.scatter(test_errors[:,0], test_errors[:,1], color="red", s=20, label="validation")
plt.xlabel('DFT (eV/atom)')
plt.ylabel('ANN (eV/atom)')
plt.show()

#03-predict: Predicting the TiO2 structure
There are 10 xsf files in 03-predict for which we use the training model to predict it's energy