# Example Workflow

## Assumed directory structure

```
example_directory
├── cp2k_input
│   └── example_template.inp
├── cp2k_output
├── lammps
│   └── template.lmp
├── n2p2
│   └── input.nn.template
├── scripts
│   ├── cp2k_batch_template.bat
│   ├── example.ipynb
│   └── template.bat
├── xyz
└── example_trajectory.history
```

While functions allow for filepaths to be specified, the default arguments will assume the above directory structure, and will read and write to locations accordingly.

Another aspect of the code is the formatting of file names when creating multiple files with a regular naming pattern. For example, as only a single trajectory is expected this is given with a full file name (e.g. `'example_trajectory.history'`) but the individual frames should contain a pair of braces to allow formatting (e.g. `'xyz/{}.xyz'`).

Finally, there is a reliance on "template" files which contain details that are not needed to be configured between different frames etc. To change these, simply modify the template files.

In [None]:
# TODO cannot pip install as tmp is full?
import sys
sys.path.append('/home/vol00/scarf860/cc_placement/CC_HDNNP/src')
from data import Data
d = Data(elements={'H': 1, 'C': 6, 'O': 8}, data_directory='../', n2p2_bin='path/to/bin')

## 1. Generate atomic configurations
There are no utility scripts for the generation of configurations, however a full trajectory can be converted into individual frames by:

In [None]:
d.read_trajectory('example_trajectory.history')
d.write_xyz('xyz/{}.xyz')

## 2. Write CP2K
Both batch scripts and input files for CP2K can be generated by: 

In [None]:
d.write_cp2k(file_batch='scripts/cp2k_batch_{}.bat',
             file_input='cp2k_input/example_{}.inp',
             file_xyz='xyz/{}.xyz',
             n_config=1,
             cutoff=(400, 600, 800),
             relcutoff=(40, 60, 80))

In this case, we have generated 9 input files and 9 batch scripts by specifying 3 values for `cutoff` and `relcutoff`. These can then be used to determine the best values for these settings (that balances accuracy with time taken).

## 3. Run CP2K
The previous step should output `bash ../scripts/all.bash`. This bash script will submit all the `.bat` files which will submit Slurm jobs for all 9 cases.

## 4. Choose (rel)cutoff
To extract the useful information from the CP2K output, the following function will print a table comparing energy, time taken and grid allocation:

In [None]:
d.print_cp2k_table(n_config=1, cutoff=(400, 600, 800), relcutoff=(40, 60, 80))

Once the best value is chosen, to run CP2K with more frames repeat steps 2. and 3. but with different arguments:

In [None]:
d.write_cp2k(file_batch='scripts/cp2k_batch_{}.bat',
             file_input='cp2k_input/example_{}.inp',
             file_xyz='xyz/{}.xyz',
             n_config=101,
             cutoff=(600),
             relcutoff=(60))

## 5. Write N2P2
Once force and energy values are obtained from CP2K, these can be written to the N2P2 data format:

In [None]:
d.write_n2p2_data(file_log='cp2k_output/example_n_{}_cutoff_600_relcutoff_60.log',
                  file_forces='cp2k_output/example_n_{}_cutoff_600_relcutoff_60-forces-1_0.xyz',
                  file_xyz='xyz/{}.xyz',
                  file_input='n2p2/input.data',
                  n_config=101)

Multiple different symmetry functions can be written to the same network input file, for example both shifted and centered versions of the radial, wide and narrow functions:

In [None]:
d.write_n2p2_nn(file_template='input.nn.template',
                file_nn='input.nn',
                r_cutoff=12.0,
                type='radial',
                rule='imbalzano2018',
                mode='center',
                n_pairs=5)
d.write_n2p2_nn(file_template='input.nn.template',
                file_nn='input.nn',
                r_cutoff=12.0,
                type='angular_narrow',
                rule='imbalzano2018',
                mode='center',
                n_pairs=5,
                zetas=[1])
d.write_n2p2_nn(file_template='input.nn.template',
                file_nn='input.nn',
                r_cutoff=12.0,
                type='angular_wide',
                rule='imbalzano2018',
                mode='center',
                n_pairs=5,
                zetas=[1])
d.write_n2p2_nn(file_template='input.nn.template',
                file_nn='input.nn',
                r_cutoff=12.0,
                type='radial',
                rule='imbalzano2018',
                mode='shift',
                n_pairs=5)
d.write_n2p2_nn(file_template='input.nn.template',
                file_nn='input.nn',
                r_cutoff=12.0,
                type='angular_narrow',
                rule='imbalzano2018',
                mode='shift',
                n_pairs=5,
                zetas=[1])
d.write_n2p2_nn(file_template='input.nn.template',
                file_nn='input.nn',
                r_cutoff=12.0,
                type='angular_wide',
                rule='imbalzano2018',
                mode='shift',
                n_pairs=5,
                zetas=[1])

## 6. Scale and prune symmetry functions
Before training, the symmetry functions must be "scaled", and in order to make the training process less expensive they can also be "pruned". Those with a low range across the `input.data` are deemed to be less desirable than those that vary a lot, and are commented out of `input.nn`.

## 7. Train network
Provided there are an acceptable number of symmetry functions after pruning (if not step 6 can be re-run with a higher or lower threshold) the network can now be trained.

The batch scripts for steps 6 and 7 are generated by the following:


In [None]:
d.write_n2p2_scripts(range_threshold: float=1e-4)

The most recent weights (those from the last epoch) are copied and renamed to the format `weights.<atomic_number>.data`. If for whatever reason a different epoch is desired, then the files should be renamed manually.

## 8. Write LAMMPS
To set up LAMMPS with data from an existing `.xyz` file, the `write_lammps_data` functon can be used. The interaction is defined by `write_lammps_pair`, which creates a LAMMPS input file based on the template provided:

In [None]:
d.write_lammps_data(file_xyz='xyz/0.xyz', lammps_unit_style='metal')
d.write_lammps_pair(r_cutoff=6.351,
                    file_template='lammps/template.lmp',
                    file_out='lammps/md.lmp',
                    n2p2_directory='n2p2',
                    lammps_unit_style='metal')

## 9. Run LAMMPS
Finally, LAMMPS can be run using the the neural network potential defining the interactions.