# DeeperMD Package 

This package builds on the features provided in the Machine learning interatomic potential (MLIP) development package, DeepMD, streamlining the data preparation, model training, and model validation. By interfacing with dpdata (data preparation package), LAMMPS (molecular dynamics software), and deepmd (MLIP software), the DeeperMD package cleans up the model development process. New functionalities like hyperparameter optimization are included in the package to optimize model parameters.

<div>
<img src="images/deepermd-01.png" width="1000"/>
</div>

## Data Preparation Sub-Package

Separated into two modules: `process_data` and `cross_val`

This subpackage reads in DFT data (currently only OUTCARs supported), and processes them for use in ML-based potentials. 

#### process_data

converts OUTCARs to `.npy` via `dpdata` package, separating data into training and validation directories based on a defined training split proportion.

In [49]:
#import functions
from data_prep.process_data import OUTCAR_to_ms,OUTCAR_to_npy,train_test_split

In [50]:
#define parameters to be used in functions 
par = '/blue/subhash/kimia.gh/B4C_ML_Potential/data/devel/B4C'
dest = '/home/kimia.gh/blue2/python_course/DeeperMD/example'
sub = ['temperature_hold','small_strains']
run = ['temperature_hold','shear_strain','volumetric_strain','uniaxial_strain']

<div>
<img src="images/deepermd_file_Artboard_2.png" width="1000"/>
</div>

#### `OUTCAR_to_ms`

converts OUTCARs from given `parent_directory` to DeepMD MultiSystem() object for storage of systems and data management.

In [51]:
#convert OUTCAR to deepmd MultiSystem() object 
ms,ms_virial,count_novirial,count_virial = OUTCAR_to_ms(
    parent_dir=par,
    sub_dirs=sub,
    run_types=run,
    )

In [52]:
ms.systems

{'B104C16': Data Summary
 Labeled System
 -------------------
 Frame Numbers      : 100
 Atom Numbers       : 120
 Including Virials  : No
 Element List       :
 -------------------
 B  C
 104  16,
 'B96C24': Data Summary
 Labeled System
 -------------------
 Frame Numbers      : 100
 Atom Numbers       : 120
 Including Virials  : No
 Element List       :
 -------------------
 B  C
 96  24}

In [53]:
ms_virial.systems

{'B104C16': Data Summary
 Labeled System
 -------------------
 Frame Numbers      : 600
 Atom Numbers       : 120
 Including Virials  : Yes
 Element List       :
 -------------------
 B  C
 104  16,
 'B96C24': Data Summary
 Labeled System
 -------------------
 Frame Numbers      : 600
 Atom Numbers       : 120
 Including Virials  : Yes
 Element List       :
 -------------------
 B  C
 96  24}

In [54]:
count_virial

{'temperature_hold': 1200,
 'shear_strain': 0,
 'volumetric_strain': 0,
 'uniaxial_strain': 0}

In [55]:
#Convert systems to npy files via deepmd package 
ms.to_deepmd_npy(os.path.join(dest,'no_virials'),set_size = 1000000)
ms_virial.to_deepmd_npy(os.path.join(dest,'virials'),set_size = 1000000)

MultiSystems (2 systems containing 1200 frames)

#### `train_test_split`

scrubs through a directory with `.npy` files and splits them into training and validation directories.

In [56]:
train_test_split(
    destination_dir='/home/kimia.gh/blue2/python_course/DeeperMD/example',
    train_split=0.9,
    ms_virial=ms_virial,
    ms=ms)

#### `OUTCAR_to_npy`

Combines the above functions in one end-to-end method to simplify data preparation stage.

In [58]:
ms_nov,ms_v,count_nov,count_v = OUTCAR_to_npy(
    parent_dir=par,
    destination_dir='/home/kimia.gh/blue2/python_course/DeeperMD/example_2',
    run_types=run)

### cross_val 

[TO-DO]

This sub-module splits training and validation data into k-sets for use in k-fold cross-validation. This is a mostly back-end package for use in hyperparameter optimization data preparation.


## Model Training Sub-Package

separated into 4 sub-modules: `hyperparam_optimization`, `hyperparam_train_test`, `post_training_handling`, and `train`.

In [26]:
import hyperparam_optimization, hyperparam_train_test, post_training_handling, train

define parameters and their values for optimization

In [27]:
params={
    "model descriptor axis_neuron":[4,8],
    "model fitting_net neuron":[[10,10]]
    }

#### Submodule `post_training_handling` 

This submodule includes functions pertinent to post training processing. These functions include: freezing models (`post_training_handling.freeze()`), compressing models (`post_training_handling.compress()`), testing models (`post_training_handling.test()`), lammps input script modification (`post_training_handling.lammps_lat_const_modifier()`), and lammps lattice constant simulations (`post_training_handling.lattice_constants()`).

In the cell below we call `lammps_lat_const_modifier` to modify a LAMMPS input script so it performs a simulation using correct data and model.

In [29]:
post_training_handling.lammps_lat_const_modifier(
    model='graph_compress.pb',
    data='/blue/subhash/michaelmacisaac/functions/deepmd/model1/data.b4c_cell',
    lammps_script='in.lattice_constants')

['# Mark Tschopp, 2010\n', '\n', '# ---------- Initialize Simulation --------------------- \n', 'clear \n', 'units metal \n', 'boundary p p p \n', 'atom_style atomic \n', 'read_data /blue/subhash/michaelmacisaac/functions/deepmd/model1/data.b4c_cell \n', '\n', '# ---------- Create Atoms --------------------- \n']


In [30]:
train_paths='/home/kimia.gh/blue2/python_course/test_03/virials/training_data'
val_paths='/home/kimia.gh/blue2/python_course/test_03/virials/validation_data'

#### Submodule `hyperparam_optimization`
This submodule includes the `json_dir_gen_1d` function which when given a dictionary of parameters and a base input json file, will generate input json files and corresponding directories for 1d hyperparamter optimization. Model jsons will be stored in a directory titled '1d_gridsearch'.

In [None]:
hyperparam_optimization.json_dir_gen_1d(
    base_json='base.json', 
    param_dict=params, 
    training_paths=train_paths, 
    validation_paths=val_paths,
    crossval=False,
    d1_dir='1d_gridsearch')

#### Submodule `hyperparam_train_test`
This submodule features the `hperparam_train_test()` function. This function trains models using the input jsons generated via `hyperparam_optimization.json_dir_gen_1d()` and the `train()` function from the `train` submodule. This function also calls the following functions from `post_training_handling` submodule: `freeze()`, `compress()`, `test()`, and optionally `lattice_constants()` if it is desired to evaluate lattice constants and cohesive energy of a structure using trained models.
The function includes many optional arguments, including: whether to perform lammps simulations, whether to compress models, and whether to perform cross validation.
The function produces a .txt file, where model performance (error) is reported for given key parameters.


```
hyperparam_train_test.hyperparam_train_test(
    directory='/home/kimia.gh/blue2/python_course/DeeperMD/',
    d1_dir='1d_gridsearch',
    lammps_script='/home/kimia.gh/blue2/python_course/DeeperMD/in.lattice_constants', 
    ref_len=5.65,
    ref_coh=-7.2183,
    test_path='/home/kimia.gh/blue2/python_course/test_03/virials/validation_data',
    n=40,
    multisystem=True,
    compression=True,
    crossval=False)

```

# Model Validation Sub-Package

[TO-DO]

This module combines the LAMMPS python package API and the DeePMD-kit API to enable validation of models with MD-based quantities (melting point, elastic constants, etc.). 
