# Example: Training a Descriptor based Potential

Let us define a vey value dict directly and try to train a simple descriptor based Si potential

#### Step 0: Get the dataset

In [1]:
!wget https://raw.githubusercontent.com/openkim/kliff/main/examples/Si_training_set_4_configs.tar.gz
!tar -xvf Si_training_set_4_configs.tar.gz

--2025-03-05 11:54:51--  https://raw.githubusercontent.com/openkim/kliff/main/examples/Si_training_set_4_configs.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8001::154, 2606:50c0:8000::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7691 (7.5K) [application/octet-stream]
Saving to: ‘Si_training_set_4_configs.tar.gz.12’


2025-03-05 11:54:51 (29.6 MB/s) - ‘Si_training_set_4_configs.tar.gz.12’ saved [7691/7691]

Si_training_set_4_configs/
Si_training_set_4_configs/Si_alat5.431_scale0.005_perturb1.xyz
Si_training_set_4_configs/Si_alat5.409_scale0.005_perturb1.xyz
Si_training_set_4_configs/Si_alat5.442_scale0.005_perturb1.xyz
Si_training_set_4_configs/Si_alat5.420_scale0.005_perturb1.xyz


#### Step 1: workspace config
Create a folder named `DNN_train_example`, and use it for everything

In [2]:
workspace = {"name": "DNN_train_example", "random_seed": 12345}

#### Step 2: define the dataset 

In [3]:
dataset = {"type": "path", "path": "Si_training_set_4_configs", "shuffle": True}

#### Step 3: model
We will use a simple fully connected neural network with `tanh` non-linearities and width of 51 (dims of our descriptor later). Model will contain 1 hidden layer with dimension 50, i.e.

In [4]:
import torch
import torch.nn as nn
torch.set_default_dtype(torch.double) # default float = double

torch_model = nn.Sequential(nn.Linear(51, 50), nn.Tanh(), nn.Linear(50, 50), nn.Tanh(), nn.Linear(50, 1))
torch_model

Sequential(
  (0): Linear(in_features=51, out_features=50, bias=True)
  (1): Tanh()
  (2): Linear(in_features=50, out_features=50, bias=True)
  (3): Tanh()
  (4): Linear(in_features=50, out_features=1, bias=True)
)

In [5]:
model = {"name": "MY_ML_MODEL"}

#### Step 4: select appropriate configuration transforms
Let us use default `set51` in Behler symmetry functions as the consfiguration transform descriptor

In [6]:
transforms = {
        "configuration": {
            "name": "Descriptor",
            "kwargs": {
                "cutoff": 4.0,
                "species": ['Si'],
                "descriptor": "SymmetryFunctions",
                "hyperparameters": "set51"
            }
        }
}

#### Step 5: training
Lets train it using Adam optimizer. With test train split of 1:3.

In [7]:
training = {
        "loss": {
            "function": "MSE",
            "weights": {
                "config": 1.0,
                "energy": 1.0,
                "forces": 10.0
            },
        },
        "optimizer": {
            "name": "Adam",
            "learning_rate": 1e-3
        },
        "training_dataset": {
            "train_size": 3
        },
        "validation_dataset": {
            "val_size": 1
        },
        "batch_size": 1,
        "epochs": 10,
}

#### Step 6: (Optional) export the model?

In [8]:
export = {"model_path":"./", "model_name": "MyDNN__MO_111111111111_000"} # name can be anything, but better to have KIM-API qualified name for convenience

#### Step 7: Put it all together, and pass to the trainer

In [9]:
training_manifest = {
    "workspace": workspace,
    "model": model,
    "dataset": dataset,
    "transforms": transforms,
    "training": training,
    "export": export
}

In [10]:
from kliff.trainer.torch_trainer import DNNTrainer

trainer = DNNTrainer(training_manifest, model=torch_model)
trainer.train()
trainer.save_kim_model()

2025-03-05 11:55:01.129 | INFO     | kliff.trainer.base_trainer:initialize:343 - Seed set to 12345.
2025-03-05 11:55:01.131 | INFO     | kliff.trainer.base_trainer:setup_workspace:390 - Either a fresh run or resume is not requested. Starting a new run.
2025-03-05 11:55:01.131 | INFO     | kliff.trainer.base_trainer:initialize:346 - Workspace set to DNN_train_example/MY_ML_MODEL_2025-03-05-11-55-01.
2025-03-05 11:55:01.133 | INFO     | kliff.dataset.dataset:add_weights:1126 - No explicit weights provided.
2025-03-05 11:55:01.134 | INFO     | kliff.dataset.dataset:add_weights:1131 - Weights set to the same value for all configurations.
2025-03-05 11:55:01.134 | INFO     | kliff.trainer.base_trainer:initialize:349 - Dataset loaded.
2025-03-05 11:55:01.135 | INFO     | kliff.trainer.base_trainer:setup_dataset_split:601 - Training dataset size: 3
2025-03-05 11:55:01.135 | INFO     | kliff.trainer.base_trainer:setup_dataset_split:609 - Validation dataset size: 1
2025-03-05 11:55:01.136 | INF

To execute this model you need to install the ``libtorch``, which is the C++ API for Pytorch. Details on how to install it and execute these ML models is provided in the :ref:`following sections <_lammps>`.