# Re-running a simulation of the database

This Jupyter Notebook is a guide for re-running simulations and adding new data points to this database repository for training a ReaxFF model.

This notebook aims to provide developers with a clear and concise workflow for re-running simulations, enabling them to reproduce results and contribute to the FAIR (Findable, Accessible, Interoperable, and Reusable) principles in scientific research.

The notebook is structured into three main sections: 

- [**Re-running simulations using `PLAMS`**](#section1): The first section focuses on using the SCM PLAMS package to re-run simulations. It covers importing the necessary libraries, connecting to the database, retrieving a specific simulation entry, preparing the simulation setup, running the simulation, and adding the new results back to the database.

- [**Re-running simulations using `ASE` and `AMSCalculator`**](#section2): The second section demonstrates an alternative approach using the ASE and AMSCalculator packages to re-run simulations. It follows a structure similar to the PLAMS section but showcases different tools and techniques.

- [**Updating metadata**](#section3): The third section provides a simple function to update the metadata, ensuring the database remains comprehensive, up-to-date, and in agreement with FAIR principles.

## <a id='section1'></a> Re-running simulations using SCM `PLAMS`

### Step 1: Importing the Required Libraries

First, we import the necessary libraries and modules, including `os`, `sys`, `matplotlib.pyplot`, `ase.db` from `ASE`, and various modules from `scm.plams`. Additionally, we add the path to the parent folder to allow access to the repository's modules.

In [40]:
import os
import sys

from ase.db import connect
from scm.plams import AMSJob, Settings, config, finish, fromASE, init

# add father folder to allow to access the modules of this repository in `..\tools`
sys.path.append("..")
from tools.db import add_to_db

### Step 2: Connecting to the Database

We establish a connection to the database file located at `"../data/LiF.db"`.


In [4]:
# Connect to the database
db = connect(os.path.join("..", "data", "LiF.db"))

### Step 3: Get a specific Simulation result and prepare it for the simulation

To re-run a specific simulation, we need to retrieve the relevant entry from the database based on specific criteria. 
In this example, we are retrieving the result for the stable LiF crystal structure with space group Fm-3m and one interstitial atom using the criteria `subset_name`, `task`, `space_group`, and `natoms`. 
Alternatively, it is possible to load a new system directly from a file using ASE Atoms object or PLASM Molecule object.

To browse and retrieve entries from the database using the Python interface, you can refer to the `browsing_sb.ipynb` notebook or the `README.md` file for more details.

The following code snippet demonstrates how to retrieve the desired simulation result:

In [12]:
# Get a specific simulation result from the database

row = db.get(subset_name="interstitial defects", task="geometry optimization", space_group="Fm-3m", natoms=25)

# Print the simulation name get
print(row.name)

5.1-0-LiF_Fm-3m_-3.18_ni_1


<div class="alert alert-block alert-warning">
<b>Warning:</b> 

Make sure to adjust the criteria according to your specific simulation requirements.
Once the desired result is obtained, you can proceed with the remaining steps to prepare and run the simulation.
</div>

We convert the retrieved row to an ASE `Atoms` object

In [23]:
# Get the ASE `Atoms` object
atoms = row.toatoms()
print(atoms)

Atoms(symbols='Li13F12', pbc=True, cell=[[5.0011581428, 0.0, 2.88742], [1.6670527143, 4.7151371154, 2.88742], [0.0, 0.0, 8.662259999999998]], calculator=SinglePointCalculator(...))


We convert the ASE `Atoms` object to a PLAMS `Molecule` object.

In [22]:
# Convert it as PLASM `Molecule` object
mol = fromASE(atoms)
print(mol)

  Atoms: 
    1        Li       0.041422       0.007810      -0.038750 
    2        Li      -0.012681      -0.204558       2.910880 
    3        Li       0.011859      -0.019194       5.788982 
    4        Li       0.938206       2.431588       1.263373 
    5        Li       0.910682       2.412126       4.514320 
    6        Li       0.845876       2.366301       7.209048 
    7        Li       2.511341       0.007610       1.390495 
    8        Li       2.469011      -0.022322       4.384582 
    9        Li       2.480417      -0.014257       7.220685 
   10        Li       3.355276       2.394019       2.848670 
   11        Li       3.137020       2.413799       5.798300 
   12        Li       3.319962       2.375147       8.676402 
   13        Li       4.994207       1.763261       5.782333 
   14         F       1.635423       1.187588       2.890077 
   15         F       1.648685       1.217318       5.805550 
   16         F       1.686816       1.209821       8.627823

### Step 4: Setting up and Run the *AMS/BAND* Simulation with `PLASM`

To rerun, we extract the simulation settings from the retrieved row.
We can use it as it is to perform the same simulation or change it if we want to use more accurate settings or compute additional settings.

In [36]:
# Get the simulation setting from the row extracted
setting = Settings(row.calculator_parameters["input"])
print(setting)

AMS: 	
    task: 	GeometryOptimization
    Properties: 	
               Gradients: 	yes
               StressTensor: 	no
               Hessian: 	no
               PESPointCharacter: 	no
               ElasticTensor: 	no
    GeometryOptimization: 	
                         OptimizeLattice: 	no
                         Convergence: 	
                                     Energy: 	3.8087988488664447e-05
                                     Gradients: 	0.3808798848866444
                                     StressEnergyPerAtom: 	0.01904399424433222
                                     Step: 	0.05
                         PretendConverged: 	yes
                         MaxIterations: 	20
BAND: 	
     basis: 	
           type: 	DZP
           Core: 	Medium
     Dependency: 	
                Core: 	0.8
     xc: 	
        GGA: 	PBE
        MetaGGA: 	postscf TPSS
     scf: 	
         mixing: 	0.3
     numericalquality: 	Normal
     beckegrid: 	
               quality: 	Normal
     CPVector: 	25

We update some of the simulation settings. In this example, we increase the maximum iterations for geometry optimization and enable the computation and storage of the elastic tensor.

In [38]:
# Update some settings to compute for escample the `ElasticTensor`
setting.AMS.GeometryOptimization.MaxIterations = 100  # increse the GO iterations
setting.AMS.Properties.ElasticTensor = "yes"  # Compute and store the Elstic Tensor
print(setting)

AMS: 	
    task: 	GeometryOptimization
    Properties: 	
               Gradients: 	yes
               StressTensor: 	no
               Hessian: 	no
               PESPointCharacter: 	no
               ElasticTensor: 	yes
    GeometryOptimization: 	
                         OptimizeLattice: 	no
                         Convergence: 	
                                     Energy: 	3.8087988488664447e-05
                                     Gradients: 	0.3808798848866444
                                     StressEnergyPerAtom: 	0.01904399424433222
                                     Step: 	0.05
                         PretendConverged: 	yes
                         MaxIterations: 	100
BAND: 	
     basis: 	
           type: 	DZP
           Core: 	Medium
     Dependency: 	
                Core: 	0.8
     xc: 	
        GGA: 	PBE
        MetaGGA: 	postscf TPSS
     scf: 	
         mixing: 	0.3
     numericalquality: 	Normal
     beckegrid: 	
               quality: 	Normal
     CPVector: 	

Here we set the working directory, simulation name, and the number of cores to be used for the simulation.

In [None]:
working_dir = os.path.join("simulation", "new_run")
simulation_name = row.name + "_new"
ncores = 32

The following block initialize, configure, and runs an _AMS_/BAND* simulation using a basic job workflow with `PLASM.` 
For advanced settings, see the PLASM documentation at https://www.scm.com/doc/plams/general.html

In [None]:
# SCM simulation initialization
init(folder=working_dir)

# Configuring the number of cores to be used for the simulation
config.job.runscript.nproc = ncores
config.job.runscript.shebang = r"#!/bin/bash"

# Creating and run the simulation job using the PLAMS `AMSJob` class.
job = AMSJob(molecule=mol, settings=setting, name=simulation_name)
job.run()

# Finishing
finish()

### Step 5: Add the new simulation into the Database

After the desired analysis, it is possible to store it in `LiF.db` database

In [None]:
# add the new datata to the database
subset_name = "interstitial defects (new)"
task = ("geometry optimization",)
user = "John Doe"

add_to_db(db, job, subset_name, task, user=user, add_ic=False, use_runtime=True)

## <a id='section2'></a> Re-running a simulation using ASE and AMSCalculator

### Step 1: Importing the Required Libraries

First, we import the necessary libraries and modules, including `os`, `sys`, `matplotlib.pyplot`, `ase.db` from `ASE`, and various modules from `scm.plams`. Additionally, we add the path to the parent folder to allow access to the repository's modules.

In [51]:
import os
import sys

from ase.db import connect
from ase.optimize import BFGS
from scm.plams import Settings, config, finish, init

# add father folder to allow to access the modules of this repository in `..\tools`
sys.path.append("..")
from tools.db import add_to_db, update_metadata
from tools.plams_experimental import AMSCalculator

### Step 2: Connecting to the Database

We establish a connection to the database file located at `"../data/LiF.db"`.


In [None]:
# Connect to the database
db = connect(os.path.join("..", "data", "LiF.db"))

### Step 3: Get a specific Simulation result and prepare it for the simulation

To re-run a specific simulation, we need to retrieve the relevant entry from the database based on specific criteria. 
In this example, we are retrieving the result for the stable LiF crystal structure with space group Fm-3m and one interstitial atom using the criteria `subset_name`, `task`, `space_group`, and `natoms`. 
Alternatively, it is possible to load a new system directly from a file using ASE Atoms object or PLASM Molecule object.

To browse and retrieve entries from the database using the Python interface, you can refer to the `browsing_sb.ipynb` notebook or the `README.md` file for more details.

The following code snippet demonstrates how to retrieve the desired simulation result:

In [None]:
# Get a specific simulation result from the database

row = db.get(subset_name="interstitial defects", task="geometry optimization", space_group="Fm-3m", natoms=25)

# Print the simulation name get
print(row.name)

5.1-0-LiF_Fm-3m_-3.18_ni_1


<div class="alert alert-block alert-warning">
<b>Warning:</b> 

Make sure to adjust the criteria according to your specific simulation requirements.
Once the desired result is obtained, you can proceed with the remaining steps to prepare and run the simulation.
</div>

We convert the retrieved row to an ASE `Atoms` object

In [None]:
# Get the ASE `Atoms` object
atoms = row.toatoms()
print(atoms)

Atoms(symbols='Li13F12', pbc=True, cell=[[5.0011581428, 0.0, 2.88742], [1.6670527143, 4.7151371154, 2.88742], [0.0, 0.0, 8.662259999999998]], calculator=SinglePointCalculator(...))


### Step 4: Setting up and Run the *AMS/BAND* Simulation with `AMSCalculator`

To rerun, we extract the simulation settings from the retrieved row.
We can use it as it is to perform the same simulation or change it if we want to use more accurate settings or compute additional settings.

In [48]:
# Get the simulation setting from the row extracted
setting = Settings(row.calculator_parameters["input"])
print(setting)

AMS: 	
    task: 	GeometryOptimization
    Properties: 	
               Gradients: 	yes
               StressTensor: 	no
               Hessian: 	no
               PESPointCharacter: 	no
               ElasticTensor: 	no
    GeometryOptimization: 	
                         OptimizeLattice: 	no
                         Convergence: 	
                                     Energy: 	3.8087988488664447e-05
                                     Gradients: 	0.3808798848866444
                                     StressEnergyPerAtom: 	0.01904399424433222
                                     Step: 	0.05
                         PretendConverged: 	yes
                         MaxIterations: 	20
BAND: 	
     basis: 	
           type: 	DZP
           Core: 	Medium
     Dependency: 	
                Core: 	0.8
     xc: 	
        GGA: 	PBE
        MetaGGA: 	postscf TPSS
     scf: 	
         mixing: 	0.3
     numericalquality: 	Normal
     beckegrid: 	
               quality: 	Normal
     CPVector: 	25

We are going to use the  Broyden–Fletcher–Goldfarb–Shanno (BFGS) ASE optimizer algorithm. Thus we drop the `task` and `GeometryOptimization` entries from the settings.

In [49]:
del setting.AMS.task  # Delatet the task setting
del setting.AMS.GeometryOptimization  # Delatet the GeometryOptimization setting
print(setting)

AMS: 	
    Properties: 	
               Gradients: 	yes
               StressTensor: 	no
               Hessian: 	no
               PESPointCharacter: 	no
               ElasticTensor: 	no
BAND: 	
     basis: 	
           type: 	DZP
           Core: 	Medium
     Dependency: 	
                Core: 	0.8
     xc: 	
        GGA: 	PBE
        MetaGGA: 	postscf TPSS
     scf: 	
         mixing: 	0.3
     numericalquality: 	Normal
     beckegrid: 	
               quality: 	Normal
     CPVector: 	256
     KGRPX: 	4



Here we set the working directory, simulation name, and the number of cores to be used for the simulation.

In [None]:
working_dir = os.path.join("simulation", "new_run")
simulation_name = row.name + "_new"
ncores = 32

The following block initializes, configures, and runs a *BAND* simulation using ASE calculators and PLAMS. In this example, we will perform a *Geometry Optimization* using the BFGS algorithm available in ASE.

For advanced settings, you can refer to the `AMSCalculator` page in the PLAMS documentation at [https://www.scm.com/doc/plams/interfaces/amscalculator.html](https://www.scm.com/doc/plams/interfaces/amscalculator.html), and the ASE documentation at [https://wiki.fysik.dtu.dk/ase/ase/ase.html](https://wiki.fysik.dtu.dk/ase/ase/ase.html).

In [None]:
# SCM simulation initialization
init(folder=working_dir)

# Configuring the number of cores to be used for the simulation
config.job.runscript.nproc = ncores
config.job.runscript.shebang = r"#!/bin/bash"

with AMSCalculator(settings=setting, amsworker=True) as calc:
    atoms.set_calculator(calc)
    optimizer = BFGS(atoms)
    optimizer.run(fmax=0.1)  # optimize until forces are smaller than 0.1 eV/ang

# Finishing
finish()

### Step 5: Add the new simulation into the Database

After the desired analysis, it is possible to store it in `LiF.db` database

In [None]:
# add the new datata to the database
subset_name = "interstitial defects (new)"
task = ("geometry optimization",)
user = "John Doe"

add_to_db(db, job, subset_name, task, user=user, add_ic=False, use_runtime=True)

## <a id='section3'></a> Updating Metadata

To ensure that the information stored in the database reflects the latest changes and additions after completing your simulation and additional study, you need to update the metadata using the `update_metadata` function.

The `update_metadata` function checks for new values in specific keys (`user`, `subset_name `, `task`, and `used_in`) and prompts you to provide descriptions for any new values found. 
This ensures that the metadata remains accurate and up-to-date.

Make sure to pass the `SQLite3Database` database object (`db`) as the argument to the function. 
The function will display detailed messages by default, but you can set the `verbose` parameter to `False` if you want to suppress them.

Once the function is executed, it will check for new values in the specified keys, prompt you for descriptions if necessary, update the metadata, and save the changes to the database.

Remember to call the `update_metadata` function whenever you make changes to the database or add new data to keep the metadata synchronized with your updates.

In [None]:
update_metadata(db)