# amp_extract tutorial

Welcome in the `amp_extract` tutorial/example. Here, we will show how to use `amp_extract` script to generatedata for the testing step.

We need to load all the packages, including the `amp_extract` one. Because it is not used as a package, we will need several steps to do it properly. The version to use it as a normal module is in developpement

## Loading modules

Here, we need the significant module allowing to load `amp_extract` properly

In [None]:
import sys
import os
from pprint import pprint as pp # For pretty printing, optional

Now, we need to load the amp_extract module. Several steps are neeeded:

1. We automatically set the module directory path for your machine and store it in the variable `module_dir`
2. We insert the path of the module directory inside the path of your python environement using the `sys` module. Like that, python knows where to find the module
3. We can now load the `Calc` version from the `amp_extract` module

In [None]:
# Insert path were amp_extract is located
module_dir = '/'.join(os.getcwd().split(sep='/')[:-2])+'/extract'
sys.path.insert(0, module_dir)

# Load amp_extract
from amp_extract import Calc

## Setting constants

Constans in python are usually written in Maj/snake_case style. This step is optional, as it simplifies the way we will build the `Calc` object later. For each of the arguments asked by `Calc`, some are mandatory on others optional.

**Mandatory**

- `AMP_FILENAME`: The name of the amp neural network parameters

- `TRAJ_FILENAME`: The reference traj file

- `GENERATED_TRAJ_FILENAME`: The list of positions for which we want a predicition

**Optional**

- `AMP_ENERGIES_OUTFILENAME`: The name of the file containing predicted energies by the amp neural network.

- `AMP_FORCES_OUTFILENAME`: The name of the file containing predicted forces by the amp neural network.

- `SRC_ENERGIES_OUTFILENAME`: The name of the file containing reference energies previously calculated.

- `SRC_FORCES_OUTFILENAME`: The name of the file containing reference forces previously calculated.

In [None]:
# Constants
AMP_FILENAME = 'amp.amp'
TRAJ_FILENAME = 'CO_disso.traj'
GENERATED_TRAJ_FILENAME = 'CO_disso_test.traj'
AMP_ENERGIES_OUTFILENAME = 'amp_energies.dat'
AMP_FORCES_OUTFILENAME = 'amp_forces.dat'
SRC_ENERGIES_OUTFILENAME ='src_energies.dat'
SRC_FORCES_OUTFILENAME = 'src_forces.dat'

## Building the `Calc` object

the next step is to build the `Calc` object. In this example, we will store it in the variable `data`. Because all the constants were setted before, we can just use them. You can also defined all the `Calc` arguments on the fly.

In [None]:
# Load the data into the Calc class
data = Calc(
        amp_filename = AMP_FILENAME,
        traj_filename = TRAJ_FILENAME,
        generated_traj_filename = GENERATED_TRAJ_FILENAME,
        amp_energies_outfilename = AMP_ENERGIES_OUTFILENAME,
        amp_forces_outfilename = AMP_FORCES_OUTFILENAME,
        src_energies_outfilename = SRC_ENERGIES_OUTFILENAME,
        src_forces_outfilename = SRC_FORCES_OUTFILENAME 
    )

## Data extraction/prediction

We can now extract the data from the reference files and make some predictions using our `amp` neural network. To do so, the `Calc` object is using the method `extract_data`. Note that this method do not return anything, and compute or extract properties inside the `Calc` object.

Let extract all the data.

In [None]:
data.extract_data() #  extract the data

As you can see, we obtain no output. Nevertheless, we can access the data through class attributes as shown below. 

In [None]:
# Now, we can access reference data as:
src_forces = data.src_forces
src_energies = data.src_energies

print('SRC data\n')
pp(src_forces[0:2]) # Print the forces for the 2 first geometries
print(*src_energies[0:2], sep='\n') # Print the 2 reference energies

print('\n-----\n')

# Now, we can access amp predicted data as:
amp_forces = data.amp_forces
amp_energies = data.amp_energies

print('AMP data\n')
pp(amp_forces[0:2]) # Print the forces for the 2 first geometries
print(*amp_energies[0:2], sep='\n') # Print the 2 reference energies

Note that the data are just rendered and saved as `list` objects and not written in any file.

## Write the data

Once the data extracted/generated, we can now write the results in the `aeneth` data format. To do so, we will use the `write_all_data` attributes to write all the data at the same time.

In [None]:
# We can also write all the results

data.write_all_data()

Note that you can only write partially the data to save time. For example, if I want to write only the predicted energies by the `amp` neural network, I can use:

In [None]:
# Write only amp_energies

data.write_energies(which='amp')

Within the same logics, I can only write the reference forces by using:

In [None]:
# Write only reference forces

data.write_forces(which='src')

## Predict: doing all the steps in one line

If you want to be quicker, the `Calc` object allows you to extract, predict, and write all the data in one line. It is the `predict` method. Basically, it uses sequentially the `extract_data` method followed by the `write_all_data` method. Let try this method:

In [None]:
# Do all the previous steps as once:

data.predict()

## Cleanning unwanted `amp` databases

When using `amp` neural network, it seems that the `amp-atomistics` package create `amp` databases with the extension `.ampdb`. Without knowing more, we guess that those databases are mandatory to use the neural network. In addition, it recreates the log file from the `amp` training. However, we generally do not want to store them. 

To delete them after having done your predictions, you can use the `clean` method from the `Calc` object. As an optional arguement, you can also decides to delete the log file as well.

In [None]:
# Cleanning

data.clean(logfile=True)

Now, you should not have any `amp` databases in your folder.

We are done, I hope you'll enjoy this module.

> This module is under development, please refer to the 'In development' page from the documentation to have more details