# Dataset Generation (Quantum Espresso)

This covers the dataset generation using Quantum Espresso. For the corresponding steps using CP2K, see [here](cp2k.ipynb).

## Assumed directory structure

```
example_directory
├── active_learning
│   ├── xyz
│   └── simulation.lammps
├── cp2k_input
│   └── template.inp
├── cp2k_output
├── lammps
│   └── template.lmp
├── n2p2
│   └── input.nn.template
├── qe
│   ├── pseuodos
│   │   └── ...
│   └── mcresol-T300-p1.xyz
├── scripts
│   ├── cp2k.ipynb
│   ├── data_pruning.ipynb
│   ├── quantum_espresso.ipynb
│   ├── workflow.ipynb
│   └── visualise.ipynb
├── validation
├── xyz
└── reference.history
```

Note that in order to generate charges, the valence of each atomic species should be provided. Otherwise it is an optional argument and will not be used.

In [None]:
# Executables and filepaths
main_directory = '..'
n2p2_bin = '/path/to/n2p2/bin'
lammps_executable = '/path/to/lammps/build/lmp_mpi'
qe_module_commands = [
    'module use ...',
    'module load ...',
]
slurm_constraint = "constraint"

In [None]:
from cc_hdnnp.controller import Controller
from cc_hdnnp.structure import AllStructures, Species, Structure

# Create objects for all elements in the structure
H = Species(symbol='H', atomic_number=1, mass=1.00794, valence=1)
C = Species(symbol='C', atomic_number=6, mass=12.011, valence=4)
O = Species(symbol='O', atomic_number=8, mass=15.9994, valence=6)

# Define a name for the Structure which has the above constituent elements
# Information used for active learning, such as the energy and force tolerances is also defined here
all_species = [H, C, O]
structure = Structure(
    name='mcresol', all_species=all_species, delta_E=1e-4, delta_F=1e-2
)
all_structures = AllStructures(structure)

controller = Controller(
    structures=all_structures,
    main_directory=main_directory,
    n2p2_bin=n2p2_bin,
    lammps_executable=lammps_executable,
    qe_module_commands=qe_module_commands,
)

## 1. Prepare Quantum Espresso Input
There are no utility scripts for the generation of configurations. Instead a single file containing the input frames should be located within `"../qe"` with the naming pattern of "`{structure.name}`-T`{temperature}`-p`{pressure}`.xyz". So for this example, where one temperature and pressure is provided, a single file is needed but in general multiple files would be needed if multiple `Structures`, temperatures or pressures were used.

In [None]:
pseudos = {
    "H": "H.pbe-rrkjus_psl.1.0.0.UPF",
    "C": "C.pbe-n-rrkjus_psl.1.0.0.UPF",
    "O": "O.pbe-n-rrkjus_psl.1.0.0.UPF",
}
controller.prepare_qe(
    qe_directory="qe",
    temperatures=[300,],
    pressures=[1,],
    selection=(0, 1),
    structure=structure,
    pseudos=pseudos,
    constraint=slurm_constraint,
)

To control how many of the frames are selected, `selection` can be used to set the starting frame (first element) and the gap between each sampled frame (second element). So `(0, 1)` samples every frame, `(10, 2)` would sample every even frame starting with `10` and so on.

For each selected frame, a subdirectory within `"../qe"` is created containing the relevant input files. Additionally, a utility script that submits all QE batch scripts in one go is written to the scripts folder:

In [None]:
!bash qe_all.sh

## 2. Write data to N2P2
After QE has run and the energy, force and charges have been calculated, these need to be written into the N2P2 format:

In [None]:
controller.write_n2p2_data_qe(
    structure_name="mcresol", temperatures=[300], pressures=[1],
)