# 8. Gaussian process regression with MD data
## Lennard-Jones fluid

Finally, we replace the *mock* MD data with actual simulation data. To generate a dataset, we run a similar active learning simulation than in the previous tutorial. The YAML input for this simulation looks like this:

```yaml
options:
    output: data/parabolic_lj_md
    write_freq: 100
    use_tstamp: True
grid:
    Lx: 1470.
    Ly: 1.
    Nx: 200
    Ny: 1
    xE: ['D', 'N', 'N']
    xW: ['D', 'N', 'N']
    yS: ['P', 'P', 'P']
    yN: ['P', 'P', 'P']
    xE_D: 0.8
    xW_D: 0.8
geometry:
    type: parabolic
    hmin: 12.
    hmax: 60.
    U: 0.12
    V: 0.
numerics:
    CFL: 0.5
    adaptive: 1
    tol: 1e-8
    dt: 0.05
    max_it: 50_000
properties:
    shear: 2.15
    bulk: 0.
    EOS: DH
    T: 1.0
    rho0: 0.8
gp:
    press:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        obs_stddev: 2.e-2
        max_steps: 10
    shear:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        obs_stddev: 4.e-3
        max_steps: 10
db:
    dtool: True
    dtool_path: data/gapflow_training_lj  # defaults to options['output']/train   
    init_size: 5
    init_method: rand
    init_width: 0.01 # default (for density)
md:
    system: lj # Lennard-Jones system
    ncpu: 10  # Max. number of CPUs
    atoms_per_cpu: 1000  # Min. number of atoms per CPU
    infile: lmp/lj/in.lmp  # Location of the LAMMPS input file
    wallfile: lmp/lj/wall.lmp  # Location of the LAMMPS file that contains the coordinates of the wall atoms
    vWall: 0.12  # Sliding velocity of the lower wall (LJ units)
    cutoff: 2.5  # Cutoff radius of LJ interactions
    temp: 1.0   # Temperature
    tsample: 100000  # Sampling time
```

Here, the `md` section of the YAML file configures the MD runs.

We ran the simulation from the command line using

`mpirun -n 1 python GaPFlow -i parabolic_1d_lj_gp_lammps.yaml`

Note that we start the run on a single processor (`mpirun -n 1`). The number of MPI processes for the LAMMPS simulations are configured in the YAML, and spawned from the *parent* process.
The active learning simulation generated 24 MD simulations over the course of a run which took approximately 3.5 hours.
We now want to use this dataset in a subsequent simulation with active learning turned *off*. The training dataset has been uploaded to [Zenodo](https://doi.org/10.5281/zenodo.18761223).

We now download the dataset to our local machine and test it there:

In [None]:
!wget -O- https://zenodo.org/records/18761223/files/gapflow_training_lj.tar.gz | tar -xz -C data

The input file looks exactly the same, but we turn active learning off. We also have to make sure that `GaPFlow` looks for the training dataset in the right location.

In [None]:
lj_gp_input = """
options:
    output: data/parabolic_lj_md
    write_freq: 100
    use_tstamp: True
grid:
    Lx: 1470.
    Ly: 1.
    Nx: 200
    Ny: 1
    xE: ['D', 'N', 'N']
    xW: ['D', 'N', 'N']
    yS: ['P', 'P', 'P']
    yN: ['P', 'P', 'P']
    xE_D: 0.8
    xW_D: 0.8
geometry:
    type: parabolic
    hmin: 12.
    hmax: 60.
    U: 0.12
    V: 0.
numerics:
    CFL: 0.5
    adaptive: 1
    tol: 1e-8
    dt: 0.05
    max_it: 50_000
properties:
    shear: 2.15
    bulk: 0.
    EOS: DH
    T: 1.0
    rho0: 0.8
gp:
    press:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        max_steps: 10
        active_learning: False  # AL turned off, so no new MD data is generated
    shear:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        max_steps: 10
        active_learning: False  # AL turned off, so no new MD data is generated
db:
    dtool: True
    dtool_path: data/gapflow_training_lj  # downloaded from Zenodo
    init_size: 5
    init_method: rand
    init_width: 0.01 # default (for density)
md:
    system: lj
    ncpu: 10
    atoms_per_cpu: 1000 
    infile: lmp/lj/in.lmp
    wallfile: lmp/lj/wall.lmp
    vWall: 0.12
    cutoff: 2.5
    temp: 1.0
    tsample: 100000
"""

In [None]:
from GaPFlow import Problem
lj_problem = Problem.from_string(lj_gp_input)

We see that the downloaded training simulations have been recognized. Thus, we are ready to run the simulation.

In [None]:
lj_problem.run()

In [None]:
lj_problem.animate()

## Hexadecane/gold

The second example with real MD data is the implemented gold/hexadecane system:

```yaml
options:
    output: data/cosine_mol
    write_freq: 500
    use_tstamp: True
grid:
    Lx: 1918.
    Ly: 1.
    Nx: 100
    Ny: 1
    xE: ['D', 'N', 'N']
    xW: ['D', 'N', 'N']
    yS: ['P', 'P', 'P']
    yN: ['P', 'P', 'P']
    xE_D: 0.51
    xW_D: 0.51
geometry:
    type: journal
    hmin: 30.
    hmax: 60.
    U: 20.e-5
    V: 0.
numerics:
    CFL: 0.5
    adaptive: 1
    tol: 1e-7
    dt: 1.
    max_it: 50_000
properties:
    shear: 2.15
    bulk: 0.
    EOS: BWR
    T: 1.0
    rho0: 0.51
gp:
    press:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        max_steps: 10
    shear:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        max_steps: 10
db:
    dtool: True
    dtool_path: data/gapflow_training_mol
    init_size: 5
    init_method: rand
md:
    system: mol
    rotation: False
    wall_rotation: False
    ncpu: 20
    atoms_per_cpu: 1000     
    wall: eam
    molecule: hexadecane
    fftemplate: lmp/mol/moltemplate_files/trappe1998.lt
    topo: lmp/mol/moltemplate_files/hexadecane.lt
    staticFiles: lmp/mol/static
    nx: 30
    nz: 3
    vWall: 20.  # in m/s
    temperature: 400.
    Ninit: 50_000
    Nsteady: 50_000
    Nsample: 100_000
```

We will again use a precomputed dataset, generated with:

`mpirun -n 1 python GaPFlow -i journal_1d_gold-hexadecane_gp_lammps.yaml`

and fetch it from [Zenodo](https://doi.org/10.5281/zenodo.18761223).

In [None]:
!wget -O- https://zenodo.org/records/18761223/files/gapflow_training_mol.tar.gz | tar -xz -C data

In [None]:
mol_gp_input = """
options:
    output: data/cosine_mol
    write_freq: 500
    use_tstamp: True
grid:
    Lx: 1918.
    Ly: 1.
    Nx: 100
    Ny: 1
    xE: ['D', 'N', 'N']
    xW: ['D', 'N', 'N']
    yS: ['P', 'P', 'P']
    yN: ['P', 'P', 'P']
    xE_D: 0.51
    xW_D: 0.51
geometry:
    type: journal
    hmin: 30.
    hmax: 60.
    U: 20.e-5
    V: 0.
numerics:
    CFL: 0.5
    adaptive: 1
    tol: 1e-7
    dt: 1.
    max_it: 50_000
properties:
    shear: 2.15
    bulk: 0.
    EOS: BWR
    T: 1.0
    rho0: 0.51
gp:
    press:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        max_steps: 10
        active_learning: False
    shear:
        fix_noise: True
        atol: 1.5
        rtol: 0.
        max_steps: 10
        active_learning: False
db:
    dtool: True
    dtool_path: data/gapflow_training_mol
    init_size: 5
    init_method: rand
md:
    system: mol
    rotation: False
    wall_rotation: False
    ncpu: 20
    atoms_per_cpu: 1000     
    wall: eam
    molecule: hexadecane
    fftemplate: lmp/mol/moltemplate_files/trappe1998.lt
    topo: lmp/mol/moltemplate_files/hexadecane.lt
    staticFiles: lmp/mol/static
    nx: 30
    nz: 3
    vWall: 20.  # in m/s
    temperature: 400.
    Ninit: 50_000
    Nsteady: 50_000
    Nsample: 100_000
"""

In [None]:
mol_problem = Problem.from_string(mol_gp_input)

In [None]:
mol_problem.run()

In [None]:
mol_problem.animate()