# [**Workflows for atomistic simulations**](http://potentials.rub.de/) 

## **Day 1 - Atomistic simulations with [pyiron](https://pyiron.org)**


### **Exercise 2: Creating and working with structure databases**

Before the excercise, you should:

* Finish exercise 1

The aim of this exercise is to make you familiar with:

* Creating structure databases and working with them for potential fitting (day 2)

## **Importing necessary modules and creating a project**

This is done the same way as shown in the first exercise

In [1]:
import numpy as np
%matplotlib inline
import matplotlib.pylab as plt

In [2]:
from pyiron import Project

In [3]:
pr = Project("creating_datasets")

## Creating a structure "container" from the data

We now go over the jobs generated in the first notebook to store structures, energies, and forces into a structure container which will later be used for potential fitting

**Note**: Usually these datasets are created using highly accurate DFT calculations. But for practical reasons, we only demonstrate how to do this using data from LAMMPS calculations (the workflow remain the same)

In [4]:
# Access the project created in exercise 1 
pr_fs = pr["../first_steps"]

In [5]:
# Create a TrainingContainer job (to store structures and databases)
container = pr.create.job.TrainingContainer('dataset_example')

## **Add structures from the E-V curves**

For starters, we append structures from the energy volume curves we calculated earlier

In [6]:
# Iterate over the jobs in this sub-project and append the final structure, potential energy, and forces
for job in pr_fs["E_V_curve"].iter_jobs(status="finished"):
    container.include_job(job, iteration_step=-1)

We can obtain this data as a `pandas` table

In [7]:
container.to_pandas()

Unnamed: 0,name,atoms,energy,forces,number_of_atoms
0,job_a_3_4,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.142019,"[[1.1869253621046177e-16, -1.7429070520896771e-16, -1.397277865277868e-16]]",1.0
1,job_a_3_5,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.338596,"[[-1.92404082484227e-16, 4.231084758750405e-17, 3.6193775346684653e-17]]",1.0
2,job_a_3_6,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.416929,"[[-2.9113397169423695e-17, 7.54965057835309e-17, -3.624419431643654e-17]]",1.0
3,job_a_3_7,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.409602,"[[3.771496125435321e-17, 3.412312546579927e-17, -2.4310047599025677e-17]]",1.0
4,job_a_3_8,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.330215,"[[-2.0545362501508919e-16, -3.5486130576273854e-17, 3.5486130576273854e-17]]",1.0
5,job_a_3_9,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.195118,"[[1.6101257219667079e-16, -4.2948421129906387e-17, 4.2948421129906387e-17]]",1.0
6,job_a_4_0,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.035358,"[[-5.946777565406637e-17, -1.0605082175909553e-16, -1.2946304704347008e-16]]",1.0


## **Add structures from the MD**

We also add some structures obtained from the MD simulations

In [8]:
# Reloading the MD job
job_md = pr_fs["lammps_job"]

In [9]:
# Iterate over the MD-trajectory to append structures

traj_length = len(job_md["output/generic/positions"])
stride = 10 # append structures every 10 steps

for i in range(0, traj_length, stride):
    container.include_job(job_md, iteration_step=i)

## **Add some defect structures (vacancies, surfaces, etc)**

It's necessary to also include some defect structures, and surfaces to the training dataset

In [10]:
# Setup a MD calculation for a structure with a vacancy
job_lammps = pr.create.job.Lammps("lammps_job_vac")
job_lammps.structure = pr.create_ase_bulk('Cu', cubic=True, a=3.61).repeat([3, 3, 3])
del job_lammps.structure[0]
job_lammps.potential = '2012--Mendelev-M-I--Cu--LAMMPS--ipr1'
job_lammps.calc_md(temperature=800, pressure=0, n_ionic_steps=10000)
job_lammps.run()

The job lammps_job_vac was saved and received the ID: 52


In [11]:
# Setup a MD calculation for a surface structure
job_lammps = pr.create.job.Lammps("lammps_job_surf")
job_lammps.structure = pr.create_surface("Cu", surface_type="fcc111", size=(4, 4, 8), vacuum=12, orthogonal=True)
job_lammps.potential = '2012--Mendelev-M-I--Cu--LAMMPS--ipr1'
job_lammps.calc_md(temperature=800, pressure=0, n_ionic_steps=10000)
job_lammps.run()

The job lammps_job_surf was saved and received the ID: 53


In [12]:
pr

{'groups': [], 'nodes': ['lammps_job_vac', 'lammps_job_surf']}

We now add these structures to the dataset

In [13]:
for job_name in ["lammps_job_vac", "lammps_job_surf"]:
    job_md = pr.load(job_name)
    pos = job_md["output/generic/positions"]
    traj_length = len(pos)
    stride = 10
    for i in range(0, traj_length, stride):
        container.include_job(job_md, iteration_step=i)

In [14]:
# We run the job sto store this dataset in the pyiron database
container.run()

The job dataset_example was saved and received the ID: 54


your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block1_values] [items->Index(['name', 'atoms', 'forces'], dtype='object')]

  pytables.to_hdf(


In [15]:
pr.job_table()

Unnamed: 0,id,status,chemicalformula,job,subjob,projectpath,project,timestart,timestop,totalcputime,computer,hamilton,hamversion,parentid,masterid
0,52,finished,Cu107,lammps_job_vac,/lammps_job_vac,/home/pyiron/,day_1/creating_datasets/,2021-03-09 09:00:43.012322,2021-03-09 09:00:47.308799,4.0,pyiron@jupyter-janssen#1,Lammps,0.1,,
1,53,finished,Cu128,lammps_job_surf,/lammps_job_surf,/home/pyiron/,day_1/creating_datasets/,2021-03-09 09:00:48.219183,2021-03-09 09:00:52.902683,4.0,pyiron@jupyter-janssen#1,Lammps,0.1,,
2,54,finished,,dataset_example,/dataset_example,/home/pyiron/,day_1/creating_datasets/,2021-03-09 09:01:01.281092,NaT,,pyiron@jupyter-janssen#1,TrainingContainer,0.4,,


## **Reloading the dataset**

This dataset can ow be reloaded anywhere to use in the potential fitting procedures

In [16]:
dataset = pr["dataset_example"]
dataset.to_pandas()

Unnamed: 0,name,atoms,energy,forces,number_of_atoms
0,job_a_3_4,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.142019,"[[1.1869253621046177e-16, -1.7429070520896771e-16, -1.397277865277868e-16]]",1.0
1,job_a_3_5,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.338596,"[[-1.92404082484227e-16, 4.231084758750405e-17, 3.6193775346684653e-17]]",1.0
2,job_a_3_6,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.416929,"[[-2.9113397169423695e-17, 7.54965057835309e-17, -3.624419431643654e-17]]",1.0
3,job_a_3_7,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.409602,"[[3.771496125435321e-17, 3.412312546579927e-17, -2.4310047599025677e-17]]",1.0
4,job_a_3_8,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.330215,"[[-2.0545362501508919e-16, -3.5486130576273854e-17, 3.5486130576273854e-17]]",1.0
5,job_a_3_9,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.195118,"[[1.6101257219667079e-16, -4.2948421129906387e-17, 4.2948421129906387e-17]]",1.0
6,job_a_4_0,"(Atom('Cu', [0.0, 0.0, 0.0], index=0))",-3.035358,"[[-5.946777565406637e-17, -1.0605082175909553e-16, -1.2946304704347008e-16]]",1.0
7,lammps_job,"(Atom('Cu', [0.0, 0.0, 0.0], index=0), Atom('Cu', [0.0, 1.804999999999592, 1.804999999999592], index=1), Atom('Cu', [1.804999999999592, 1.1052437362302367e-16, 1.804999999999592], index=2), Atom('...",-369.311743,"[[-1.2656542480726799e-14, -1.46965772884755e-14, -1.61017033040167e-14], [-1.3905543383430098e-14, 4.5310977192514186e-15, 4.8333732849403796e-15], [4.9682480351975795e-15, -1.4072076837123899e-1...",108.0
8,lammps_job,"(Atom('Cu', [0.140426153531212, 11.00934611760493, 10.96820769600138], index=0), Atom('Cu', [10.983357200359302, 1.779939365335074, 1.7146804782560903], index=1), Atom('Cu', [2.1228644677344763, 0...",-360.190839,"[[-0.21910202935187897, -0.37573419410584397, 0.43392575377979187], [0.16208168404695897, -0.00671505675904709, 1.03458554920361], [-1.2001630139266497, -0.40207322348963503, -0.45620473735655703]...",108.0
9,lammps_job,"(Atom('Cu', [0.1407579358923329, 11.020287239626356, 10.855878337455094], index=0), Atom('Cu', [0.29542104007972325, 1.6514729828183248, 1.7760939949715], index=1), Atom('Cu', [1.9957825818756145,...",-356.403521,"[[-0.023031834879864897, 0.04284143869144259, 0.5899774836434099], [-0.5418151518758109, 0.6754733604037649, -0.5582999589285099], [-0.6011411771363389, -0.355590065330048, -0.003590198630652358],...",108.0


We can now inspect the data in this dataset quite easily

In [17]:
struct = dataset.get_structure(30)

In [18]:
structures, energies, forces, num_atoms  = dataset.to_list()

The datasets used in the potential fitting procedure for day 2 (obtained from accurate DFT calculations) will be accessed in the same way