# <font style="font-family:roboto;color:#455e6c"> Generating datasets for Machine Learning Interatomic Potentials with the ASSYST method </font>  

<div class="admonition note" name="html-admonition" style="background:#e3f2fd; padding: 10px">
<font style="font-family:roboto;color:#455e6c"> <b> DPG Tutorial: Automated Workflows and Machine Learning for Materials Science Simulations </b> </font> </br>
<font style="font-family:roboto;color:#455e6c"> 16 March 2024 </font>
</div>

In [2]:
from pyiron_workflow import Workflow

In [3]:
from pyironflow import PyironFlow

In [7]:
from pyiron_nodes.atomistic.mlips.fitting.assyst import make_assyst

In [8]:
from pyiron_nodes.atomistic.mlips.fitting.assyst.structures import ElementInput, Elements, ElementsTable

## <font style="font-family:roboto;color:#455e6c"> Background </font> 

*Automated Small SYmetric Structure Training* or ASSYST is a method to generate training data for machine learning potentials.
The key idea is to use small structures to automatically explore structurally and chemically diverse atomic environments and provide training data around the energetically most favorable ones.

### <font style="font-family:roboto;color:#455e6c"> Workflow Overview </font>

![image](img/AssystSchematic.svg)

### <font style="font-family:roboto;color:#455e6c"> Transferability </font>

ASSYST trained potentials describe also structures that they are not directly trained on, such as point and planar defects.

![image](img/Fig8_MTP24_2d0_8d2_DefectsManual.png)

Liquid state is also well described and potentials are stable for long running thermodynamic integrations.

![image](img/Fig11_MgCa.png)

This phase diagram is our goal for today!

### <font style="font-family:roboto;color:#455e6c"> Literature </font>

- Mg and Defects: https://journals.aps.org/prb/abstract/10.1103/PhysRevB.107.104103
- Ternary Mg/Al/Ca: https://www.researchsquare.com/article/rs-4732459/v1

## <font style="font-family:roboto;color:#455e6c"> Constructing element combinations </font> 

The first step in the ASSYST workflow is to decide which chemical space to cover and how densely.
Increasing the new number of total atoms allows you to generate more and more complex structures
and also sample the chemical space more densely.

Here's an example for a ternary system, where we sampled the unaries with ASSYST datasets of 1-10 Atoms and the binaries and ternaries with 2-8 or 3-8 Atoms, respectively.

![img](img/Fig3_Everything_Conc_Plot.png)

Log-histogram of composition of a final training set.

`Elements` wraps a list of compositions at which we will sample random crystals.

In [28]:
mg = Elements((
    {'Mg': 2}, {'Mg': 4}
))
mg

Elements(atoms=({'Mg': 2}, {'Mg': 4}))

In [29]:
al = Elements((
    {'Al': 1}, {'Al': 2}
))
al

Elements(atoms=({'Al': 1}, {'Al': 2}))

Can be combined with standard python operations.

In [30]:
mg + al

Elements(atoms=({'Mg': 2}, {'Mg': 4}, {'Al': 1}, {'Al': 2}))

In [31]:
mg | al

Elements(atoms=({'Mg': 2, 'Al': 1}, {'Mg': 4, 'Al': 2}))

In [32]:
mg * al

Elements(atoms=({'Mg': 2, 'Al': 1}, {'Mg': 2, 'Al': 2}, {'Mg': 4, 'Al': 1}, {'Mg': 4, 'Al': 2}))

Created by the ElementInput node and visualized by StoichiometryTable.

In [33]:
wf = Workflow("ASSYST_Elements_Unary")
wf.Element = ElementInput(element="Mg")
wf.ElementsTable = ElementsTable(wf.Element)

In [34]:
pf = PyironFlow([wf])
pf.gui

HBox(children=(Accordion(children=(VBox(children=(Button(button_style='info', description='Refresh', style=But…

TASK: Build a small workflow that creates a table with Mg, Al and Ca so that:
1. Mg:Ca is always 2:1
2. combines it with 2-8 Al
3. has at least 2 Ca in every composition
4. contains at most 16 Atoms

Check `utilities` for nodes to `Add()`, `Multiply()` or `Or()` objects together.

Check `atomistic` -> `mlips` -> `fitting` -> `assyst` for nodes to `FilterSize()` or `ElementsTable()`

In [None]:
pf = PyironFlow([])
pf.gui

Load this workflow for the solution.

In [26]:
wf = Workflow("ASSYST_Elements_Combine")

In [27]:
pf = PyironFlow([wf])
pf.gui

HBox(children=(Accordion(children=(VBox(children=(Button(button_style='info', description='Refresh', style=But…

## <font style="font-family:roboto;color:#455e6c"> Full Workflow for a Small Structure Set </font> 

This demonstration uses the GRACE universal force fields for the relaxation steps.
Usually we would run them in low convergence DFT.

In [39]:
from pyiron_nodes.atomistic.mlips.fitting.assyst import make_assyst

In [40]:
wf = make_assyst('ASSYST', 'Mg', 'Ca', 'Al', delete_existing_savefiles=True)

In [None]:
pf = PyironFlow([wf], flow_widget_ratio=.85)

In [None]:
pf.gui

## <font style="font-family:roboto;color:#455e6c"> Precomputed Full Workflow with Large Structure Set </font> 

This is the same workflow, but pre-run with realistic input for a Unary system.
It contains ~10k structures and you can attach plotting functions at various nodes to view them.

In [36]:
wf = make_assyst('ASSYST_Mg_FULL', 'Mg', min_atoms=1, max_atoms=10, max_structures=None)

In [37]:
pf = PyironFlow([wf], flow_widget_ratio=.85)

In [38]:
pf.gui

HBox(children=(Accordion(children=(VBox(children=(Button(button_style='info', description='Refresh', style=But…