### Materials Project Workshop – July 31–August 2 2019, Berkeley, California
#### Link to notebook: [http://workshop.materialsproject.org/pymatgen/core/pymatgen_advanced.ipynb](http://workshop.materialsproject.org/pymatgen/core/pymatgen_advanced.ipynb)

# Lesson 3, Advanced use of pymatgen
*Remember to download your notebook if you want to keep a copy of your code.*

# 0. Introducing Transformations

Transformations in *pymatgen* can be used to change one structure to another such that a later calculation can be performed.

A Transformation *object* exists because it is re-usable: it allows you to apply the same transformation to the multiple different structures, which is useful in a high-throughput context.

Typically, in pymatgen, code might live in two places, for example:

* `pymatgen.analysis.bond_valence` contains the `BVAnalyzer`, a code to help estimate likely oxidation states in your crystal
* `pymatgen.transformations.standard_transformations` contains ` AutoOxiStateDecorationTransformation`, the corresponding transformation that applies `BVAnalyzer` to your structure

Transformations often wrap this other code to give it a standardized interface.


## 0.1 Conventional Cell Transformation

To start with, let's create a primitive lattice for silicon.

In [None]:
from pymatgen import Structure, Lattice

In [None]:
from crystal_toolkit.helpers.pythreejs_renderer import view

In [None]:
si_lattice = Lattice.from_parameters(3.85, 3.85, 3.85, 60, 60, 60)
si = Structure(si_lattice, ["Si", "Si"], [[0.75, 0.75, 0.75], [0, 0, 0]])

In [None]:
view(si)

In its primitive setting, this does not look much like the textbook picture of silicon. It can be useful to convert to its conventional setting, which can be necessary for certain tasks which require a crystal to be in a standard setting (for example, to report tensor properties).

All transformations live in the `pymatgen.transformations` submodule, and wrap up operations that map one Structure to one or more transformed Structures.

Each transformation has a standard format. You create the transformation along with any options for that transformation like so:

You can also inspect to see if the transformation is one-to-one or one-to-many:

If it's one-to-one the output is a single Structure, if it's one to many the output is a list of dictionaries: `[{"structure": first transformed structure, ...}, {"structure": second transformed structure, ...}`

Let's test this transformation out:

In [None]:
view(si_conv, draw_image_atoms=True)

### How Finding the Conventional Cell Works and When It Might Fail

This transformation is very robust. Behind the scenes, it uses the [`spglib`](https://atztogo.github.io/spglib/) library, which is a powerful and robust code for symmetry analysis. However, note that due to limits of numerical precision sites may not be exactly on the symmetrically-equivalent positions so we introduce tolerance factors (`symprec`, a length tolerance, and `angle_tolerance`). These can be modified when constructing the transformation as appropriate:

`trans = ConventionalCellTransformation(symprec=0.1, angle_tolerance=5)`

Also using `spglib` is a `PrimitiveCellTransformation` to transform a crystal into its primitive setting.

# 1. Case Study for Structure Prediction

In [None]:
barium_titanate = Structure.from_spacegroup("Pm-3m",
                                            Lattice.cubic(3.9),
                                            ["Ba", "Ti", "O"],
                                            [[0, 0, 0], [0.5, 0.5, 0.5], [0.5, 0.5, 0]])

In [None]:
view(barium_titanate, draw_image_atoms=True)

## 1.1 Transformation to Decorate Structure with Oxidation States

Import the transformation which applies oxidation states:

Initialize the transformation:

If we apply it to our structure, we can see oxidation states are added:

### How Oxidation State Decoration Works and When It Might Fail

The Bond Valence analyzer implements a maximum a posteriori (MAP) estimation method to
determine oxidation states in a structure. The algorithm is as follows:

1. The bond valence sum of all symmetrically distinct sites in a structure
    is calculated using the element-based parameters in [O'Keefe, Michael, and N. E. Brese. "Atom sizes and bond lengths in molecules and crystals." Journal of the American Chemical Society 113.9 (1991): 3226-3229](http://doi.org/10.1021/ja00009a002).

2. The posterior probabilities of all oxidation states is then calculated using: P(oxi_state|BV) = K * P(BV|oxi_state) * P(oxi_state), where K is
    a constant factor for each element. P(BV/oxi_state) is calculated as a
    Gaussian with mean and std deviation determined from an analysis of
    the ICSD. The posterior P(oxi_state) is determined from a frequency
    analysis of the ICSD.

3. The oxidation states are then ranked in order of decreasing probability
and the oxidation state combination that result in a charge neutral cell
is selected.

Therefore, the bond valence analysis will fail if either parameters for that element are missing from the pre-tabulated data, or if the oxidation state is unusual and not well-represented in the ICSD.

As a fallback, we have "oxidation state guesses" which are *composition-only* guesses.

This is a composition *object* which has many useful properties including `oxi_state_guesses`:

# 1.2 Transformation to Predict Similar Structures 

This is out first one-to-many transformation. We indicate we're interested in multiple results by setting `return_ranked_list`. The ranking of this list varies between different transformations.

Let's see the our first predicted structure:

### How Structure Prediction Works and When It Might Fail

This is a probabilistic model based on substitution probabilities data-mined from the ICSD. A full description of the algorithm is available in: [Hautier, Geoffroy, et al. "Data mined ionic substitutions for the discovery of new compounds." Inorganic chemistry 50.2 (2010): 656-663.](https://doi.org/10.1021/ic102031h)

Without subsequent calculation, we cannot say whether the predicted structures are stable or not. It can be useful to check The Materials Project to see if the predicted structure has in fact been calculated and whether it is predicted to be stable. Additionally, like the bond valence analyzer, if a given element or oxidation state is rare, predictions might be inaccurate.

# Example for Adsorbate Calculation

Imagine a simple use case for studying the adsorption of CO on a catalyst surface.

You might start with a disordered structure representing your catalyst. Many crystal structures obtained via experimental methods are only given in a disordered form, that is with partial occupancies on the site. *On average* a site might contain 50% Pt and 50% Au as in the following example:

In [None]:
ptau = Structure.from_spacegroup('Fm-3m', Lattice.cubic(4), [{"Pt": 0.5, "Au": 0.5}], [[0, 0, 0]])

In [None]:
view(ptau, draw_image_atoms=True)

# 2.1 Transformation to Enumerate Ordered Approximations for Disordered Structures

Most computational methods require only ordered structures (integer occupancy), and therefore the first step when starting from a disordered structure is to create a *disordered approximation.*

### How Creating Ordered Approximations Work and When It Might Fail

There are two ways of creating ordered approximations in *pymatgen*, `EnumerateStructureTransformation` (using the [`enumlib`](https://github.com/msg-byu/enumlib) code, and `OrderDisorderTransformation` which is implemented purely in *pymatgen* but requires your structure to be decorated with oxidation states.

Creating ordered approximations might fail if your cell contains a large number of species or is otherwise very complex, such that performing an enumeration creates a combinatorial explosion of different possible orderings.

There are also physical concerns: a symmetric ordered approximation might not be most appropriate, and instead a "random" like cell might be more physical. This would require a different transformation.

# 2.2 Transformation to Create a Surface

This is an example of a transformation that requires mandatory arguments set:

Note that in this case the transformation returns a `Slab` and not a `Structure`. In object-orientated fashion, the `Slab` is a sub-class of `Structure` meaning that it has all the same functionality of `Structure` but with added information such as the miller indices used to generate the surface.

We can show that this is a `Slab` using the `type` command:

### How Creating Surfaces Works and How It Might Fail

This is a fairly robust transformation but care must be taken when performing
actual calculations to ensure that there is sufficient vacuum present such
that periodic images do not interact with one another, and that the surfaces 
are appropriately charge balanced.

# 2.3 Transformation to Add an Adsorbate

Finally, we want to add an adsorbate to our surface.

### How Adding Adsorbates Works and When It Might Fail

The AdsorbateSiteFinder finds adsorbate sites on slabs and generates
adsorbate structures according to user-defined criteria.

The algorithm for finding sites proceeds as follows:

1. Determine "surface sites" by finding those within
   a height threshold along the miller index of the
   highest site
2. Create a network of surface sites using the Delaunay
   triangulation of the surface sites
3. Assign on-top, bridge, and hollow adsorption sites
   at the nodes, edges, and face centers of the Del.
   Triangulation
4. Generate structures from a molecule positioned at
   these sites
   
This algorithm is fairly robust but was developed primarily for metal surfaces, 
with less testing performed for oxide surfaces. Full details can be found in
the associated publication:

[Montoya, Joseph H., and Kristin A. Persson. "A high-throughput framework for determining adsorption energies on solid surfaces." npj Computational Materials 3.1 (2017): 14.](https://doi.org/10.1038/s41524-017-0017-z)

# Summary

Transformations are powerful because they can be applied repeatedly to different materials, and chained together to create complex results. This can be useful when setting up your own calculations, or for educational use to demonstrate a particular system in a particular configuration. Using Transformations, *pymatgen* also makes it easy to glue together codes from other parts of the Materials Science software ecosystem.

Transformations also form the foundations of the high-throughput workflows that power The Materials Project. Tomorrow, we will show *atomate* workflows has built upon some of these transformations to automate calculation of complex materials properties.