## Predicting synthesizability of arbitrary crystal structures and compositions
This notebook shows how to:
* Access material structures from the Materials Project (MP) using the Materials API (MAPI) or figshare.  
* Use pre-trained models to predict synthesizability of materials from either 1) Materials Project ID; 2) crystal composition; or 3) crystal structure.

You will need a [Materials Project API key](https://materialsproject.org/open) to use the features shown in this notebook.

In [None]:
%load_ext autoreload
%autoreload 2

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

from monty.serialization import loadfn, dumpfn

from pymatgen.ext.matproj import MPRester
from pymatgen.core import Structure, Lattice

from pumml.pupredict import PUPredict

### Accessing MP data

You can access all MP structures (as of 04-24-2020) directly from figshare: https://figshare.com/account/home#/collections/4952793.  

However, the MP is constantly being updated and new structures are added. It is highly recommended that you use the MAPI to pull structure data that you are interested in. Get an API key for the Materials Project [here](https://materialsproject.org/open).

This code shows how to apply some criteria (e.g., ignore compounds with f-block elements), get MP IDs (which does not take much time), and then download structures in chunks (time-consuming).

In [None]:
# Treat materials with f-block electrons separately.
fblock = ['Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 
         'Tm', 'Yb', 'Lu', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk',
         'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr']

criteria = {"elements": {"$nin": fblock}}  # exclude fblock

In [None]:
# https://wiki.materialsproject.org/The_Materials_API
mpids = []
with MPRester() as m:  # include api key as argument or configure with pmg command line 
    mp_ids = m.query(criteria, ["material_id"], chunk_size=0)

In [None]:
# Tag with date collected
today = datetime.today().strftime('%Y-%m-%d')

mp_ids = [mpid['material_id'] for mpid in mp_ids]
dumpfn(mp_ids, "mp_ids_%s.json" % (today))

In [None]:
mp_ids = loadfn('mp_ids_%s.json' %(today))

The sublists contain MP IDs in chunks of 1000.

In [None]:
chunk_size = 1000
sublists = [mp_ids[i:i+chunk_size] for i in range(0, len(mp_ids), chunk_size)]

# MPRester.supported_properties
properties = ['energy_per_atom', 'formation_energy_per_atom',
              'e_above_hull', 'icsd_ids',
             'material_id', 'structure']

data = []
# Get all materials from MP by mpid
with MPRester() as m:  # use api_key arg or set up with pmg command line tool
    for sublist in sublists:
        data += m.query({"material_id":{"$in": sublist}}, properties=properties)

In [None]:
dumpfn(data, "mp_fblock_%s.json" % (today))

### Access a small sample dataset
We want to be responsible users of the MAPI, so to test out pumml models we can work with small MP datasets that are already downloaded.
You can ownload a small example dataset of 500 structures [here](https://figshare.com/articles/500_example_structures_from_Materials_Project/12252962).

In [None]:
data = loadfn('mp_example_dataset_042420.json')  # json file must be in same directory as this notebook

Materials Project data is a really useful source for training models, but what if we are interested in the synthesizability of a particular theoretical compound? We have pre-trained PUMML models on large subsets of the Materials Project to enable quick predictions.

### Predict synthesizability of theoretical compounds from MP IDs
We can use the `pumml.pupredict.PUPredict` class to generate synthesizability scores directly for compounds from the Materials Project. Information related to the predictions and Materials Project API access is logged to a file called `output.log`. When you create an instance of `PUPredict`, the pre-trained models and data will be downloaded to your local machine.

In [2]:
api_key = '<api_key>'  # fill this in with your key
pup = PUPredict(api_key)

In [3]:
print(pup.synth_score_from_mpid('mp-1213718')) # theoretical Cs2TbO3
print(pup.synth_score_from_mpid('mp-771359'))  # theoretical Cu2O3

[array([0.37218361])]
[array([0.49403711])]


The outputs represent the synthesizability scores of the theoretical compounds.

### Predict synthesizability by chemical formula
We can also predict synthesizability for a crystal composition. If there are multiple crystal structures with the same composition, synthesizability scores for each will be predicted.

In [4]:
pup.synth_score_from_formula('Ba2Yb2Al4Si2N10O4')

[array([0.04694946]), array([0.04952542])]

In [5]:
pup.synth_score_from_formula('Na1Mg1')

'No such compound exists in Materials Project Database'

### Predict synthesizability by crystal structure
Finally, we can predict synthesizability for a crystal structure represented as a `pymatgen.structure` object.

In [6]:
bcc_fe = Structure(Lattice.cubic(2.8), ["Fe", "Fe"], [[0, 0, 0], [0.5, 0.5, 0.5]])

In [7]:
pup.synth_score_from_structure(bcc_fe)

No synthesizability prediction is returned! If we check the log, we see that all MP entries for Fe already exist, so there's no need to predict synthesizability.

In [8]:
!tail output.log -n2

INFO - The given input with material_id "mp-1271295" exists.
INFO - The given input with material_id "mp-136" exists.
