# Introduction

This notebook demonstrates how to predict likely n- and p-type dopant atoms using pymatgen. This example uses the Materials API to download the structure of interest but any `Structure` object can be used. Two methods for choosing dopants are demonstrated. The first uses a simple Shannon radii comparison, whereas the second is based on the substitution probability of two atoms calculated using the `SubstitutionPredictor` utility in pymatgen. This code requires knowledge of the oxidation state of all elements in the structure. These can be guessed using pymatgen but should be checked to ensure the validity of the results.


Written using:
- pymatgen==2018.10.18

*Author: Alex Ganose (10/06/18)*

In [1]:
# Imports we need for generating dopant suggestions

from pymatgen.analysis.structure_prediction.substitution_probability import SubstitutionPredictor
from pymatgen.core.periodic_table import Specie, Element
from pymatgen.analysis.local_env import CrystalNN

from pymatgen import MPRester
from pprint import pprint

In [2]:
# Establish rester for accessing Materials API
mpr = MPRester(api_key='#########')  # INSERT YOUR OWN API KEY

Here we define a variable -- `num_dopants` for how many dopants you wish to explore.

In [3]:
num_dopants = 5  # number of highest probability dopants you wish to see 

## Download a structure and add oxidation states

In this section, we use the Materials API to download a structure and add information on the oxidation states of the atoms.

In [4]:
mp_id = 'mp-856'  # Materials Project id for rutile SnO2

structure = mpr.get_structure_by_material_id(mp_id)

The downloaded structure does not contain oxidation state information. There are two ways to add this information. The first is to specify the oxidation state of the elements manually.

In [5]:
structure.add_oxidation_state_by_element({"Sn": 4, "O": -2})

Alternatively, we can use pymatgen to guess the oxidation states. If using this method you should check that the oxidation states are what you expect.

In [6]:
structure.add_oxidation_state_by_guess()

Let's check what oxidation states pymatgen guessed.

In [7]:
species = structure.composition.elements

print(species)

[Specie O2-, Specie Sn4+]


## Finding dopants by Shannon radii

In this section, we use the known Shannon radii to predict likely dopants. We will prefer dopants which have the smallest difference in radius to the host atoms. As the Shannon radii depend on the coordination number of the site, we must first calculate the bonding in the structure. In this example we do this using the `CrystalNN` class.

In [8]:
cnn = CrystalNN()
bonded_structure = cnn.get_bonded_structure(structure)

Next, we define a function to take a bonded structure with oxidation states and report the closest n- and p-type dopants, sorted by the difference in Shannon radii. We also define some helper functions to facilitate getting the Shannon radii for all elements in their common charge states.

In [9]:
all_species = [Specie(el, oxi) for el in Element
               for oxi in el.common_oxidation_states]

ROMAN = [(10, "X"), (9, "IX"), (5, "V"), (4, "IV"), (1, "I")]


def int_to_roman(number):
    """Convert an int to a roman numeral."""
    result = []
    for (arabic, roman) in ROMAN:
        (factor, number) = divmod(number, arabic)
        result.append(roman * factor)
        if number == 0:
            break
    return "".join(result)


def get_shannon_radii_by_cn(cn_roman, radius_to_compare=0):
    """Gets all the Shannon radii for a particular coordination number.
    
    As the Shannon radii depends on charge state and coordination number,
    species without an entry for a particular coordination number will
    be skipped.
    """
    shannon_radii = []
    
    for s in all_species:
        try:
            radius = s.get_shannon_radius(cn_roman)
            shannon_radii.append({
                'species': s, 'radius': radius,
                'radii_diff': radius - radius_to_compare}) 
        except KeyError:
            pass

    return shannon_radii


def get_dopant_by_shannon_radii(bonded_structure, num_dopants=5):
    """Get dopant suggestions based on Shannon radii differences.
    
    Args:
        oxid_structure (StructureGraph): A pymatgen structure graph 
            decorated with oxidation states.
        num_dopants (int): The nummber of suggestions to return for
            n- and p-type dopants.
    """
    # get a series of tuples with (coordination number, specie)
    cn_and_species = set((bonded_structure.get_coordination_of_site(i),
                          bonded_structure.structure[i].specie)
                         for i in range(bonded_structure.structure.num_sites))
    
    cn_to_radii_map = {}
    possible_dopants = []
    
    for cn, species in cn_and_species:
        cn_roman = int_to_roman(cn)
        
        try:
            species_radius = species.get_shannon_radius(cn_roman)
        except KeyError:
            print("Shannon radius not found for {} with "
                  "coordination number {}".format(species, cn))
            print("skipping...")
            continue
        
        if cn not in cn_to_radii_map:
            cn_to_radii_map[cn] = get_shannon_radii_by_cn(
                cn_roman, radius_to_compare=species_radius)
        
        shannon_radii = cn_to_radii_map[cn]

        possible_dopants += [{'radii_diff': p['radii_diff'],
                              'new_species': p['species'],
                              'old_species': species}
                             for p in shannon_radii]
    
    possible_dopants.sort(key=lambda x: abs(x['radii_diff']))

    n_type = [pred for pred in possible_dopants
              if pred['new_species'].oxi_state > pred['old_species'].oxi_state]
    p_type = [pred for pred in possible_dopants
              if pred['new_species'].oxi_state < pred['old_species'].oxi_state]
    
    return {'n_type': n_type[:num_dopants], 'p_type': p_type[:num_dopants]}       

Now let's run the function on our bonded structure:

In [10]:
dopants = get_dopant_by_shannon_radii(bonded_structure, num_dopants=num_dopants)

pprint(dopants)

{'n_type': [{'new_species': Specie U6+,
             'old_species': Specie Sn4+,
             'radii_diff': 0.040000000000000036},
            {'new_species': Specie Nb5+,
             'old_species': Specie Sn4+,
             'radii_diff': -0.04999999999999993},
            {'new_species': Specie Ta5+,
             'old_species': Specie Sn4+,
             'radii_diff': -0.04999999999999993},
            {'new_species': Specie F-,
             'old_species': Specie O2-,
             'radii_diff': -0.06000000000000005},
            {'new_species': Specie Np5+,
             'old_species': Specie Sn4+,
             'radii_diff': 0.06000000000000005}],
 'p_type': [{'new_species': Specie Ni2+,
             'old_species': Specie Sn4+,
             'radii_diff': 0.0},
            {'new_species': Specie Ru3+,
             'old_species': Specie Sn4+,
             'radii_diff': -0.009999999999999898},
            {'new_species': Specie Ir3+,
             'old_species': Specie Sn4+,
             '

The most favoured n-type dopant is U on a Sn site. Unfortunately, this is not a sustainable or safe choice of dopant. The most common industrial n-type dopant for SnO2 is fluorine. While F is present in our list of suggested dopants, it found way down at suggestion number 4.

Another limitation of the Shannon radii approach to choosing dopants is that the radii depend on both the coordination number and charge state. For many elements, the radii for charge states and coordination numbers have not been tabulated, meaning this approach is incomplete.

Instead we should use a more robust approach to determine possible dopants. 

## Finding dopants by substitution probability

In this section, we use the `SubstitutionPredictor` to predict likely dopants substitutions using a data-mined approach from ICSD data. Based on the species in the structure, we get a list of which species are likely to substitute in but have different charge states. The substitution prediction methodology is presented in: 
*Hautier, G., Fischer, C., Ehrlacher, V., Jain, A., and Ceder, G. (2011) Data Mined Ionic Substitutions for the Discovery of New Compounds. Inorganic Chemistry, 50(2), 656-663. doi:10.1021/ic102031h*

Here, we define a variable -- `threshold` for the threshold probability in making substitution/structure predictions.

In [11]:
threshold = 0.001  # probability threshold for substitution/structure predictions

Next, we define a function to filter the predicted substitutions by their charge states. Based on this, we report those substitutions that will be n- or p-type dopants.

In [12]:
sp = SubstitutionPredictor(threshold=threshold)


def get_dopant_suggestions(oxid_structure, num_dopants=5, threshold=0.001):
    """Get dopant suggestions based on substitution probabilities.
    
    Args:
        oxid_structure (Structure): A pymatgen structure decorated with
            oxidation states.
        num_dopants (int): The nummber of suggestions to return for
            n- and p-type dopants.
        threshold (float): Probability threshold for substitutions.
    """
    subs = [sp.list_prediction([s]) for s in structure.composition.elements]
    subs = [{'probability': pred['probability'],
             'new_species': list(pred['substitutions'].keys())[0],
             'old_species': list(pred['substitutions'].values())[0]} 
            for species_preds in subs for pred in species_preds]
    subs.sort(key=lambda x: x['probability'], reverse=True)
    
    n_type = [pred for pred in subs
              if pred['new_species'].oxi_state > pred['old_species'].oxi_state]
    p_type = [pred for pred in subs
              if pred['new_species'].oxi_state < pred['old_species'].oxi_state] 
    
    return {'n_type': n_type[:num_dopants], 'p_type': p_type[:num_dopants]}

Now let's run the function on the structure we downloaded earlier:

In [13]:
dopants = get_dopant_suggestions(structure, num_dopants=num_dopants, threshold=threshold)

pprint(dopants)

INFO:root:11 substitutions found
INFO:root:87 substitutions found


{'n_type': [{'new_species': Specie F-,
             'old_species': Specie O2-,
             'probability': 0.06692682583342519},
            {'new_species': Specie Cl-,
             'old_species': Specie O2-,
             'probability': 0.0210226382884322},
            {'new_species': Specie Ta5+,
             'old_species': Specie Sn4+,
             'probability': 0.019486221245908524},
            {'new_species': Specie Sb5+,
             'old_species': Specie Sn4+,
             'probability': 0.010380692735493769},
            {'new_species': Specie Nb5+,
             'old_species': Specie Sn4+,
             'probability': 0.009988531781437165}],
 'p_type': [{'new_species': Specie Co2+,
             'old_species': Specie Sn4+,
             'probability': 0.023398867249112963},
            {'new_species': Specie Cd2+,
             'old_species': Specie Sn4+,
             'probability': 0.022644061067779372},
            {'new_species': Specie Li+,
             'old_species': Specie S

We get a list of potential dopants sorted by their substitution probability. The most likely n-type dopant is F on a O site. Fluorine doped SnO2 (FTO) is one of the most widely used transparent conducting oxides, therefore validating this approach.