# Lesson 8: Test your Skills

## Now it's time to put your skills to the test!
* Exercise: As a materials investigator, your team has just finished running calculations for a large number of crystal structures. Now, you want to analyze the dataset by writing __functions__ that utilize logic (in the form of __conditionals__) to parse through the datasets in the forms of __dictionaries__ and __lists__.

First, load the relevant datasets

In [None]:
import os
from monty.serialization import loadfn

data_dir = "../data/"

# Our crystal structures, in addition to useful elemental information
crystals = loadfn(os.path.join(data_dir, "crystals.json"))
atomic_numbers = loadfn(os.path.join(data_dir, "atomic_numbers.json"))
atomic_weights = loadfn(os.path.join(data_dir, "atomic_weights.json"))

# Names of elements associated with an element symbol (e.g. {"Al": "Aluminium"})
element_names = loadfn(os.path.join(data_dir, "element_names.json"))

# Elemental mass fraction of Earth's crust (source: https://en.wikipedia.org/wiki/Abundances_of_the_elements_(data_page))
mass_frac_earth_crust = loadfn(os.path.join(data_dir, "mass_frac_earth_crust.json"))

# Dictionary of elements, with those who discovered them 
# and the year they were discovered 
# (sources: https://en.wikipedia.org/wiki/Timeline_of_chemical_element_discoveries,
#  https://education.jlab.org/qa/discover_ele.html)
discovery_dict = loadfn(os.path.join(data_dir, "discoveries.json"))

We have the mapping from `symbol` $\rightarrow$ `name`, now let's obtain `name` $\rightarrow$ `symbol` (we'll need it later)

In [None]:
print(element_names["Ag"])

element_symbols = {v: k for k, v in element_names.items()}

print(element_symbols["Silver"])

Next, we'll choose the set of elements that we would like to start with. Each element has a unique story associated with it. We can explore these stories with the information from the element discovery timeline.

__Our task:__ Build a function that takes a list of element symbols as input. In addition, if we want to add the elements associated with a particular materials discoverer, we can include some _regex_ (or "regular expression") strings corresponding to the discoverers we would like to feature.

In [None]:
from pymatgen.core import Element, Composition

In [None]:
def my_element_discoverer(elements:list=[], discoverers:list=[], prior_to:int=2020):
    
    # Start with elements provided by user 
    # (use copy() to not modify list passed to function)
    all_elements = elements.copy()
    
    # Loop through discoverers to find which elements they discovered
    # and then add them to your list
    for discoverer in discoverers:
        for element_name in discovery_dict:
            
            for name in discovery_dict[element_name]['discovered_by']:
                if discoverer in name:
                    elem = Element(element_symbols[element_name])
                    all_elements.append(elem)
    
    # Create a new list that will contain only the elements discovered 
    # before the year you provide
    elements_pruned = []
    
    for elem in all_elements:
        element_name = element_names[str(elem)]
        if discovery_dict[element_name]['year'] != '?':
            year_discovered = int(discovery_dict[element_name]['year'])
        else:
            # Unknown discovery date (known since ancient times)  
            year_discovered = -2000
            
        if year_discovered < prior_to:
            elem = Element(elem)
            elements_pruned.append(elem)
    
    all_elements = elements_pruned
    
    # Use set() to convert your list into a unique set
    # then recast as a list()
    all_elements = list(set(all_elements))
    
    return all_elements


Let's test out our function! Say we want to use oxygen in addition to elements discovered by scientists with Curie in their name (Marie and Pierre), as well as elements discovered by scientists in Berkeley

In [None]:
my_elements = my_element_discoverer(elements=["O"], 
                                    discoverers=["Curie", "Lawrence Berkeley"], 
                                    prior_to=2020)
print(my_elements)

Can you spot which elements are missing?

(Hint below)

In [None]:
# Discovered by scientists at UC Berkeley
my_elements = my_element_discoverer(discoverers=["Ghiorso"], 
                                    prior_to=2020)
print(my_elements)

Platinum was found in gold alloys in present day Columbia dating as far back as 800 BC

In [None]:
my_elements = my_element_discoverer(elements=["Ni"], 
                                    discoverers=["Indigenous People of South America"], 
                                    prior_to=1800)
print(my_elements)

Next, our goal is to find all of the crystals that contain the elements in our list. 

Once we obtain this list, it is often useful to sort these materials based on a particular metric, depending on which properties that we would like to assess. The two criteria we will use in this study are the molar mass and abundance:

* Molar mass = $\sum_{i=1}^N x_i n_i$, where $n_i$ is the elemental molar mass, and $x_i$ is the molar fraction
* Abundance metric = $\prod_{i=1}^N a_i^{y_i}$, where $a_i$ is the mass abundance of the element $i$, and $y_i$ is the mass fraction in the material

In [None]:
import numpy as np

In [None]:
def compute_molar_weight(crystal):
    # Obtain atomic fractions of material
    comp = Composition(crystal["pretty_formula"])                                                                                          
    atomic_fractions = [comp.get_atomic_fraction(Element(elem)) for elem in crystal['elements']]
    
    # Obtain atomic weights of elements in material
    weights = [atomic_weights[elem] for elem in crystal['elements']]
    
    # Compute molar weight of material (weighted mean)
    molar_weight = 0.0
    for elem,frac,weight in zip(crystal["elements"], atomic_fractions, weights):
        molar_weight += frac*weight
    
    return molar_weight

def compute_abundance_metric(crystal):
    # Obtain mass fractions of material
    comp = Composition(crystal["pretty_formula"])                                                                                          
    molar_weight = compute_molar_weight(crystal)
    mass_fractions = [comp.get_atomic_fraction(Element(elem))*(atomic_weights[elem]/molar_weight) 
                      for elem in crystal['elements']]
    
    # Obtain atomic abundances of elements in material
    abundances = [mass_frac_earth_crust[elem] for elem in crystal['elements']]
    
    # Compute abundance metric (geometric mean)
    abundance_metric = 1.0
    for elem,frac,abundance in zip(crystal["elements"], mass_fractions, abundances):
        abundance_metric *= (100*abundance)**frac
    
    return abundance_metric

In [None]:
crystal = crystals[0]
print("Material:", crystal["pretty_formula"])
print("Molar weight (g / mol / # atoms per formula unit) = ", compute_molar_weight(crystal))
print("Abundance metric (% kg/kg) = ", compute_abundance_metric(crystal))

Now that we have our sorting metrics, let's find the materials in our dataset that contain the elements we have provided, and sort them based on the criteria above.

A helpful relation: For finite sets $A$ and $B$, $B \subseteq A$ if and only if $B = A \cap B$

In [None]:
def get_crystals_from_elements(elements:list, crystals:list, sort_scheme:int=0):
    
    crystals_matched = []
    for crystal in crystals:
        
        crystal_elems = [Element(elem) for elem in crystal['elements']]
        
        # We can check if the material contains the elements that we have provided 
        # (set B in A) by checking if B = intersection(A,B)
        intersection = set(elements) & set(crystal_elems)
        if set(elements) == intersection:
            crystals_matched.append(crystal)
        
    if sort_scheme == 0:
        # Sort by molar weight
        sort_key = "Molar weight (g / mol / # atoms per formula unit)"
        sortable_values = [compute_molar_weight(crystal) for crystal in crystals_matched]
    elif sort_scheme == 1:
        # Sort by abundance metric
        sort_key = "Abundance metric (% kg/kg)"
        sortable_values = [compute_abundance_metric(crystal) for crystal in crystals_matched]
    else:
        # Catch all case - no sorting provided
        sortable_values = []
        print("Warning: Invalid sort scheme!")
        return [], [], ""
    
    if sortable_values:
        #print(sortable_values)
        
        # Sort crystals based on sorting metric (reverse to descending order)
        value_map = [{"value":v, "crystal":c} for v,c in zip(sortable_values, crystals_matched)]
        value_map = sorted(value_map, key=lambda x: x["value"], reverse=True)        
        #sortable_values, crystals_matched = zip(*sorted(zip(sortable_values, crystals_matched), reverse=True))
        
        # Extract desired values after sorting
        sortable_values = [x['value'] for x in value_map]
        crystals_matched = [x['crystal'] for x in value_map]
        
        #print(sortable_values)
        
    return crystals_matched, sortable_values, sort_key

We've built our functions! Now let's test them to search the materials in our dataset

In [None]:
# The elements we would like to search our dataset
elements = [Element(elem) for elem in ['Ca', 'O']]

# Testing our function
crystals_matched, sorted_values, sort_key = get_crystals_from_elements(elements, crystals, 
                                                                       sort_scheme=0)

# The formulas of the crystals that we searched for
formulas = [crystal['pretty_formula'] for crystal in crystals_matched]

print("Number of crystals found: ", len(crystals_matched))
# print(formulas)

For a large number of materials, we often gain the most information by perfoming an analysis of how the sorting criteria is distributed. In this case, we can create a histogram of the data using the `matplotlib` package. 

In [None]:
import matplotlib.pyplot as plt

number_of_bins = 20
plt.hist(sorted_values, number_of_bins)
plt.ylabel('Counts')
plt.xlabel(sort_key)
plt.show()