## *Querying, Organizing and Visualizing Materials Data*


**Why?** Access to data associated with materials in electronic form enables engineers, scientists and
students to explore this data, display it graphically, find trends and develop models.

**What?** In this tutorial, we will learn how to query, organize and plot data from the databases associated with the Python libraries [Pymatgen](http://pymatgen.org/) and [Mendeleev](https://mendeleev.readthedocs.io/en/stable/). 

**How to use this?** This tutorial uses Python, some familiarity with programming would be beneficial but is not required. Run each code cell in order by clicking "Shift + Enter". Feel free to modify the code, or change queries to familiarize yourself with the workings on the code.


Suggested modifications and exercises are included in <font color=blue> blue</font>.

**Outline:**

1. Query from Pymatgen
2. Processing and Organizing Data
3. Plotting
4. Query from Mendeleev

**Get started:** Click "Shift-Enter" on the code cells to run! 

In [1]:
# These lines import both libraries and then define an array with elements to be used below
# You'll need to install pymatgen and mendeleev (use conda or pip)

import pymatgen.core as pymat
import mendeleev as mendel
import pandas as pd

elements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg',
            'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr',
            'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br',
            'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag',
            'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'Hf', 'Ta', 'W',
            'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'La', 'Ce', 'Pr',
            'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu',
            'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu']

### 1. Query from Pymatgen

Pymatgen is an open-source library in python used for material analysis. Pymatgen is a powerful and popular resource that can be used to access data in the two repositories: the [Materials Project](https://materialsproject.org/) and the [Crystallography Open Database](http://www.crystallography.net/cod/). Pymatgen makes querying these resources and obtain data from its internal database easy. We will start by querying the database within the library by using the **Element** class.

Making a query in Pymatgen requires the chemical symbol of the element, which are all listed in the cell above. From there, the property is accessible as an attribute of that Element object. For a list of all the properties available click [here](http://pymatgen.org/pymatgen.core.periodic_table.html?pymatgen.core.periodic_table.Specie.element) to learn more about the Element class.

In this example we will query the Young's modulus for the elements in the list "sample". You will be able to see the values with the corresponding units for this quantity. You can use the commented code to query all the properties listed for the "sample" elements.

In [2]:
querable_pymatgen = ["atomic_mass", "poissons_ratio","atomic_radius", "electrical_resistivity","molar_volume","thermal_conductivity", "bulk_modulus", "youngs_modulus", 
                    "brinell_hardness", "average_ionic_radius", "melting_point", "rigidity_modulus", "density_of_solid","coefficient_of_linear_thermal_expansion"]

sample = ['Fe', 'Co', 'Ni', 'Cu', 'Zn']

for item in sample:
    element_object = pymat.periodic_table.Element(item)
    print(item, element_object.youngs_modulus) # You can change "youngs_modulus" to any of the properties in the querable_pymatgen list
    
#for item in sample:
#    for i in querable_pymatgen:
#        element_object = pymat.Element(item)
#        print(item, i, getattr(element_object,i))

Fe 211.0 GPa
Co 209.0 GPa
Ni 200.0 GPa
Cu 130.0 GPa
Zn 108.0 GPa


 * <font color=blue> **Exercise 1.** Modify the query above to extract Brinell hardness. </font>
 * <font color=blue> **Exercise 2.** Uncomment the lines above to see all the properties of the selected elements. </font>
 
 Remember: "Shift-Enter" to re-run the cell.
 

### 2. Processing and Organizing Data

After going through the basics of a query, we will now learn how to organize data in Python lists and dictionaries.

Entries in a dictionary have a name (in our case, the element) and attributes associated with it. Dictonaries can be useful to store a collection of data values from a particular element. In this example, we will create one to store some of the properties for Iron, using queries from both of the libraries we discussed. Note that the specific heat is obtained from Mendeleev, which is another database to access properties of elements.

In [3]:
Fe_data = {} # Initializing a dictionary

# Each of the following lines is making a single entry

Fe_data["atomic_number"] = mendel.element("Fe").atomic_number 
Fe_data["coefficient_of_linear_thermal_expansion"] = pymat.Element("Fe").coefficient_of_linear_thermal_expansion
Fe_data["youngs_modulus"] = pymat.Element("Fe").youngs_modulus
Fe_data["specific_heat"] = mendel.element("Fe").specific_heat

#Print the entire entry for Fe
print(Fe_data)

#Print a specific attribute:
print(Fe_data["specific_heat"])

# This line is to delete an entry
    # del Fe_data["atomic_number"]

{'atomic_number': 26, 'coefficient_of_linear_thermal_expansion': 1.18e-05, 'youngs_modulus': 211.0, 'specific_heat': 0.449}
0.449


Another way we can organize data is in lists, which can be very helpful if we want to create plots with our data. Following the examples above, we will now query two specific properties for all elements to get a list of values which will be indexed corresponding to the positions of the elements in the "elements" list in the first cell of the tutorial.

In [4]:
sample = elements.copy()

CTE = [] # In this list we will store the Coefficients of Thermal Expansion
youngs_modulus = [] # In this list we will store the Young's Moduli
melting_temperature = [] # In this list we will store the Melting Temperatures

for item in sample:
    CTE.append(pymat.Element(item).coefficient_of_linear_thermal_expansion)
    youngs_modulus.append(pymat.Element(item).youngs_modulus)
    melting_temperature.append(pymat.Element(item).melting_point)

# You can visualize the lists by uncommenting these print statements
#print(CTE)
#print(youngs_modulus)
#print(melting_temperature)

# We will use the following arrays to group elements by their crystal structure at RT, all elements that are gases and liquids at RT have been removed

fcc_elements = ["Ag", "Al", "Au", "Cu", "Ir", "Ni", "Pb", "Pd", "Pt", "Rh", "Sr", "Th", "Yb"]
bcc_elements = ["Ba", "Cr", "Cs", "Eu", "Fe", "K", "Li", "Mn", "Mo", "Na", "Nb", "P", "Rb", "Ta", "V", "W" ]
hcp_elements = ["Be", "Ca", "Cd", "Co", "Dy", "Er", "Gd", "Hf", "Ho", "Lu", "Mg", "Os", "Re", "Ru", "Sc", "Tb", "Tc","Ti", "Tl", "Tm", "Y", "Zn", "Zr"]

# Others (Solids): "B", "Sb", "Sm", "Bi" and "As" are Rhombohedral; "C" , "Ce" and "Sn" are Allotropic; "Si" and "Ge" are Face-centered diamond-cubic; "Pu" is Monoclinic;
#                  "S", "I", "U", "Np" and "Ga" are Orthorhombic; "Se" and "Te" Hexagonal; "In" and "Pa" are Tetragonal; "la", "Pr", "Nd", "Pm" are Double hexagonal close-packed;

Finally, the most efficient way we to visualize how the dataset we just created looks is to use the [Pandas](https://pandas.pydata.org/) library to display it. This library will take the list of lists and show it in a nice, user-friendly table with the properties as the column headers.

For this exercise, we will work with the data extracted for elements with the <b>FCC crystal structure</b>. 

First, we will create a list of lists using a for-loop and the values we can query from the Pymatgen library. We can specify the names for each column from our array of properties we queried. 

In [5]:
all_values = [] # Values for Attributes

for item in fcc_elements:
    element_values = []

    element_object = pymat.Element(item)    
    for i in querable_pymatgen:
        element_values.append(getattr(element_object,i))
        
    all_values.append(element_values) # All lists are appended to another list, creating a list of lists
    
# Pandas Dataframe
df = pd.DataFrame(all_values, columns=querable_pymatgen)
display(df)

Unnamed: 0,atomic_mass,poissons_ratio,atomic_radius,electrical_resistivity,molar_volume,thermal_conductivity,bulk_modulus,youngs_modulus,brinell_hardness,average_ionic_radius,melting_point,rigidity_modulus,density_of_solid,coefficient_of_linear_thermal_expansion
0,107.8682,0.37,1.6,1.63e-08,10.27,430.0,100.0,83.0,24.5,1.086667,1234.93,30.0,10490.0,1.9e-05
1,26.981539,0.35,1.25,2.7e-08,10.0,235.0,76.0,70.0,245.0,0.675,933.47,26.0,2700.0,2.3e-05
2,196.966569,0.44,1.35,2.2e-08,10.21,320.0,220.0,78.0,2450.0,1.07,1337.33,27.0,19300.0,1.4e-05
3,63.546,0.34,1.35,1.72e-08,7.11,400.0,140.0,130.0,874.0,0.82,1357.77,48.0,8920.0,1.7e-05
4,192.217,0.26,1.35,4.7e-08,8.52,150.0,320.0,528.0,1670.0,0.765,2739.0,210.0,22650.0,6e-06
5,58.6934,0.31,1.35,7.2e-08,6.59,91.0,180.0,200.0,700.0,0.74,1728.0,76.0,8908.0,1.3e-05
6,207.2,0.44,1.8,2.1e-07,18.26,35.0,46.0,16.0,38.3,1.1225,600.61,5.6,11340.0,2.9e-05
7,106.42,0.39,1.4,1.08e-07,8.56,72.0,180.0,121.0,37.3,0.84625,1828.05,44.0,12023.0,1.2e-05
8,195.084,0.38,1.35,1.06e-07,9.09,72.0,230.0,168.0,392.0,0.805,2041.4,61.0,21090.0,9e-06
9,102.9055,0.26,1.35,4.3e-08,8.28,150.0,380.0,275.0,1100.0,0.745,2237.0,150.0,12450.0,8e-06


Pandas allows for easier manipulation of the data than the structures we discussed before, both dictionaries and lists of lists.
We can make modifications to this dataframe in each of the following cells, to showcase the flexibility the Pandas library offers.

To make this dataframe look better for example, we can start by using the list of elements instead of numbered rows.

In [7]:
df.index = fcc_elements
display(df)

Unnamed: 0,atomic_mass,poissons_ratio,atomic_radius,electrical_resistivity,molar_volume,thermal_conductivity,bulk_modulus,youngs_modulus,brinell_hardness,average_ionic_radius,melting_point,rigidity_modulus,density_of_solid,coefficient_of_linear_thermal_expansion
Ag,107.8682,0.37,1.6,1.63e-08,10.27,430.0,100.0,83.0,24.5,1.086667,1234.93,30.0,10490.0,1.9e-05
Al,26.981539,0.35,1.25,2.7e-08,10.0,235.0,76.0,70.0,245.0,0.675,933.47,26.0,2700.0,2.3e-05
Au,196.966569,0.44,1.35,2.2e-08,10.21,320.0,220.0,78.0,2450.0,1.07,1337.33,27.0,19300.0,1.4e-05
Cu,63.546,0.34,1.35,1.72e-08,7.11,400.0,140.0,130.0,874.0,0.82,1357.77,48.0,8920.0,1.7e-05
Ir,192.217,0.26,1.35,4.7e-08,8.52,150.0,320.0,528.0,1670.0,0.765,2739.0,210.0,22650.0,6e-06
Ni,58.6934,0.31,1.35,7.2e-08,6.59,91.0,180.0,200.0,700.0,0.74,1728.0,76.0,8908.0,1.3e-05
Pb,207.2,0.44,1.8,2.1e-07,18.26,35.0,46.0,16.0,38.3,1.1225,600.61,5.6,11340.0,2.9e-05
Pd,106.42,0.39,1.4,1.08e-07,8.56,72.0,180.0,121.0,37.3,0.84625,1828.05,44.0,12023.0,1.2e-05
Pt,195.084,0.38,1.35,1.06e-07,9.09,72.0,230.0,168.0,392.0,0.805,2041.4,61.0,21090.0,9e-06
Rh,102.9055,0.26,1.35,4.3e-08,8.28,150.0,380.0,275.0,1100.0,0.745,2237.0,150.0,12450.0,8e-06


We can then use simple Pandas binary operations to only show elements that satisfy a certain condition.

The first cell will display a version of the dataframe filtered to elements that have an atomic mass <i>greater or equal</i> than 150u. (Pandas operator .ge) <br>
The second cell will display a version of the dataframe filtered to elements with exactly 0.26 Poissons' ratio. (Pandas operator .eq) <br>

There are standard operators for greater or equal (.ge), less or equal (.le), equal (.eq) and not equal (.ne). A list of such operations can be found [here](https://pandas.pydata.org/pandas-docs/version/0.24.2/reference/frame.html#binary-operator-functions). However, we can also create our custom binary conditions.

The third cell will display a version with a custom binary condition. The elements shown have Young's modulus less than 120 GPa, and Poissons' ratio greater than 0.25. <br>