# PBAs: E Above Hull Analysis

In this notebook, I conduct an analysis of the energy above hull data for the PBAs. In materials science, energy above hull is a measure of a chemical structure's stability, where a lower energy above hull is more stable. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

## Pre-processing and Loading

Before we can load the json into python, we need to change a few formatting issues with the file:

In [2]:
# with open('pba.json', 'r') as file :
#   pba_json = file.read()

In [3]:
# #Getting rid of the /* i */ and replacing with a comma:
# for i in range(1,700):
#     j = str(i)
#     pba_json = pba_json.replace('/* ' + j + ' */', ',')

# print(pba_json[:100])

In [4]:
# #Adding square brackets:
# pba_json = '[\n' + pba_json + '\n]'
# #Deleting first comma:
# pba_json = pba_json[:2] + pba_json[3:]

In [5]:
# #Getting rid of the ObjectId() tag from the _id value:
# pba_json = pba_json.replace("ObjectId(", "")
# pba_json = pba_json.replace(")", "")

In [6]:
# #Saving to file as pba_1.json
# pba_1 = open('pba_1.json', 'w')
# pba_1.write(pba_json)
# pba_1.close()

Now we are ready to load the file using monty.serialization.loadfn, which loads the json into a list of dictionary entries.

In [7]:
from monty.serialization import loadfn

In [8]:
data_1 = loadfn('pba_1.json')

In [9]:
len(data_1)

536

In [10]:
data_1[1]

{'_id': '58e5d103d95cbb63a64878f0', 'input': {'structure': Structure Summary
  Lattice
      abc : 9.9509025313318 9.9509025313318 9.9509025313318
   angles : 89.99613296435679 90.00386703564321 90.00386703564321
   volume : 985.3429511575596
        A : 9.95090252 -0.0003358 -0.0003358
        B : -0.0003358 9.95090252 0.0003358
        C : -0.0003358 0.0003358 9.95090252
  PeriodicSite: Ca (7.4762, 7.4759, 7.4759) [0.7514, 0.7513, 0.7513]
  PeriodicSite: Ca (2.4747, 2.4744, 7.4759) [0.2487, 0.2486, 0.7513]
  PeriodicSite: Ca (2.4747, 7.4759, 2.4744) [0.2487, 0.7513, 0.2486]
  PeriodicSite: Ca (2.4705, 7.4801, 7.4801) [0.2483, 0.7517, 0.7517]
  PeriodicSite: Fe (0.0067, 9.9439, 9.9439) [0.0007, 0.9993, 0.9993]
  PeriodicSite: Fe (4.9721, 4.9785, 9.9466) [0.4997, 0.5003, 0.9996]
  PeriodicSite: Fe (0.0039, 4.9785, 4.9785) [0.0004, 0.5003, 0.5003]
  PeriodicSite: Fe (4.9721, 9.9466, 4.9785) [0.4997, 0.9996, 0.5003]
  PeriodicSite: Co (4.9696, 9.9460, 9.9460) [0.4995, 0.9995, 0.9995]
  P

Now that the pba data is loaded into python, we can begin to building pymatgen entries for each structure.

## Using Pymatgen

In [11]:
import pymatgen as mg

### Creating pymatgen entries

Next, we want to make pymatgen entries using the composition and energy values. Here is an example of a ComputedEntry:

In [12]:
from pymatgen.entries.computed_entries import ComputedEntry

my_entry = ComputedEntry(composition="Ni4O2",
                  energy=-28,
                  parameters={"potcar_symbols": ['pbe Ni_pv', 'pbe O'],
                              "hubbards":{'Ni': 6.2, 'O': 0.0}},
                  data={"oxide_type":"oxide"})

print(my_entry)

ComputedEntry None - Ni4 O2
Energy = -28.0000
Correction = 0.0000
Parameters:
potcar_symbols = ['pbe Ni_pv', 'pbe O']
hubbards = {'Ni': 6.2, 'O': 0.0}
Data:
oxide_type = oxide


The first step to creating a ComputedEntry is gettting the composition, which can be given either as a dict or as a string.

In [13]:
struct=data_1[1]['input']['structure']

In [14]:
struct.composition

Comp: Ca4 Fe4 Co4 C24 N24

Next, we access the energy value from the 'output' section of the main data_1 file:

In [15]:
out = data_1[1]['output']
out['energy']

-476.8670732

Now we can create a pymatgen entry with the pba structure in data_1[1]:

In [16]:
struct=data_1[1]['input']['structure']
pba_1 = ComputedEntry(composition=struct.composition,
                  energy=data_1[1]['output']['energy'],
                      parameters = {"nelect": data_1[1]['input']['parameters']['NELECT'],
                                    "hubbards": data_1[1]['input']['hubbards'],
                                    "potcar_spec": data_1[1]['input']['potcar_spec'],
                                    "is_hubbard": data_1[1]['input']['is_hubbard']})

print(pba_1)

ComputedEntry None - Ca4 Fe4 Co4 C24 N24
Energy = -476.8671
Correction = 0.0000
Parameters:
nelect = 348.0
hubbards = {}
potcar_spec = [{'titel': 'PAW_PBE Ca_sv 06Sep2000', 'hash': 'eb006721e214c04b3c13146e81b3a27d'}, {'titel': 'PAW_PBE Fe_pv 06Sep2000', 'hash': '5963411539032ec3298fa675a32c5e64'}, {'titel': 'PAW_PBE Co 06Sep2000', 'hash': 'b169bca4e137294d2ab3df8cbdd09083'}, {'titel': 'PAW_PBE C 08Apr2002', 'hash': 'c0a8167dbb174fe492a3db7f5006c0f8'}, {'titel': 'PAW_PBE N 08Apr2002', 'hash': 'b98fd027ddebc67da4063ff2cabbc04b'}]
is_hubbard = False
Data:


In [17]:
# *** START HERE (9/14): 
#we'll make full entries for all of the structures in data_1, then we'll start working with the Materials Project Data

Let's iterate over the whole data_1 file with this method to make a list of pba entries:

In [18]:
pba_entries = []
for i in range(0, len(data_1)):
    if 'input' in data_1[i]:
        struct = data_1[i]['input']['structure']
        pba_entry = ComputedEntry(composition = struct.composition,
                                 energy = data_1[i]['output']['energy'],
                                 parameters = {"nelect": data_1[i]['input']['parameters']['NELECT'],
                                    "hubbards": data_1[i]['input']['hubbards'],
                                    "potcar_spec": data_1[i]['input']['potcar_spec'],
                                    "is_hubbard": data_1[i]['input']['is_hubbard']})
        pba_entries.append(pba_entry)
pba_entries[:5]

[ComputedEntry None - Ca4 Fe4 Co4 C24 N24
 Energy = -476.8671
 Correction = 0.0000
 Parameters:
 nelect = 348.0
 hubbards = {}
 potcar_spec = [{'titel': 'PAW_PBE Ca_sv 06Sep2000', 'hash': 'eb006721e214c04b3c13146e81b3a27d'}, {'titel': 'PAW_PBE Fe_pv 06Sep2000', 'hash': '5963411539032ec3298fa675a32c5e64'}, {'titel': 'PAW_PBE Co 06Sep2000', 'hash': 'b169bca4e137294d2ab3df8cbdd09083'}, {'titel': 'PAW_PBE C 08Apr2002', 'hash': 'c0a8167dbb174fe492a3db7f5006c0f8'}, {'titel': 'PAW_PBE N 08Apr2002', 'hash': 'b98fd027ddebc67da4063ff2cabbc04b'}]
 is_hubbard = False
 Data:, ComputedEntry None - Mg4 Cr4 Os4 C24 N24
 Energy = -501.1275
 Correction = 0.0000
 Parameters:
 nelect = 352.0
 hubbards = {}
 potcar_spec = [{'titel': 'PAW_PBE Mg_pv 06Sep2000', 'hash': 'bbcf6f81cc34a3090d483ad641178746'}, {'titel': 'PAW_PBE Cr_pv 07Sep2000', 'hash': 'eb23364cc25164418f9f79efd8f04f7d'}, {'titel': 'PAW_PBE Os_pv 20Jan2003', 'hash': '7bb96cace1809ebeb4d030d71024c5bf'}, {'titel': 'PAW_PBE C 08Apr2002', 'hash': '

In [19]:
len(pba_entries)

535

### Accessing materials entries from the Materials Project

Next, let's try to add this to the main entries list that was generated from the Materials Project. We will then try to apply the MPC corrections to this list, then get e_above_hull values.

In [20]:
#for i in range[len(pba_entries)]:
comp = pba_entries[0].as_dict()['composition'].keys()
comp

dict_keys(['Ca', 'Fe', 'Co', 'C', 'N'])

In [21]:
from pymatgen import MPRester
mpr = MPRester(api_key='clRGHmBDgp1xt9zA')

In [22]:
entries = mpr.get_entries_in_chemsys(comp)
entries #these are all the entries from the actual Materials Project with those atoms

[ComputedEntry mp-10683 - Ca1
 Energy = -1.6033
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseudo_potential = {'functional': 'PBE', 'labels': ['Ca_sv'], 'pot_type': 'paw'}
 hubbards = {}
 potcar_symbols = ['PBE Ca_sv']
 oxide_type = None
 Data:
 oxide_type = None, ComputedEntry mp-45 - Ca1
 Energy = -2.0218
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseudo_potential = {'functional': 'PBE', 'labels': ['Ca_sv'], 'pot_type': 'paw'}
 hubbards = {}
 potcar_symbols = ['PBE Ca_sv']
 oxide_type = None
 Data:
 oxide_type = None, ComputedEntry mp-1067285 - Ca4
 Energy = -6.9113
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseudo_potential = {'functional': 'PBE', 'labels': ['Ca_sv'], 'pot_type': 'paw'}
 hubbards = {}
 potcar_symbols = ['PBE Ca_sv']
 oxide_type = None
 Data:
 oxide_type = None, ComputedEntry mp-1008498 - Ca4
 Energy = -7.3837
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseu

In [23]:
entries.append(pba_entries[0])

In [24]:
len(entries)

213

### Applying Correction

In [25]:
from pymatgen import MPRester

In [26]:
mpr = MPRester(api_key='clRGHmBDgp1xt9zA') #need API key (froom MP website -> dashboard)

In [27]:
#entries = mpr.get_entries_in_chemsys('Ca-Fe-Co-C-N'.split('-'))
#this gets all of the entries containing these elements

In [28]:
type(entries[1])

pymatgen.entries.computed_entries.ComputedEntry

Now we have to take into account the corrections

NELECT become nelect in parameters

see email for all parameters and how to access in the dictionary

In [29]:
from pymatgen.entries.compatibility import MaterialsProjectCompatibility

In [30]:
mpc = MaterialsProjectCompatibility()

In [31]:
entries_1 = mpc.process_entries(entries)
#use correction term here in building phase diagram
#put this entries_1 into the phase diagram method

In [32]:
entries_1[:]

[ComputedEntry mp-10683 - Ca1
 Energy = -1.6033
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseudo_potential = {'functional': 'PBE', 'labels': ['Ca_sv'], 'pot_type': 'paw'}
 hubbards = {}
 potcar_symbols = ['PBE Ca_sv']
 oxide_type = None
 Data:
 oxide_type = None, ComputedEntry mp-45 - Ca1
 Energy = -2.0218
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseudo_potential = {'functional': 'PBE', 'labels': ['Ca_sv'], 'pot_type': 'paw'}
 hubbards = {}
 potcar_symbols = ['PBE Ca_sv']
 oxide_type = None
 Data:
 oxide_type = None, ComputedEntry mp-1067285 - Ca4
 Energy = -6.9113
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseudo_potential = {'functional': 'PBE', 'labels': ['Ca_sv'], 'pot_type': 'paw'}
 hubbards = {}
 potcar_symbols = ['PBE Ca_sv']
 oxide_type = None
 Data:
 oxide_type = None, ComputedEntry mp-1008498 - Ca4
 Energy = -7.3837
 Correction = 0.0000
 Parameters:
 run_type = GGA
 is_hubbard = False
 pseu

In [33]:
import pymatgen.entries.compatibility as pycomp

In [34]:
last = entries[-1]

In [35]:
last

ComputedEntry None - Ca4 Fe4 Co4 C24 N24
Energy = -476.8671
Correction = 0.0000
Parameters:
nelect = 348.0
hubbards = {}
potcar_spec = [{'titel': 'PAW_PBE Ca_sv 06Sep2000', 'hash': 'eb006721e214c04b3c13146e81b3a27d'}, {'titel': 'PAW_PBE Fe_pv 06Sep2000', 'hash': '5963411539032ec3298fa675a32c5e64'}, {'titel': 'PAW_PBE Co 06Sep2000', 'hash': 'b169bca4e137294d2ab3df8cbdd09083'}, {'titel': 'PAW_PBE C 08Apr2002', 'hash': 'c0a8167dbb174fe492a3db7f5006c0f8'}, {'titel': 'PAW_PBE N 08Apr2002', 'hash': 'b98fd027ddebc67da4063ff2cabbc04b'}]
is_hubbard = False
Data:

Will then have energy above hull values, which is what we will analyze. 

see example for method on how to access e above hull (higher e above hull is less stable)

eventually put into df with A, P, R, # of A, and e above hull (we need an e above hull for every A = 1 to 8)

Remember that we only need the e above hull data for the pbas, we don't need that calculation for all of the other strucutres with the same A,P,R etc.

Not sure how to calculate the e above hull for materials that aren't already in the materials project database?

### Starting the analysis

In [36]:
from pymatgen import MPRester
mpr = MPRester(api_key='clRGHmBDgp1xt9zA')
from pymatgen.entries.compatibility import MaterialsProjectCompatibility
mpc = MaterialsProjectCompatibility()
from pymatgen.analysis.phase_diagram import PhaseDiagram, PDPlotter

In [37]:
#First, initialize a dataframe filled with NaN, which we will fill as we go through pba_entries:
pba_e_hull_df = pd.DataFrame(index=range(len(pba_entries)),columns=['Composition','e_above_hull','A_atom','P_atom','R_atom','Correction'])
for i in range(len(pba_entries)): #Looping through the length of the pba_entries list
    try:
        entries = mpr.get_entries_in_chemsys(pba_entries[i].as_dict()['composition'].keys()) #access entries from MP
        entries.append(pba_entries[i])
        entries = mpc.process_entries(entries) #applying correction
        phase_d = PhaseDiagram(entries) #making phase diagram, which will allow calculation of e_above_hull
    
        #Putting composition, e_above_hull, and correction values into pba_df:
        pba_e_hull_df.loc[i, 'Composition'] = entries[-1].composition.formula
        pba_e_hull_df.loc[i, 'e_above_hull'] = phase_d.get_e_above_hull(entries[-1])
        pba_e_hull_df.loc[i, 'Correction'] = entries[-1].correction
    
        if len(list(entries[-1].composition.as_dict())) == 4:
            #This will be true when P and R are the same atom, in which case we need to index differently.
            pba_e_hull_df.loc[i, 'A_atom'] = list(entries[-1].composition.as_dict())[0]
            pba_e_hull_df.loc[i, 'P_atom'] = list(entries[-1].composition.as_dict())[1]
            pba_e_hull_df.loc[i, 'R_atom'] = list(entries[-1].composition.as_dict())[1]
        else:
            pba_e_hull_df.loc[i, 'A_atom'] = list(entries[-1].composition.as_dict())[0]
            pba_e_hull_df.loc[i, 'P_atom'] = list(entries[-1].composition.as_dict())[1]
            pba_e_hull_df.loc[i, 'R_atom'] = list(entries[-1].composition.as_dict())[2]
    except:
        print('The error occured on loop ' + str(i))

The error occured on loop 1
The error occured on loop 12
The error occured on loop 21
The error occured on loop 40
The error occured on loop 56
The error occured on loop 134
The error occured on loop 185
The error occured on loop 187
The error occured on loop 258
The error occured on loop 271
The error occured on loop 273
The error occured on loop 277
The error occured on loop 429
The error occured on loop 456
The error occured on loop 475
The error occured on loop 493


Need to figure out what's causing error - appears that one of the pba_entries doesn't have a correction term.

In [53]:
error_entries = [1,12,21,40,56,134,185,187,258,271,273,277,429,456,475,493]
for i in error_entries:
    try:
        entries = mpr.get_entries_in_chemsys(pba_entries[i].as_dict()['composition'].keys()) #access entries from MP
        entries.append(pba_entries[i])
        entries = mpc.process_entries(entries) #applying correction
        phase_d = PhaseDiagram(entries) #making phase diagram, which will allow calculation of e_above_hull
    
        #Putting composition, e_above_hull, and correction values into pba_df:
        pba_e_hull_df.loc[i, 'Composition'] = entries[-1].composition.formula
        pba_e_hull_df.loc[i, 'e_above_hull'] = phase_d.get_e_above_hull(entries[-1])
        pba_e_hull_df.loc[i, 'Correction'] = entries[-1].correction
    
        if len(list(entries[-1].composition.as_dict())) == 4:
            #This will be true when P and R are the same atom, in which case we need to index differently.
            pba_e_hull_df.loc[i, 'A_atom'] = list(entries[-1].composition.as_dict())[0]
            pba_e_hull_df.loc[i, 'P_atom'] = list(entries[-1].composition.as_dict())[1]
            pba_e_hull_df.loc[i, 'R_atom'] = list(entries[-1].composition.as_dict())[1]
        else:
            pba_e_hull_df.loc[i, 'A_atom'] = list(entries[-1].composition.as_dict())[0]
            pba_e_hull_df.loc[i, 'P_atom'] = list(entries[-1].composition.as_dict())[1]
            pba_e_hull_df.loc[i, 'R_atom'] = list(entries[-1].composition.as_dict())[2]
    except:
        print('Error on loop ' + str(i))

Error on loop 456


In [56]:
pba_e_hull_df.tail(100)

Unnamed: 0,Composition,e_above_hull,A_atom,P_atom,R_atom,Correction
435,K2 Mn4 Co4 C24 N24,0.284048,K,Mn,Co,0
436,Na2 Cr4 Fe4 C24 N24,0.290788,Na,Cr,Fe,0
437,Mg2 Cr4 In4 C24 N24,0.434479,Mg,Cr,In,0
438,Na2 Mn4 Os4 C24 N24,0.107546,Na,Mn,Os,0
439,K2 Cr4 Co4 C24 N24,0.303521,K,Cr,Co,0
440,Mg2 Mn4 Ni4 C24 N24,0.47905,Mg,Mn,Ni,0
441,Ca2 Fe4 Os4 C24 N24,0.20536,Ca,Fe,Os,0
442,K2 Fe4 Co4 C24 N24,0.220437,K,Fe,Co,0
443,Li2 Fe4 Co4 C24 N24,0.264315,Li,Fe,Co,0
444,K2 Cr8 C24 N24,0.223728,K,Cr,Cr,0


In [55]:
#Saving to file as pba_e_hull_df.csv
pba_e_hull_df.to_csv('pba_e_hull_df.csv')

In [59]:
pba_e_hull_df

Unnamed: 0,Composition,e_above_hull,A_atom,P_atom,R_atom,Correction
0,Ca4 Fe4 Co4 C24 N24,0.435563,Ca,Fe,Co,0
1,Mg4 Cr4 Os4 C24 N24,0.319625,Mg,Cr,Os,0
2,Ca4 Mn4 Fe4 C24 N24,0.38262,Ca,Mn,Fe,0
3,Ca4 Mn4 Os4 C24 N24,0.303886,Ca,Mn,Os,0
4,Li4 Cr8 C24 N24,0.320146,Li,Cr,Cr,0
5,Mg4 Cr4 Os4 C24 N24,0.325666,Mg,Cr,Os,0
6,Sr4 Cr4 Fe4 C24 N24,0.374969,Sr,Cr,Fe,0
7,Sr4 V4 Ni4 C24 N24,0.523641,Sr,V,Ni,0
8,Mg4 Mn4 V4 C24 N24,0.486088,Mg,Mn,V,0
9,Li4 Fe4 Co4 C24 N24,0.264808,Li,Fe,Co,0


In [57]:
pba_entries[456]

ComputedEntry None - C8 N12
Energy = -141.5829
Correction = 0.0000
Parameters:
nelect = 92.0
hubbards = {}
potcar_spec = [{'titel': 'PAW_PBE C 08Apr2002', 'hash': 'c0a8167dbb174fe492a3db7f5006c0f8'}, {'titel': 'PAW_PBE N 08Apr2002', 'hash': 'b98fd027ddebc67da4063ff2cabbc04b'}]
is_hubbard = False
Data:

In [58]:
data_1[457]

{'_id': '5949939ed95cbb63a6202332', 'input': {'structure': Structure Summary
  Lattice
      abc : 10.17854965954659 7.4124 6.97575
   angles : 90.0 90.20579928116425 90.0
   volume : 526.2993739872612
        A : 10.178484 0.0 -0.03656
        B : 0.0 7.4124 0.0
        C : 0.0 0.0 6.97575
  PeriodicSite: C (9.7825, 5.7943, 0.9184) [0.9611, 0.7817, 0.1367]
  PeriodicSite: C (0.3959, 1.6181, 6.0207) [0.0389, 0.2183, 0.8633]
  PeriodicSite: C (5.4852, 2.0881, 2.5146) [0.5389, 0.2817, 0.3633]
  PeriodicSite: C (4.6933, 5.3243, 4.4246) [0.4611, 0.7183, 0.6367]
  PeriodicSite: C (7.0282, 3.8300, 3.9760) [0.6905, 0.5167, 0.5736]
  PeriodicSite: C (3.1502, 3.5824, 2.9631) [0.3095, 0.4833, 0.4264]
  PeriodicSite: C (8.2395, 0.1238, 6.4327) [0.8095, 0.0167, 0.9264]
  PeriodicSite: C (1.9390, 7.2886, 0.5065) [0.1905, 0.9833, 0.0736]
  PeriodicSite: N (4.5956, 1.9146, 5.9324) [0.4515, 0.2583, 0.8528]
  PeriodicSite: N (5.5829, 5.4978, 1.0068) [0.5485, 0.7417, 0.1472]
  PeriodicSite: N (0.4937, 5