# Checking A,P,R Atoms

In this notebook, I am checking that the identities of the A, P, and R atoms calculated in Notebook 01 with CrystalNN match the actual identities of the atoms from the main database. We will be comparing the dataframe pba_e_hull_df, from Notebook 01, with the json file pba_w_APR, which contains all of the data directly from the main database.

## Importing and Cleaning pba_w_APR

pba_w_APR is a json file. Let's first look to see if it's in the proper format to import:

In [1]:
with open('pba_w_APR.json', 'r') as file :
  pba_json = file.read()

In [2]:
print(pba_json[:1000])

[{"R": "Co", "P": "Fe", "A": "Ca", "n": 4, "input": {"structure": {"@module": "pymatgen.core.structure", "@class": "Structure", "lattice": {"matrix": [[9.95090252, -0.0003358, -0.0003358], [-0.0003358, 9.95090252, 0.0003358], [-0.0003358, 0.0003358, 9.95090252]], "a": 9.9509025313318, "b": 9.9509025313318, "c": 9.9509025313318, "alpha": 89.99613296435679, "beta": 90.00386703564321, "gamma": 90.00386703564321, "volume": 985.3429511575596}, "sites": [{"species": [{"element": "Ca", "occu": 1}], "abc": [0.75135993, 0.75127745, 0.75127745], "xyz": [7.476204862928603, 7.47588864272739, 7.47588864272739], "label": "Ca"}, {"species": [{"element": "Ca", "occu": 1}], "abc": [0.24872255, 0.24864007, 0.75127745], "xyz": [2.47467807727261, 2.474361857071396, 7.47588864272739], "label": "Ca"}, {"species": [{"element": "Ca", "occu": 1}], "abc": [0.24872255, 0.75127745, 0.24864007], "xyz": [2.47467807727261, 7.47588864272739, 2.474361857071396], "label": "Ca"}, {"species": [{"element": "Ca", "occu": 1

This looks good. We'll import the data using the loadfn method, which puts the data into a list of python dictionaries. This is the same method used in Notebook 01.

In [3]:
from monty.serialization import loadfn

In [4]:
data_1 = loadfn('pba_w_APR.json')

In [5]:
data_1[0:2]

[{'R': 'Co',
  'P': 'Fe',
  'A': 'Ca',
  'n': 4,
  'input': {'structure': Structure Summary
   Lattice
       abc : 9.9509025313318 9.9509025313318 9.9509025313318
    angles : 89.99613296435679 90.00386703564321 90.00386703564321
    volume : 985.3429511575596
         A : 9.95090252 -0.0003358 -0.0003358
         B : -0.0003358 9.95090252 0.0003358
         C : -0.0003358 0.0003358 9.95090252
   PeriodicSite: Ca (7.4762, 7.4759, 7.4759) [0.7514, 0.7513, 0.7513]
   PeriodicSite: Ca (2.4747, 2.4744, 7.4759) [0.2487, 0.2486, 0.7513]
   PeriodicSite: Ca (2.4747, 7.4759, 2.4744) [0.2487, 0.7513, 0.2486]
   PeriodicSite: Ca (2.4705, 7.4801, 7.4801) [0.2483, 0.7517, 0.7517]
   PeriodicSite: Fe (0.0067, 9.9439, 9.9439) [0.0007, 0.9993, 0.9993]
   PeriodicSite: Fe (4.9721, 4.9785, 9.9466) [0.4997, 0.5003, 0.9996]
   PeriodicSite: Fe (0.0039, 4.9785, 4.9785) [0.0004, 0.5003, 0.5003]
   PeriodicSite: Fe (4.9721, 9.9466, 4.9785) [0.4997, 0.9996, 0.5003]
   PeriodicSite: Co (4.9696, 9.9460, 9.946

Let's loop through data_1 and parse the composition and atom identities into a pandas dataframe, which we'll then use to compare with the previously created dataframe.

In [72]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [34]:
pba_APR_df = pd.DataFrame(index=range(len(data_1)),columns=['A_atom','P_atom','R_atom','n_A'])
for i in range(len(data_1)):
    try:
        pba_APR_df.loc[i, 'P_atom'] = data_1[i]['P']
        pba_APR_df.loc[i, 'R_atom'] = data_1[i]['R']
        if 'A' in data_1[i]:
            pba_APR_df.loc[i, 'A_atom'] = data_1[i]['A']
        if 'n' in data_1[i]:
            pba_APR_df.loc[i, 'n_A'] = data_1[1]['n']
    except:
        print(i)


3755
3756
3757
3758
3759
3760
3761
3762
3763
3764
3765
3766
3767
3768
3769
3770
3771
3772
3773
3774
3775
3776
3777
3778
3779
3780
3781
3782


In [36]:
pba_APR_df.head()

Unnamed: 0,A_atom,P_atom,R_atom,n_A
0,Ca,Fe,Co,4
1,Mg,Cr,Os,4
2,Ca,Fe,Mn,4
3,Ca,Mn,Os,4
4,Li,Cr,Cr,4


In [37]:
len(pba_APR_df)

3783

As we can see, this data set is much longer than the dataset originally analyzed in Notebook 01. For the purposes of checking the A, P, and R atoms from Notebook 01, we won't worry too much about this right now.

The method that we'll use is loop through the pba_e_hull_df, and for each structure we'll check that there is a corresponding structure in the pba_APR_df with the same A, P, and R atoms.

If our CrystalNN method misclassified any of the atom identities in Notebook 01, then it likely will not match any of the entries in the pba_APR_df, so we'll know there was an error.

In [88]:
#Importing pba_e_hull_df:
pba_e_hull_df = pd.read_csv('pba_e_hull_df.csv')
pba_e_hull_df.drop('Unnamed: 0', axis = 1, inplace = True)

In [None]:
pba_e_hull_df.head()

In [91]:
list_of_errors = [] #This will be a list of the structures in the original df that do not correspond to any 
# of the structures in pba_APR_df.
for i in range(len(pba_e_hull_df)):
    A_atom = pba_e_hull_df.iloc[i]['A_atom']
    P_atom = pba_e_hull_df.iloc[i]['P_atom']
    R_atom = pba_e_hull_df.iloc[i]['R_atom']
    A_entries = pba_APR_df[pba_APR_df['A_atom'] == A_atom]
    AP_entries = A_entries[A_entries['P_atom'] == P_atom]
    APR_entries = P_entries[P_entries['R_atom'] == R_atom]
    if len(APR_entries) == 0:
        list_of_hebys.append(i)
print(list_of_errors)

[]


As we can see, all of the structures from the Notebook 01 dataframe, pba_e_hull_df, have corresponding structures in this new dataframe