# Exploration of the ICSD database

Load the ICSD dataset as a `pandas.DataFrame`, which is saved when running the `icsd/download.py` and then `icsd/augment.py` script (see `README.md` for directions)

In [2]:
import os
import pandas as pd
import matplotlib.pyplot as plt 
ICSD_AUG_PKL = os.path.join("icsd","all_icsd_cifs_augmented.pkl")
icsdf = pd.read_pickle(ICSD_AUG_PKL)
icsdf.head()

Unnamed: 0,id,cif,_database_code_ICSD,_chemical_formula_structural,_chemical_formula_sum,_cell_length_a,_cell_length_b,_cell_length_c,_cell_angle_alpha,_cell_angle_beta,_cell_angle_gamma,_cell_volume
0,73729,data_9745-ICSD\n_database_code_ICSD 9745\n_aud...,9745,Ce,Ce1,3.0940,6.0070,5.2460,90.000,90.000,90.000,97.50
1,966671,data_677067-ICSD\n_database_code_ICSD 677067\n...,677067,B28,B28,5.0400,5.6100,6.9200,90.000,90.000,90.000,195.66
2,73731,data_9785-ICSD\n_database_code_ICSD 9785\n_aud...,9785,Kr,Kr1,4.0000,4.0000,6.5300,90.000,90.000,120.000,90.48
3,73732,data_9786-ICSD\n_database_code_ICSD 9786\n_aud...,9786,Xe,Xe1,4.3400,4.3400,7.0900,90.000,90.000,120.000,115.65
4,45072,data_86375-ICSD\n_database_code_ICSD 86375\n_a...,86375,Se6,Se6,11.4000,11.4000,4.4700,90.000,90.000,120.000,503.09
...,...,...,...,...,...,...,...,...,...,...,...,...
218834,13648,data_27815-ICSD\n_database_code_ICSD 27815\n_a...,27815,(Na2.35Ca11.01Sr0.14Mn0.032Mg0.11Ce1.36La0.27N...,H0.41Ca11.01Ce1.36F6.34Fe0.75La0.27Mg0.11Mn0.0...,10.4300,10.4300,10.4300,90.000,90.000,90.000,1134.63
218835,407812,data_155061-ICSD\n_database_code_ICSD 155061\n...,155061,((H3O)11.61Na3.0K0.2Ba0.033Sr0.63Ce0.22Y0.05)(...,H42.44Al0.07Ba0.033Ca4.65Ce0.22Cl1.1Fe0.66Hf0....,14.1557,14.1557,30.4880,90.000,90.000,120.000,5290.81
218836,412660,data_159045-ICSD\n_database_code_ICSD 159045\n...,159045,(Ba1.47K0.53Ca0.31Ce0.17Nd0.1Na0.06La0.02)(Mg0...,H24Al0.03Ba2.19Ca0.31Ce0.17Fe0.23K0.53La0.02Mg...,13.0170,13.0170,13.0170,90.000,90.000,90.000,2205.63
218837,415347,data_161277-ICSD\n_database_code_ICSD 161277\n...,161277,(Li1.14K0.75Cs0.09Na0.02)(Na0.78Ca0.22)(Fe5.64...,H4Al0.15Ca0.29Cs0.09F1Fe5.64K0.75Li1.14Mg0.04M...,5.3745,11.6509,11.9242,64.425,77.038,85.476,656.21


## Available columns

ICSD only provides the `cif` files of each materials, more treatment was required to extract other columns. See the `icsd/augment.py` script to see how to obtain more information.

In [4]:
for column in list(icsdf.columns): 
    print(column)

id
cif
_database_code_ICSD
_chemical_formula_structural
_chemical_formula_sum
_cell_length_a
_cell_length_b
_cell_length_c
_cell_angle_alpha
_cell_angle_beta
_cell_angle_gamma
_cell_volume


## Stochiometric compounds

Looking for decimal points in formulae allows to find stochiometric compounds

In [5]:
def fraction_composition(s):
    return not ("." in s)

int_sum_icsdf = icsdf.loc[ icsdf['_chemical_formula_sum'].apply(lambda s: not (("." in s) or ("(" in s))) ]
int_struct_icsdf = icsdf.loc[ icsdf['_chemical_formula_structural'].apply(lambda s: not (("." in s) or ("(" in s))) ]

print(f"{len(int_sum_icsdf)}/{len(icsdf)} materials from ICSD have no '.' in their sum formula")
print(f"{len(int_struct_icsdf)}/{len(icsdf)} materials from OQMD have no '.' in their structural formula")

149798/218839 materials from ICSD have no '.' in their sum formula
94915/218839 materials from OQMD have no '.' in their structural formula


## Look for a substring in the cif file

here we attempt to find the magnetic compounds by searching `magneti` string in all the dataset. We find `magnetischen` in the titles of papers and journals.

In [30]:
substr = "magneti"
icsdf_with_substr = icsdf.loc[icsdf['cif'].apply(lambda cif: substr in cif)]
nb_substr = len(icsdf_with_substr)
print(f"{nb_substr} contains the string '{substr}'")
print("found for example:")
for i in range(6):
    formula = icsdf_with_substr['_chemical_formula_sum'].iloc[i]
    for line in icsdf_with_substr['cif'].iloc[i].split('\n'):
        if substr in line:
            print(f"{formula}:\t {line}")

20301 contains the string 'magneti'
found for example:
Pd1:	 Das Zustandsdiagramm Lithium-Palladium und die magnetischen Eigenschaften der
Au1:	 Energetics and the magnetic state of Mn2 adsorbed on Au(111): Dimer bond
Co1:	 energetic calculations to investigate the hard magnetic phase
Pr1:	 energetic calculations to investigate the hard magnetic phase
W1:	 'Enhancement of the spin transfer torque efficiency in magnetic STM junctions'
Mo1:	 Ab initio study of energetics and magnetism of sigma phase in Co-Mo and Fe-Mo
