# Exploration of the ICSD database

Load the ICSD dataset as a `pandas.DataFrame`, which is saved when running the `icsd/download.py` and then `icsd/augment.py` script (see `README.md` for directions)

In [1]:
import os
import pandas as pd
import matplotlib.pyplot as plt 
ICSD_AUG_PKL = os.path.join("icsd","all_icsd_cifs_augmented.pkl")
icsdf = pd.read_pickle(ICSD_AUG_PKL)
icsdf

AttributeError: 'DataFrame' object has no attribute '_data'

## Available columns

ICSD only provides the `cif` files of each materials, more treatment was required to extract other columns. See the `icsd/augment.py` script to see how to obtain more information.

In [4]:
for column in list(icsdf.columns): 
    print(column)

id
cif
_database_code_ICSD
_chemical_formula_structural
_chemical_formula_sum
_cell_length_a
_cell_length_b
_cell_length_c
_cell_angle_alpha
_cell_angle_beta
_cell_angle_gamma
_cell_volume


## Stochiometric compounds

Looking for decimal points in formulae allows to find stochiometric compounds

In [5]:
def fraction_composition(s):
    return not ("." in s)

int_sum_icsdf = icsdf.loc[ icsdf['_chemical_formula_sum'].apply(lambda s: not (("." in s) or ("(" in s))) ]
int_struct_icsdf = icsdf.loc[ icsdf['_chemical_formula_structural'].apply(lambda s: not (("." in s) or ("(" in s))) ]

print(f"{len(int_sum_icsdf)}/{len(icsdf)} materials from ICSD have no '.' in their sum formula")
print(f"{len(int_struct_icsdf)}/{len(icsdf)} materials from OQMD have no '.' in their structural formula")

149798/218839 materials from ICSD have no '.' in their sum formula
94915/218839 materials from OQMD have no '.' in their structural formula


## Look for a substring in the cif file

here we attempt to find the magnetic compounds by searching `magneti` string in all the dataset. We find `magnetischen` in the titles of papers and journals.

In [30]:
substr = "magneti"
icsdf_with_substr = icsdf.loc[icsdf['cif'].apply(lambda cif: substr in cif)]
nb_substr = len(icsdf_with_substr)
print(f"{nb_substr} contains the string '{substr}'")
print("found for example:")
for i in range(6):
    formula = icsdf_with_substr['_chemical_formula_sum'].iloc[i]
    for line in icsdf_with_substr['cif'].iloc[i].split('\n'):
        if substr in line:
            print(f"{formula}:\t {line}")

20301 contains the string 'magneti'
found for example:
Pd1:	 Das Zustandsdiagramm Lithium-Palladium und die magnetischen Eigenschaften der
Au1:	 Energetics and the magnetic state of Mn2 adsorbed on Au(111): Dimer bond
Co1:	 energetic calculations to investigate the hard magnetic phase
Pr1:	 energetic calculations to investigate the hard magnetic phase
W1:	 'Enhancement of the spin transfer torque efficiency in magnetic STM junctions'
Mo1:	 Ab initio study of energetics and magnetism of sigma phase in Co-Mo and Fe-Mo
