# Reading CIF files

## Introduction
In this example we will see how to load a CIF file and access information stored inside it. For brevity we will use $B_2Mg$ CIF file available at http://crystallography-online.com/structure/1000026. Let us assume that the file is saved as `B2Mg.cif` in current directory (otherwise change the below variable to point to the right path).

In [1]:
B2MG_CIF_FILE = 'B2Mg.cif'

Here are the raw contents of the he file for easy reference:
```text
#------------------------------------------------------------------------------
#$Date: 2016-02-14 15:26:36 +0100 (Sun, 14 Feb 2016) $
#$Revision: 176435 $
#$URL: svn://www.crystallography.net/cod/cif/1/00/00/1000026.cif $
#------------------------------------------------------------------------------
#
# This file is available in the Crystallography Open Database (COD),
# http://www.crystallography.net/
#
# All data on this site have been placed in the public domain by the
# contributors.
#
data_1000026
_journal_coden_ASTM              JAPUAW
_journal_name_full               'J. Appl. Chem. USSR, engl. trans.'
_journal_page_first              970
_journal_page_last               974
_journal_volume                  44
_journal_year                    1971
_chemical_formula_sum            'B2 Mg'
_space_group_IT_number           191
_symmetry_cell_setting           hexagonal
_symmetry_Int_Tables_number      191
_symmetry_space_group_name_Hall  '-P 6 2'
_symmetry_space_group_name_H-M   'P 6/m m m'
_audit_creation_date             2002-02-11
_cell_angle_alpha                90
_cell_angle_beta                 90
_cell_angle_gamma                120.
_cell_formula_units_Z            1
_cell_length_a                   3.085
_cell_length_b                   3.085
_cell_length_c                   3.523
_cell_volume                     29.04
_cod_original_formula_sum        'Mg B2'
_cod_database_code               1000026
loop_
_symmetry_equiv_pos_as_xyz
x,y,z
-y,x-y,z
-x+y,-x,z
-x,-y,z
y,-x+y,z
x-y,x,z
y,x,-z
x-y,-y,-z
-x,-x+y,-z
-y,-x,-z
-x+y,y,-z
x,x-y,-z
-x,-y,-z
y,-x+y,-z
x-y,x,-z
x,y,-z
-y,x-y,-z
-x+y,-x,-z
-y,-x,z
-x+y,y,z
x,x-y,z
y,x,z
x-y,-y,z
-x,-x+y,z
loop_
_atom_site_label
_atom_site_symmetry_multiplicity
_atom_site_Wyckoff_symbol
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
_atom_site_type_symbol
Mg 1 a 0 0 0 1 Mg
B 2 d 0.3333 0.6667 0.5000 1 B
```

## Importing necessary stuff
The only thing that we need to import in order to read our CIF file is the `kristal.io.read_cif` function.

In [2]:
from kristal.io import read_cif

## Reading file and accessing data blocks
The basic invocation of `read_cif` requires only a single parameter - a path of the CIF file.

In [3]:
b2mg_data = read_cif(B2MG_CIF_FILE)

The returned value is a dictionary keyed with data block names. Let's verify that the only data block present in the file was read by Kristal. 

In [4]:
print(b2mg_data.keys())

dict_keys(['1000026'])


## Accessing data in the data blocks
Values of the `b2mg_data` dict are instances of `DataBlock` namedtuple. They have the following attributes:

- `name`: name of the data block (this is the same as the key corresponding to the data block)
- `loops`: a list of loops found in the file. Contents of each loop are stored inside `pandas.DataFrame`
- `entries`: other entries from the CIF file, storead as `pandas.Series`. Most of the time you can use this attribute like a dictionary.

Let's access some data.

In [5]:
datablock = b2mg_data['1000026']
datablock.name

'1000026'

### Accessing entries
Let us first see how one can access data entries. As already mentioned, they can be accessed in a dictionary-like fashion. For example, the below code displays all cell angles.

In [6]:
for key in ['cell_angle_alpha', 'cell_angle_beta', 'cell_angle_gamma']:
    print(datablock.entries[key])

90
90
120.0


One can easily access names of all available entries using `datablock.entries.index` attribute.

In [7]:
datablock.entries.index

Index(['journal_coden_ASTM', 'journal_name_full', 'journal_page_first',
       'journal_page_last', 'journal_volume', 'journal_year',
       'chemical_formula_sum', 'space_group_IT_number',
       'symmetry_cell_setting', 'symmetry_Int_Tables_number',
       'symmetry_space_group_name_Hall', 'symmetry_space_group_name_H-M',
       'audit_creation_date', 'cell_angle_alpha', 'cell_angle_beta',
       'cell_angle_gamma', 'cell_formula_units_Z', 'cell_length_a',
       'cell_length_b', 'cell_length_c', 'cell_volume',
       'cod_original_formula_sum', 'cod_database_code'],
      dtype='object')

Note that Kristal does its best to detect type of items stored in CIF file and convert them accordingly. Let's print cell angles again, this times showing the corresponding type.

In [8]:
for key in ['cell_angle_alpha', 'cell_angle_beta', 'cell_angle_gamma']:
    angle = datablock.entries[key]
    print('{0} {1} {2}'.format(key, angle, type(angle)))

cell_angle_alpha 90 <class 'int'>
cell_angle_beta 90 <class 'int'>
cell_angle_gamma 120.0 <class 'float'>


As wee see Kristal tries to match as specific type as possible. In the above example $\alpha$ and $\beta$ angles were read as integers since they consist only of digits. On the other hand $\gamma$ angle was read as a floating point number since it contained decimal point.

Remember that the entries are stored inside a `pandas.Series` instance. This allows us to use all of the nice features pandas has to offer, like subsetting. For example, we can easily access all entries containing information relevant to unit cell geometry in the following way:

In [9]:
datablock.entries[datablock.entries.index.str.startswith('cell')]

cell_angle_alpha           90
cell_angle_beta            90
cell_angle_gamma          120
cell_formula_units_Z        1
cell_length_a           3.085
cell_length_b           3.085
cell_length_c           3.523
cell_volume             29.04
dtype: object

### Accessing loops
Accessing loops is as simple as accessing data entries. Let's see content of the second loop.

In [10]:
datablock.loops[1]

Unnamed: 0,atom_site_label,atom_site_symmetry_multiplicity,atom_site_Wyckoff_symbol,atom_site_fract_x,atom_site_fract_y,atom_site_fract_z,atom_site_occupancy,atom_site_type_symbol
0,Mg,1,a,0.0,0.0,0.0,1,Mg
1,B,2,d,0.3333,0.6667,0.5,1,B


As with `entries`, we can use all of the pandas magic. For instance, choosing only interesting columns is easy. Let's display atom labels along with their position in fractional coordinates.

In [11]:
datablock.loops[1][['atom_site_label', 'atom_site_fract_x', 'atom_site_fract_y', 'atom_site_fract_z']]

Unnamed: 0,atom_site_label,atom_site_fract_x,atom_site_fract_y,atom_site_fract_z
0,Mg,0.0,0.0,0.0
1,B,0.3333,0.6667,0.5


Or, maybe you are interested in data corresponding to particular element?

In [12]:
datablock.loops[1].query('atom_site_label=="B"')

Unnamed: 0,atom_site_label,atom_site_symmetry_multiplicity,atom_site_Wyckoff_symbol,atom_site_fract_x,atom_site_fract_y,atom_site_fract_z,atom_site_occupancy,atom_site_type_symbol
1,B,2,d,0.3333,0.6667,0.5,1,B


## Acknowledgements
For parsing CIF files Kristal uses [Lark](https://github.com/lark-parser/lark), an excellent library for parsing arbitrary context free grammars.