# Match refs to crystal_prototypes

This Notebook builds a spreadsheet of identified matches between the downloaded reference structures and the listed crystal_prototypes.

**NOTE**: The matches are not guaranteed to be completely accurate and some may be missing, incorrect.

**Library imports**

In [1]:
# Standard Python libraries
from __future__ import (absolute_import, print_function,
                        division, unicode_literals)
import os
import glob

# http://www.numpy.org/
import numpy as np

# https://pandas.pydata.org/
import pandas as pd

# https://github.com/usnistgov/DataModelDict
from DataModelDict import DataModelDict as DM

# https://github.com/usnistgov/atomman
import atomman as am
import atomman.unitconvert as uc

# https://github.com/usnistgov/iprPy
import iprPy
import iprPy.highthroughput as htp
print('iprPy version', iprPy.__version__)

iprPy version 0.8.a


## 1. Access information

**Load database**

In [2]:
database = htp.get_database('master')

**Get calculation_crystal_space_group results**

In [3]:
crystal_space_group_df = database.get_records_df(style='calculation_crystal_space_group', full=True, flat=True)

## 2. Match prototypes to references

**Parse out results for prototypes and reference structures**

In [4]:
prototype_df = crystal_space_group_df[crystal_space_group_df.family+'.xml'==crystal_space_group_df.load_file]
reference_df = crystal_space_group_df[crystal_space_group_df.family+'.poscar'==crystal_space_group_df.load_file]

**Sort by space group and pearson (natoms/cell)**

In [5]:
match_df = []
for reference in reference_df.itertuples():
    match_dict = {}
    match_dict['reference'] = reference.family
    match_dict['site'], match_dict['number'] = reference.family.split('-')
    match_dict['number'] = int(match_dict['number'])
    matches = prototype_df[(
                            (reference.pearson_symbol == prototype_df.pearson_symbol)
                           &(reference.spacegroup_number == prototype_df.spacegroup_number)
                          # &(reference.wykoff_letters == prototype_df.wykoff_letters)
                          )]
    if len(matches) == 1:
        match_dict['prototype'] = matches.iloc[0].family
        match_dict['ref_wykoff'] = reference.wykoff_letters
    elif len(matches) == 0:
        match_dict['prototype'] = np.nan
    else:
        match_dict['prototype'] = 'multiple'
    match_df.append(match_dict)
match_df = pd.DataFrame(match_df)

**Check that wykoff positions are symmetrically equivalent**

In [6]:
match_df.loc[(match_df.prototype=='A1--Cu--fcc') & (~match_df.ref_wykoff.isin(['a', 'b'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A2--W--bcc') & (~match_df.ref_wykoff.isin(['a'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A3--Mg--hcp') & (~match_df.ref_wykoff.isin(['b', 'c', 'd'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=="A3'--alpha-La--double-hcp") & (~match_df.ref_wykoff.isin(['a b', 'a c', 'a d'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A4--C--dc') & (~match_df.ref_wykoff.isin(['a', 'b'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A5--beta-Sn') & (~match_df.ref_wykoff.isin(['a', 'b'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A6--In--bct') & (~match_df.ref_wykoff.isin(['a', 'b'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A7--alpha-As') & (~match_df.ref_wykoff.isin(['c'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='Ah--alpha-Po--sc') & (~match_df.ref_wykoff.isin(['a', 'b'])),
             'prototype'] = np.nan
match_df.loc[(match_df.prototype=='A15--beta-W') & (~match_df.ref_wykoff.isin(['a c', 'a d'])),
             'prototype'] = np.nan

## 3. Simplify and save

In [7]:
match_df = match_df.sort_values(['site', 'number']).reset_index()[['reference', 'prototype']]

In [8]:
match_df.to_csv('reference_prototype_match.csv', index=False)