# Benchmark data

We perform a parsing of the Supplementary Information (SI) of the paper of Head-Gordon (HG).
We build a dictionary that contains information on all the molecules of the HG dataset. Selected information are the type (spin-polarized or non spin-polarized) of the computation for each molecule. Moreover we also extract the values of the statical polarizability obtained for the lda (actually two types: Slater and SPW92), pbe and pbe0. We also collect the results obtained with the reference method of the paper of HG, i.e. the CCSD(T).

In [3]:
from pyexcel_ods import get_data
import json

In [4]:
data = get_data('HG_data.ods')
data =  json.dumps(data)
data = json.loads(data)['Sheet1']

The molecule SO-trip has been renamed as SO, in both the .ods file and in the geometries folder. The posinp with the name SO.xyz has been added in the molecules database of BigDFT

Data are represented as follows

In [5]:
print data[0]

[u'molecule', u'spin_pol', u'lda-Slater', u'lda-SPW92', u'pbe', u'pbe0', u'CCSD(T)']


We build a dictionary that codifies all the relevant information. For each molecule we also parse the related file in the geometries folder and extract the spin polarization value that gives the polarization of the ground state of the molecule. 

In [6]:
dataset = {}
for rows in data[1:]:
    molecule=str(rows[0])
    datamol={}
    #get spin polarization indication
    datamol[str(data[0][1])]=str(rows[1]).lower()
    #get reference data
    ida=2
    for idata in range(2,7):
        datamol[str(data[0][idata])] = rows[ida:ida+3]
        ida+=3
    #get spin polarization value
    f=open('geometries/'+molecule+'.xyz')
    lines=f.readlines()
    datamol['mpol_ref']=lines[1].rstrip('/\n').split(' ')[1]
    dataset[molecule]=datamol
    

For instance

In [8]:
dataset['O2']

{'CCSD(T)': [1.21, 1.21, 2.263],
 'lda-SPW92': [1.287, 1.287, 2.294],
 'lda-Slater': [1.377, 1.377, 2.439],
 'mpol_ref': '3',
 'pbe': [1.292, 1.292, 2.322],
 'pbe0': [1.224, 1.224, 2.313],
 'spin_pol': 'sp'}

In [9]:
dataset['AlF']

{'CCSD(T)': [5.971, 5.971, 5.132],
 'lda-SPW92': [6.154, 6.154, 5.59],
 'lda-Slater': [6.648, 6.648, 6.092],
 'mpol_ref': '1',
 'pbe': [6.292, 6.292, 5.656],
 'pbe0': [6.248, 6.248, 5.378],
 'spin_pol': 'nsp'}

Save the dataset as yaml file

In [10]:
import yaml

In [11]:
with open('hg_data.yaml', 'w') as outfile:
    yaml.dump(dataset, outfile, default_flow_style=False)