# Benchmark data

We perform a parsing of the Supplementary Information (SI) of the paper of Head-Gordon (HG).
We build a dictionary that contains information on all the molecules of the HG dataset. Selected information are the type (spin-polarized or non spin-polarized) of the computation for each molecule. Moreover we also extract the values of the statical polarizability obtained for the lda (actually two types: Slater and SPW92), pbe and pbe0. We also collect the results obtained with the reference method of the paper of HG, i.e. the CCSD(T).

In [1]:
from pyexcel_ods import get_data
import json

In [2]:
data = get_data('HG_data.ods')
data =  json.dumps(data)
data = json.loads(data)['Sheet1']

Data are represented as follows

In [3]:
print data[0]

[u'molecule', u'spin_pol', u'lda-Slater', u'lda-SPW92', u'pbe', u'pbe0', u'CCSD(T)']


We build a dictionary that codifies all the relevant information. For each molecule we also parse the related file in the geometries folder and extract the spin polarization value that gives the polarization of the ground state of the molecule. 

In [4]:
dataset = {}
for rows in data[1:]:
    molecule=str(rows[0])
    datamol={}
    #get spin polarization indication
    datamol[str(data[0][1])]=str(rows[1]).lower()
    #get reference data
    ida=2
    for idata in range(2,7):
        datamol[str(data[0][idata])] = rows[ida:ida+3]
        ida+=3
    #get spin polarization value
    f=open('geometries/'+molecule+'.xyz')
    lines=f.readlines()
    datamol['mpol_ref']=lines[1].rstrip('/\n').split(' ')[1]
    dataset[molecule]=datamol
    

For instance

In [5]:
dataset['H2O']

{'CCSD(T)': [1.362, 1.46, 1.406],
 'lda-SPW92': [1.58, 1.572, 1.572],
 'lda-Slater': [1.771, 1.669, 1.703],
 'mpol_ref': '1',
 'pbe': [1.579, 1.572, 1.564],
 'pbe0': [1.412, 1.485, 1.444],
 'spin_pol': 'nsp'}

In [6]:
molecule = dataset.keys()
molecule.sort()
#molecule

Add the key with the values of the field intensity

In [7]:
data = get_data('HG_field_intensity.ods')
data =  json.dumps(data)
data = json.loads(data)['Sheet1']

In [8]:
for rows in data[1:-3]:
    dataset[str(rows[0])]['field_int']=rows[1]

In [9]:
dataset['CO']

{'CCSD(T)': [1.753, 1.753, 2.283],
 'field_int': 0.01,
 'lda-SPW92': [1.872, 1.872, 2.358],
 'lda-Slater': [1.993, 1.993, 2.49],
 'mpol_ref': '1',
 'pbe': [1.856, 1.856, 2.363],
 'pbe0': [1.778, 1.778, 2.274],
 'spin_pol': 'nsp'}

Save the dataset as yaml file

In [10]:
import yaml

In [52]:
with open('hg_data.yaml', 'w') as outfile:
    yaml.dump(dataset, outfile, default_flow_style=False)

## Setup data for the supplementary.tex

In [11]:
import scipy
from tabulate import tabulate

ImportError: No module named tabulate

In [53]:
dataset

{'AlF': {'CCSD(T)': [5.971, 5.971, 5.132],
  'field_int': 0.01,
  'lda-SPW92': [6.154, 6.154, 5.59],
  'lda-Slater': [6.648, 6.648, 6.092],
  'mpol_ref': '1',
  'pbe': [6.292, 6.292, 5.656],
  'pbe0': [6.248, 6.248, 5.378],
  'spin_pol': 'nsp'},
 'Ar': {'CCSD(T)': [1.634, 1.634, 1.634],
  'field_int': 0.01,
  'lda-SPW92': [1.779, 1.779, 1.779],
  'lda-Slater': [1.889, 1.889, 1.889],
  'mpol_ref': '1',
  'pbe': [1.769, 1.769, 1.769],
  'pbe0': [1.693, 1.693, 1.693],
  'spin_pol': 'nsp'},
 'BF': {'CCSD(T)': [2.95, 2.95, 2.686],
  'field_int': 0.01,
  'lda-SPW92': [3.315, 3.315, 2.991],
  'lda-Slater': [3.603, 3.603, 3.27],
  'mpol_ref': '1',
  'pbe': [3.271, 3.271, 2.974],
  'pbe0': [3.159, 3.159, 2.78],
  'spin_pol': 'nsp'},
 'BH2': {'CCSD(T)': [2.832, 3.019, 2.914],
  'field_int': 0.01,
  'lda-SPW92': [3.172, 3.648, 3.239],
  'lda-Slater': [3.374, 4.144, 3.476],
  'mpol_ref': '2',
  'pbe': [3.094, 3.545, 3.177],
  'pbe0': [2.977, 3.271, 3.037],
  'spin_pol': 'sp'},
 'BH2Cl': {'CCSD(T)'