In [1]:
import yaml
import os, sys
sys.path.insert(0,'../')
import Routines as R

# Construction of the dataset

The aim of this notebook is to build a dictionary that collect all the information needed to compute the statical polarizability (for all the chose psp's) for the HG dataset.

To achieve this task we split the hg dataset into two parts according the sp or nsp nature of the computation. Then we associate to each molecule a set of study (i.e. a choice of xc and psp). The study is processed with the single study workflow previously defined. 

In [9]:
hg_dataset = yaml.load(open('../HG Dataset/hg_data.yaml'))
#hg_dataset

We choose to treat the $H$ 'molecule' within the spin-polarizaed framework, so we set its 'spi_pol' field to 'sp

In [10]:
hg_dataset['H']['spin_pol'] = 'sp'

In [11]:
hg_dataset['H']

{'CCSD(T)': [0.667, 0.667, 0.667],
 'lda-SPW92': [0.902, 0.902, 0.902],
 'lda-Slater': [0.958, 0.958, 0.958],
 'pbe': [0.829, 0.829, 0.829],
 'pbe0': [0.775, 0.775, 0.775],
 'spin_pol': 'sp'}

## Construction of the nsp_dataset

We build the nsp_dataset that contain all the molecules associated to a spin_pol = nsp.

This dictionary contains also the reference results extracted from the hg_dataset and a key 'study' that specifies
all the couples xc,psp associated to each molecule of the dataset.

In [12]:
nsp_dataset = {}
for mol,data in hg_dataset.iteritems():
    if data['spin_pol'] == 'nsp':
        ref_results = {'lda_pw' : data['lda-SPW92'], 'pbe' : data['pbe'], 'pbe0' : data['pbe0']}
        nsp_dataset[mol] = {'ref_results' : ref_results}

In [13]:
print 'number of nsp molecules = ', len(nsp_dataset.keys())

number of nsp molecules =  74


In [14]:
# for instance
nsp_dataset['CO']

{'ref_results': {'lda_pw': [1.872, 1.872, 2.358],
  'pbe': [1.856, 1.856, 2.363],
  'pbe0': [1.778, 1.778, 2.274]}}

Now we add the key study to the dataset. The possible couple of (xc,psp) are given by:

* (lda_pt,hgh_k) : lda_pt stands for lda of Pade Teller implementation. This choice is realized by setting ixc = 1 in        the input file.

* (lda_pw,hgh_k) : that correspond to the lda xc of Perdev and Wang (1992). This choice is realized by setting ixc = -001012 in the input file.
  
* (pbe,hgh_k) : this choice is realized by setting ixc = 11 in the input file.

* (pbe,nlcc_aw) : that corresponds to the pbe xc with the non linear core correction psp of Alex Willand. This choice is realized by setting ixc = 11 and by adding the appropriate psp's in the folder of the study.

* (pbe,nlcc_ss) : that corresponds to the pbe xc with the non linear core correction psp of S. Saha. This choice is realized by setting ixc = 11 and by adding the appropriate psp's in the folder of the study.


* (pbe0,hgh_k)  : that corresponds to the hybrid pbe0 xc with the hgh_k psp. This choice is realized by setting ixc = 'PBE0' in the input. Moreover, since there is no default choice in BigDFT associated to this functional the appropriate psp has to be provided in the input file as follows:
  
  inp['psppar.atom']={'Pseudopotential XC': 11} 

The nlcc_aw and nlcc_ss psp's are not given for all the atoms of the dataset. These are the list of the atoms for which these psp's are available (see  http://bigdft.org/Wiki/index.php?title=NLCC_PBE_psppar for nlcc_aw and http://bigdft.org/Wiki/index.php?title=New_Soft-Accurate_NLCC_pseudopotentials for the nlcc_ss one)

In [15]:
nlcc_aw = ['Al','B','C','Cl','F','H','Mg','N','O','P','S','Si']
nlcc_ss = ['Al','B','Be','C','Ca','Cl','F','H','He','K','Li','Mg','N','Ne','O','P','Si']
# plus other atoms that are not present in the  molecules of the dataset

Associate all the possible studies to each molecule of the dataset. The studies that can be performed for all the molecules are given by

In [16]:
studies = [('lda_pt','hgh_k'),('lda_pw','hgh_k'),('pbe','hgh_k'),('pbe0','hgh_k')]

In [17]:
for mol in nsp_dataset:
    nsp_dataset[mol]['study'] = studies

The studies with the nlcc psp's are possible only if the atoms of the molecule belong to the list nlcc_aw or nlcc_ss

In [18]:
for mol in nsp_dataset:
    if R.molecule_inlist(mol,nlcc_aw):
        nsp_dataset[mol]['study'] = nsp_dataset[mol]['study'] + [('pbe','nlcc_aw')]
    if R.molecule_inlist(mol,nlcc_ss):
        nsp_dataset[mol]['study'] = nsp_dataset[mol]['study'] + [('pbe','nlcc_ss')]

In [21]:
nsp_dataset['CO']

{'ref_results': {'lda_pw': [1.872, 1.872, 2.358],
  'pbe': [1.856, 1.856, 2.363],
  'pbe0': [1.778, 1.778, 2.274]},
 'study': [('lda_pt', 'hgh_k'),
  ('lda_pw', 'hgh_k'),
  ('pbe', 'hgh_k'),
  ('pbe0', 'hgh_k'),
  ('pbe', 'nlcc_aw'),
  ('pbe', 'nlcc_ss')]}

Save the dataset as yaml file

In [22]:
import yaml
with open('nsp_dataset.yaml', 'w') as outfile:
    yaml.dump(nsp_dataset, outfile, default_flow_style=False)

This dataset can be passed to the 'Dataset calculator' notebook which runs the computations.

## Construction of the sp_dataset

...TO BE DONE:.....