In [9]:
import yaml
import os, sys
sys.path.insert(0,'../')
import Routines as R

# Construction of the dataset

The aim of this notebook is to build a dictionary that collect all the information needed to compute the statical polarizability (for all the chose psp's) for the HG dataset.

To achieve this task we split the hg dataset into two parts according the sp or nsp nature of the computation. Then we associate to each molecule a set of study (i.e. a choice of xc and psp). The study is processed with the single study workflow previously defined. 

In [24]:
hg_dataset = yaml.load(open('../HG Dataset/hg_data.yaml'))
#hg_dataset

## Construction of the nsp_dataset

We build the nsp_dataset that contain all the molecules associated to a spin_pol = nsp.

This dictionary contains also the reference results extracted from the hg_dataset and a key 'study' that specifies
all the couples xc,psp associated to each molecule of the dataset.

In [28]:
# CHECK IF LDA-SPW92 IS THE RIGHT CHOICE !!!!!
nsp_dataset = {}
for mol,data in hg_dataset.iteritems():
    if data['spin_pol'] == 'nsp':
        ref_results = {'lda' : data['lda-SPW92'], 'pbe' : data['pbe'], 'pbe0' : data['pbe0']}
        nsp_dataset[mol] = {'ref_results' : ref_results}

In [29]:
print 'number of nsp molecules = ', len(nsp_dataset.keys())

number of nsp molecules =  75


In [30]:
# for instance
nsp_dataset['AlF']

{'ref_results': {'lda': [6.154, 6.154, 5.59],
  'pbe': [6.292, 6.292, 5.656],
  'pbe0': [6.248, 6.248, 5.378]}}

Now we add the key study to the dataset. The possible couple of (xc,psp) are given by:

* for xc = lda the psp is hgh-k. This is the default choice performed by BigDFT so no input, apart 
  from the choice of xc as to be given as input.
  
* for xc = pbe one possible psp is hgh-k, again this is the default choice of BigDFT.

* for xc = pbe0 there is no a default choice of BigDFT. So the information has to be provided in the
  input file as follows:
  
  inp['psppar.atom']={'Pseudopotential XC': 11}

* for xc = pbe there is one further choice given by the pbe psp with nlcc. The list of atoms for which this 
  psp are provided is given in http://bigdft.org/Wiki/index.php?title=NLCC_PBE_psppar. If all the atoms of 
  a molecule belong to this list this study can be activated.  
  
  The list of atoms for which nlcc pseudo are given is:
  

In [31]:
nlcc_psp = ['B','C','N','O','F','Al','Si','P','Cl','H','Mg']

where, as stated in the link, H and Mg psp in this list give more accurate nonrelativistic atomization energies than the relativistic Krack pseudopotentials when used in conjunction with the nlcc psp.

We represent the study field as a list of touple, as follows

In [49]:
base_study = [('lda','hgh-k'),('pbe','hgh-k'),('pbe0','hgh-k')]

In [50]:
for mol,data in nsp_dataset.iteritems():
    if R.molecule_inlist(mol,nlcc_psp):
        data['study'] = base_study+[('pbe','nlcc')]
    else : 
        data['study'] = base_study

In [59]:
#nsp_dataset

Save the dataset as yaml file

In [56]:
import yaml
with open('nsp_dataset.yaml', 'w') as outfile:
    yaml.dump(nsp_dataset, outfile, default_flow_style=False)

This dataset can be passed to the 'Dataset calculator' notebook which runs the computations.

TO BE DONE : 

Consider also the psp found in 

http://bigdft.org/Wiki/index.php?title=New_Soft-Accurate_NLCC_pseudopotentials

some of these are nlcc while other are not. Do they belong to a differnt class w.r.t the previous ones, should
we include all of them?

In [58]:
saha_elements = ['H','He','Li','Be','B','C','N','O','F','Ne','Mg','Al','Si','P','Cl','K']
# plus other atoms that are not present in the  molecules of the dataset

## Construction of the sp_dataset

...TO BE DONE:.....

In [53]:
#####################################################
# OLD STUFF

We split the HG dataset into two list, according to the nsp or sp character of the associated study

In [1]:
nsp_list = ['AlF','Ar','BeH2','BF','BH2Cl','BH2F','BH3','BHF2','C2H2','C2H4','CH2BH','CH3BH2','CH3Cl',\
 'CH3F','CH3NH2','CH3OH','CH3SH','CH4','Cl2','ClCN','ClF','CO','CO2','CS','CSO','FCN','FNO','H',\
 'H2','H2O','HBO','HBS','HCCCl','HCCF','HCHO','HCl','HCN','HCONH2','HCOOH','He','HF','HNC','HOCl',\
 'HOOH','LiBH4','LiCl','LiCN','LiH','Mg','Mg2','N2','N2H4','NaCl','NaCN','NaH','Ne','NH2Cl','NH2F',\
 'NH2OH','NH3','NH3O','OCl2','P2H4','PH2OH','PH3','PH3O','S2H2','SCl2','SF2','SH2','SiH3Cl','SiH3F',\
 'SiH4','SiO','SO2']
sp_list = ['Be','BeH','BH2','BN','BO','BS','C2H','C2H3','CH2F','CH2NH','CH2PH','CH2-t','CH3O','F2',\
 'FH-OH','H2CN','H2O-Li','HCHS','HCO','HCP','HNO','HNS','HO2','HOF','Li','Li2','N','N2H2','Na','Na2',\
 'NaLi','NH','NH2','NO','NOCl','NP','O2','O3','OCl','OF','OF2','OH','P','PH','PH2','S2','SCl','SF','SH',\
 'SO-trip','CN','P2','PS','CH3','NCO','FCO','SiH3']

In [23]:
print 'number of nsp elements : ', len(nsp_list)
print 'number of sp elements : ', len(sp_list)

number of nsp elements :  75
number of sp elements :  57
