In [1]:
# useful to autoreload the module without restarting the kernel
%load_ext autoreload
%autoreload 2

In [2]:
from mppi import InputFiles as I, Calculators as C, Datasets as D

# Tutorial for the Dataset module

Dataset is the class used to build, perform and post-process a set made of several calculation performed both with QuantumESPRESSO and Yambo.

Here we discuss some explicit examples to describe the usage and the main features of the package.

## Perform a convergence analysis for the gs energy of Silicon

We use this class to find the value of the energy cutoff that guarantees a converged result for the
ground state energy of Silicon.

We start from a given input file for Silicon

In [20]:
inp = I.PwInput(file='IO_files/si_scf.in')
#inp

And we define a Calculator that will be used by the Dataset class to run the computation

In [49]:
code1 = C.QeCalculator(mpi_run='mpirun -np 2', skip = False)
code2 = C.QeCalculator(mpi_run='mpirun -np 4', skip = False)
code1.global_options()

Initialize a parallel QuantumESPRESSO calculator with scheduler direct
Initialize a parallel QuantumESPRESSO calculator with scheduler direct


{'omp': 1,
 'executable': 'pw.x',
 'multiTask': True,
 'scheduler': 'direct',
 'mpi_run': 'mpirun -np 2',
 'cpus_per_task': 4,
 'ntasks': 3,
 'skip': False,
 'verbose': True}

Now we can define the instance of Dataset to perform the convergence procedure. Some information of the class
can be read as

In [50]:
gs_convergence = D.Dataset(label='Si_gs_convergence',run_dir='Si_gs_convergence', spin_orbit = False)

Dataset inherit from Runner so it has the same structure and we can use the same methods of QeCalculator and YamboCalculator 
to access to its global options. 

Note that in this case we have defined a spin_orbit variable that can be used later. This variables is 
stored in the global options of the dataset

In [51]:
gs_convergence.global_options()

{'label': 'Si_gs_convergence',
 'run_dir': 'Si_gs_convergence',
 'spin_orbit': False}

The next step is to append to the Dataset all the calculation that we want to peform lately.

For instance we can append some calculations in function of the cutoff energy. To show the design of the class
we make usage of two different calculators

In [52]:
eng_cut = 20 
idd = {'eng_cut' : eng_cut} #id that identifies the run in the Dataset
inp.set_prefix(D.name_from_id(idd)) #attribute the id as the prefix of the input
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code1,input=inp, variable1 = 'first_run')

The append_run method set the attribute of the object, for instance

In [53]:
print(gs_convergence.ids) # identify each element of the dataset
print(gs_convergence.calculators) # list with the calculators and the associated runs
gs_convergence.runs

[{'eng_cut': 20}]
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7fca9d72bef0>, 'iruns': [0]}]


[{'names': ['eng_cut_20'],
  'inputs': [{'control': {'verbosity': "'high'",
     'pseudo_dir': "'../pseudos'",
     'calculation': "'scf'",
     'prefix': "'eng_cut_20'"},
    'system': {'force_symmorphic': '.true.',
     'occupations': "'fixed'",
     'ibrav': '2',
     'celldm(1)': '10.3',
     'ntyp': '1',
     'nat': '2',
     'ecutwfc': 20},
    'electrons': {'conv_thr': '1e-08'},
    'ions': {},
    'cell': {},
    'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},
    'atomic_positions': {'type': 'crystal',
     'values': [['Si', [0.125, 0.125, 0.125]],
      ['Si', [-0.125, -0.125, -0.125]]]},
    'kpoints': {'type': 'automatic',
     'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},
    'cell_parameters': {},
    'file': 'IO_files/si_scf.in'}],
  'label': 'Si_gs_convergence',
  'run_dir': 'Si_gs_convergence',
  'spin_orbit': False,
  'variable1': 'first_run'}]

The name of the input files is evaluated from the ids using the function name_from_id.

We add further calculations

In [54]:
eng_cut = 30 
idd = {'eng_cut' : eng_cut} #id that identifies the run in the Dataset
inp.set_prefix(D.name_from_id(idd)) #attribute the id as the prefix of the input
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code1,input=inp, variable2 = 'second_run')

In [55]:
print(gs_convergence.ids) 
print(gs_convergence.calculators) 
gs_convergence.runs

[{'eng_cut': 20}, {'eng_cut': 30}]
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7fca9d72bef0>, 'iruns': [0, 1]}]


[{'names': ['eng_cut_20', 'eng_cut_30'],
  'inputs': [{'control': {'verbosity': "'high'",
     'pseudo_dir': "'../pseudos'",
     'calculation': "'scf'",
     'prefix': "'eng_cut_20'"},
    'system': {'force_symmorphic': '.true.',
     'occupations': "'fixed'",
     'ibrav': '2',
     'celldm(1)': '10.3',
     'ntyp': '1',
     'nat': '2',
     'ecutwfc': 20},
    'electrons': {'conv_thr': '1e-08'},
    'ions': {},
    'cell': {},
    'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},
    'atomic_positions': {'type': 'crystal',
     'values': [['Si', [0.125, 0.125, 0.125]],
      ['Si', [-0.125, -0.125, -0.125]]]},
    'kpoints': {'type': 'automatic',
     'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},
    'cell_parameters': {},
    'file': 'IO_files/si_scf.in'},
   {'control': {'verbosity': "'high'",
     'pseudo_dir': "'../pseudos'",
     'calculation': "'scf'",
     'prefix': "'eng_cut_30'"},
    'system': {'force_symmorphic': '.true.',
     'occupations': "'fixed'",
     'i

Note that the variables passed as kwargs in the append run are added to the runs member.

We add further compuations using also the second calculator

In [56]:
eng_cut = 40 
idd = 'eng_cut_%s'%eng_cut # the id can be also a string
inp.set_prefix(D.name_from_id(idd)) 
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code2,input=inp)

eng_cut = 50 
idd = {'eng_cut' : eng_cut} 
inp.set_prefix(D.name_from_id(idd))
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code1,input=inp)

In [57]:
print(gs_convergence.ids) 
print(gs_convergence.calculators) 

[{'eng_cut': 20}, {'eng_cut': 30}, 'eng_cut_40', {'eng_cut': 50}]
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7fca9d72bef0>, 'iruns': [0, 1, 3]}, {'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7fca9d72be48>, 'iruns': [2]}]


gs_convergence.runs is a list that contains the merge of the input object and the global options for each of the
appended run, in this way one can check which are the inputs associated to each calculator.

In [58]:
#gs_convergence.runs[1] #give the parameters of the runs associated to the second calculator

The attribute .results a dictionary that is empty before the run

In [59]:
gs_convergence.results

{}

Once that all the computation have been added we can run the Dataset

In [32]:
results = gs_convergence.run()
results

Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_20.in > eng_cut_20.log
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_30.in > eng_cut_30.log
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_50.in > eng_cut_50.log
run0_is_running:True  run1_is_running:True  run2_is_running:True  
Job completed
Executing command: cd Si_gs_convergence; mpirun -np 4 pw.x -inp eng_cut_40.in > eng_cut_40.log
run0_is_running:True  
Job completed


{0: 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml',
 1: 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml',
 3: 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml',
 2: 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}

The run method returns the attribute .results of the Dataset. 

In [33]:
gs_convergence.results

{0: 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml',
 1: 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml',
 3: 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml',
 2: 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}

This implementation allows us to parse the data after the execution of the dataset and/or to choose a parser 
among several choices. 

### Usage of the multiTask feature

By default Dataset run in parallel all the computations associated to the same calculator. However if the multiTask = False option
is passed to the calculator all the computations are performed in sequence.

In [34]:
code1.update_global_options(multiTask=False)
code2.update_global_options(multiTask=False)

In [35]:
results = gs_convergence.run()
results

delete log file: Si_gs_convergence/eng_cut_20.log
delete xml file: Si_gs_convergence/eng_cut_20.xml
delete folder: Si_gs_convergence/eng_cut_20.save
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_20.in > eng_cut_20.log
run0_is_running:True  
Job completed
delete log file: Si_gs_convergence/eng_cut_30.log
delete xml file: Si_gs_convergence/eng_cut_30.xml
delete folder: Si_gs_convergence/eng_cut_30.save
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_30.in > eng_cut_30.log
run0_is_running:True  
Job completed
delete log file: Si_gs_convergence/eng_cut_50.log
delete xml file: Si_gs_convergence/eng_cut_50.xml
delete folder: Si_gs_convergence/eng_cut_50.save
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_50.in > eng_cut_50.log
run0_is_running:True  
Job completed
delete log file: Si_gs_convergence/eng_cut_40.log
delete xml file: Si_gs_convergence/eng_cut_40.xml
delete folder: Si_gs_convergence/eng_cut_40.save
Executin

{0: 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml',
 1: 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml',
 3: 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml',
 2: 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}

### Parsing of the results

One way to perform the parsing of the results is _a posteriori_ from the run of the dataset.

For instance we can parse the results with the PwParser class of this package

In [36]:
from mppi import Parsers as P
results = {}
for run,data in gs_convergence.results.items():
    results[run] = P.PwParser(data)

Parse file : Si_gs_convergence/eng_cut_20.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_30.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_50.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_40.save/data-file-schema.xml


In [37]:
results

{0: <mppi.Parsers.PwParser.PwParser at 0x7fca86fecd30>,
 1: <mppi.Parsers.PwParser.PwParser at 0x7fca9d8959b0>,
 3: <mppi.Parsers.PwParser.PwParser at 0x7fca9d895a90>,
 2: <mppi.Parsers.PwParser.PwParser at 0x7fca9d84a8d0>}

The results associate to the key "i" correspond to the i-th element appended to the run.

The input parameters associated to each key of results are written inside the gs_convergence_runs[key] list.

For instance the total energy is extracted as

In [38]:
for run,res in results.items():
    print('run',run,'energy',res.get_energy(convert_eV=False))

run 0 energy -7.870821313272014
run 1 energy -7.872953197530331
run 3 energy -7.874492376312396
run 2 energy -7.874327291306248


### Usage of the post processing function

The Parsing, or other more specific procedures, can be performed directly when the run method is called.

To do so, we define a post processing function and pass it to the Dataset. 

The class will apply this function when the run method of Dataset is called. For instance in this way we can directly 
extract the total energy 

In [40]:
def extract_energy(dataset): 
    from mppi import Parsers as P
    energy = {}
    for run,data in dataset.results.items():
        results = P.PwParser(data,verbose=False)
        energy[run] = results.get_energy(convert_eV = False)
    return energy

In [41]:
gs_convergence.set_postprocessing_function(extract_energy)

Once that the post processing function is passed to dataset it is directly applied when the run is executed

In [42]:
code1.update_global_options(verbose=False,skip=True,multiTask=True)
code2.update_global_options(verbose=False,skip=True,multiTask=True)
gs_convergence.run()

{0: -7.870821313272014,
 1: -7.872953197530331,
 3: -7.874492376312396,
 2: -7.874327291306248}

In [52]:
gs_convergence.post_processing()

{0: -7.870821313272014,
 1: -7.872953197530331,
 3: -7.874492376312396,
 2: -7.874327291306248}

Note that the attribute results contains always the name of the xml data, the post processed results
can be accessed in the class as self.post_processing(). 

In [None]:
######################################################################################à

In [35]:
def name_from_id(id):
    """
    Convert the id into a run name. If id is a string, set name = id, if it is a
    dictionary build the name string of the run from the id dictionary.

    Args:
        id : id associated to the run
    Returns:
       name (str): name of the run associated to the dictionary ``id``
    """
    if type(id) is str :
        name = id
    elif type(id) is dict :
        keys=sorted(id.keys())
        name=''
        for k in keys:
            name += k+'_'+str(id[k])+'-'
        name = name.rstrip('-')
    else :
        print('id type not recognized')
        name = None
    return name

In [36]:
ids = [{'ecut' : 40},{'ecut' : 40, 'k' : 6},{'ecut' : 40, 'k' : 7, 's' : 0}]
sel_id = {'ecut' : 40, 'k' : 6}

In [37]:
names = [name_from_id(id) for id in ids]
names

['ecut_40', 'ecut_40-k_6', 'ecut_40-k_7-s_0']

In [38]:
sel_name = name_from_id(sel_id)
sel_name

'ecut_40-k_6'

In [40]:
fetch_indices = []
for irun,name in enumerate(names):
    if sel_name in name : fetch_indices.append(irun)
fetch_indices

[1]

In [24]:
bla = [0,1,3]

In [25]:
bla.index(3)

2

In [None]:
##################################################################################################

### Usage of the fetch_results method

Another possible approach is to define a post processing function that perform a simple parsing of the data.

Then we can use fetch_results to seek for the attribute energy in the computation(s) that match the id 
passed in fetch_results

In [60]:
def parse_data(dataset):
    from mppi import Parsers as P
    results = {}
    for run,data in dataset.results.items():
        results[run] = P.PwParser(data,verbose=False)
    return results

In [61]:
gs_convergence.set_postprocessing_function(parse_data)

In [64]:
gs_convergence.fetch_results(id={'eng_cut': 50},attribute='energy')

[3]
[]


[-7.874492376312396]

Note that it not necesary to run the dataset since the fetch_results method perform the runs that match
with the id (if the option run_if_not_present=True is used)

### Usage of the seek_convergence method

We present the functionality of this method by performing a second convergence test on the number of kpoints.

In this example we set the energy cutoff to 60 Ry and build a new dataset appending run with increasing number of
kpoints.

In [78]:
inp = I.PwInput('IO_files/si_scf.in')
inp.set_energy_cutoff(60)

In [80]:
code = C.QeCalculator(skip=True,verbose=False)
#code.global_options()

Initialize a QuantumESPRESSO calculator with OMP_NUM_THREADS=1 and command mpirun -np 4 pw.x


In [81]:
gs_kpoint = D.Dataset(label='Si_kpoints_convergence',run_dir='Si_gs_convergence')

In [82]:
kpoints = [2,3,4,5,6,7,8]

In [84]:
for k in kpoints:
    id = {'kp':k}
    inp.set_kpoints(points = [k,k,k])
    inp.set_prefix(D.name_from_id(id))
    gs_kpoint.append_run(id=id,runner=code,input=inp)

The runs have been appended but not performed, then we call seek_convergence.

We want to perform a convergence procedure based on the value of the total energy of the system.
So we can use the post processing function that directly provides this quantity

In [85]:
gs_kpoint.set_postprocessing_function(extract_energy)

In [86]:
gs_kpoint.seek_convergence(rtol=0.001)

Fetching results for id " {'kp': 2} "
Fetching results for id " {'kp': 3} "
Fetching results for id " {'kp': 4} "
Fetching results for id " {'kp': 5} "
Convergence reached in Dataset "Si_kpoints_convergence" for id " {'kp': 4} "


({'kp': 4}, -7.874513952262473)

Seek_converge runs all the computation (in the order provided by append_run) until convergence is reached.
Otherwise it is possible to pass a list of ids as argument of the method, in this case the calculation are restricted
to the simulations associated to the provided ids.

It is also possible to use a more generic post processing function that simply parse the data.
In this case we can choose which quantity is used to check if the convergence is reached by specifying the attribute = ...
options in the call of the seek_convergence. For instance

In [87]:
gs_kpoint.set_postprocessing_function(parse_data)

In [98]:
gs_kpoint.seek_convergence(rtol=0.001,attribute='energy')

Fetching results for id " {'kp': 2} "
Fetching results for id " {'kp': 3} "
Fetching results for id " {'kp': 4} "
Fetching results for id " {'kp': 5} "
Convergence reached in Dataset "Si_kpoints_convergence" for id " {'kp': 4} "


({'kp': 4}, -7.874513952262473)

## Perform a convergence test for Hartree-Fock computations with Yambo

We consider a set of Hartree-Fock computation for silicon and we look for the value of the EXXRLvcs that ensure
a converged value of the direct gap.

First of all we need a nscf computation. We start from scf result with ecutoff = 60 and kpoints = [4,4,4]

In [3]:
inp = I.PwInput('Si_gs_convergence/kp_4.in')
inp.set_nscf(8)
inp.set_kpoints(points = [6,6,6]) #nscf kpoints can be different from the scf
name = 'nscf_kp6_ecut60'
inp.set_prefix(name)
#inp

In [117]:
code = C.QeCalculator()
code.global_options()

Initialize a QuantumESPRESSO calculator with OMP_NUM_THREADS=1 and command mpirun -np 4 pw.x


{'omp': 1,
 'mpi_run': 'mpirun -np 4',
 'executable': 'pw.x',
 'skip': False,
 'verbose': True}

In [118]:
code.run(run_dir='Si_gs_convergence',input=inp,name=name,source_dir='Si_gs_convergence/kp_4.save')

delete log file: Si_gs_convergence/nscf_kp6_ecut60.log
delete xml file: Si_gs_convergence/nscf_kp6_ecut60.xml
delete folder: Si_gs_convergence/nscf_kp6_ecut60.save
Copy source_dir Si_gs_convergence/kp_4.save in the Si_gs_convergence/nscf_kp6_ecut60.save
Run directory Si_gs_convergence
Executing command: mpirun -np 4 pw.x -inp nscf_kp6_ecut60.in > nscf_kp6_ecut60.log


'Si_gs_convergence/nscf_kp6_ecut60.save/data-file-schema.xml'

The next step is the generation of the run_dir and SAVE folder

In [119]:
from mppi import Utilities as U

In [3]:
run_dir = 'Si_hf_convergence'
source_dir = 'Si_gs_convergence/nscf_kp6_ecut60.save'

In [122]:
U.build_SAVE(source_dir,run_dir)

Create folder Si_hf_convergence
Executing command: cd Si_gs_convergence/nscf_kp6_ecut60.save; p2y -a 2
Executing command: cp -r Si_gs_convergence/nscf_kp6_ecut60.save/SAVE Si_hf_convergence
Executing command: cd Si_hf_convergence;OMP_NUM_THREADS=1 yambo


Now we are ready to build the Yambo dataset

In [4]:
code = C.YamboCalculator(skip=True)

Initialize a Yambo calculator with OMP_NUM_THREADS=1 and command mpirun -np 4 yambo


In [5]:
inp = I.YamboInput(args='yambo -x -V rl',folder=run_dir)
# we are interested at the direct gap at Gamma so we include only the first kpoint
inp['variables']['QPkrange'] = [[1,1,1,8],'']
inp

{'args': 'yambo -x -V rl',
 'folder': 'Si_hf_convergence',
 'filename': 'yambo.in',
 'arguments': ['HF_and_locXC'],
 'variables': {'FFTGvecs': [2733.0, 'RL'],
  'SE_Threads': [0.0, ''],
  'EXXRLvcs': [17153.0, 'RL'],
  'QPkrange': [[1, 1, 1, 8], '']}}

In [6]:
hf_convergence = D.Dataset(label='Si_hf',run_dir=run_dir)

Let us start by adding some computations to see how to manage the data

In [7]:
exx_values = [1.,2.,3.,4.] #in Hartree

In [8]:
for e in exx_values:
    id = {'exxrl' : e}
    inp['variables']['EXXRLvcs'] = [1e3*e, 'mHa']
    hf_convergence.append_run(id=id,input=inp,runner=code)

 If needed we can also pass the jobname attribute by adding, for istance
 
 jobname=D.name_from_id(id)+'-job' 
 
 in the appen_run

In [9]:
hf_convergence.run()

Run directory Si_hf_convergence
Skip the computation for input exxrl_1.0
Run directory Si_hf_convergence
Skip the computation for input exxrl_2.0
Run directory Si_hf_convergence
Skip the computation for input exxrl_3.0
Run directory Si_hf_convergence
Skip the computation for input exxrl_4.0


{0: {'output': ['Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf'],
  'ndb': ['Si_hf_convergence/exxrl_1.0/ndb.HF_and_locXC']},
 1: {'output': ['Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf'],
  'ndb': ['Si_hf_convergence/exxrl_2.0/ndb.HF_and_locXC']},
 2: {'output': ['Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf'],
  'ndb': ['Si_hf_convergence/exxrl_3.0/ndb.HF_and_locXC']},
 3: {'output': ['Si_hf_convergence/exxrl_4.0/o-exxrl_4.0.hf'],
  'ndb': ['Si_hf_convergence/exxrl_4.0/ndb.HF_and_locXC']}}

### Parsing the results with a post processing function

We can define a general post processing function to extract all the results from the o- files of the dataset.

We can use the YamboParser class of this package

In [10]:
def parse_data(dataset):
    from mppi import Parsers as P
    results = {}
    for run,data in dataset.results.items():
        results[run] = P.YamboParser(data['output'],verbose=True)
    return results

In [11]:
hf_convergence.set_postprocessing_function(parse_data)

In [12]:
code.update_global_options(verbose=False,skip=True)
results = hf_convergence.run()

Parse file Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf
Parse file Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf
Parse file Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf
Parse file Si_hf_convergence/exxrl_4.0/o-exxrl_4.0.hf


Results can be extracted as

In [14]:
for irun in results:
    print(results[irun]['hf']['ehf'])

[-19.05416   -1.101     -1.067     -1.62535    6.79846    6.790734
   6.649487   7.654567]
[-19.13128   -1.552     -1.519     -2.08017    6.501243   6.493162
   6.351671   7.156224]
[-19.16567   -1.701     -1.667     -2.22218    6.34695    6.329222
   6.187717   6.996501]
[-19.17205   -1.718     -1.684     -2.23912    6.328658   6.310536
   6.169065   6.977456]


### Computing the direct gap with a post processing function

We describe the usage of a post processing function to perform a more specific operation like computing
the direct band gap. We define the post processing function

In [59]:
def get_direct_gap(dataset):
    """"
    Compute the direct band gap assuming that there is only one kpoint.
    The arguments energy_col, val_band and cond_band are read from the global_options
    of the dataset.
    """
    from mppi import Parsers as P
    import numpy as np
    glob_opt = dataset.global_options()
    val_band = glob_opt.get('val_band',1)
    cond_band = glob_opt.get('cond_band',1)
    # the name of the column used to compute the gap
    energy_col = glob_opt.get('energy_col','hf') 
    gap = {}
    for run,data in dataset.results.items():
        results = P.YamboParser(data['output'])
        key = list(results.keys())[0] # select the key (can be hf or qp)
        bands = results[key]['band']
        index_val = np.where(bands == val_band)
        index_cond = np.where(bands == cond_band)
        energy = results[key][energy_col]
        delta = energy[index_cond]-energy[index_val]
        gap[run] = float(delta)
    return gap

This function assume that some input like the specification of the conduction and valence bands are given in the global options
of the dataset. So we can se

In [60]:
hf_convergence.update_global_options(val_band = 4, cond_band = 5, energy_col = 'hf')

Then we set the new post processing function and run the dataset

In [61]:
hf_convergence.set_postprocessing_function(get_direct_gap)

In [62]:
hf_convergence.run()

{0: 6.5168870000000005, 1: 6.67449, 2: 6.662207, 3: 6.660855000000001}

### Usage of seek convergence

The post processing function defined above can be used together with the seek_convergence method to perform a convergence study

In this case we define a new dataset and append many possible runs. Only those one needed to reach the given tolerance will be executed

In [63]:
hf_convergence2 = D.Dataset(label='Si_hf',run_dir=run_dir,val_band = 4, cond_band = 5, var_name = 'hf')

In [64]:
exx_values = [float(i) for i in range(1,10)] #in Hartree
exx_values

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

In [65]:
for e in exx_values:
    id = {'exxrl' : e}
    inp['variables']['EXXRLvcs'] = [1e3*e, 'mHa']
    hf_convergence2.append_run(id=id,input=inp,runner=code)

In [66]:
hf_convergence2.set_postprocessing_function(get_direct_gap)

In [67]:
hf_convergence2.seek_convergence(rtol=0.0001)

Fetching results for id " {'exxrl': 1.0} "
Fetching results for id " {'exxrl': 2.0} "
Fetching results for id " {'exxrl': 3.0} "
Fetching results for id " {'exxrl': 4.0} "
Fetching results for id " {'exxrl': 5.0} "
Fetching results for id " {'exxrl': 6.0} "
Convergence reached in Dataset "Si_hf" for id " {'exxrl': 5.0} "


({'exxrl': 5.0}, 6.661784)