In [1]:
# useful to autoreload the module without restarting the kernel
%load_ext autoreload
%autoreload 2

In [2]:
from mppi import InputFiles as I, Calculators as C, Datasets as D

# Tutorial for the Dataset module

Dataset is the class used to build, perform and post-process a set made of several calculation performed both with QuantumESPRESSO and Yambo.

Here we discuss some explicit examples to describe the usage and the main features of the package.

## Perform a convergence analysis for the gs energy of Silicon

We use this class to find the value of the energy cutoff that guarantees a converged result for the
ground state energy of Silicon.

We start from a given input file for Silicon

In [3]:
inp = I.PwInput(file='IO_files/si_scf.in')
#inp

And we define a Calculator that will be used by the Dataset class to run the computation

In [4]:
code = C.QeCalculator(omp = 1, mpi_run='mpirun -np 4', skip = True, verbose= True)

Initialize a QuantumESPRESSO calculator with OMP_NUM_THREADS=1 and command mpirun -np 4 pw.x


Now we can define the instance of Dataset to perform the convergence procedure

In [20]:
gs_convergence = D.Dataset(label='Si_gs_convergence',run_dir='Si_gs_convergence')

Dataset inherit from Runner so it has the same structure and we can use the same methods of QeCalculator and YamboCalculator 
to access to its global options

In [21]:
gs_convergence.global_options()

{'label': 'Si_gs_convergence', 'run_dir': 'Si_gs_convergence'}

The next step is to append to the Dataset all the calculation that we want to peform lately.

For instance we can perform first a set of calculations in function of the cutoff energy

In [22]:
from mppi.Utilities import Utils as U

In [23]:
energy_cutoffs = [20,30,40,50] # in Ry

In [24]:
for e in energy_cutoffs:
    idd = {'eng_cut' : e} #id that identifies the run in the Dataset
    inp.set_prefix(U.name_from_id(idd)) #attribute the id as the prefix of the input
    inp.set_energy_cutoff(e)
    gs_convergence.append_run(id=idd,runner=code,input=inp)

The append_run method set the attribute of the object, for instance

In [25]:
print(gs_convergence.ids) # idensify each element of the dataset
print(gs_convergence.names) # name of the input file written on disk
print(gs_convergence.calculators)

[{'eng_cut': 20}, {'eng_cut': 30}, {'eng_cut': 40}, {'eng_cut': 50}]
['eng_cut_20', 'eng_cut_30', 'eng_cut_40', 'eng_cut_50']
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f15a818d518>, 'runs': [0, 1, 2, 3]}]


gs_convergence.runs is a list that contains the merge of the input object and the global options for each of the
appended run, in this way one can check which is the input associated

In [26]:
gs_convergence.runs[2] #give the parameters of the third computation appended to the dataset

{'label': 'Si_gs_convergence',
 'run_dir': 'Si_gs_convergence',
 'input': {'control': {'verbosity': "'high'",
   'pseudo_dir': "'../pseudos'",
   'calculation': "'scf'",
   'prefix': "'eng_cut_40'"},
  'system': {'force_symmorphic': '.true.',
   'occupations': "'fixed'",
   'ibrav': '2',
   'celldm(1)': '10.3',
   'ntyp': '1',
   'nat': '2',
   'ecutwfc': 40},
  'electrons': {'conv_thr': '1e-08'},
  'ions': {},
  'cell': {},
  'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},
  'atomic_positions': {'type': 'crystal',
   'values': [['Si', [0.125, 0.125, 0.125]],
    ['Si', [-0.125, -0.125, -0.125]]]},
  'kpoints': {'type': 'automatic',
   'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},
  'cell_parameters': {},
  'file': 'IO_files/si_scf.in'}}

The attribute .calculator is a dictionary that is empty before the run

In [27]:
gs_convergence.results

{}

Before run the Dataset we can add another computation, made with a different calculator

In [28]:
code2 = C.QeCalculator(omp = 1, mpi_run='mpirun -np 2')

Initialize a QuantumESPRESSO calculator with OMP_NUM_THREADS=1 and command mpirun -np 2 pw.x


In [29]:
idd = 'code_number2' # we can use a string as id of the run
inp.set_prefix(U.name_from_id(idd))
inp.set_energy_cutoff(60)
gs_convergence.append_run(id=idd,runner=code2,input=inp,verbose = False)

In [30]:
print(gs_convergence.ids)
gs_convergence.calculators

[{'eng_cut': 20}, {'eng_cut': 30}, {'eng_cut': 40}, {'eng_cut': 50}, 'code_number2']


[{'calc': <mppi.Calculators.QeCalculator.QeCalculator at 0x7f15a818d518>,
  'runs': [0, 1, 2, 3]},
 {'calc': <mppi.Calculators.QeCalculator.QeCalculator at 0x7f15a80a1278>,
  'runs': [4]}]

Once that all the computation have been added we can run the Dataset

In [31]:
gs_convergence.run()

Run directory Si_gs_convergence
Skip the computation for input  eng_cut_20
Run directory Si_gs_convergence
Skip the computation for input  eng_cut_30
Run directory Si_gs_convergence
Skip the computation for input  eng_cut_40
Run directory Si_gs_convergence
Skip the computation for input  eng_cut_50


{0: {'xml_data': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},
 1: {'xml_data': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},
 2: {'xml_data': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'},
 3: {'xml_data': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},
 4: {'xml_data': 'Si_gs_convergence/code_number2.save/data-file-schema.xml'}}

Note that the run information of the code2 are not printed since verbose = False has been provided when the run has been appended.

The run method returns the attribute .results of the Dataset. 

In [33]:
gs_convergence.results

{0: {'xml_data': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},
 1: {'xml_data': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},
 2: {'xml_data': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'},
 3: {'xml_data': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},
 4: {'xml_data': 'Si_gs_convergence/code_number2.save/data-file-schema.xml'}}

This implementation allows to parse the data after the execution of the dataset and/or to choose a parser 
among several choices. 

The parser can be chosen at this stage, for instance

## Parsing of the results

One way to perform the parsing of the results is a posteriori from the run of the dataset.

For instance we can parse the results with the PwParser class of this package

In [34]:
from mppi import Parsers as P

results = {}
for run,data in gs_convergence.results.items():
    results[run] = P.PwParser(data['xml_data'])

Parse file : Si_gs_convergence/eng_cut_20.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_30.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_40.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_50.save/data-file-schema.xml
Parse file : Si_gs_convergence/code_number2.save/data-file-schema.xml


In [35]:
results

{0: <mppi.Parsers.PwParser.PwParser at 0x7f15a80a1a20>,
 1: <mppi.Parsers.PwParser.PwParser at 0x7f158f043748>,
 2: <mppi.Parsers.PwParser.PwParser at 0x7f158f040e48>,
 3: <mppi.Parsers.PwParser.PwParser at 0x7f158f029fd0>,
 4: <mppi.Parsers.PwParser.PwParser at 0x7f158efe7860>}

The results dictionary is something different from gs_convergence, however the results associate to the key "i"
correspond to the i-th element appended to the run.

The input parameters associated to each key of results are written inside the gs_convergence_runs[key] list.

For instance the total energy is extracted as

In [36]:
for run,res in results.items():
    print('run',run,'energy',res.get_energy(convert_eV=False))

run 0 energy -7.870821313426413
run 1 energy -7.872953197509275
run 2 energy -7.874327291306248
run 3 energy -7.874492376334014
run 4 energy -7.874513952356973


### Usage of the post processing function

The Parsing, or other more specific procedures, can be performed directly when the run method is called.

To do so, we define a post processing function and pass it to the Dataset. 

The class will apply this function when the run of the Dataset is complete. For istance in this way we can directyl 
extract the total energy 

In [37]:
def extract_energy(data):
    from mppi import Parsers as P
    energy = {}
    for run,data in data.results.items():
        results = P.PwParser(data['xml_data'],verbose=False)
        energy[run] = results.get_energy(convert_eV = False)
    return energy

In [38]:
gs_convergence.set_postprocessing_function(extract_energy)

Once that the post processing function is passed to dataset it is directly applied when the run is executed

In [39]:
code.update_global_options(verbose=False)
gs_convergence.run()

{0: -7.870821313426413,
 1: -7.872953197509275,
 2: -7.874327291306248,
 3: -7.874492376334014,
 4: -7.874513952356973}

Note that the attribute results contains always the name of the xml data, the post processed results
can be accessed in the class as self.post_processing(). 

For this reason the method fetch_results has been slightly modified with respect to the original implementation
of PyBigDFT.

Since in this case the post processing function directly returns the energy we can use the method fetch_results without specifying the

attribute to extract the energy for a specific value of the energy cutoff

In [40]:
gs_convergence.fetch_results(id={'eng_cut' : 30})

[-7.872953197509275]

### Usage of the fetch_results method

Another possible approach is to define a post processing function that perform a simple parsing of the data.

Then we can use fetch_results to seek for the attribute energy in the computation(s) that match the id 
passed in fetch_results

In [41]:
def parse_data(data):
    from mppi import Parsers as P
    results = {}
    for run,data in data.results.items():
        results[run] = P.PwParser(data['xml_data'],verbose=False)
    return results

In [42]:
gs_convergence.set_postprocessing_function(parse_data)

In [43]:
gs_convergence.run()

{0: <mppi.Parsers.PwParser.PwParser at 0x7f15a80a17f0>,
 1: <mppi.Parsers.PwParser.PwParser at 0x7f15a80a1c18>,
 2: <mppi.Parsers.PwParser.PwParser at 0x7f158ef0a7f0>,
 3: <mppi.Parsers.PwParser.PwParser at 0x7f158eec6d68>,
 4: <mppi.Parsers.PwParser.PwParser at 0x7f158ee869b0>}

In [44]:
gs_convergence.fetch_results(id={'eng_cut': 50},attribute='energy')

[-7.874492376334014]

### Usage of the seek_convergence method

We present the functionality of this method by performing a second convergence test on the number of kpoints.

In this example we set the energy cutoff to 60 Ry and build a new dataset appending run with increasing number of
kpoints.

In [45]:
inp = I.PwInput('IO_files/si_scf.in')
inp.set_energy_cutoff(60)

In [46]:
code = C.QeCalculator(skip=True,verbose=False)
code.global_options()

Initialize a QuantumESPRESSO calculator with OMP_NUM_THREADS=1 and command mpirun -np 4 pw.x


{'omp': 1,
 'mpi_run': 'mpirun -np 4',
 'executable': 'pw.x',
 'skip': True,
 'verbose': False}

In [47]:
gs_kpoint = D.Dataset(label='Si_kpoints_convergence',run_dir='Si_gs_convergence')

In [48]:
kpoints = [2,3,4,5,6,7,8]

In [49]:
for k in kpoints:
    id = {'kp':k}
    inp.set_kpoints(points = [k,k,k])
    inp.set_prefix(U.name_from_id(id))
    gs_kpoint.append_run(id=id,runner=code,input=inp)

The runs have been appended but not performed, then we call seek_convergence.

We want to perform a convergence procedure based on the value of the total energy of the system.
So we have to define a post processing function that provide this quantity

In [50]:
def extract_energy(data):
    from mppi import Parsers as P
    energy = {}
    for run,data in data.results.items():
        results = P.PwParser(data['xml_data'],verbose=False)
        energy[run] = results.get_energy(convert_eV = False)
    return energy

In [51]:
gs_kpoint.set_postprocessing_function(extract_energy)

In [52]:
gs_kpoint.seek_convergence(rtol=0.001)

Fetching results for id " {'kp': 2} "
Fetching results for id " {'kp': 3} "
Fetching results for id " {'kp': 4} "
Fetching results for id " {'kp': 5} "
Convergence reached in Dataset "Si_kpoints_convergence" for id " {'kp': 4} "


({'kp': 4}, -7.874513952262473)

Seek_converge runs all the computation (in the order provided by append_run) until convergence is reached.
Otherwise it is possible to pass a list of ids as argument of the method, in this case the calculation are restricted
to the simulations associated to the provided ids.

In [None]:
#####################################################################################

## Perform a set of Hartree-Fock computations with Yambo

Dataset can be used to organize Yambo computation in an analogous way of the QE ones. The differences are represented by the usage of YamboIn to build the input files, the usage of YamboCalculator to run the computations and by the pre_processing function.

In [42]:
code = C.YamboCalculator(omp=1,mpi_run='mpirun -np 4',executable='yambo',suffix='hf',verbose=True,skip=True)

Initialize a Yambo calculator with command OMP_NUM_THREADS=1 mpirun -np 4 yambo
Suffix for post_processing :  hf


In [43]:
yambo_hf = D.Dataset(label='Hatree-Fock',run_dir='yambo_hf',pre_processing='yambo')

In this case the pre_processing function _has to be_ called before appending the runs because the YamboIn class neeeds the SAVE folder to init the input object.

The dataset make usage of __one__ nscf computation to build the SAVE folder that is used in all the runs

In [44]:
source = 'si_nscf/k_6.save/'

In [45]:
yambo_hf.pre_processing_function(source_dir=source)

Create folder yambo_hf
execute :  cd si_nscf/k_6.save/;p2y -a 2
execute :  cp -r si_nscf/k_6.save//SAVE yambo_hf
execute :  cd yambo_hf;OMP_NUM_THREADS=1 yambo


Now the runs can be appended to the dataset. For instance we perform parametric runs in terms of the EXXRLvcs parameter that expresses the energy cutoff in the number of g-components of G0

In [46]:
exx_values = [2.,3.,4.] #in Hartree

In [47]:
yambo_in = I.YamboIn('yambo -x -V rl',folder=yambo_hf.run_dir)

for ex in exx_values:
    idd = {'EXXRLvcs' : ex} 
    yambo_in['EXXRLvcs'] = [1000.0*ex,'mHa']
    yambo_hf.append_run(id=idd,calculator=code,input=yambo_in)  

In [48]:
yambo_hf.ids

[{'EXXRLvcs': 2.0}, {'EXXRLvcs': 3.0}, {'EXXRLvcs': 4.0}]

In [51]:
print(yambo_hf.runs[0])

HF_and_locXC
FFTGvecs = 2133.000000 RL
SE_Threads = 0.000000e+00 
EXXRLvcs = 2000.000000 mHa
% QPkrange
 1 | 32 | 1 | 10 |   
%



In [52]:
yambo_hf.run()

execute : cd yambo_hf ; OMP_NUM_THREADS=1 mpirun -np 4 yambo -F EXXRLvcs_2.0.in -J EXXRLvcs_2.0 -C EXXRLvcs_2.0
parse file : yambo_hf/EXXRLvcs_2.0/o-EXXRLvcs_2.0.hf
execute : cd yambo_hf ; OMP_NUM_THREADS=1 mpirun -np 4 yambo -F EXXRLvcs_3.0.in -J EXXRLvcs_3.0 -C EXXRLvcs_3.0
parse file : yambo_hf/EXXRLvcs_3.0/o-EXXRLvcs_3.0.hf
execute : cd yambo_hf ; OMP_NUM_THREADS=1 mpirun -np 4 yambo -F EXXRLvcs_4.0.in -J EXXRLvcs_4.0 -C EXXRLvcs_4.0
parse file : yambo_hf/EXXRLvcs_4.0/o-EXXRLvcs_4.0.hf


Results can be extraced in various ways, both using the fetch_results methods or by direct access to the attribute of the Yambo parser. Here we provide some examples.

First of all we can see the names of the attributes for each elements of yambo_hf.results as follows

In [53]:
keys = yambo_hf.results[0].getAttributes()
print(keys)

dict_keys(['K-point', 'Band', 'Eo', 'Ehf', 'DFT', 'HF'])


Then, we can access to the values directly as

In [54]:
yambo_hf.results[0].Ehf[0:5]

[-18.69873, -1.207, -1.438, -0.62819, 6.98599]

Or by using the fetch_results

In [55]:
yambo_hf.fetch_results(id={'EXXRLvcs' : 2.0},attribute='Ehf')[0][0:5]

[-18.69873, -1.207, -1.438, -0.62819, 6.98599]

Also we can use fetch_results to extract the computation(s) that we need and then access directly to the attributes

In [56]:
yambo_hf.fetch_results(id={'EXXRLvcs' : 2.0})[0].Ehf[0:5]

[-18.69873, -1.207, -1.438, -0.62819, 6.98599]