# YamboWorkflow to easily compute several quasiparticle corrections

Often, several quasiparticle corrections (>100) needs to be computed: we may want to computed interpolated G0W0 bands (with `yambopy` or `wannier90`, for example), or we need it to solve the Bethe-Salpeter equation on top of G0W0 results, which represent the state-of-the-art protocol to compute optical properties of materials.

However, this task is a really time consuming: we need to split into several simulation, and the merge back the `ndb.QP` databases containing the quasiparticle corrections. 
Routinely, this can be done by hands, for a small number of quasiparticle needed, within the `yambopy` package.

When we are talking of large set of simulations, we can exploit the power of AiiDA to automatically obtain the final quasiparticle database.  

The logic is simple: divide et impera. The workflow decides how to distribute the quasiparticle corrections among the calculations, following input parameters provided by the user. 
This represents the main difference with the standard `YamboWorklow` run. 
Then, under the hood, the plugin call `yambopy` to perform the final merging.

This tutorial will proceed as the previous one, expect for the fact that, before the submission, we will provide the information needed to compute the wanted quasiparticles. 

In [1]:
from aiida import orm, load_profile
load_profile()

from aiida.plugins import WorkflowFactory
YamboWorkflow = WorkflowFactory('yambo.yambo.yambowf')

from aiida_quantumespresso.common.types import ElectronicType

## Providing the minimal inputs needed for protocols

We have to provide minimal inputs for the creation of the builder instance, namely:
- codes;
- structure;

Providing a parent calculation as input, the already performed steps are skipped, in order to avoid waste of human and computational time.
If no parent is passed to the builder, also DFT inputs are created within the protocols as provided in the `PwBaseWorkChain`.

In [2]:
options = {
    'pwcode_id': 'pw-7.1@hydralogin', 
    'pseudo_family':"PseudoDojo/0.4/PBE/SR/standard/upf",
    'yamboprecode_id':'p2y-5.1@hydralogin',
    'yambocode_id':'yambo-5.1@hydralogin',
    'protocol':'fast',
    #'parent_id':274, #not necessary to set; if you want it, take ytheour previously nscf id (pk) to skip the DFT part.
    'structure_id':161,
}

In [3]:
builder = YamboWorkflow.get_builder_from_protocol(
            pw_code = options['pwcode_id'],
            preprocessing_code = options['yamboprecode_id'],
            code = options['yambocode_id'],
            protocol=options['protocol'],
            protocol_qe=options['protocol'],
            structure= orm.load_node(options['structure_id']),
            overrides={},
            pseudo_family= options['pseudo_family'],
            #parent_folder=orm.load_node(options['parent_id']).outputs.remote_folder,
            electronic_type=ElectronicType.INSULATOR, #default is METAL: in that case, smearing is used
            calc_type='gw', #or 'bse'; default is 'gw'
)


Summary of the main inputs:
BndsRnXp = 200
GbndRnge = 200
NGsBlkXp = 6 Ry
FFTGvecs = 18 Ry


kpoint mesh for nscf: [6, 6, 2]


In [4]:
#You can also try different protocols:
    
YamboWorkflow.get_available_protocols()

{'fast': {'description': 'Under converged for most materials, but fast'},
 'moderate': {'description': 'Meta converged for most materials, higher computational cost than fast'},
 'precise': {'description': 'Converged for most materials, higher computational cost than moderate'}}

Now, if you inspect the prepopulated inputs, you can see the default values respecting the imposed protocol:

In [5]:
builder.nscf.pw.parameters.get_dict()

{'CONTROL': {'calculation': 'nscf',
  'forc_conv_thr': 0.001,
  'tprnfor': True,
  'tstress': True,
  'etot_conv_thr': 0.0004},
 'SYSTEM': {'nosym': False,
  'occupations': 'fixed',
  'ecutwfc': 60.0,
  'ecutrho': 480.0,
  'force_symmorphic': True,
  'nbnd': 200},
 'ELECTRONS': {'electron_maxstep': 80,
  'mixing_beta': 0.4,
  'conv_thr': 1.6e-09}}

In [6]:
builder.yres.yambo.parameters.get_dict()

{'arguments': ['dipoles', 'ppa', 'HF_and_locXC', 'gw0'],
 'variables': {'Chimod': 'hartree',
  'DysSolver': 'n',
  'GTermKind': 'BG',
  'X_and_IO_nCPU_LinAlg_INV': [1, ''],
  'NGsBlkXp': [6, 'Ry'],
  'FFTGvecs': [18, 'Ry'],
  'BndsRnXp': [[1, 200], ''],
  'GbndRnge': [[1, 200], ''],
  'QPkrange': [[[1, 1, 32, 32]], '']}}

We then provide the computational resources:

In [7]:
builder.scf.pw.metadata.options = {
    'max_wallclock_seconds': 60*60, # in seconds
    'resources': {
            "num_machines": 1, # nodes
            "num_mpiprocs_per_machine": 16, # MPI per nodes
            "num_cores_per_mpiproc": 1, # OPENMP
        },
    'prepend_text': u"export OMP_NUM_THREADS="+str(1), # if needed
    #'account':'project_name',
    'queue_name':'s3par',
    #'qos':'',
}

builder.nscf.pw.metadata.options = builder.scf.pw.metadata.options
builder.yres.yambo.metadata.options = builder.scf.pw.metadata.options

### Overrides

As in the previous examples (see e.g. then  `YamboRestart` notebook), it is possible to modify the default inputs also during the builder creation phase, so not a posteriori. This can be done by using overrides:

In [8]:
overrides_scf = {
        'pseudo_family': "PseudoDojo/0.4/PBE/SR/standard/upf", 
        'pw':{
            
        'metadata':{
                    'options':{
                    'max_wallclock_seconds': 60*60, # in seconds
                    'resources': {
                            "num_machines": 1, # nodes
                            "num_mpiprocs_per_machine": 16, # MPI per nodes
                            "num_cores_per_mpiproc": 1, # OPENMP
                        },
                    'prepend_text': u"export OMP_NUM_THREADS="+str(1), # if needed
                    #'account':'project_name',
                    'queue_name':'s3par',
                    #'qos':'',
                                    },
        },
        },
    }

overrides_nscf = {
        'pseudo_family': "PseudoDojo/0.4/PBE/SR/standard/upf", 
        'pw': {
            'parameters':{
                'CONTROL':{}, #not needed if you don't override something
                'SYSTEM':{},
                'ELECTRONS':{'diagonalization':'cg'},
            },
             'metadata':{
                    'options':{
                    'max_wallclock_seconds': 60*60, # in seconds
                    'resources': {
                            "num_machines": 1, # nodes
                            "num_mpiprocs_per_machine": 16, # MPI per nodes
                            "num_cores_per_mpiproc": 1, # OPENMP
                        },
                    'prepend_text': u"export OMP_NUM_THREADS="+str(1), # if needed
                    #'account':'project_name',
                    'queue_name':'s3par',
                    #'qos':'',
                                    },
        },
    },
}

overrides_yambo = {
        "yambo": {
            "parameters": {
                "arguments": [
                    "rim_cut",
                ],
                "variables": {
                    "NGsBlkXp": [4, "Ry"],
                    "FFTGvecs": [24, "Ry"],
                },
            },
        'metadata':{
                    'options':{
                    'max_wallclock_seconds': 60*60, # in seconds
                    'resources': {
                            "num_machines": 1, # nodes
                            "num_mpiprocs_per_machine": 16, # MPI per nodes
                            "num_cores_per_mpiproc": 1, # OPENMP
                        },
                    'prepend_text': u"export OMP_NUM_THREADS="+str(1), # if needed, i.e. in PBS/Torque 
                    #'account':'project_name',
                    'queue_name':'s3par',
                    #'qos':'',
                                    },
                    },
        },
    
}

overrides = {
    'yres': overrides_yambo,
    'nscf': overrides_nscf,
    'scf': overrides_scf
    
}


So, let's create a new builder instance with also the `overrides` information:

In [9]:
builder = YamboWorkflow.get_builder_from_protocol(
            pw_code = options['pwcode_id'],
            preprocessing_code = options['yamboprecode_id'],
            code = options['yambocode_id'],
            protocol=options['protocol'],
            protocol_qe=options['protocol'],
            structure= orm.load_node(options['structure_id']),
            overrides=overrides,
            #parent_folder=load_node(options['parent_id']).outputs.remote_folder,
            electronic_type=ElectronicType.INSULATOR, #default is METAL: smearing is used
            calc_type='gw', #or 'bse'; default is 'gw'
)

Summary of the main inputs:
BndsRnXp = 200
GbndRnge = 200
NGsBlkXp = 4 Ry
FFTGvecs = 24 Ry


kpoint mesh for nscf: [6, 6, 2]


In [10]:
builder.nscf.pw.parameters.get_dict()

{'CONTROL': {'calculation': 'nscf',
  'forc_conv_thr': 0.001,
  'tprnfor': True,
  'tstress': True,
  'etot_conv_thr': 0.0004},
 'SYSTEM': {'nosym': False,
  'occupations': 'fixed',
  'ecutwfc': 84.0,
  'ecutrho': 336.0,
  'force_symmorphic': True,
  'nbnd': 200},
 'ELECTRONS': {'electron_maxstep': 80,
  'mixing_beta': 0.4,
  'diagonalization': 'cg',
  'conv_thr': 1.6e-09}}

In [11]:
builder.yres.yambo.metadata.options

{'stash': {}, 'resources': {'num_machines': 1, 'num_mpiprocs_per_machine': 16, 'num_cores_per_mpiproc': 1}, 'max_wallclock_seconds': 3600, 'withmpi': True, 'prepend_text': 'export OMP_NUM_THREADS=1', 'queue_name': 's3par'}

In [12]:
builder.nscf.pw.parameters.get_dict()['ELECTRONS']['diagonalization']

'cg'

In [13]:
family = orm.load_group("PseudoDojo/0.4/PBE/SR/standard/upf")
#builder.<sublevels_up_to .pw>.pseudos = family.get_pseudos(structure=structure) 
builder.scf.pw.pseudos = family.get_pseudos(structure=orm.load_node(161)) 
builder.nscf.pw.pseudos = family.get_pseudos(structure=orm.load_node(161)) 

### Requesting the YamboWorkflow to compute a specific quantity: the minimum band gap and the direct band gap at Gamma

Within `YamboWorkflow`, it is possible to obtain the band gap of a material in an automatic fashion. The workflow contains the logic to inspect DFT band structure, as computed in the nscf step,
and determine the k-points and electronic band coordinates corresponding to the minimal band gap of the material.
In this way, the exact quasiparticle levels can be computed, without additional human intervention. 

Here below we see how to set additional parsing, through the `additional_parsing` attribute of the builder. This consists in an AiiDA List instance containing strings, each of them
representing the desired quantity. In this case, we want to compute the band gap at Gamma and the minimal gap, respectively "gap_GG" and "gap_".

It is possible also to ask for other high-symmetry points, e.g. M, K. However, if the points are not contained in our mesh, their quasiparticle correction is skipped (it cannot be computed). 
Indirect gaps can be computed, providing a string of the type "gap_AB", where `A` is the k-point for the top valence band, and `B` is the k-points of the bottom conduction bands. For example, the indirect gap G->M 
can be computed providing the "gap_GM" string in the `additional_parsing` List.

Finally, also single particle levels can be computed for the last valence and first conduction bands. What we need to provide is the string "homo_K" or "lumo_K", respectively. `K` is the desired high-symmetry k-point.
To explicitly compute the top valence and the bottom conduction GW energies, just provide "homo" and "lumo".

In [14]:
builder.additional_parsing = orm.List(list=['gap_GG','gap_'])

### Requesting the YamboWorkflow to compute a specific set of quasiparticles

The idea is to split the QP calculation in several subsets, then merge it in a final database -- with yambopy functionalities.
There are a lot of possibilities to run QP calculations, to be provided in the QP_subset_dict input of the YamboWorkflow: 

(1) provide subset of already wanted QP, already in subsets (i.e. already splitted);

```python
QP_subset_dict= {
    'subsets':[
        [[1,1,8,9],[2,2,8,9]], #first subset
        [[3,3,8,9],[4,4,8,9]], #second subset
                ],
}
```

(2) provide explicit QP, i.e. a list of single QP to be splitted;

```python
QP_subset_dict= {
    'explicit':[
        [1,1,8,9],[2,2,8,9],[3,3,8,9],[4,4,8,9], #to be splitted
                ],
}
```

(3) provide boundaries for the bands to be computed: [k_i,k_f,b_i,b_f];

```python
QP_subset_dict= {
    'boundaries':{
        'k_i':1,    #default=1
        'k_f':20,   #default=NK_ibz
        'b_i':8,
        'b_f':9,
    },
}
```

(4) provide a range of (DFT) energies where to consider the bands and the k-points to be computed, useful if we don't know the system;
    of we want BSE for given energies -- usually, BSE spectra is well converged for 75% of this range. These are generated as 
    explicit QP, then splitted.
    It is possible to provide also: 'range_spectrum', which find the bands to be included in the BSE calculation, including the other bands 
    outside the range_QP window as scissored -- automatically by yambo in the BSE calc. So the final QP will have 
    rangeQP bands, but the BSE calc will have all the range_spectrum bands.
    These ranges are windows of 2*range, centered at the Fermi level. 
    If you set the key 'full_bands'=True, all the kpoints are included for each bands. otherwise, only the qp in the window.

```python
QP_subset_dict= {
    'range_QP':3, #eV         , default=nscf_gap_eV*1.2
    'range_spectrum':10, #eV

}
```

for (2) and (4) there are additional options:
    - (a) 'split_bands': split also in bands, not only kpoints the subset. default is True.
    - (b) 'extend_QP': it allows to extend the qp after the merging, including QP not explicitely computed
        as FD+scissored corrections (see paper HT M Bonacci et al. 2023). Useful in G0W0 interpolations
        e.g. within the aiida-yambo-wannier90 plugin.
        (b.1) 'consider_only': bands to be only considered explcitely, so the other ones are deleted from the explicit subsets;
        (b.2) 'T_smearing': the fake smearing temperature of the correction.

```python
QP_subset_dict.update({
    'split_bands':True, #default
    'extend_QP': True, #default is False
    'consider_only':[8,9],
    'T_smearing':1e-2, #default
})
```

computation options: 

(a) 'qp_per_subset':20; #how many qp in each splitted subset.
(b) 'parallel_runs':4; to be submitted at the same time remotely. then the remote is deleted, as the qp is stored locally,
(c) 'resources':para_QP, #see below
(d) 'parallelism':res_QP, #see below


In [16]:
para_QP = {}
para_QP['SE_CPU'] = '2 2 4'
para_QP['SE_ROLEs'] = 'q qp b'
res_QP = {
                        'num_machines': 1,
                        'num_mpiprocs_per_machine': 16,
                        'num_cores_per_mpiproc': 1,
            }


QP_subset_dict= {
    'range_QP':10, #eV         , default=nscf_gap_eV*1.2
    'full_bands':True,
    'consider_only':[7,8,9,10], #eV
    'qp_per_subset': 20,
    'parallel_runs':4,

}

QP_subset_dict.update({
    'resources':res_QP, #default is the same as previous GW
    'parallelism': para_QP, #default is the same as previous GW

})


builder.QP_subset_dict= orm.Dict(dict=QP_subset_dict) #set this if you want to compute also QP after the single GW calculation.

### Submission phase

In [17]:
from aiida.engine import submit

In [11]:
run = None

In [19]:
if run:
    print('run is already running -> {}'.format(run.pk))
    print('sure that you want to run again?, if so, copy the else instruction in the cell below and run!')
else:
    run = submit(builder)

print(run)



uuid: 27d9e615-7b9a-4058-a1d0-24619021ccd7 (pk: 4265) (aiida.workflows:yambo.yambo.yambowf)


### Inspecting the outputs

Suppose that your calculation finished successfully, then you can access the outputs via the output method of the run instance. All the outputs of YamboRestart and YamboCalculation are inherited here.

In [12]:
run.is_finished_ok

True

In [15]:
!verdi process report {run.pk}

[22m2024-01-09 19:40:56 [1975 | REPORT]: [4265|YamboWorkflow|start_workflow]: no previous pw calculation found, we will start from scratch
2024-01-09 19:40:56 [1976 | REPORT]: [4265|YamboWorkflow|start_workflow]:  workflow initilization step completed.
2024-01-09 19:40:56 [1977 | REPORT]: [4265|YamboWorkflow|can_continue]: the workflow continues with a scf calculation
2024-01-09 19:40:56 [1978 | REPORT]: [4265|YamboWorkflow|perform_next]: performing a scf calculation
2024-01-09 19:40:58 [1979 | REPORT]:   [4266|PwBaseWorkChain|run_process]: launching PwCalculation<4271> iteration #1
2024-01-09 19:43:06 [1985 | REPORT]:   [4266|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-01-09 19:43:06 [1986 | REPORT]:   [4266|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-01-09 19:43:06 [1987 | REPORT]: [4265|YamboWorkflow|can_continue]: the workflow continues with a nscf calculation
2024-01-09 19:43:06 [1988 | REPORT]: [4265|YamboWorkflow|perform_ne

Inspecting the report of the process, you can see that indeed the workflow splits the quasiparticle sets and perform a final merge, via the `merge_QP` calcfunction.

#### How to access the merge QP file and any other file retrieved from a run.

The merged database is stored in the AiiDA repository, in principle not able to be accessed "by hands". 
However, there is a trick which consists in the creation of a temporary directory where we copy the file.
At that point, we can move it wherever we want, so that we can also use it outside AiiDA (maybe the only 
reason why we use AiiDA is to easily compute 1000 quasiparticle corrections).

In [18]:
import pathlib
import tempfile
import os


#a given simulation retrieved folder (you can select the wanted YamboCalculation instead of the run node).
retrieved_node = run.outputs.retrieved

# Create temporary directory
with tempfile.TemporaryDirectory() as dirpath:
    # Open the output file from the AiiDA storage and copy content to the temporary file
    for filename in retrieved_node.base.repository.list_object_names():
        # Create the file with the desired name
        temp_file = pathlib.Path(dirpath) / filename
        with retrieved_node.open(filename, 'rb') as handle:
            temp_file.write_bytes(handle.read())
            
        print(filename)
        
        #here you can do the copy of the file:
        # os.system("cp <dirpath/filename> <your wanted destination>")

_scheduler-stderr.txt
_scheduler-stdout.txt
l-aiida.out_HF_and_locXC_gw0_rim_cut_ppa_CPU_1
l_p2y_CPU_1
l_setup_CPU_1
ndb.HF_and_locXC
ndb.QP
ns.db1
o-aiida.out.qp
r-aiida.out_HF_and_locXC_gw0_rim_cut_ppa
r_setup


In [26]:
#the merged QP
retrieved_node = run.outputs.merged_QP

# Create temporary directory
with tempfile.TemporaryDirectory() as dirpath:
    # Open the output file from the AiiDA storage and copy content to the temporary file
    for filename in retrieved_node.base.repository.list_object_names():
        # Create the file with the desired name
        temp_file = pathlib.Path(dirpath) / filename
        with retrieved_node.open(filename, 'rb') as handle:
            temp_file.write_bytes(handle.read())
            
        print(filename)
        
        #here you can do the copy of the file:
        # os.system("cp <dirpath/filename> <your wanted destination>")

ndb.QP_fixed


Why the merged ndb.QP is named `ndb.QP_fixed`? The reason is that there is a sanitizing procedure of
the original merged database: as the number of QP is very high, it may happen that some of them is lost or 
give NaN result. The logic is to find these quasiparticle corrections and replace them with scissor&stretching
correction.