# YamboWorkflow to easily compute several quasiparticle corrections

Often, several quasiparticle corrections (>100) need to be computed: we may want to compute interpolated G0W0 bands (with `yambopy` or `wannier90`, for example), or we need them to solve the Bethe-Salpeter equation on top of G0W0 results.

However, this task is really time consuming: we need to split into several simulations, and the merge back the `ndb.QP` databases containing the quasiparticle corrections. 
Routinely, this can be done via simple scripts,directly within the `yambopy` package.

When we are dealing with demanding simulations, we can exploit the power of AiiDA to automatically obtain the merged quasiparticles database.  

The logic is simple: divide et impera. The workflow decides how to distribute the quasiparticle corrections among the calculations, following input parameters provided by the user. 
This represents the main difference with the standard `YamboWorklow` run. 
Then, under the hood, the plugin call `yambopy` to perform the final merging.

This tutorial will proceed as the previous one, expect for the fact that, before the submission, we will provide the information needed to compute the wanted quasiparticles. 

In [1]:
from aiida import orm, load_profile
load_profile()

from aiida.plugins import WorkflowFactory
from aiida.orm import QueryBuilder
from aiida.engine import submit

from aiida_quantumespresso.common.types import ElectronicType

import yaml

qb = QueryBuilder()
qb.append(orm.Group, filters={'label': 'Silicon/bulk'}, tag='group')
qb.append(orm.StructureData, with_group='group')

loaded_structure_id = qb.all()[0][0].pk

# Read YAML file
with open("../configuration/codes_localhost.yaml", 'r') as stream:
    codes = yaml.safe_load(stream)
    
with open("../configuration/resources_localhost.yaml", 'r') as stream:
    resources = yaml.safe_load(stream)
    
options = {
    'pseudo_family':"PseudoDojo/0.4/PBE/SR/standard/upf",
    'protocol':'fast',
    #'parent_id':274, #not necessary to set; if you want it, take ytheour previously nscf id (pk) to skip the DFT part.
    'structure_id':loaded_structure_id,
}

YamboWorkflow = WorkflowFactory('yambo.yambo.yambowf')

builder = YamboWorkflow.get_builder_from_protocol(
            pw_code = codes['pwcode_id'],
            preprocessing_code = codes['yamboprecode_id'],
            code = codes['yambocode_id'],
            protocol=options['protocol'],
            protocol_qe=options['protocol'],
            structure= orm.load_node(options['structure_id']),
            overrides={
                'yres': {"yambo": {
                    "parameters": {
                        "variables": {
                            "NGsBlkXp": [4, "Ry"],
                            "FFTGvecs": [24, "Ry"],
                            },
                        },
                    },
                }
            },
            pseudo_family= options['pseudo_family'],
            #parent_folder=orm.load_node(options['parent_id']).outputs.remote_folder,
            electronic_type=ElectronicType.INSULATOR, #default is METAL: in that case, smearing is used
            calc_type='gw', #or 'bse'; default is 'gw'
)

builder.scf.pw.metadata.options = resources

builder.nscf.pw.metadata.options = builder.scf.pw.metadata.options
builder.yres.yambo.metadata.options = builder.scf.pw.metadata.options

Summary of the main inputs:
BndsRnXp = 150
GbndRnge = 150
NGsBlkXp = 4 Ry
FFTGvecs = 24 Ry


kpoint mesh for nscf: [6, 6, 6]


### Requesting the YamboWorkflow to compute a specific set of quasiparticles

The idea is to split the QP calculation in several subsets, then merge it in a final database. So, at the end of the calculations, the ndb.QP databases are merged in only one database and exposed as an AiiDA SingleFileData
output. 
The merging procedure is performed using *yambopy*. 
There are a lot of possibilities to run QP calculations, to be provided in the `QP_subset_dict` input of the YamboWorkflow:

(1) provide subset of wanted QP, already in subsets (i.e. already splitted);

```python
QP_subset_dict= {
    'subsets':[
        [[1,1,8,9],[2,2,8,9]], #first subset
        [[3,3,8,9],[4,4,8,9]], #second subset
                ],
}
```

(2) provide explicit QP, i.e. a list of single QP to be splitted;

```python
QP_subset_dict= {
    'explicit':[
        [1,1,8,9],[2,2,8,9],[3,3,8,9],[4,4,8,9], #to be splitted
                ],
}
```

(3) provide boundaries for the bands to be computed: [k_i,k_f,b_i,b_f];

```python
QP_subset_dict= {
    'boundaries':{
        'k_i':1,    #default=1
        'k_f':20,   #default=NK_ibz
        'b_i':8,
        'b_f':9,
    },
}
```

(4) provide a range of (DFT) energies where to consider the bands and the k-points to be computed, useful if we don't know the system;
    of we want BSE for given energies -- usually, BSE spectra is well converged for 75% of this range. These are generated as 
    explicit QP, then splitted.
    It is possible to provide also: 'range_spectrum', which find the bands to be included in the BSE calculation, including the other bands 
    outside the range_QP window as scissored automatically by yambo in the BSE calc. So the final QP will have 
    range_QP bands, but the BSE calc will have in input all the range_spectrum bands.
    These ranges are windows of 2*range, centered at the Fermi level. 
    If you set the key 'full_bands'=True, all the kpoints are included for each bands. otherwise, only the qp in the window.

```python
QP_subset_dict= {
    'range_QP':3, #eV         , default=nscf_gap_eV*1.2
    'range_spectrum':10, #eV

}
```
#### Additional options for (2) and (4)
for (2) and (4) there are additional options:
- (a) 'split_bands': split also in bands, not only kpoints the subset. default is True.
- (b) 'consider_only': bands to be only considered explcitely, so the other ones are deleted from the explicit subsets;
- (c) 'extend_QP': it allows to extend the qp after the merging, including QP not explicitely computed,
        as FermiDirac+scissored corrections (see paper HT M Bonacci et al. 2023). Useful in G0W0 interpolations
        e.g. within the aiida-yambo-wannier90 plugin.
        (c.1) 'T_smearing': the fake smearing temperature of the correction.

```python
QP_subset_dict.update({
    'split_bands':True, #default
    'extend_QP': True, #default is False
    'consider_only':[8,9],
    'T_smearing':1e-2, #default
})
```

#### Basic usage
Usually, the settings that we should provide are: 

- (a) 'qp_per_subset':20; #how many qp in each splitted subset.
- (b) 'parallel_runs':4; to be submitted at the same time remotely. then the remote is deleted, as the qp is stored locally,
- (c) 'resources':para_QP, #see below
- (d) 'parallelism':res_QP, #see below


In [2]:
para_QP = {}
para_QP['SE_CPU'] = '2 2 4'
para_QP['SE_ROLEs'] = 'q qp b'
res_QP = {
                        'num_machines': 1,
                        'num_mpiprocs_per_machine': 1,
                        'num_cores_per_mpiproc': 1,
            }


QP_subset_dict= {
    'range_QP':3, #eV         , default=nscf_gap_eV*1.2
    'full_bands':True,
    'consider_only':[4,5], #eV
    'qp_per_subset': 10,
    'parallel_runs':4,

}

QP_subset_dict.update({
    'resources':res_QP, #default is the same as previous GW
    'parallelism': para_QP, #default is the same as previous GW

})


builder.QP_subset_dict= orm.Dict(dict=QP_subset_dict) #set this if you want to compute also QP after the single GW calculation.

### Submission phase

In [3]:
run = None

In [4]:
if run:
    print('run is already running -> {}'.format(run.pk))
    print('sure that you want to run again?, if so, copy the else instruction in the cell below and run!')
else:
    run = submit(builder)

print(run)



uuid: 965e0277-96d3-434c-b924-93c4cbf7464a (pk: 540) (aiida.workflows:yambo.yambo.yambowf)


### Inspecting the outputs

Suppose that your calculation finished successfully, then you can access the outputs via the output method of the run instance. All the outputs of YamboRestart and YamboCalculation are inherited here.

In [13]:
run.is_finished_ok

True

In [14]:
!verdi process report {run.pk}

[22m2024-02-19 15:19:21 [51 | REPORT]: [540|YamboWorkflow|start_workflow]: no previous pw calculation found, we will start from scratch
2024-02-19 15:19:21 [52 | REPORT]: [540|YamboWorkflow|start_workflow]:  workflow initilization step completed.
2024-02-19 15:19:21 [53 | REPORT]: [540|YamboWorkflow|can_continue]: the workflow continues with a scf calculation
2024-02-19 15:19:21 [54 | REPORT]: [540|YamboWorkflow|perform_next]: performing a scf calculation
2024-02-19 15:19:23 [55 | REPORT]:   [541|PwBaseWorkChain|run_process]: launching PwCalculation<546> iteration #1
2024-02-19 15:19:28 [56 | REPORT]:   [541|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-02-19 15:19:28 [57 | REPORT]:   [541|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-02-19 15:19:28 [58 | REPORT]: [540|YamboWorkflow|can_continue]: the workflow continues with a nscf calculation
2024-02-19 15:19:28 [59 | REPORT]: [540|YamboWorkflow|perform_next]: performing a nscf calcu

Inspecting the report of the process, you can see that indeed the workflow splits the quasiparticle sets and perform a final merge, via the `merge_QP` calcfunction.

In [15]:
run.outputs.output_ywfl_parameters.get_dict()

{'SOC': False,
 'QP_pk': 611,
 'c_max': 5,
 'q_ind': 13,
 'v_min': 4,
 'gap_GW': 1.0607,
 'nscf_pk': 555,
 'GW_k_c_ind': 13,
 'GW_k_v_ind': 1,
 'candidate_for_BSE': True}

In [16]:
run.outputs.nscf_mapping.get_dict()

{'soc': False,
 'gap_': [[1, 1, 4, 4], [13, 13, 5, 5]],
 'homo_k': 1,
 'lumo_k': 13,
 'valence': 4,
 'gap_type': 'indirect',
 'conduction': 5,
 'nscf_gap_eV': 0.665,
 'dft_predicted': 'semiconductor/insulator',
 'number_of_kpoints': 16,
 'magnetic_calculation': False}

#### How to access the merge QP file and any other file retrieved from a run.

The merged database is stored in the AiiDA repository, in principle not able to be accessed "by hands". 
However, there is a trick which consists in the creation of a temporary directory where to copy the retrieved files.

In [17]:
import pathlib
import tempfile
import os


#a given simulation retrieved folder (you can select the wanted YamboCalculation instead of the run node).
retrieved_node = run.outputs.retrieved

# Create temporary directory
with tempfile.TemporaryDirectory() as dirpath:
    # Open the output file from the AiiDA storage and copy content to the temporary file
    for filename in retrieved_node.base.repository.list_object_names():
        # Create the file with the desired name
        temp_file = pathlib.Path(dirpath) / filename
        with retrieved_node.open(filename, 'rb') as handle:
            temp_file.write_bytes(handle.read())
            
        print(filename)
        
        #here you can do the copy of the file:
        # os.system(f"cp {tempfile} <your wanted destination>")

_scheduler-stderr.txt
_scheduler-stdout.txt
l-aiida.out_HF_and_locXC_gw0_ppa_el_el_corr
l_p2y
l_setup
ndb.HF_and_locXC
ndb.QP
ns.db1
o-aiida.out.qp
r-aiida.out_HF_and_locXC_gw0_ppa_el_el_corr
r_setup


In [18]:
#the merged QP
retrieved_node = run.outputs.merged_QP

# Create temporary directory
with tempfile.TemporaryDirectory() as dirpath:
    # Open the output file from the AiiDA storage and copy content to the temporary file
    for filename in retrieved_node.base.repository.list_object_names():
        # Create the file with the desired name
        temp_file = pathlib.Path(dirpath) / filename
        with retrieved_node.open(filename, 'rb') as handle:
            temp_file.write_bytes(handle.read())
            
        print(filename)
        
        #here you can do the copy of the file:
        # os.system("cp <dirpath/filename> <your wanted destination>")

ndb.QP_fixed


Why the merged ndb.QP is named `ndb.QP_fixed`? The reason is that there is a sanitizing procedure of
the original merged database: as the number of QP is very high, it may happen that some of them is lost or 
give NaN result. The logic is to find these quasiparticle corrections and replace them with scissor&stretching
correction.