# YamboWorkflow: a full DFT+MBPT flow

The `YamboWorkflow`  is the core workchain of the plugin that takes care of performing all the steps needed in a typical Yambo simulation,
from preliminary self-consistent (SCF) and non-self-consistent (NSCF) DFT calculations to the actual GW (BSE) calculations, and the related post-processing. 
The workflow ensures a robust interoperability between DFT and MBPT codes (Quantum ESPRESSO and Yambo, respectively), and links subsequent calculations, 
interfacing the data automatically. 
In practice, YamboWorkflow encodes a dynamic execution according to the instructions provided in input. 
This implies performing all the intermediate steps needed for a specific calculations 
without the need of instructing them explicitly, or, on the contrary, to skip some of the intermediate steps for which parent calculations are available, fully 
exploiting the YamboWorkflow provenance information. It uses the PwBaseWorkchain from `aiida-quantumespresso`
as a subworkflow to perform the first DFT part, if required, and the `YamboRestart` for the GW part. A smart logic is considered to understand what 
process has to be done to achieve success. If the previous calculation is not `finished_ok`, the workflow will exit in a failed state: we rely on the fact that 
the success of an input calculation is guaranteed by the BaseRestartWorkchain used at the lower level of the plugin. 

NB: it is also possible to run BSE@GW (or just DFT), as you will see in a following dedicated tutorial.

In [1]:
from aiida import orm, load_profile
load_profile()

from aiida.plugins import WorkflowFactory
from aiida.orm import QueryBuilder
from aiida.engine import submit

from aiida_quantumespresso.common.types import ElectronicType

import yaml

## Providing the minimal inputs needed for protocols

We have to provide minimal inputs for the creation of the builder instance, namely:
- codes;
- structure;

Providing a parent calculation as input, the already performed steps are skipped, in order to avoid waste of human and computational time.
If no parent is passed to the builder, also DFT inputs are created within the protocols as provided in the `PwBaseWorkChain`.

In [2]:
qb = QueryBuilder()
qb.append(orm.Group, filters={'label': 'Silicon/bulk'}, tag='group')
qb.append(orm.StructureData, with_group='group')

loaded_structure_id = qb.all()[0][0].pk

In [3]:
# Read YAML file
with open("../configuration/codes_localhost.yaml", 'r') as stream:
    codes = yaml.safe_load(stream)
    
with open("../configuration/resources_localhost.yaml", 'r') as stream:
    resources = yaml.safe_load(stream)

In [4]:
options = {
    'pseudo_family':"PseudoDojo/0.4/PBE/SR/standard/upf",
    'protocol':'fast',
    #'parent_id':274, #not necessary to set; if you want it, take ytheour previously nscf id (pk) to skip the DFT part.
    'structure_id':loaded_structure_id,
}

In [5]:
options.update(codes)

In [6]:
YamboWorkflow = WorkflowFactory('yambo.yambo.yambowf')

builder = YamboWorkflow.get_builder_from_protocol(
    pw_code = options['pwcode_id'],
    preprocessing_code = options['yamboprecode_id'],
    code = options['yambocode_id'],
    protocol=options['protocol'],
    protocol_qe=options['protocol'],
    structure= orm.load_node(options['structure_id']),
    overrides={},
    pseudo_family= options['pseudo_family'],
    #parent_folder=orm.load_node(options['parent_id']).outputs.remote_folder,
    electronic_type=ElectronicType.INSULATOR, #default is METAL: in that case, smearing is used
    calc_type='gw', #or 'bse'; default is 'gw'
)


Summary of the main inputs:
BndsRnXp = 150
GbndRnge = 150
NGsBlkXp = 2 Ry
FFTGvecs = 9 Ry


kpoint mesh for nscf: [6, 6, 6]


In [7]:
#You can also try different protocols:
    
YamboWorkflow.get_available_protocols()

{'fast': {'description': 'Under converged for most materials, but fast'},
 'moderate': {'description': 'Meta converged for most materials, higher computational cost than fast'},
 'precise': {'description': 'Converged for most materials, higher computational cost than moderate'}}

Now, if you inspect the prepopulated inputs, you can see the default values respecting the imposed protocol:

In [8]:
builder.nscf.pw.parameters.get_dict()

{'CONTROL': {'calculation': 'nscf',
  'forc_conv_thr': 0.001,
  'tprnfor': True,
  'tstress': True,
  'etot_conv_thr': 0.0002},
 'SYSTEM': {'nosym': False,
  'occupations': 'fixed',
  'ecutwfc': 30.0,
  'ecutrho': 240.0,
  'force_symmorphic': True,
  'nbnd': 150},
 'ELECTRONS': {'electron_maxstep': 80, 'mixing_beta': 0.4, 'conv_thr': 8e-10}}

In [9]:
builder.yres.yambo.parameters.get_dict()

{'arguments': ['dipoles', 'ppa', 'HF_and_locXC', 'gw0'],
 'variables': {'Chimod': 'hartree',
  'DysSolver': 'n',
  'GTermKind': 'BG',
  'X_and_IO_nCPU_LinAlg_INV': [1, ''],
  'NGsBlkXp': [2, 'Ry'],
  'FFTGvecs': [9, 'Ry'],
  'BndsRnXp': [[1, 150], ''],
  'GbndRnge': [[1, 150], ''],
  'QPkrange': [[[1, 1, 32, 32]], '']}}

We then provide the computational resources:

In [10]:
builder.scf.pw.metadata.options = resources

builder.nscf.pw.metadata.options = builder.scf.pw.metadata.options
builder.yres.yambo.metadata.options = builder.scf.pw.metadata.options

### Overrides

As in the previous examples (see e.g. then  `YamboRestart` notebook), it is possible to modify the default inputs also during the builder creation phase, so not a posteriori. This can be done by using overrides:

In [14]:
overrides_scf = {
        'pseudo_family': "PseudoDojo/0.4/PBE/SR/standard/upf", 
        'pw':{
        'metadata':{"options":resources},
        },
    }

overrides_nscf = {
        'pseudo_family': "PseudoDojo/0.4/PBE/SR/standard/upf", 
        'pw': {
            'parameters':{
                'CONTROL':{}, #not needed if you don't override something
                'SYSTEM':{},
                'ELECTRONS':{'diagonalization':'david'},
            },
             'metadata':{"options":resources},
    },
}

overrides_yambo = {
        "yambo": {
            "parameters": {
                "arguments": [
                    "rim_cut",
                ],
                "variables": {
                    "NGsBlkXp": [4, "Ry"],
                    "FFTGvecs": [24, "Ry"],
                },
            },
        'metadata':{"options":resources},
        },
    
}

overrides = {
    'yres': overrides_yambo,
    'nscf': overrides_nscf,
    'scf': overrides_scf
    
}


So, let's create a new builder instance with also the `overrides` information:

In [15]:
builder = YamboWorkflow.get_builder_from_protocol(
    pw_code = options['pwcode_id'],
    preprocessing_code = options['yamboprecode_id'],
    code = options['yambocode_id'],
    protocol=options['protocol'],
    protocol_qe=options['protocol'],
    structure= orm.load_node(options['structure_id']),
    overrides=overrides,
    #parent_folder=load_node(options['parent_id']).outputs.remote_folder,
    electronic_type=ElectronicType.INSULATOR, #default is METAL: smearing is used
    calc_type='gw', #or 'bse'; default is 'gw'
)

Summary of the main inputs:
BndsRnXp = 150
GbndRnge = 150
NGsBlkXp = 4 Ry
FFTGvecs = 24 Ry


kpoint mesh for nscf: [6, 6, 6]


In [16]:
builder.nscf.pw.parameters.get_dict()

{'CONTROL': {'calculation': 'nscf',
  'forc_conv_thr': 0.001,
  'tprnfor': True,
  'tstress': True,
  'etot_conv_thr': 0.0002},
 'SYSTEM': {'nosym': False,
  'occupations': 'fixed',
  'ecutwfc': 36.0,
  'ecutrho': 144.0,
  'force_symmorphic': True,
  'nbnd': 150},
 'ELECTRONS': {'electron_maxstep': 80,
  'mixing_beta': 0.4,
  'diagonalization': 'david',
  'conv_thr': 8e-10}}

In [17]:
builder.nscf.pw.parameters.get_dict()['ELECTRONS']['diagonalization']

'david'

As you may notice, here the builder has a new attributes, referring to scf, nscf and yambo parts: this means that we are actually providing the inputs for 
respectively PwBaseWorkchain and YamboRestart. 
The only 'strict' YamboWorkflow input is now the ``parent_folder``.

### Requesting the YamboWorkflow to compute a specific quantity: the minimum band gap and the direct band gap at Gamma

Within `YamboWorkflow`, it is possible to obtain the band gap of a material without explicitely provide the corresponding position in the reciprocal space. The workflow contains the logic to inspect DFT band structure, as computed in the nscf step, and determine the k-points and electronic band coordinates corresponding to the minimal band gap of the material.
In this way, the exact quasiparticle levels can be computed, without additional human intervention. Of course this logic will suffer effects like the change of the reciprocal space positions of the band gap, after GW correction. This requires a further analysis of the GW band structure. 

Here below we see how to set additional parsing, through the `additional_parsing` attribute of the builder. This consists in an AiiDA List instance containing strings, each of them
representing the desired quantity. In this case, we want to compute the band gap at Gamma and the minimal gap, respectively "gap_GG" and "gap_":

In [18]:
builder.additional_parsing = orm.List(list=['gap_GG','gap_'])

It is possible also to ask for other high-symmetry points, e.g. M, K. However, if the points are not contained in our mesh, their quasiparticle correction is skipped (it cannot be computed). 
Indirect gaps can be computed, providing a string of the type "gap_AB", where `A` is the k-point for the top valence band, and `B` is the k-points of the bottom conduction bands. For example, the indirect gap G->M 
can be computed providing the "gap_GM" string in the `additional_parsing` List.

Other examples are:
```bash
builder.additional_parsing = List(list=['gap_','gap_GG','homo','lumo']) #GW
builder.additional_parsing = List(list=['lowest_exciton','brightest_exciton']) #BSE
```

Finally, also single particle levels can be computed for the last valence and first conduction bands. What we need to provide is the string "homo_K" or "lumo_K", respectively. `K` is the desired high-symmetry k-point.
To explicitly compute the top valence and the bottom conduction GW energies, just provide "homo" and "lumo".

The requested quantity is then stored in a human-readable output Dict called `output_ywfl_parameters`.

### Submission phase

In [19]:
run = None

In [20]:
if run:
    print('run is already running -> {}'.format(run.pk))
    print('sure that you want to run again?, if so, copy the else instruction in the cell below and run!')
else:
    run = submit(builder)

print(run)



uuid: 2d8e28d0-fc3d-4c7e-83c6-d2326f0cf32b (pk: 467) (aiida.workflows:yambo.yambo.yambowf)


In [26]:
!verdi process report {run.pk}

[22m2024-02-14 18:44:00 [19 | REPORT]: [467|YamboWorkflow|start_workflow]: no previous pw calculation found, we will start from scratch
2024-02-14 18:44:00 [20 | REPORT]: [467|YamboWorkflow|start_workflow]:  workflow initilization step completed.
2024-02-14 18:44:00 [21 | REPORT]: [467|YamboWorkflow|can_continue]: the workflow continues with a scf calculation
2024-02-14 18:44:00 [22 | REPORT]: [467|YamboWorkflow|perform_next]: performing a scf calculation
2024-02-14 18:44:02 [23 | REPORT]:   [468|PwBaseWorkChain|run_process]: launching PwCalculation<473> iteration #1
2024-02-14 18:44:08 [24 | REPORT]:   [468|PwBaseWorkChain|results]: work chain completed after 1 iterations
2024-02-14 18:44:09 [25 | REPORT]:   [468|PwBaseWorkChain|on_terminated]: remote folders will not be cleaned
2024-02-14 18:44:09 [26 | REPORT]: [467|YamboWorkflow|can_continue]: the workflow continues with a nscf calculation
2024-02-14 18:44:09 [27 | REPORT]: [467|YamboWorkflow|perform_next]: performing a nscf calcu

### Inspecting the outputs

Suppose that your calculation finished successfully, then you can access the outputs via the output method of the run instance. All the outputs of YamboRestart and YamboCalculation are inherited here.

In [27]:
run.is_finished_ok

True

We can then inspect the outputs, in particular the additional parsed information that we requested. 
These are collected in the `output_ywfl_parameters` output node, which is an AiiDA dictionary. 

In [28]:
run.outputs.output_ywfl_parameters.get_dict()

{'gap_': 1.0569413485557,
 'homo': -0.30429274897575,
 'lumo': 0.75264859957993,
 'gap_GG': 3.0603234552562,
 'homo_G': -0.30429274897575,
 'lumo_G': 2.7560307062805,
 'gap_dft': 0.66473089347482,
 'homo_dft': 0.0,
 'lumo_dft': 0.66473089347482,
 'gap_GG_dft': 2.5552411541998,
 'homo_G_dft': 0.0,
 'lumo_G_dft': 2.5552411541998}

Moreover, the information extracted from the nscf step are stored in the `nscf_mapping` output node:

In [29]:
run.outputs.nscf_mapping.get_dict()

{'soc': False,
 'gap_': [[1, 1, 4, 4], [13, 13, 5, 5]],
 'gap_GG': [[1, 1, 4, 4], [1, 1, 5, 5]],
 'homo_k': 1,
 'lumo_k': 13,
 'valence': 4,
 'gap_type': 'indirect',
 'conduction': 5,
 'nscf_gap_eV': 0.665,
 'dft_predicted': 'semiconductor/insulator',
 'number_of_kpoints': 16,
 'magnetic_calculation': False}