Linking LOBSTER calculations to the underlying DFT calcualtions #12

ondracka · 2023-03-20T17:09:59Z

In the pull request #11 it was suggested to do:

archive_qe = archive.m_context.resolve_archive(f'../upload/archive/mainfile/{path}')
archive.system = archive_qe.run[0].system[0]

As a means of linking to the underlying DFT calculations. I thought I would try first linking with with VASP, where the mainfile is just vasprun.xml and that one is automatically generated in the directory when the vasp was executed (LOBSTER needs to run in the same directory afterwards).

So for example this would be correct?

archive_vasp = archive.m_context.resolve_archive('../upload/archive/mainfile/vasprun.xml')
archive.system = archive_vasp.run[0].system[0]

Now to test this I need a full nomad setup (like a working oasis), correct? If I just run locally nomad parse lobsterout in a directory where the lobsterout file and corresponding vasprun.xml is I end with nomad.metainfo.metainfo.MetainfoReferenceError: cannot retrieve archive PkzmvrnrhINXqYtkbSekZCxGqpdX from http://nomad-lab.eu/prod/v1/api/v1. So this can't be tested at a parser level only, or how should I do it?

BTW regarding the linking to QE, QE does not write to a standardized location. In fact the current QE DFT parser uses the QE stdout (which is usually redirected and saved somewhere, but that depends on the user). QE also writes xml output, but this is quite recent and parsing of it is not supported ATM by the electronic parsers. However assuming the output of the QE run was indeed saved and is in the directory somewhere, should I just try to do archive.m_context.resolve_archive for all files in the directory to see if I can hit the jackpot or how should I proceed?

The text was updated successfully, but these errors were encountered:

ondracka · 2023-03-21T08:55:54Z

BTW CC @ladinesa

JosePizarro3 · 2023-03-29T06:20:28Z

Hi @ondracka ,

This is very interesting. I also started a couple of weeks ago trying to link some code with the underlying DFT calculation with the "automatically resolved workflow" idea in mind.

As you pointed out, you cannot test this with nomad parse, but rather with a local nomad installation and trying draggin files and printing things by the terminal (I don't think there is really any other way for this, but we can ask @markus1978 ).

There are other things:

You have to define level at the beginning of the parser class, similar to Phonopy. Thought I think it is better to define the level one step before (in the gitlab when the matching of the parser is executed).
Main issue here is how to reference the correct mainfile paths, as in an upload, there can be several mainfiles corresponding to the level=0 (DFT) and level=X (next task).

I am going to investigate this and open a merge in the electronic-parsers once I have something. We can keep contact with each other if you want, so we don't double work 🙂

ladinesa · 2023-03-29T12:18:28Z

you can skip the test for the workflow.
if this is not something standard, we should not write an automatic workflow generation. We run the risk of linking the incorrect calculation. It should be left to the user to generate the workflow.

JosePizarro3 · 2023-03-29T12:23:50Z

if this is not something standard, we should not write an automatic workflow generation. We run the risk of linking the incorrect calculation. It should be left to the user to generate the workflow.

I think workflows like this are kind of standard, what is not is the placement of files in the upload. In any case, we could leave a try for these kind of situations, guessing where the files have chances to be (like one folder up w.r.t to the next level).

ladinesa · 2023-03-29T12:27:25Z

But is the mainfile of the reference calc specified in the mainfile? What if there are a number of these files in different locations?

JosePizarro3 · 2023-03-29T12:31:19Z

But is the mainfile of the reference calc specified in the mainfile?

This is a challenge, indeed. Maybe (I am just starting to explore this) can we resolve it from the upload? Example: 1 DFT mainfile vasprun.xml, 4 GW mainfiles from yambo at different kgrids. The DFT is placed in the main dir, while the other 4 GWs in subfolders. There we could resolve it I think.

In some cases, codes output the original DFT code they come from (like this LOBSTER, right?).

What if there are a number of these files in different locations?

Then the try pops the exception, as we cannot predict people moving files too much around.

ladinesa · 2023-03-29T12:39:13Z

We generate only if we could find the correct number of reference files. My opinion is that we should be as conservative as possible when generating these workflows automatically. It is better not to have them than to have incorrect links. Is this not the case of xspectra, since it does not specify the starting point, we did not try to generate the workflow automatically?

JosePizarro3 · 2023-03-29T13:59:19Z

Well, xspectra is an easy case, as it is coming always from QE. In that case we just need to check a couple of things from the output of the DFT entry, hence we can locate which is the automatic workflow:

1 QE file for groundstate
N QE file for excitedstate
and 3N XSPECTRA files (there are 3 dipoles per core-hole).

In my opinion, if we know which files have the proper metainfo, we can resolve the automatic workflow. It is then a matter of properly parse SinglePoints and scan sections. Do you think this makes sense or in practice is better not to even try?

JosePizarro3 · 2023-03-29T14:00:49Z

A small note: this of course only works in the same upload, not across uploads**

ladinesa · 2023-03-29T14:01:55Z

We can implement automatic workflow generation but again, we should provide only link if we can uniquely identify reference calculations.

Yes of course only on the same upload. It is up to the user to link inter uploads tasks.

JosePizarro3 · 2023-03-29T14:02:37Z

Indeed, you are totally right. Only in safe situations where we can double check metadata.

ladinesa closed this as completed Mar 29, 2023

ladinesa reopened this Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linking LOBSTER calculations to the underlying DFT calcualtions #12

Linking LOBSTER calculations to the underlying DFT calcualtions #12

ondracka commented Mar 20, 2023

ondracka commented Mar 21, 2023

JosePizarro3 commented Mar 29, 2023

ladinesa commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023

ladinesa commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023 •

edited

Loading

ladinesa commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023

ladinesa commented Mar 29, 2023 •

edited

Loading

JosePizarro3 commented Mar 29, 2023

Linking LOBSTER calculations to the underlying DFT calcualtions #12

Linking LOBSTER calculations to the underlying DFT calcualtions #12

Comments

ondracka commented Mar 20, 2023

ondracka commented Mar 21, 2023

JosePizarro3 commented Mar 29, 2023

ladinesa commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023

ladinesa commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023 • edited Loading

ladinesa commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023

ladinesa commented Mar 29, 2023 • edited Loading

JosePizarro3 commented Mar 29, 2023

JosePizarro3 commented Mar 29, 2023 •

edited

Loading

ladinesa commented Mar 29, 2023 •

edited

Loading