Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking LOBSTER calculations to the underlying DFT calcualtions #12

Open
ondracka opened this issue Mar 20, 2023 · 11 comments
Open

Linking LOBSTER calculations to the underlying DFT calcualtions #12

ondracka opened this issue Mar 20, 2023 · 11 comments

Comments

@ondracka
Copy link
Contributor

In the pull request #11 it was suggested to do:

archive_qe = archive.m_context.resolve_archive(f'../upload/archive/mainfile/{path}')
archive.system = archive_qe.run[0].system[0]

As a means of linking to the underlying DFT calculations. I thought I would try first linking with with VASP, where the mainfile is just vasprun.xml and that one is automatically generated in the directory when the vasp was executed (LOBSTER needs to run in the same directory afterwards).

So for example this would be correct?

archive_vasp = archive.m_context.resolve_archive('../upload/archive/mainfile/vasprun.xml')
archive.system = archive_vasp.run[0].system[0]

Now to test this I need a full nomad setup (like a working oasis), correct? If I just run locally nomad parse lobsterout in a directory where the lobsterout file and corresponding vasprun.xml is I end with nomad.metainfo.metainfo.MetainfoReferenceError: cannot retrieve archive PkzmvrnrhINXqYtkbSekZCxGqpdX from http://nomad-lab.eu/prod/v1/api/v1. So this can't be tested at a parser level only, or how should I do it?

BTW regarding the linking to QE, QE does not write to a standardized location. In fact the current QE DFT parser uses the QE stdout (which is usually redirected and saved somewhere, but that depends on the user). QE also writes xml output, but this is quite recent and parsing of it is not supported ATM by the electronic parsers. However assuming the output of the QE run was indeed saved and is in the directory somewhere, should I just try to do archive.m_context.resolve_archive for all files in the directory to see if I can hit the jackpot or how should I proceed?

@ondracka
Copy link
Contributor Author

BTW CC @ladinesa

@JosePizarro3
Copy link
Contributor

Hi @ondracka ,

This is very interesting. I also started a couple of weeks ago trying to link some code with the underlying DFT calculation with the "automatically resolved workflow" idea in mind.

As you pointed out, you cannot test this with nomad parse, but rather with a local nomad installation and trying draggin files and printing things by the terminal (I don't think there is really any other way for this, but we can ask @markus1978 ).

There are other things:

  • You have to define level at the beginning of the parser class, similar to Phonopy. Thought I think it is better to define the level one step before (in the gitlab when the matching of the parser is executed).
  • Main issue here is how to reference the correct mainfile paths, as in an upload, there can be several mainfiles corresponding to the level=0 (DFT) and level=X (next task).

I am going to investigate this and open a merge in the electronic-parsers once I have something. We can keep contact with each other if you want, so we don't double work 🙂

@ladinesa
Copy link
Collaborator

  1. you can skip the test for the workflow.
  2. if this is not something standard, we should not write an automatic workflow generation. We run the risk of linking the incorrect calculation. It should be left to the user to generate the workflow.

@JosePizarro3
Copy link
Contributor

if this is not something standard, we should not write an automatic workflow generation. We run the risk of linking the incorrect calculation. It should be left to the user to generate the workflow.

I think workflows like this are kind of standard, what is not is the placement of files in the upload. In any case, we could leave a try for these kind of situations, guessing where the files have chances to be (like one folder up w.r.t to the next level).

@ladinesa
Copy link
Collaborator

But is the mainfile of the reference calc specified in the mainfile? What if there are a number of these files in different locations?

@JosePizarro3
Copy link
Contributor

JosePizarro3 commented Mar 29, 2023

But is the mainfile of the reference calc specified in the mainfile?

This is a challenge, indeed. Maybe (I am just starting to explore this) can we resolve it from the upload? Example: 1 DFT mainfile vasprun.xml, 4 GW mainfiles from yambo at different kgrids. The DFT is placed in the main dir, while the other 4 GWs in subfolders. There we could resolve it I think.

In some cases, codes output the original DFT code they come from (like this LOBSTER, right?).

What if there are a number of these files in different locations?

Then the try pops the exception, as we cannot predict people moving files too much around.

@ladinesa
Copy link
Collaborator

We generate only if we could find the correct number of reference files. My opinion is that we should be as conservative as possible when generating these workflows automatically. It is better not to have them than to have incorrect links. Is this not the case of xspectra, since it does not specify the starting point, we did not try to generate the workflow automatically?

@ladinesa ladinesa reopened this Mar 29, 2023
@JosePizarro3
Copy link
Contributor

Well, xspectra is an easy case, as it is coming always from QE. In that case we just need to check a couple of things from the output of the DFT entry, hence we can locate which is the automatic workflow:

1 QE file for groundstate
N QE file for excitedstate
and 3N XSPECTRA files (there are 3 dipoles per core-hole).

In my opinion, if we know which files have the proper metainfo, we can resolve the automatic workflow. It is then a matter of properly parse SinglePoints and scan sections. Do you think this makes sense or in practice is better not to even try?

@JosePizarro3
Copy link
Contributor

A small note: this of course only works in the same upload, not across uploads**

@ladinesa
Copy link
Collaborator

ladinesa commented Mar 29, 2023

We can implement automatic workflow generation but again, we should provide only link if we can uniquely identify reference calculations.

Yes of course only on the same upload. It is up to the user to link inter uploads tasks.

@JosePizarro3
Copy link
Contributor

Indeed, you are totally right. Only in safe situations where we can double check metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants