Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 general base classes for workflows #3

Merged
merged 17 commits into from Feb 7, 2024

Conversation

JosePizarro3
Copy link
Contributor

@JosePizarro3 JosePizarro3 commented Jan 15, 2024

Hey @ladinesa ,

I applied some changes for inheritance to my workflows. I also reformatted with Ruff those files. Can you please review this? I'd like to merge this before working on #1 .

Btw, regarding classes and attributes: is there any way to rewrite attributes with a different naming in a class? I could imagine defining some FirstPrinciplesCalculation which encodes DFT and GW, and then inherit from them in DFTOutputs and GWOutputs, but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

This is related with the magres MR in Gitlab.

@JosePizarro3 JosePizarro3 self-assigned this Jan 15, 2024
@JosePizarro3 JosePizarro3 linked an issue Jan 15, 2024 that may be closed by this pull request
@JosePizarro3
Copy link
Contributor Author

(Something is going on with the pipeline, it seems it cannot install dependencies, tho I didn't touch anything but here).

@ladinesa
Copy link
Collaborator

but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

My opinion is it is not necessary to put a suffix because it is clear from the inheritance tree. You can perhaps put an alias or modify the description.

@JosePizarro3
Copy link
Contributor Author

but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

My opinion is it is not necessary to put a suffix because it is clear from the inheritance tree. You can perhaps put an alias or modify the description.

The problem is with inheritance: if I have two classes with the same named attributes, one of the classes attrs will overwrite the other, hence the "_dft" and "_gw".

@ladinesa
Copy link
Collaborator

ladinesa commented Jan 15, 2024

but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

My opinion is it is not necessary to put a suffix because it is clear from the inheritance tree. You can perhaps put an alias or modify the description.

The problem is with inheritance: if I have two classes with the same named attributes, one of the classes attrs will overwrite the other, hence the "_dft" and "_gw".

This is an indication of a bad hierarchy design then, In DMFTResults, I am not sure why it would inherit from both TB and DFT results. I think you should define these separately,

dos_quantity = Quantity(type=References(Dos))

class FirstPrinciplesResults(SimulationWorkflowResults):
    dos = dos_quantity

class DFTResults(FirstPrinciplesResults):
    pass

class TBResults(FirstPrinciplesResults):
   pass

class DMFTResults(SimulationWorkflowResults):
    dos_dft = dos_quantity
    dos_tb = dos_quantity

@JosePizarro3
Copy link
Contributor Author

but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

My opinion is it is not necessary to put a suffix because it is clear from the inheritance tree. You can perhaps put an alias or modify the description.

The problem is with inheritance: if I have two classes with the same named attributes, one of the classes attrs will overwrite the other, hence the "_dft" and "_gw".

This is an indication of a bad hierarchy design then, In DMFTResults, I am not sure why it would inherit from both TB and DFT results. I think you should define these separately,

dos_quantity = Quantity(type=References(Dos))

class FirstPrinciplesResults(SimulationWorkflowResults):
    dos = dos_quantity

class DFTResults(FirstPrinciplesResults):
    pass

class TBResults(FirstPrinciplesResults):
   pass

class DMFTResults(SimulationWorkflowResults):
    dos_dft = dos_quantity
    dos_tb = dos_quantity

This is again inheritance vs composition. In any case, I think in this case inheritance is the way of going, because we can reuse the base DFT, TB, DMFT, GW... results in multiple standard workflows. This is because it is easier to define these than rather compose from two sections.

I will keep it then to maintain base Results sections with the "_" naming, and then inherit from the workflow Results sections.

@ladinesa
Copy link
Collaborator

but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

My opinion is it is not necessary to put a suffix because it is clear from the inheritance tree. You can perhaps put an alias or modify the description.

The problem is with inheritance: if I have two classes with the same named attributes, one of the classes attrs will overwrite the other, hence the "_dft" and "_gw".

This is an indication of a bad hierarchy design then, In DMFTResults, I am not sure why it would inherit from both TB and DFT results. I think you should define these separately,

dos_quantity = Quantity(type=References(Dos))

class FirstPrinciplesResults(SimulationWorkflowResults):
    dos = dos_quantity

class DFTResults(FirstPrinciplesResults):
    pass

class TBResults(FirstPrinciplesResults):
   pass

class DMFTResults(SimulationWorkflowResults):
    dos_dft = dos_quantity
    dos_tb = dos_quantity

This is again inheritance vs composition. In any case, I think in this case inheritance is the way of going, because we can reuse the base DFT, TB, DMFT, GW... results in multiple standard workflows. This is because it is easier to define these than rather compose from two sections.

I will keep it then to maintain base Results sections with the "_" naming, and then inherit from the workflow Results sections.

The tree should reflect the relationship between workflows. In this case, dmft is not a kind of dft and tb workflow but composed of them. The base class for this should be BeyondDFT which contains references to the dft and tb workflow results.

@JosePizarro3
Copy link
Contributor Author

but overwritting the names of the attributes (band_gap, dos, band_structure) adding "_dft" and "_gw" respectively. I was checking the metainfo.py but I could not find this, and I am not sure whether it is a good practice at all.

My opinion is it is not necessary to put a suffix because it is clear from the inheritance tree. You can perhaps put an alias or modify the description.

The problem is with inheritance: if I have two classes with the same named attributes, one of the classes attrs will overwrite the other, hence the "_dft" and "_gw".

This is an indication of a bad hierarchy design then, In DMFTResults, I am not sure why it would inherit from both TB and DFT results. I think you should define these separately,

dos_quantity = Quantity(type=References(Dos))

class FirstPrinciplesResults(SimulationWorkflowResults):
    dos = dos_quantity

class DFTResults(FirstPrinciplesResults):
    pass

class TBResults(FirstPrinciplesResults):
   pass

class DMFTResults(SimulationWorkflowResults):
    dos_dft = dos_quantity
    dos_tb = dos_quantity

This is again inheritance vs composition. In any case, I think in this case inheritance is the way of going, because we can reuse the base DFT, TB, DMFT, GW... results in multiple standard workflows. This is because it is easier to define these than rather compose from two sections.
I will keep it then to maintain base Results sections with the "_" naming, and then inherit from the workflow Results sections.

The tree should reflect the relationship between workflows. In this case, dmft is not a kind of dft and tb workflow but composed of them. The base class for this should be BeyondDFT which contains references to the dft and tb workflow results.

Yes, I think composition makes sense, you are totally right. I was confused by the previous structure, so I am trying again composing sub-sections inside each of the workflow results (using the XXOutputs sections I defined here).

I will push something after lunch, thanks for the feedback.

@JosePizarro3
Copy link
Contributor Author

JosePizarro3 commented Jan 16, 2024

@ladinesa new version (sorry I also formatted the testing with Ruff)

  • I follow now composition instead of inheritance by creating a ElectronicStructureOutputs class and use it as sub-section of the results of each GW, DMFT... workflow.
  • I kept inheritance for the workflows Method (as it makes sense, the workflow should inherit from the DFTMethod base class)
  • I also inherit from BeyondDFT for each main workflow section, and the normalization works in a compositional way for each workflow.

I also understand that, under results now we have something somehow similar to tasks[i].outputs but more tailored to the standard electronic-structure workflows. I think it is also easier to add more of these workflows with these changes, or extend the existing ones (I can use the NMR Magres example for it).

What do you think?

@JosePizarro3
Copy link
Contributor Author

Ok, I had to add Plus in order for the BeyondDFT.normalize function to grep the workflow_name. I think it is more robust like this.

As this is part of the merge regarding NMR, I am thinking whether it is better if I directly merge #1 here, then merge this branch and update the reference to the plugin accordingly. Sorry for the mesh, I was working in parallel and realized that there were more changes here and in #1.

@JosePizarro3 JosePizarro3 mentioned this pull request Jan 17, 2024
workflow_name = []
for task in self.tasks:
workflow_name.append(task.name)
self.name = "+".join(workflow_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be problematic if task.name is None. do '+'.join([task.name for task in self.tasks if task.name])

@JosePizarro3 JosePizarro3 force-pushed the 2-general-base-classes-for-workflows branch from 1c05f0a to eab3897 Compare January 17, 2024 13:48
@JosePizarro3
Copy link
Contributor Author

@ladinesa sorry, Ruff is really a pain to disable in VSCode, and it added formatting to this file...

Anyways, I changed the imports to runschema but it seems the pipeline is failing. Should I wait until you merge #7 and rebase?

@ladinesa
Copy link
Collaborator

ladinesa commented Feb 2, 2024

@ladinesa sorry, Ruff is really a pain to disable in VSCode, and it added formatting to this file...

Anyways, I changed the imports to runschema but it seems the pipeline is failing. Should I wait until you merge #7 and rebase?

Sorry I am waiting for the review from @JFRudzinski , but I can also merge this right away.

@JFRudzinski
Copy link
Contributor

@ladinesa sorry, Ruff is really a pain to disable in VSCode, and it added formatting to this file...
Anyways, I changed the imports to runschema but it seems the pipeline is failing. Should I wait until you merge #7 and rebase?

Sorry I am waiting for the review from @JFRudzinski , but I can also merge this right away.

@ladinesa sorry I somehow missed the request. I see you merged anyway, but I took a look. Thanks for notifying me.

@ladinesa
Copy link
Collaborator

ladinesa commented Feb 2, 2024

I am now done fixing the develop branch. There is one lingering error that I commented out in test_xs_workflow. Can you have a look @JosePizarro3 ?

@JosePizarro3 JosePizarro3 force-pushed the 2-general-base-classes-for-workflows branch from 53e7b36 to 194c65f Compare February 2, 2024 12:49
@JosePizarro3
Copy link
Contributor Author

I am now done fixing the develop branch. There is one lingering error that I commented out in test_xs_workflow. Can you have a look @JosePizarro3 ?

Well, I investigated a bit, but I think it makes sense if I pass the ball to you back.

It seems that the XS.normalize function is being executed two times. I am guessing this is coming from the MetainfoNormalizer, althought it gets very obscure when trying to understand _normalize() and the interface with nomad.normalizing.__init__.py.

After some debugging, it seems that the first time that the section XS comes to self.normalize_section(section, logger) is already normalized, and then it passes again creating a new workflow2.results.spectra section, hence creating twice as much spectras (which surprisingly means that the testing and results.properties.spectroscopic.spectra are correct, but normalization is out of control).

Thus, @ladinesa do you mind checking again this normalization? This is why I wanted to send the ball back to you, because you know much more the details and ordering of normalization. Let me know if you need some extra help.

@ladinesa
Copy link
Collaborator

ladinesa commented Feb 2, 2024

I am now done fixing the develop branch. There is one lingering error that I commented out in test_xs_workflow. Can you have a look @JosePizarro3 ?

Well, I investigated a bit, but I think it makes sense if I pass the ball to you back.

It seems that the XS.normalize function is being executed two times. I am guessing this is coming from the MetainfoNormalizer, althought it gets very obscure when trying to understand _normalize() and the interface with nomad.normalizing.__init__.py.

After some debugging, it seems that the first time that the section XS comes to self.normalize_section(section, logger) is already normalized, and then it passes again creating a new workflow2.results.spectra section, hence creating twice as much spectras (which surprisingly means that the testing and results.properties.spectroscopic.spectra are correct, but normalization is out of control).

Thus, @ladinesa do you mind checking again this normalization? This is why I wanted to send the ball back to you, because you know much more the details and ordering of normalization. Let me know if you need some extra help.

the workflow normalizer also calls the normalize function. I will provide a hot fix

@JosePizarro3
Copy link
Contributor Author

the workflow normalizer also calls the normalize function. I will provide a hot fix

Thanks 🙌🏻

@JosePizarro3 JosePizarro3 force-pushed the 2-general-base-classes-for-workflows branch from 76017c8 to 6053935 Compare February 5, 2024 14:35
@JosePizarro3 JosePizarro3 merged commit 2f26453 into develop Feb 7, 2024
1 of 2 checks passed
@JosePizarro3 JosePizarro3 deleted the 2-general-base-classes-for-workflows branch February 7, 2024 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

General base classes for workflows
3 participants