Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardise and update VASP input sets #3484

Merged
merged 15 commits into from
Jan 20, 2024
Merged

Conversation

utf
Copy link
Member

@utf utf commented Nov 26, 2023

Summary

This PR refactors the VASP input sets to make them compatible with atomate2 and fixes some of the issues that have been raised previously.

This PR does not introduce any breaking changes. The API for the input sets has not changed and pymatgen users can continue to use their existing code.

The input sets are compatible with pymatgen, atomate1, and atomate2. The most drastic change is that now all input sets are dataclasses. This made addressing the following issues easier and also should make it easier to create new input sets in the future. Note, I did not copy the atomate2 input sets to pymatgen, instead I refactored the pymatgen input sets specifically to avoid any breaking changes.

Issues addressed

Closes #3492.

In addition to updating the get_vasp_input function to allow starting from a previous directory, this PR addresses the following points:

  • Clarity: Previously, input set logic was scattered over the __init__, .incar, override_from_prev_calc functions. Some input sets inherited from DictSet, some from MPRelaxSet, some from MPStaticSet. This made it difficult to pin down why a particular setting was getting set. Now, all logic specific to an input set is provided in the incar_updates and kpoints_updates property functions that define any changes to the base input set config. This function is also used when starting from a previous calculation, so all the logic is in one place.
  • Consistency: Previously, respecting user_incar_settings was left down to specific input set implementations and there were a number of examples where this was not respected (e.g., see EDIFF setting in MPStaticSet or MPNonSCFSet where it is not possible to specify MAGMOM through user_incar_settings). Now, user_incar_settings is always applied in a standard way and will always take precedence.
  • Shared functionality: I noticed that there were a number of features that are shared across multiple input sets but had been re-implemented multiple times. For example, line mode k-point generation for PBE and HSE. I also found that some features, such as automated KSPACING in the Scan input sets might be desired features in other input sets (e.g., static calculations). I've therefore tried to put logic in the base input set where possible to share these common/broadly useful functionalities. Now, If one wants to make a MPScanBS input set, one doesn't need to duplicate the line mode generation logic a third time.
  • A minor point, but I found that some similar input sets, like MPNonSCFSet and MPHSEBSSet did not have feature parity. E.g., MPNonSCFSet had optics modes which also make sense for MPHSEBSSet. I tried to add feature parity between similar input sets where possible.
  • I also found that the override_from_prev and from_prev_calc were defined repeatedly in most input sets. I therefore moved these functions to the base DictSet. I also added an option inherit_incar to dict set which can be True, False, or a list of keys to inherit (as copied from MatPESSet). If a class has inherit_incar enabled and but uses the from_prev function to initialise their input set, the only thing that will get used is the Structure from the previous directory and the bandgap information.

@janosh
Copy link
Member

janosh commented Nov 26, 2023

Thanks, this is great stuff (as usual)!🥇

Previously, respecting user_incar_settings was left down to specific input set implementations and there were a number of examples where this was not respected

Do you think this needs better test coverage? In general, just having looked at the whole code and knowing atomate2, any parts that stand out as having particularly poor test coverage? Happy to help write more tests.

@JaGeo
Copy link
Member

JaGeo commented Nov 26, 2023

Just wanted to say thank you for this! Both from pymatgen and atomate2 perspective!

@utf
Copy link
Member Author

utf commented Nov 26, 2023

Just to note that the remaining failing tests are also failing on the master branch currently, and not to do with this PR (seems like it is missing PBE64 potcars).

@matthewkuner
Copy link
Contributor

matthewkuner commented Nov 27, 2023

Thank you for this!
I do have a question tho---do you think the existing tests are sufficient to ensure that these changes truly do not break anyone's code/expectations? Or should more testing be added?

@utf
Copy link
Member Author

utf commented Nov 28, 2023

do you think the existing tests are sufficient to ensure that these changes truly do not break anyone's code/expectations?

That is a good question. It is almost impossible to catch all edge cases. That said, we could probably develop a quick testing suite that generates input files using the old input sets for a range of structures and options and then compares them to the new input sets.

I would very much appreciate help with this if possible.

@janosh
Copy link
Member

janosh commented Nov 28, 2023

Just to note that the remaining failing tests are also failing on the master branch currently, and not to do with this PR (seems like it is missing PBE64 potcars).

Just pushed a fix (816cd3a) for the failing test on master. (CC @esoteric-ephemera)

I would very much appreciate help with this if possible.

Is this otherwise ready to go from your side? I.e. is now the time to add tests?

@utf
Copy link
Member Author

utf commented Nov 28, 2023

Is this otherwise ready to go from your side? I.e. is now the time to add tests?

Yes, from my point of view this should be ready for more testing. I just added some commits that aim to minimise the diff.

@davidwaroquiers
Copy link
Contributor

davidwaroquiers commented Dec 1, 2023

Hello everyone,

Thanks for this @utf, I think it's very good work and a very good way forward to merge the two input sets implementations lying respectively in pymatgen and atomate2 and I definitely agree with the need to not have two different implementations of the input generators/sets.

I have a couple of questions and/or comments though, no idea if any of this can be addressed in any way.

  1. Are all the input generators in atomate2 gonna be in pymatgen ultimately ? In this PR, the ones already existing in pymatgen have been modified so that, as you mention a/ they are backward-compatible in pymatgen, b/ they can be used in atomate1 and c/ they can be used in atomate2. I did not test that but I trust you on this ;)
  2. The current input generators (and input sets) in atomate2 follow the abstract interface defined in pymatgen/io/core (not only vasp, also CP2K, abinit/ongoing PR, aims/ongoing PR, Qchem/ongoing PR). Here the vasp input sets will work due to some part being duck typed wrt the pymatgen.io.core generator/set interface (e.g. the VaspInput returned by get_input_set, is not an InputSet as defined in the InputGenerator interface, hence it works "only" because VaspInput also has a write_input method). I would think that it is good to have an overall standard as much as possible there, which is even more important with all the ongoing work being done in atomate2 for having common workflows (elastic workflow, phonons, ... which can be done using "any" code following the "contract). Would it be possible to have VaspInput be a subclass of InputSet ?
  3. The naming (and probably this is to keep full backward compatibility in pymatgen, no idea if something can be done there) can be confusing for the user in atomate2: a job maker in atomate2 requires an input_set_generator as an argument and will be passed an object called a "VaspInputSet". (Side note on this: pymatgen vasp input sets have always been of great use to many people, but I always found the name weird, a set of inputs, to me is a set of inputs, and it could be generated manually or automatically, through a generator or a factory or else...). Note that my main point here is on the VaspInputSet name, not the VaspInput. If something were to be made possible here, I would prefer having VaspInputSet being a VaspInputGenerator, and the VaspInput stay as VaspInput (this last one is, I think, not confusing for the user). Anyway again, no idea if this is open to discussion/possible (or e.g. if an alias could be done, so that "former" pymatgen users don't need to change their code, and atomate2 users can use the "new" name, is it good/bad ? I don't have a strong opinion on that, and it's not up to me anyway :) )
  4. Corollary of points 1., 2. and 3., if some specific vasp input generators stay in atomate2 for some reason, and others (more general) are in pymatgen, and both are kind of different (the general ones being "VaspInputSet", the specific ones being "VaspInputGenerator", I would guess it would add more confusion to the atomate2.

I understand there has already been many discussions on this topic but maybe some of these questions could be addressed somehow ? (VaspInput becoming a subclass of InputSet and VaspInputSet having "VaspInputGenerator" alias ? other ideas ?)

@janosh
Copy link
Member

janosh commented Dec 2, 2023

I always assumed the Set in VaspInputSet came from generating a set of input files (INCAR, POSCAR, ...) but maybe that's wrong.

I'm open to adding an alias VaspInputSet -> VaspInputGenerator and deprecating VaspInputSet. But I'll let others weigh in as well.

@utf
Copy link
Member Author

utf commented Dec 3, 2023

Thanks for your thoughts @davidwaroquiers.

Are all the input generators in atomate2 gonna be in pymatgen ultimately?

That is the goal. If there are some very specialised input sets needed for an atomate2 workflow then potentially they don't have to go in pymatgen. But by and large, the input sets will live in pymatgen and the workflow code will live in atomate2.

Would it be possible to have VaspInput be a subclass of InputSet?

I'm in favour of this, since we have the abstract interface defined and there are a growing number of codes that support it. The downsides to doing this are small as far as I can see and won't require breaking any existing pymatgen code.

I would prefer having VaspInputSet being a VaspInputGenerator, and the VaspInput stay as VaspInput

I understand the logic here but I'm personally against renaming VaspInputSet. I think this will add additional confusion to existing pymatgen users. If necessary we could rename any remaining input sets in atomate2 to match the pymatgen naming, and also optionlly update the input_set_generator field on the atomate2 makers (although this last point will be more disruptive).

I'm open to adding an alias VaspInputSet -> VaspInputGenerator and deprecating VaspInputSet.

On the point of VaspInputSet, I think there is a strong case for merging DictSet and VaspInputSet. Currently, the VaspInputSet object is not useable by itself and is only functional when used as a superclass for DictSet. This is because the potcar property uses the potcar_symbols property, which in turn tries to access self._config_dict which is only present in DictSet subclasses. E.g., see here:

settings = self._config_dict["POTCAR"]

My suggestion would be:

  • Merge VaspInputSet properties and functions into DictSet.
  • Deprecate VaspInputSet.
  • Leave VaspInput as it is (but make it a subclass of InputSet.

I think this leaves the overall picture cleaner since now there is no confusion between VaspInput and VaspInputSet.

On a separate note, I've found the pymatgen/io/vasp/sets.py file to be getting quite enormous (currently 3337 lines). Can we turn sets into a package and have a number of modules (e.g., base, mp, mit, atomate2, mvl). We can import all the input sets in sets/__init__.py so that nobody's code is broken. I'm happy to do that in this PR if others agree.

@janosh
Copy link
Member

janosh commented Dec 3, 2023

Go right ahead with the file split! I've been considering merging DictSet and VispInputSet as well so no convincing needed here but was planning to deprecate DictSet. Why keep that?

@utf
Copy link
Member Author

utf commented Dec 3, 2023

My thought process is that most (if not all) people who have made a custom input set will have subclassed DictSet rather than VaspInputSet (for the reasons listed above). So keeping DictSet will mean their code won't break. E.g., a quick GitHub search brings up:

I'll just note that the features in this PR will not break these other implementations!

@JaGeo
Copy link
Member

JaGeo commented Dec 3, 2023

I think this is a really important point!

@janosh
Copy link
Member

janosh commented Dec 3, 2023

We can have a long deprecation period or even keep DictSet around as a permanent alias (though I'm against that) but DictSet is a misnomer imo. VaspInputSet is clearly a more self-explanatory name.

@davidwaroquiers
Copy link
Contributor

Thanks for the feedback @utf and indeed I agree with trying to be backward compatible with existing code using pymatgen.

I understand the push towards keeping VaspInputSet (or DictSet for the matter). Still, it confuses me: the VaspInputSet (or the DictSet, if VIS properties and methods are merged into it) has a get_input_set method, which should return a (subclass of) InputSet (i.e. the VaspInput object, currently not a subclass of InputSet). If a change from VaspInputSet (DictSet) to VaspInputGenerator (?) is not on the table, I think it could even be considered to change the pymatgen/io/core interface:

e.g. InputSet becoming an object called e.g. "Input", and InputGenerator becoming an InputSet. That way the InputSet is a "factory"/"generator" object that returns an Input object. This means modifying all atomate2 generators and sets but I think it decreases confusion (my preference still being on generators being VaspInputGenerators and InputSets being InputSets, although again, I acknowledge the problems for existing users/codes of pymatgen).

On a general level, while I agree that backward incompatible changes should be avoided as much as possible, in some cases, it can be beneficial (either from a developer or from a user's perspective). In this case, I would think that having a long deprecation period for VaspInputSet to change it to VaspInputGenerator would have been ok and would be beneficial for both the user (a vasp input generator generates a set of inputs, i.e. a vasp input set/ or just vasp input) and the developer (clear api defined in pymatgen/io/core).

In any case, I will of course adhere to the decisions made.

@utf
Copy link
Member Author

utf commented Dec 4, 2023

Hi @davidwaroquiers, agreed that the naming is confusing. I'll defer to others as to the exact direction taken if deprecations/renaming is to happen.

@shyuep
Copy link
Member

shyuep commented Dec 4, 2023

My decision is that I have no problems with having an alias called VaspInputGenerator. But VASPInputSet will not be deprecated. Let's be pragmatic - there are many things we probably want to rename. But deprecation of classes/methods creates headaches and unless there is an actual practical benefit, i.e., a large change in functionality or the API, these should not be done.

@computron
Copy link
Member

We discussed at the atomate2 meeting, I think the plan is for @Zhuoying to add more tests and then propose merging this

@shyuep
Copy link
Member

shyuep commented Jan 17, 2024

@Zhuoying When are the tests going to be done? Surely it does not take several months to write a few tests?

@janosh
Copy link
Member

janosh commented Jan 18, 2024

In the interest of getting this unstuck I suggest we merge this PR as is (thanks again @utf!) and add tests later if we discover any problems.

@JaGeo
Copy link
Member

JaGeo commented Jan 19, 2024

I - cautious by nature - would like to point out that this is a core functionality of pymatgen. Thus, in my opinion, we should merge it only if we are sure that such bugs are very unlikely. (Decision is up to you all as maintainers, of course!)

@janosh
Copy link
Member

janosh commented Jan 19, 2024

Thus, in my opinion, we should merge it only if we are sure that such bugs are very unlikely

Existing tests are passing which is meaningful since test coverage on VASP IO is decent. IMO we shouldn't hold this PR to higher standards than PMG in general and frankly, bugs are not that unlikely in PMG.

@janosh janosh merged commit e502e33 into materialsproject:master Jan 20, 2024
22 checks passed
@janosh
Copy link
Member

janosh commented Jan 20, 2024

and frankly, bugs are not that unlikely in PMG

case in point 386b969 and 882452e

@JaGeo
Copy link
Member

JaGeo commented Jan 20, 2024

(Thanks. I know that there are bugs in pymatgen and that they are not unlikely. Also in my code. 😉 I don't see why this is supporting fewer tests though 😅)

As I said, maintainers have to decide. And I even gave a 👍 for the point that existing tests are passing.

@janosh
Copy link
Member

janosh commented Jan 20, 2024

I don't see why this is supporting fewer tests though

more test coverage is always good but it seems like no one has time to actually write the tests right now

@esoteric-ephemera
Copy link
Contributor

I can support the effort of writing more tests - where do we feel like current test coverage is lacking?

Also, does this PR cover the recent atomate2 PR that corrected how dict-like user_incar_settings objects are handled?

@utf
Copy link
Member Author

utf commented Jan 23, 2024

@esoteric-ephemera yes it does!

@Zhuoying
Copy link
Contributor

@utf Thanks for the nice PR and @janosh @shyuep Thanks for approving and merging it.
I have submitted another PR #3576 to add tests. We can discuss any controversial default settings under that PR.

@esoteric-ephemera Thanks for offering the efforts. Any tests on Magmomldau are welcome (currently coverage not very high). Also we are roaming between atomate2 and pymatgen to accommodate the recent changes.

@janosh
Copy link
Member

janosh commented Jan 26, 2024

@esoteric-ephemera @Zhuoying have a look at the output of

pip install pytest pytest-coverage
# cd into pmg repo
pytest tests/io/vasp --cov pymatgen/io/vasp

for where new tests are most needed:

---------- coverage: platform darwin, python 3.11.7-final-0 ----------
Name                           Stmts   Miss  Cover
--------------------------------------------------
pymatgen/io/vasp/__init__.py       3      0   100%
pymatgen/io/vasp/help.py          35     35     0%
pymatgen/io/vasp/inputs.py      1074    112    90%
pymatgen/io/vasp/optics.py       186     18    90%
pymatgen/io/vasp/outputs.py     2351    210    91%
pymatgen/io/vasp/sets.py        1204     79    93%
--------------------------------------------------
TOTAL                           4853    454    91%

use --cov-report html for an interactive line-by-line browser report

@janosh janosh added io Input/output functionality vasp Vienna Ab initio Simulation Package api Application programming interface labels Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Application programming interface io Input/output functionality vasp Vienna Ab initio Simulation Package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reciprocal_density not working in MPHSEBSSet
9 participants