Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vasp options #799

Closed
wants to merge 147 commits into from
Closed

Vasp options #799

wants to merge 147 commits into from

Conversation

liamhuber
Copy link
Member

Amendments to #769.

A new field on vasp input is created, the DataContainer code_specific_options, which in this case stores the flag allowing the structure order to preserved (which is the meat of #769).

Still needs tests.

ligerzero-ai and others added 7 commits September 20, 2022 22:54
I just copied the pull sudarshan did for direct positioning.

In particular, I am not sure why there are two different places where he needed to place the same setting.

I also don't understand the logic of "allow_reordering=not self.allow_structure_reordering"

And also why in the second block of code, he adjusts the setter to be _write_in_direct_coordinates opposed to the first one where he writes write_in_direct_coordinates
I just copied the pull sudarshan did for direct positioning.

In particular, I am not sure why there are two different places where he needed to place the same setting.

I also don't understand the logic of "allow_reordering=not self.allow_structure_reordering"

And also why in the second block of code, he adjusts the setter to be _write_in_direct_coordinates opposed to the first one where he writes write_in_direct_coordinates
@coveralls
Copy link

coveralls commented Oct 6, 2022

Pull Request Test Coverage Report for Build 3226717749

  • 26 of 34 (76.47%) changed or added relevant lines in 3 files are covered.
  • 63 unchanged lines in 2 files lost coverage.
  • Overall coverage increased (+0.04%) to 68.635%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyiron_atomistics/vasp/potential.py 5 8 62.5%
pyiron_atomistics/vasp/structure.py 13 18 72.22%
Files with Coverage Reduction New Missed Lines %
pyiron_atomistics/interactive/quasi_newton.py 14 77.9%
pyiron_atomistics/atomistics/structure/atoms.py 49 75.88%
Totals Coverage Status
Change from base Build 3190143705: 0.04%
Covered Lines: 12077
Relevant Lines: 17596

💛 - Coveralls

@ligerzero-ai
Copy link
Contributor

ligerzero-ai commented Oct 6, 2022

Upon closer inspection, it looks like this problem runs much deeper than I (we) thought, I did some initial scripting for tests and found that the structure was "sorting" itself even in the branch in which it shouldn't be. I found that this was due to the following lines:

in lines 102-104 the search for species strings to be written to the POSCAR relies on the sorting behaviour.

atom_numbers = structure.get_number_species_atoms()
if write_species:
    f.write(" ".join(atom_numbers.keys()) + endline)

So the writing behaviour gets the positions/coordinates right, but fails on the species ordering side, because the get_number_species_atoms implicitly relies on the sorting behaviour. This is really all too complicated for something that is already implemented elsewhere - e.g. the Structure.to(fmt="poscar", filename="POSCAR_test") from pymatgen.

I would prefer a complete refactoring of the write_poscar function in favour of a:

structure_pmg = pyiron_to_pymatgen(structure)
structure_pmg.to(fmt="poscar", filename="POSCAR_test")

This is a simple, quick and easy fix.

For the current default sorting behaviour, we can add that as an if-else statement.

However, it comes to my mind that I am not aware of how many other places rely on this kind of implicit ordering behaviour by pyiron. That could prove a major pain to weed out.

@ligerzero-ai
Copy link
Contributor

ligerzero-ai commented Oct 6, 2022

image

So I can see that the get_number_species_atoms is called in 5 other places, so explicitly it's not too bad, since I can see that the others are just simple uses of that function without implicit reliance on sorting behaviour. But I don't know where else in the codebase there will be an implicit reliance on this kind of sorting behaviour.

I am especially worried about the reading of output, because if there is a similar implicit reliance on this kind of sorting behaviour for parsing it could be a major pain to refactor out of the codebase, not to mention future maintenance of backwards-compatibility with already-generated data.

Are there tests written in mind expecting this kind of implicit sorting behaviour? Or is input-output matching explicitly tested for without a reliance on this behaviour?

For example, are there tests for the OUTCAR parsing which read to see if the species order list matches what pyiron is expecting? e.g. does pyiron implicitly rely on this ordering behaviour to remap to the user-fed original ionic positions to store into the database? are there tests for this kind of implicit behaviour? e.g. if it suddenly stopped existing, will a test break? Or will it break silently?

tl;dr If I change the input now, won't output parsing which relies on this behaviour be completely f'd?

@liamhuber
Copy link
Member Author

So the writing behaviour gets the positions/coordinates right, but fails on the species ordering side, because the get_number_species_atoms implicitly relies on the sorting behaviour.

@ligerzero-ai good testing and good catch!

I would prefer a complete refactoring of the write_poscar function in favour of a: ...

I am open to wrapping the pymatgen implementation for some/all of this functionality, although I don't know enough about it to comment on the exact implementation. In general, if someone already in our dependencies does something we want to do, that's the absolute perfect time to cut code out of our code base and just use their implementation!

So I can see that the get_number_species_atoms is called in 5 other places, so explicitly it's not too bad...

Indeed. Actually, the fact that get_number_species_atoms is returning a sorted list with all the types clustered together is totally fine. That's the behaviour I would want. So I don't think we will even need to "fix" those occurrences.

in lines 102-104 the search for species strings to be written to the POSCAR relies on the sorting behaviour.

Agreed, this is trouble. The most straightforward solution I see is to add a second if clause here that depends on allow_reordering. Something like:

if allow_reordering:
    atom_numbers = structure.get_number_species_atoms()
else:
    atom_numbers = get_ordered_species_count(structure)
if write_species:
    f.write(" ".join(atom_numbers.keys()) + endline)
num_str = [str(val) for val in atom_numbers.values()]

Where get_ordered_species_count counts up all the consecutive atoms of the same type in the structure.

tl;dr If I change the input now, won't output parsing which relies on this behaviour be completely f'd?

Maaaaaybe not? Elsewhere in pyiron_atomistics.vasp.structure, I see stuff that looks like it is reversing the reordering. If we're really lucky, it will just continue working in the case that allow_structure_reordering==True, and it will be a useless waste of flops but not break anything in the case that allow_structure_reordering==False.

Definitely I would be more comfortable if the VASP job simply stored (and serialized) the map to transform pyiron-indices to vasp-indices, and simply applied it in reverse on reading files. I'm cautiously optimistic that such a setup would simplify a lot of code in this module.

Be a bit patient with me here, I never looked in side vasp.structure until you brought up this issue 😝

For example, are there tests for the OUTCAR parsing which read to see if the species order list matches what pyiron is expecting? e.g. does pyiron implicitly rely on this ordering behaviour to remap to the user-fed original ionic positions to store into the database? are there tests for this kind of implicit behaviour? e.g. if it suddenly stopped existing, will a test break? Or will it break silently?

Excellent questions and I'm afraid I don't know. Reassuringly, there are a bunch of tests in tests/vasp, including test_structure (which in turn includes something called test_vasp_sorter which is great news), and test_outcar. Both of these make use of a relatively extensive set of static POSCAR and OUTCAR files, so (again, if we're lucky), there is only minimal work for us. All the tests passed on the continuous integration, but that's not shocking since none of the tests take us away from the default value of allow_structure_reordering=True. I suspect what's needed is to go through a bunch of the existing tests and add new if-branches where this is set to False and test that case too. Might also require some new reference files in tests/static/vasp_test_files to accomplish this.

Again though, I'm not a VASP user so I don't have any specific insight into this particular corner of our test suite. I'm happy to provide this sort of high level advice, but we should see at the meeting if there's someone at-institute to provide more fine-grained and real-time assistance. Maybe you already had a chance to talk with Sam about it?

tldr; Good investigating. Hopefully we will get away with a minor adjustment to the species and mimicking a bunch of existing tests 🤞

@ligerzero-ai
Copy link
Contributor

Thanks so much for the response Liam, v. grateful.

I will take a deeper dive on the weekend, but I doubt that I will have something complete written by Monday. I will probably test this behaviour explicitly on the cluster using a simple case and see what the output reader does. I will also need to take a deeper look into the output parsing to see if I can catch anything there as well.

Hopefully there is no kind of implicit reliance on this kind of behaviour, but if there is I'll be sure to let the group know, and we can decide on how to proceed with grandfathering in of old datasets after we implement this change.

@liamhuber
Copy link
Member Author

Per pyiron meeting (2022-10-10), renamed input.code_specific_options to input.options.

@ligerzero-ai
Copy link
Contributor

So, it looks like the problems are only just beginning - I can see that the POTCAR generation isn't explicitly reading the structure at runtime and relies on the default sorting behaviour:

image

So I'm off to fix this (POTCAR generation), but I'm not holding my breath for output parsing being rigorous with it's implementation either.

@ligerzero-ai
Copy link
Contributor

ligerzero-ai commented Oct 11, 2022

Okay, this just got way more complicated, actually ( :((((( <---- me ). I realise now that the potential generation is dependent on the implicit ordering, 2716561, but I also cannot just feed it the raw structure, because the sorting takes place at the write_poscar step, which means that there is no way to neatly feed the POTCAR the structure and have it generate it, outside of adding another flag which should disable the default sorting behaviour.

Imho, I feel quite strongly that it's terrible practice to include sorting inside input generation routines. The expected behaviour of any filewriter should be that it generates what is fed to it, without alterations.

So, it's my view that the writer should be "dumb" and the sorting should happen outside of it. If I feed a potcar a structure: structure it should generate the POTCAR directly from the structure. If you want a sorted POTCAR, you should sort your structure before you feed it into your POTCAR writer. Right now, the potcar generator is performing sorting inside of the function, similar to what is happening in write_poscar.

What is even more concerning is that this behaviour is scattered in the output as well: the output parsing is not explicitly reading a mapped list of the indices. It's relying on the sorting behaviour being the same as what is used to generate the structure, which you can see in lines 1978-onwards in the collect function in vasp.base. This is just really nasty overall.

Currently:
vasp job -> structure -> writers which perform sorting magic inside of them --> output parsed with implied sorting magic assuming that the same function was used.

My proposal:
vasp job -> structure -> sorted_structure (can be turned off with a flag) -> user struct---written input struct map stored to hdf5 -> "dumb" writers for input file generation --> output parsed as is --> re-mapping explicitly using the hdf5 stored user vs input map.

This leaves exactly 0 room for any kind of bs happening as a result of the sorting behaviour. There's no reliance whatsoever on the input and output functions calling the same function, and is far more maintainable overall.

Imho, this calls for some heavy re-writing of the behaviour of the vasp modules because the fact that the map is implicitly relying on the same function being called at input and output makes me feel terribly uncomfortable in general. If you change one, you will break the other, silently.

@ligerzero-ai
Copy link
Contributor

ligerzero-ai commented Oct 11, 2022

And to top it all off, you can see that the unittests pass on my latest commit, which means that the tests are failing to catch the mismatch between input and output. This should fail with the elements in the generated POTCAR having a mismatch with what is specified in the generated POSCAR.

@ligerzero-ai
Copy link
Contributor

Evidence of unsorted POSCARs breaking output parsing:

RAW DATA:
CONTCAR, magmom datablock from OUTCAR of job:

Fe14 Ni1 P1
   1.00000000000000
     5.7400000000000002    0.0000000000000000    0.0000000000000000
     0.0000000000000000    5.7400000000000002    0.0000000000000000
     0.0000000000000000    0.0000000000000000    5.7400000000000002
   Fe   P    Fe   Ni   Fe
     1     1     3     1    10
Direct
  0.0025664630704604  0.0032319176062993  0.0032274764708489
  0.2499994228784569  0.2500029153329241  0.2500007627117586
  0.0025666904584504  0.4967683803040508  0.0032278685645549
  0.2499989124577435  0.7499961246198761  0.2500012030899373
  0.4974336564145658  0.0032323325444499  0.0032278986920338
  0.7500007266307670  0.2500018099441704  0.2500006522263918
  0.4974329547739163  0.4967689527429046  0.0032274639824345
  0.7500011310338164  0.7499959463162306  0.2500012500073081
  0.0025662305143739  0.0032323470480516  0.4967722750902888
  0.2499989179606291  0.2500039379306188  0.7499988070948959
  0.0025672336441982  0.4967688955773852  0.4967728567290476
  0.2499989374875279  0.7499959701448191  0.7499987773856067
  0.4974331611726478  0.0032319315360019  0.4967728211408048
  0.7500011539600592  0.2500039814947535  0.7499987588800141
  0.4974333446765785  0.4967684450015159  0.4967723431120620
  0.7500010628658087  0.7499961118559484  0.7499987848220121
 magnetization (x)

# of ion       s       p       d       tot
------------------------------------------
    1       -0.012  -0.034   2.425   2.378
    2        0.004  -0.072   0.000  -0.068
    3       -0.012  -0.034   2.426   2.379
    4       -0.004  -0.041   2.547   2.501
    5       -0.012  -0.034   2.425   2.378
    6       -0.021  -0.058   0.807   0.728
    7       -0.012  -0.034   2.426   2.379
    8       -0.011  -0.032   2.622   2.578
    9       -0.012  -0.034   2.425   2.378
   10       -0.004  -0.041   2.547   2.501
   11       -0.012  -0.034   2.425   2.379
   12       -0.015  -0.043   2.667   2.610
   13       -0.012  -0.034   2.425   2.378
   14       -0.011  -0.032   2.622   2.578
   15       -0.012  -0.034   2.425   2.379
   16       -0.011  -0.068   2.561   2.483
--------------------------------------------------
tot         -0.172  -0.662  35.773  34.939

note P -0.068 and Ni 0.728

IN JUPYTER NOTEBOOK:

job.get_structure() prints

Fe: [0.01473148 0.01855122 0.01852574]
P: [4.30500608 4.30497767 4.304993  ]
Fe: [1.43499667 1.43501676 1.43500436]
Fe: [0.0147328  2.8514505  0.01852797]
Fe: [1.43499374 4.30497773 1.43500689]
Ni: [2.85526737 2.8514509  2.85147323]
Fe: [2.85526921 0.01855357 0.01852815]
Fe: [4.30500419 1.43501039 1.43500373]
Fe: [2.85526513 2.85145377 0.01852562]
Fe: [4.30500649 4.30497675 1.43500718]
Fe: [0.01473016 0.01855369 2.85147289]
Fe: [1.4349938  1.43502262 4.30499317]
Fe: [0.0147359  2.85145349 2.85147622]
Fe: [1.43499392 4.30497687 4.304993  ]
Fe: [2.85526634 0.01855128 2.85147599]
Fe: [4.3050066  1.43502285 4.30499288]
tags: 
    spin: [(0: 2.378) (1: 2.483) (2: -0.068) (3: 2.379) (4: 2.501) (5: 2.379) (6: 2.378) (7: 0.728) (8: 2.379) (9: 2.578) (10: 2.378) (11: 2.501) (12: 2.379) (13: 2.61) (14: 2.378) (15: 2.578)]
pbc: [ True  True  True]
cell: 
Cell([5.74, 5.74, 5.74])

note P 2.483 Ni 2.379, when in fact it should be -0.068 (2) and 0.728 (7).

@ligerzero-ai
Copy link
Contributor

ligerzero-ai commented Oct 11, 2022

@samwaseda #769 (comment)

So what you are saying is that the class Output should also include a self.options = VaspSpecificOptions() as it is in the Input() class currently? Then we pass different structures using structure[id_pyi_to_dft] according to the bool value of self.options.allow_reordering?

Ok, great, I think I get it.

Then we can safely remove all the vasp_sorter and get_chemical_species calls inside the actual writers and parsers and just have the raw output parsed as is before adding the final conversion back at the end of variable assignment.

@property
def id_pyi_to_dft(self):
    if self.input.option.allow_reorder:
        return self._reordered_indices
    return np.arange(len(self.structure))

@samwaseda
Copy link
Member

So what you are saying is that the class Output should also include a self.options = VaspSpecificOptions() as it is in the Input() class currently? Then we pass different structures according to the bool value of self.options.allow_reordering?

No I'm saying the permutation should happen only in this one variable, so that you don't need any other if ... else anywhere else. The storage should be only one place, and I'm fine with self.input.options.allow_reordering.

samwaseda and others added 20 commits November 4, 2022 22:06
Bumps [pymatgen](https://github.com/materialsproject/pymatgen) from 2022.9.21 to 2022.10.22.
- [Release notes](https://github.com/materialsproject/pymatgen/releases)
- [Changelog](https://github.com/materialsproject/pymatgen/blob/master/CHANGES.rst)
- [Commits](materialsproject/pymatgen@v2022.9.21...v2022.10.22)

---
updated-dependencies:
- dependency-name: pymatgen
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [mp-api](https://github.com/materialsproject/api) from 0.29.1 to 0.29.3.
- [Release notes](https://github.com/materialsproject/api/releases)
- [Commits](materialsproject/api@v0.29.1...v0.29.3)

---
updated-dependencies:
- dependency-name: mp-api
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [scipy](https://github.com/scipy/scipy) from 1.9.1 to 1.9.3.
- [Release notes](https://github.com/scipy/scipy/releases)
- [Commits](scipy/scipy@v1.9.1...v1.9.3)

---
updated-dependencies:
- dependency-name: scipy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
For Methfessel-Paxton smearing the occupation of orbitals can be
negative.  Checking >0 in check_band_occupancy is therefore not enough.
Bumps [pint](https://github.com/hgrecco/pint) from 0.19.2 to 0.20.1.
- [Release notes](https://github.com/hgrecco/pint/releases)
- [Changelog](https://github.com/hgrecco/pint/blob/master/CHANGES)
- [Commits](hgrecco/pint@0.19.2...0.20.1)

---
updated-dependencies:
- dependency-name: pint
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.1.2 to 1.1.3.
- [Release notes](https://github.com/scikit-learn/scikit-learn/releases)
- [Commits](scikit-learn/scikit-learn@1.1.2...1.1.3)

---
updated-dependencies:
- dependency-name: scikit-learn
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Previously get_structure returned added all elements in the storage to
each structure and defined the indices accordingly, even if the
structure only contains a subset of these elements.  Not all codes can
handle this, so I'm restricting this again.
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@stale
Copy link

stale bot commented Nov 23, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ligerzero-ai
Copy link
Contributor

Closed because rebase failed to satisfy unittests requirement for traitlets

Superceded by #880

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black reformat the code using the black standard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants