New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vasp options #799
Vasp options #799
Conversation
I just copied the pull sudarshan did for direct positioning. In particular, I am not sure why there are two different places where he needed to place the same setting. I also don't understand the logic of "allow_reordering=not self.allow_structure_reordering" And also why in the second block of code, he adjusts the setter to be _write_in_direct_coordinates opposed to the first one where he writes write_in_direct_coordinates
I just copied the pull sudarshan did for direct positioning. In particular, I am not sure why there are two different places where he needed to place the same setting. I also don't understand the logic of "allow_reordering=not self.allow_structure_reordering" And also why in the second block of code, he adjusts the setter to be _write_in_direct_coordinates opposed to the first one where he writes write_in_direct_coordinates
Tests in next commit
…? Is this appropriate?
Pull Request Test Coverage Report for Build 3226717749
💛 - Coveralls |
Upon closer inspection, it looks like this problem runs much deeper than I (we) thought, I did some initial scripting for tests and found that the structure was "sorting" itself even in the branch in which it shouldn't be. I found that this was due to the following lines: in lines 102-104 the search for species strings to be written to the POSCAR relies on the sorting behaviour.
So the writing behaviour gets the positions/coordinates right, but fails on the species ordering side, because the get_number_species_atoms implicitly relies on the sorting behaviour. This is really all too complicated for something that is already implemented elsewhere - e.g. the Structure.to(fmt="poscar", filename="POSCAR_test") from pymatgen. I would prefer a complete refactoring of the write_poscar function in favour of a:
This is a simple, quick and easy fix. For the current default sorting behaviour, we can add that as an if-else statement. However, it comes to my mind that I am not aware of how many other places rely on this kind of implicit ordering behaviour by pyiron. That could prove a major pain to weed out. |
So I can see that the get_number_species_atoms is called in 5 other places, so explicitly it's not too bad, since I can see that the others are just simple uses of that function without implicit reliance on sorting behaviour. But I don't know where else in the codebase there will be an implicit reliance on this kind of sorting behaviour. I am especially worried about the reading of output, because if there is a similar implicit reliance on this kind of sorting behaviour for parsing it could be a major pain to refactor out of the codebase, not to mention future maintenance of backwards-compatibility with already-generated data. Are there tests written in mind expecting this kind of implicit sorting behaviour? Or is input-output matching explicitly tested for without a reliance on this behaviour? For example, are there tests for the OUTCAR parsing which read to see if the species order list matches what pyiron is expecting? e.g. does pyiron implicitly rely on this ordering behaviour to remap to the user-fed original ionic positions to store into the database? are there tests for this kind of implicit behaviour? e.g. if it suddenly stopped existing, will a test break? Or will it break silently? tl;dr If I change the input now, won't output parsing which relies on this behaviour be completely f'd? |
@ligerzero-ai good testing and good catch!
I am open to wrapping the pymatgen implementation for some/all of this functionality, although I don't know enough about it to comment on the exact implementation. In general, if someone already in our dependencies does something we want to do, that's the absolute perfect time to cut code out of our code base and just use their implementation!
Indeed. Actually, the fact that
Agreed, this is trouble. The most straightforward solution I see is to add a second if allow_reordering:
atom_numbers = structure.get_number_species_atoms()
else:
atom_numbers = get_ordered_species_count(structure)
if write_species:
f.write(" ".join(atom_numbers.keys()) + endline)
num_str = [str(val) for val in atom_numbers.values()] Where
Maaaaaybe not? Elsewhere in Definitely I would be more comfortable if the VASP job simply stored (and serialized) the map to transform pyiron-indices to vasp-indices, and simply applied it in reverse on reading files. I'm cautiously optimistic that such a setup would simplify a lot of code in this module. Be a bit patient with me here, I never looked in side
Excellent questions and I'm afraid I don't know. Reassuringly, there are a bunch of tests in Again though, I'm not a VASP user so I don't have any specific insight into this particular corner of our test suite. I'm happy to provide this sort of high level advice, but we should see at the meeting if there's someone at-institute to provide more fine-grained and real-time assistance. Maybe you already had a chance to talk with Sam about it? tldr; Good investigating. Hopefully we will get away with a minor adjustment to the species and mimicking a bunch of existing tests 🤞 |
Thanks so much for the response Liam, v. grateful. I will take a deeper dive on the weekend, but I doubt that I will have something complete written by Monday. I will probably test this behaviour explicitly on the cluster using a simple case and see what the output reader does. I will also need to take a deeper look into the output parsing to see if I can catch anything there as well. Hopefully there is no kind of implicit reliance on this kind of behaviour, but if there is I'll be sure to let the group know, and we can decide on how to proceed with grandfathering in of old datasets after we implement this change. |
Per pyiron meeting (2022-10-10), renamed |
So, it looks like the problems are only just beginning - I can see that the POTCAR generation isn't explicitly reading the structure at runtime and relies on the default sorting behaviour: So I'm off to fix this (POTCAR generation), but I'm not holding my breath for output parsing being rigorous with it's implementation either. |
Okay, this just got way more complicated, actually ( :((((( <---- me ). I realise now that the potential generation is dependent on the implicit ordering, 2716561, but I also cannot just feed it the raw structure, because the sorting takes place at the write_poscar step, which means that there is no way to neatly feed the POTCAR the structure and have it generate it, outside of adding another flag which should disable the default sorting behaviour. Imho, I feel quite strongly that it's terrible practice to include sorting inside input generation routines. The expected behaviour of any filewriter should be that it generates what is fed to it, without alterations. So, it's my view that the writer should be "dumb" and the sorting should happen outside of it. If I feed a potcar a structure: What is even more concerning is that this behaviour is scattered in the output as well: the output parsing is not explicitly reading a mapped list of the indices. It's relying on the sorting behaviour being the same as what is used to generate the structure, which you can see in lines 1978-onwards in the Currently: My proposal: This leaves exactly 0 room for any kind of bs happening as a result of the sorting behaviour. There's no reliance whatsoever on the input and output functions calling the same function, and is far more maintainable overall. Imho, this calls for some heavy re-writing of the behaviour of the vasp modules because the fact that the map is implicitly relying on the same function being called at input and output makes me feel terribly uncomfortable in general. If you change one, you will break the other, silently. |
And to top it all off, you can see that the unittests pass on my latest commit, which means that the tests are failing to catch the mismatch between input and output. This should fail with the elements in the generated POTCAR having a mismatch with what is specified in the generated POSCAR. |
Evidence of unsorted POSCARs breaking output parsing: RAW DATA:
note P -0.068 and Ni 0.728 IN JUPYTER NOTEBOOK:
note P 2.483 Ni 2.379, when in fact it should be -0.068 (2) and 0.728 (7). |
So what you are saying is that the class Output should also include a Ok, great, I think I get it. Then we can safely remove all the vasp_sorter and get_chemical_species calls inside the actual writers and parsers and just have the raw output parsed as is before adding the final conversion back at the end of variable assignment.
|
No I'm saying the permutation should happen only in this one variable, so that you don't need any other |
Bumps [pymatgen](https://github.com/materialsproject/pymatgen) from 2022.9.21 to 2022.10.22. - [Release notes](https://github.com/materialsproject/pymatgen/releases) - [Changelog](https://github.com/materialsproject/pymatgen/blob/master/CHANGES.rst) - [Commits](materialsproject/pymatgen@v2022.9.21...v2022.10.22) --- updated-dependencies: - dependency-name: pymatgen dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [mp-api](https://github.com/materialsproject/api) from 0.29.1 to 0.29.3. - [Release notes](https://github.com/materialsproject/api/releases) - [Commits](materialsproject/api@v0.29.1...v0.29.3) --- updated-dependencies: - dependency-name: mp-api dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [scipy](https://github.com/scipy/scipy) from 1.9.1 to 1.9.3. - [Release notes](https://github.com/scipy/scipy/releases) - [Commits](scipy/scipy@v1.9.1...v1.9.3) --- updated-dependencies: - dependency-name: scipy dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
For Methfessel-Paxton smearing the occupation of orbitals can be negative. Checking >0 in check_band_occupancy is therefore not enough.
Bumps [pint](https://github.com/hgrecco/pint) from 0.19.2 to 0.20.1. - [Release notes](https://github.com/hgrecco/pint/releases) - [Changelog](https://github.com/hgrecco/pint/blob/master/CHANGES) - [Commits](hgrecco/pint@0.19.2...0.20.1) --- updated-dependencies: - dependency-name: pint dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps [scikit-learn](https://github.com/scikit-learn/scikit-learn) from 1.1.2 to 1.1.3. - [Release notes](https://github.com/scikit-learn/scikit-learn/releases) - [Commits](scikit-learn/scikit-learn@1.1.2...1.1.3) --- updated-dependencies: - dependency-name: scikit-learn dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
Previously get_structure returned added all elements in the storage to each structure and defined the indices accordingly, even if the structure only contains a subset of these elements. Not all codes can handle this, so I'm restricting this again.
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Closed because rebase failed to satisfy unittests requirement for traitlets Superceded by #880 |
Amendments to #769.
A new field on vasp input is created, the
DataContainer
code_specific_options
, which in this case stores the flag allowing the structure order to preserved (which is the meat of #769).Still needs tests.