Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling partial periodic boundary conditions #2364

Closed
gpetretto opened this issue Jan 20, 2022 · 8 comments
Closed

Handling partial periodic boundary conditions #2364

gpetretto opened this issue Jan 20, 2022 · 8 comments

Comments

@gpetretto
Copy link
Contributor

Is your feature request related to a problem? Please describe.
We (I and @davidwaroquiers) were thinking about what would be the best option to handle objects with partial periodic boundary conditions.
In particular this would be helpful for defining the inputs of codes that support calculations with this option.

Describe the solution you'd like
A potential solution would be to extend the Structure object to support partial PBC or introduce a new object specific to these cases.
We mainly wanted to know if this was ever considered in pymatgen and, if not, start a discussion about whether this could be a desirable feature to implement.

Describe alternatives you've considered
A trivial solution is to use a Structure with some vacuum in the non-periodic directions. This, however, forces to keep somewhere else the information about which directions are periodic and all the operations on the Structure need to pass through a series of if statements.

@shyuep
Copy link
Member

shyuep commented Jan 20, 2022

This is definitely desired. I would be happy to discuss.

@gpetretto
Copy link
Contributor Author

gpetretto commented Jan 24, 2022

Thanks for the interest. Here are some considerations that I hope will be useful as a starting point.

The distinction between the Molecule and Structure objects always worked well for me in general. Other codes (e.g. ASE, aiida) handle all cases of partial periodic boundary conditions (PPBC) in a single object, with the information about PPBC stored as a list of 3 booleans. Introducing such a classification in pymatgen when two objects already exist may lead to some confusion: is a Molecule an entity with 0 PBC? Is a Structure an entity with 3 PBC?
Another important point is how much breaking of backward compatibility would be allowed to implement such a feature. Objects that involve PBC (or the lack of it) include Structure/Molecule, Lattice, Site/PeriodicSite and their use is ubiquitous in pymatgen. Considering that the need for this feature apparently did not emerge for more than 10 years, it would seem unreasonable to cause a large disruption in the rest of the code just to introduce it.
Depending on the solution implemented it can be that the same physical object could be represented with two different python objects. Would this be acceptable?
My impression is that, when considering its representation, a physical object with PPBC is "closer" to a Structure than to a Molecule. For example, a 1D object is still associated with the concept of a lattice (even though it is basically just one number), while an object with no PBC does not need it at all.
In principle defining the periodicity with the triplet of boolean is not strictly necessary. A 1D object could be defined by a "lattice" consisting of a single float and set of 2D cartesian coordinates plus 1D fractional coordinates. Without the need to specify its direction in a 3D cartesian space. However this, aside from being messy to handle, would be limiting in the definition of the system. There could be good reasons for defining the 1D system along the x-axis rather than the z-axis. It seems that the DFT codes that support this kind of input may have different conventions. For this reason in the following I focused on the idea of the PBC defined in a 3D cartesian space.

That said, and given that this would be a new feature in an already existing and widely used package, I don't think there are too many options to add the support for PPBC. I will list those that I can think of, in order of increasing impact for the rest of the existing code. I also tried to outline a (non exhaustive) list of pros and cons.

  • Create new subclass(es) of SiteCollection to handle the specific cases of PPBC. Two distinct possibilities are the introduction of two new classes (a 1DStructure plus a 2DStructure) or a single class for generic PPBC.

    • pros:
      • little impact on existing code. Support for these classes could be introduced at different times and only if needed.
      • should be relatively easy to implement. Some code duplication could be avoided with an additional abstraction layer (e.g. PeriodicSiteCollection) that handles common functionalities between the new PPBC object(s) and Structure
    • cons:
      • more objects to maintain separately
      • probably more new objects needed that play the role of Lattice and Site?
      • in the option with single new object, this may end up overlapping with Structure and Molecule if 3D or 0D PBC are allowed inside it.
  • Implement a new (non abstract) class from which Structure inherits. Such a new class can have a pbc=(bool, bool, bool) attribute and will be the reference for handling PPBC. Structure would subclass it and have pbc=(True, True, True) fixed. (False, False, False) is forbidden and remains a distinct object from Molecule.

    • pros:
      • not much disruption for the existing code
      • most of the code could probably be in the parent class
      • switching to accepting an instance of the base class (e.g. in the input generator of some code) should be relatively painless if only methods of the base class are used.
    • cons:
      • ambiguity when handling the case pbc=(True, True, True). Both the new object and Structure could be used.
      • potential issues in the inheritance tree due to the presence of IStructure. Should an immutable PPBC object exist as well? How would the final inheritance tree be?
      • a bit disappointing not accepting the pbc=(False, False, False) option, but adding the Molecule in the inheritance tree would introduce additional complications with little benefit.
      • new objects for Lattice and Site may be needed as well with associated coding and maintenance efforts required.
  • Add the pbc=(bool, bool, bool) directly in Structure. (False, False, False) is forbidden and remains a distinct object from Molecule:

    • pros:
      • no new objects. Might be easier for both users and developers to have a single entry point and a single object to maintain.
      • if everything is made consistently, Lattice and Site could be updated to handle pbc as well and remove the need for new specific objects.
    • cons:
      • likely disruptive for a large part of the existing code. When Structure is used it currently means a 3D PBC. Passing a Structure with PPBC will still respect the API but PPBC will not be really supported by most of the code. Checks should be added almost everywhere. Alternatively the user should be responsible for using a proper 3D periodic Structure, with the risk of failures or, worse, unexpected behavior.
      • still does not allow for pbc=(False, False, False)
  • Create a new object that handles all kind of PBC, from 0 to 3, replacing Molecule and Structure in the long run or coexisting with them. I honestly listed this just for completeness, because I believe that this would wreak havoc in all depending codes and drop the nice distinction between Structure and Molecule with very little to gain.

These were my personal thoughts about it. I would be glad to hear comments and if other options should be considered.

@shyuep
Copy link
Member

shyuep commented Jan 24, 2022

Thanks for the thoughtful suggestions. Oddly, my inclination is to do the last one - a new SiteCollection subclass for all kinds of PBC. The general idea is that this new class would mainly be used for 1D and 2D for now, and the design should be such that we can make Molecule and Structure to subclass it with little difficulty. There are some codes where we'd want to support any PBC, and there are some codes where only certain types of PBC are supported (molecules and 3D crystals). The analogy I would make is that you should write a Polygon class from which specific implementations like Square, Triangle, Hexagon, Pentagon can be derived. Maybe 90% of the time people only work with Squares and Triangles and so, we have special subclasses for them. But Hexagons, Pentagons or even Heptadecagons can be defined as just polygons in general.

Your second suggestion is fine too, though preferably, I don't want to treat Molecules separate from other kinds of PBC.

@shyuep
Copy link
Member

shyuep commented Jan 24, 2022

@mkhorton You should weigh in since this is a major change.

@shyuep
Copy link
Member

shyuep commented Jan 24, 2022

I should add that you should not worry about IStructure for now. The intent is really to get rid of it soon. There is very little benefit to keeping IStructure and IMolecule.

@mkhorton
Copy link
Member

Thanks @shyuep.

For the motivation of this feature, I agree and I also have been in touch with several people who would like PPBCs. They're useful for a number of reasons (2D materials, mixed basis set codes, classical modelling codes, etc.).

For the implementation, I think we should first acknowledge that to define PBCs, a Lattice should be required. E.g., the Lattice defines the periodicity. At present, all lattices in pymatgen are 3D lattices. While defining a 2D lattice is technically possible, I think in practice it would see limited use -- even 2D materials are often more practically described with a 3D lattice, so that fractional co-ordinates can be used to describe atomic positions.

So one option to consider might be a property on Lattice. By default, Lattice.periodic = (True, True, True). This can then optionally be changed as necessary, and Structure will continue to work as-is except for any features that need to take the Lattice periodicity into account.

In this way, the periodicity of a SiteCollection is defined by the Lattice associated with the sites within it, in much the same way as it currently works when the PeriodicSite contains lattice information.

@gpetretto
Copy link
Contributor Author

Thanks for your comments. So let me try to summarize your main suggestions.

One option would be to have a single a tree like this:
pbc1

I think this is not so dissimilar from what I had in mind in my second option in the previous post:
pbc2

The first being @shyuep preferred option.

The suggestion from @mkhorton seems to go in the opposite direction. No new classes and instead add the periodic attribute to Lattice, so that Structure can handle PPBC. I have just one doubt: do I understand correctly that you would require that a reasonable 3D lattice should always be defined, so that the main reference will still be the fractional coordinates? My initial idea was to consider a Lattice with a 3x3 Matrix, but to basically ignore the axes of the directions with no periodicity. However the latter would indeed leave ambiguity in the definition of fractional coordinates for PPBC and how to store them in the serialized version (currently they are stored as fractional coordinates).

Does this summarizes correctly your positions?

While we personally have a slight preference for the creation of a new class (let's call it System at this stage, just for reference. Suggestions on the name are welcome, if this would be the chosen option), I am still afraid that there will be some issues including Molecule as a subclass. Some examples are

  • as also pointed out by @mkhorton, the defining property of a system with PBC is the presence of a Lattice. So, while System needs to have a lattice attribute, this is basically meaningless for a Molecule. This might force to handle this special case of the lattice attribute in the base class.
  • System will still be a SiteCollection. While Structure is a collection of PeriodicSites, Molecule is a collection of Sites. What should System contain? PeriodicSite seems the obvious choice, but Molecule will be a subclass of System while having Sites internally.

Handling all the different cases in the base class can be more challenging to implement and maintain. If to avoid these kind of issues all the methods and properties of System need to be overridden in Molecule, what would be the benefit of having it as a subclass of System?

If you are interested, I would be available to try to work on this. In that case I can open a WIP PR to discuss other points in more details. However, since this is a major change, I think that a clear decision about which solution to implement should preferably come from your side.

@shyuep
Copy link
Member

shyuep commented Feb 1, 2022

Thanks. I have thought about it further. I think we can go with your original scheme (option B), i.e., keep Molecule and Structure as separate branches. We then adopt @mkhorton suggestion that we define PBC in terms of the lattice basis vectors. This will be integrated into PeriodicSites and Structure, which can be something with arbitrary PBC (with 3D being default). To be honest, I think it is difficult to do pure strategization without seeing a real concrete example of what it would look like. I think a WIP PR with a barebones implementation (maybe it does not need to be in Structure in the first instance if it makes it easier to write) would be useful to stimulate ideas and discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants