Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Molecular box hashing #86

Closed
richardjgowers opened this issue Mar 8, 2022 · 4 comments
Closed

Molecular box hashing #86

richardjgowers opened this issue Mar 8, 2022 · 4 comments
Assignees

Comments

@richardjgowers
Copy link
Contributor

No description provided.

@dwhswenson
Copy link
Member

Do we intend to wrap the periodic box as its own class? Or do we just have the thing that contains it know how to serialize/deserialize that attribute? I don't immediately see a need for a separate class in this case.

How do we want to handle rounding issues in floats/float arrays? I see a few possibilities:

  1. Equality/hashing of these attributes is based on the serialized form:
    serialized = original.to_serialized()
    deserialized = from_serialized(serialized)
    reserialized = deserialized.to_serialized()
    serialized == reserialized  # this is true, so use it for the internal equality check
    # note that we do not require exact equality of deserialized == original
  2. Equality/hashing based on UUID. If we don't create a wrapper class for boxes, then we only need the UUID in the molecular system container class.
  3. Use a binary format for serialization.

This issue will be closely tied to #87; it might be best to combine them into one.

@dwhswenson dwhswenson changed the title Molecular box hasing Molecular box hashing Mar 8, 2022
@richardjgowers
Copy link
Contributor Author

Yeah the box is essentially an "item" in the state (see #85) as this can change the energy of the system. Because there's only 6 values which completely define a box (for a triclinic cell which covers most/all? cases) we can just dump 6 doubles (aka 6x8 bytes) of raw bits. This should mean we can get perfect reproduction, I think this is option 3 below.

Yeah serialisation and hashing might end up one and the same.

@dwhswenson
Copy link
Member

Still not clear whether this is actually a separate class to create (possibly more a discussion for #85) -- does it have any functionality other than holding data? If not, then it seems like just an attribute of something else (probably the solvent container, otherwise the overall molecular system container).

If we're good with serialization here being non-human-readable, then I assume we'll just do ndarray.tobytes()?

I've always stored the full 3x3 array just for clarity (I don't think the extra few bytes is a major storage issue here). Relevant information for OpenMM: http://docs.openmm.org/latest/userguide/theory/05_other_features.html#periodic-boundary-conditions. Note the hard requirement on that set of inequalities -- this can be an issue with rounding (solved by passing input vectors through openmm.app.internal.unitcell.reducePeriodicBoxVectors before going to OpenMM; this is a common problem when setting up from a low-precision format like XTC).

@richardjgowers
Copy link
Contributor Author

This is part of the gufe.ChemicalSystem.__hash__, went for box.to_bytes() to get its hash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants