Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API changes for ForceField serialization #291

Merged
merged 15 commits into from
May 6, 2019
Merged

Conversation

j-wags
Copy link
Member

@j-wags j-wags commented Apr 25, 2019

This PR:

Status:

  • Ready for review

@@ -652,6 +646,40 @@ def __init__(self, permit_cosmetic_attributes=False, **kwargs):
"'permit_cosmetic_attributes=True'".format(key, self.__class__))


def _add_parameters(self, section_dict, permit_cosmetic_attributes=False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've separated the parameter addition code out from the ParameterHandler initialization code. This way, we can go through multiple rounds of parameter addition from different SMIRNOFF data sources with a single ParameterHandler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs):
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler.
super().__init__(**kwargs)

def check_handler_compatibility(self,
Copy link
Member Author

@j-wags j-wags Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an explicit check_handler_compatibility function for each ParameterHandler currently in use. These functions are nearly identical, and could be refactored/consolidated once we get a sense for how well they work.

check_handler_compatibility now takes another ParameterHandler as a parameter, instead of a SMIRNOFF dict. So now, when we're about to load a SMIRNOFF data section that we already have a ParameterHandler for, we initialize a temporary new handler from the new section header information, and ensure that our physics are compatible with that. This new approach will let us use ParameterHandler.__init__'s behavior to process the new SMIRNOFF data section for sanity and set any default values, instead of trying to duplicate that behavior in this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely like the idea of this! And for sure when you get more of a handle on what comparisons can be made there definitely looks to be scope for condensing things, but for now is probably fine (maybe throw in some todo's in case of bus?)!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. In general, I think passing around objects (at least in the public API) rather than dicts will generally improve encapsulation and thus extensibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to open an issue to remind us to look into code simplification.

@j-wags j-wags changed the title [WIP] API changes for ForceField serialization API changes for ForceField serialization Apr 30, 2019
@j-wags j-wags requested a review from jchodera April 30, 2019 17:09
@codecov-io
Copy link

codecov-io commented Apr 30, 2019

Codecov Report

Merging #291 into 0.3.0-branch will increase coverage by 0.42%.
The diff coverage is 80.87%.

Impacted file tree graph

@@               Coverage Diff                @@
##           0.3.0-branch     #291      +/-   ##
================================================
+ Coverage         75.21%   75.64%   +0.42%     
================================================
  Files                17       17              
  Lines              5004     5129     +125     
================================================
+ Hits               3764     3880     +116     
- Misses             1240     1249       +9
Impacted Files Coverage Δ
openforcefield/typing/engines/smirnoff/io.py 70.96% <100%> (+6.89%) ⬆️
openforcefield/utils/structure.py 43.59% <100%> (ø) ⬆️
...enforcefield/typing/engines/smirnoff/parameters.py 76.07% <76.42%> (+0.43%) ⬆️
...enforcefield/typing/engines/smirnoff/forcefield.py 69.56% <89.47%> (+1.72%) ⬆️
openforcefield/utils/toolkits.py 87.01% <0%> (-0.12%) ⬇️
openforcefield/topology/topology.py 72.91% <0%> (+1.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44354b5...b4ba97f. Read the comment docs.

@j-wags j-wags requested a review from SimonBoothroyd May 1, 2019 22:58
Copy link
Contributor

@SimonBoothroyd SimonBoothroyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good man - my main concern is using a string to define format but otherwise LGTM!

@@ -131,7 +131,7 @@
<Bond smirks="[#6X4:1]-[#6X3:2]" id="b2" k="634.0" length="1.51"/>
</Bonds>
<!-- WARNING: AMBER functional forms drop the factor of 2 in the angle energy term, so cross-comparing this file with a corresponding .frcmod file, it will appear that the values here are twice as large as they should be. -->
<Angles angle_unit="degrees" k_unit="kilocalories_per_mole/radian**2">
<Angles angle_unit="degrees" k_unit="kilocalories_per_mole/radian**2" cosmetic_element="why not?">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be slightly safer to define `"why not?" as a constant, and then f string it in here, so that further down you can just compare against the exact cosmetic element which was used.

@@ -153,6 +153,8 @@
'''



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v. pedantic but maybe remove the extra space here.

Test roundtripping a ForceField to and from an XML file
"""
# These files will be deleted once garbage collection runs (end of this function)
iofile1 = NamedTemporaryFile(suffix='.' + filename_extension)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm reading this correctly, but won't this sometimes produce suffixes of ..xml or ..offxml?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. Good catch!

forcefield_1.to_file(iofile1.name, discard_cosmetic_attributes=False, format=specified_format)

# Ensure that the new XML string has cosmetic attributes in it
assert 'cosmetic_element="why not?"' in open(iofile1.name).read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I was referring to in my above comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see the benefit of doing this, but I'm concerned that the system is already confusing, and adding in string formatting commands would make it even more so.

available for lookup in the ForceField.

.. warning :: This API is experimental and subject to change.

Parameters
----------
parameter_handler_class : A ParameterHandler-derived object
The ParameterHandler to register
parameter_handler : A ParameterHandler-derived object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just specifying a type of ParameterHandler already implies that derived objects are also allowable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now that I look at it, I feel weird about how I use the word "derived" in these docstrings. I'm removing this phrase from the files.


Parameters
----------
format : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good, although I'm really not sure about having format as just some string literal... it makes it really unclear as to which values it can take (e.g 'Xml, xml, XML, .xml, .offxml, ..., json, Json, JSON... etc) and doesn't lend itself well to static linting. My personal preference would be to have an Enum which contains all of the available formats, and to pass this instead.

e.g: ff_string = ff.to_string(ForceField.Format.OFFXML, True) or ff_string = ff.to_string(ForceField.Format.JSON, True) if we extend support to other formats in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this it's less than ideal to have string literals instead of enumerated types.

Alternatively, we could have registration of a handler add actual methods like object.to_xml() and object.from_xml(), but that might make it harder to generally and flexibly handle many parameter types.

----------
filename : str
The filename to write to
format : str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above ^^^^

@@ -652,6 +646,40 @@ def __init__(self, permit_cosmetic_attributes=False, **kwargs):
"'permit_cosmetic_attributes=True'".format(key, self.__class__))


def _add_parameters(self, section_dict, permit_cosmetic_attributes=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

IncompatibleParameterError if handler_kwargs are incompatible with existing parameters.
"""
float_attrs_to_compare = []
string_attrs_to_compare = ['potential']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As opposed to splitting into different arrays, it may be a bit cleaner to have a single array of all attrs and then just do a type check in the below loops. If these checks (i.e abs(this_val - other_val) > 1.e-6) are going to be used repeatedly they could probably just be dumped into static methods on the ParameterHandler class.

Something like

@staticmethod
def compare_numbers(a, b, tolerance)
    ....

or something more like numpy's isclose method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a good first step for streamlining. For now, I want to see what this function turns into. There may be some cases where differences in some attributes are permissible, so I want to develop these separately a bit more to see how much flexibility a single inherited solution would require.

@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs):
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler.
super().__init__(**kwargs)

def check_handler_compatibility(self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely like the idea of this! And for sure when you get more of a handle on what comparisons can be made there definitely looks to be scope for condensing things, but for now is probably fine (maybe throw in some todo's in case of bus?)!

Copy link
Member

@jchodera jchodera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some comments on the API!

self.register_parameter_handler(new_handler)
return_handler = new_handler

return return_handler

def get_io_handler(self, io_format):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be get_parameter_io_handler to be consistent with register_parameter_io_handler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I'll also change get_handler to get_parameter_handler.

raise IOError(msg)


def to_string(self, format, discard_cosmetic_attributes=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think format should be an optional kwarg to avoid breaking assumptions about how object.to_*() should typically work. The kwargs can modify behavior, but some sensible behavior should be default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. I'll set the default to XML.


Parameters
----------
format : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this it's less than ideal to have string literals instead of enumerated types.

Alternatively, we could have registration of a handler add actual methods like object.to_xml() and object.from_xml(), but that might make it harder to generally and flexibly handle many parameter types.

@@ -312,8 +303,6 @@ def prepend_all_keys(d, char='@'):
prepend_all_keys(item)

prepend_all_keys(smirnoff_data['SMIRNOFF'])
print(smirnoff_data)
print()
return xmltodict.unparse(smirnoff_data, pretty=True)

def to_xml(self, smirnoff_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is to_xml() added to the ForceField class, or is it something awkward like object.parameter_io_handlers['XML'].to_xml()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you said, it would have to be ff_object.get_io_handler('XML').to_xml(). And all is does is call XMLParameterIOHandler.to_string() anyway, which is easier to access via ff_object.to_string('XML')

It's so unlikely that this would ever be used, that I'd be OK to remove it.

return_cosmetic_attributes : bool, optional. default = False
Whether to return non-spec attributes of each ParameterType-derived object.
discard_cosmetic_attributes : bool, optional. Default = True
Whether to discard non-spec attributes of each ParameterType-derived object.

Returns
-------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these deep copies of ParameterTypes, or the actual objects? Would modifying an object modify the corresponding object in ForceField?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are serialized representations of ParameterTypes (so, not ParameterTypes at all, but dicts).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So modifying them has no effect on the corresponding object in ForceField.

@@ -274,15 +274,15 @@ def __contains__(self, item):
# Fall back to traditional access
return list.__contains__(self, item)

def to_list(self, return_cosmetic_attributes=False):
def to_list(self, discard_cosmetic_attributes=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a .to_list() but not a .from_list()?

@@ -447,16 +449,16 @@ def smirks(self, smirks):
smirks, ensure_valence_type=self._VALENCE_TYPE)
self._smirks = smirks

def to_dict(self, return_cosmetic_attributes=False):
def to_dict(self, discard_cosmetic_attributes=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a .to_dict() but not a .from_dict()?

@@ -818,16 +846,16 @@ def postprocess_system(self, topology, system, **kwargs):
"""
pass

def to_dict(self, output_units=None, return_cosmetic_attributes=False):
def to_dict(self, output_units=None, discard_cosmetic_attributes=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have a .to_dict() but not a .from_dict()?

def check_handler_compatibility(self,
other_handler):
"""
Checks whether this ParameterHandler encodes the same physics as another ParameterHandler. This is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say "the same physics", but at least "compatible physics".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Added.

@j-wags
Copy link
Member Author

j-wags commented May 3, 2019

@jchodera asked several questions of the form "why is there a to_X function and not a from_X function?"

This was a really interesting question to think about. What it comes down to is that the SMIRNOFF data model makes serialization and deserialization asymmetric. That is, it would be satisfying to be able to take a SMIRNOFF dict (hierarchical dict representation of an OFFXML), and pass it to the ForceField initializer, and have that pass each parameter section into the appropriate ParameterHandler initializer, and have that pass the parameter list into the ParameterList initializer, and have that pass each individual parameter into the appropriate ParameterType initializer. But we can't.

This is because there are at least two cases where information that is scientifically important to the ParameterType is actually stored in the section header (and therefore becomes an attribute of the ParameterHandler). These are:

  • units
  • Torsions's default_idivf

So, what really happens is:

  • the ParameterHandler is initialized using the section dict as kwargs for its initializer
  • the ParameterHandler initializes a ParameterList for itself (no kwargs)
  • the ParameterHandler modifies each parameter (which is still a dict) by attaching units,
  • the ParameterHandler uses the unit-attached dict as kwargs to the ParameterType initializer,
  • the ParameterHandler finally appends the initialized ParameterType to its ParameterList.

Because of this, we can't directly serialize a ParameterType to a line in an OFFXML file. At a minimum, the owning ParameterHandler must decide which units all of its ParameterTypes will use, and then convert all of the serialized ParameterTypes it owns to use those values.

I think that modifying the SMIRNOFF spec to achieve this symmetry would be more than just satisfying -- It would probably head off several errors and simplify development in the future. What do you think, @jchodera @SimonBoothroyd @andrrizzi @davidlmobley ?

@jchodera
Copy link
Member

jchodera commented May 3, 2019

100% supportive of modifying the spec to make the architecture simpler! This would mean each entry would just have its own units attached, right?

We can still have convenience methods that will convert all unit-bearing quantities of a force field or parameter class to a desired unit system. It could even be chained for convenience:

forcefields.standardize_units().to_file(filename, format=XML)

@SimonBoothroyd
Copy link
Contributor

I definitely think that would be a good way forward, even if it adds some extra bulk / repeated information into the force field file itself. At the end of the day, I think we really want people to be accessing and editing the FF objects through the python API rather than the file itself, such that we don't need to be too concerned about slightly verbose files / the previously common copy paste errors!

It seems like this change would very much help in allowing the force field to be serialized to different formats in the future, making things a bit more future proof. If I understand this correctly, making these changes would allow FF objects to be directly turned into ( / created from) a hierarchical dict without any extra processing, by which I mean every object which makes up the FF object would be able to turn itself into (/ be created from) a dict, without knowledge of any other objects which would be a big simplification!

I guess this would also allow easier integration of libraries like attrs (or pydantic) which could allow cleaner class definitions / validation logic.

@davidlmobley
Copy link
Contributor

I also agree this makes sense. We don't want our thoughts of what we liked in terms of the spec to keep us from doing stuff which really matters/make software architecture hard, and most of those preferences were not strong preferences.

@j-wags
Copy link
Member Author

j-wags commented May 4, 2019

Great. Thanks for the feedback, I think this is the right way to go. I'm going to handle the spec change in a later PR so the FF versioning can move forward for now (otherwise we'll never finish 0.3.0)

Copy link
Collaborator

@andrrizzi andrrizzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! I've added a few very minor comments in the review.

I also agree that the simplification obtained by changing the units OFFXML spec might be worth the redundant info.

Dict to be passed to the handler for construction or checking compatibility. If None, will be assumed
to represent handler defaults.
permit_cosmetic_attributes : bool, optional. Default = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outrageously nitpicky, but we used allow_ instead of permit_ somewhere else (at least) in allow_undefined_stereo, and I wonder if we should change one of the two for naming consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I agree we should standardize on one, and am basically ambivalent about which. If I had to choose, I think allow is slightly clearer.


.. warning :: This API is experimental and subject to change.

Parameters
----------
parameter_io_handler_class : A subclass of ParameterIOHandler
parameter_io_handler : A ParameterIOHandler object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it. Good call!


Parameters
----------
format : str, optional. Default='XML'
Copy link
Collaborator

@andrrizzi andrrizzi May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think of having formats codification here, for io_format, and in toolkits.py in an Enum object instead of strings (probably for a later PR/API-breaking version).

We could also extend this to accept directly a ParameterIOHandler object so that we could easily implement translations between specifications in the future. So maybe we can call this something like parameter_io_handler, which accepts also a parameter handler key (i.e. a format), and the code could be something like

if isinstance(parameter_io_handler, ParameterIOHandler):
    io_handler = parameter_io_handler
else:
    io_handler = self. get_parameter_io_handler(parameter_io_handler)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant! Agree that IO format enums should come later, but will extend this to accept ParameterIOHandlers right now

@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs):
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler.
super().__init__(**kwargs)

def check_handler_compatibility(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it. In general, I think passing around objects (at least in the public API) rather than dicts will generally improve encapsulation and thus extensibility.

@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs):
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler.
super().__init__(**kwargs)

def check_handler_compatibility(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to open an issue to remind us to look into code simplification.

@j-wags j-wags merged commit e2e1092 into 0.3.0-branch May 6, 2019
@j-wags
Copy link
Member Author

j-wags commented May 6, 2019

Thanks for the feedback, everyone. I think I've either addressed everything here, or moved it to a issue that we can track. Merging this into 0.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants