-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API changes for ForceField serialization #291
Conversation
parameter addition, add "discard_cosmetic_attributes" kwarg
@@ -652,6 +646,40 @@ def __init__(self, permit_cosmetic_attributes=False, **kwargs): | |||
"'permit_cosmetic_attributes=True'".format(key, self.__class__)) | |||
|
|||
|
|||
def _add_parameters(self, section_dict, permit_cosmetic_attributes=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've separated the parameter addition code out from the ParameterHandler initialization code. This way, we can go through multiple rounds of parameter addition from different SMIRNOFF data sources with a single ParameterHandler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs): | |||
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler. | |||
super().__init__(**kwargs) | |||
|
|||
def check_handler_compatibility(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an explicit check_handler_compatibility
function for each ParameterHandler
currently in use. These functions are nearly identical, and could be refactored/consolidated once we get a sense for how well they work.
check_handler_compatibility
now takes another ParameterHandler
as a parameter, instead of a SMIRNOFF dict. So now, when we're about to load a SMIRNOFF data section that we already have a ParameterHandler
for, we initialize a temporary new handler from the new section header information, and ensure that our physics are compatible with that. This new approach will let us use ParameterHandler.__init__
's behavior to process the new SMIRNOFF data section for sanity and set any default values, instead of trying to duplicate that behavior in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely like the idea of this! And for sure when you get more of a handle on what comparisons can be made there definitely looks to be scope for condensing things, but for now is probably fine (maybe throw in some todo's in case of bus?)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. In general, I think passing around objects (at least in the public API) rather than dict
s will generally improve encapsulation and thus extensibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to open an issue to remind us to look into code simplification.
Codecov Report
@@ Coverage Diff @@
## 0.3.0-branch #291 +/- ##
================================================
+ Coverage 75.21% 75.64% +0.42%
================================================
Files 17 17
Lines 5004 5129 +125
================================================
+ Hits 3764 3880 +116
- Misses 1240 1249 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good man - my main concern is using a string to define format but otherwise LGTM!
@@ -131,7 +131,7 @@ | |||
<Bond smirks="[#6X4:1]-[#6X3:2]" id="b2" k="634.0" length="1.51"/> | |||
</Bonds> | |||
<!-- WARNING: AMBER functional forms drop the factor of 2 in the angle energy term, so cross-comparing this file with a corresponding .frcmod file, it will appear that the values here are twice as large as they should be. --> | |||
<Angles angle_unit="degrees" k_unit="kilocalories_per_mole/radian**2"> | |||
<Angles angle_unit="degrees" k_unit="kilocalories_per_mole/radian**2" cosmetic_element="why not?"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be slightly safer to define `"why not?" as a constant, and then f string it in here, so that further down you can just compare against the exact cosmetic element which was used.
@@ -153,6 +153,8 @@ | |||
''' | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v. pedantic but maybe remove the extra space here.
Test roundtripping a ForceField to and from an XML file | ||
""" | ||
# These files will be deleted once garbage collection runs (end of this function) | ||
iofile1 = NamedTemporaryFile(suffix='.' + filename_extension) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I'm reading this correctly, but won't this sometimes produce suffixes of ..xml
or ..offxml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're right. Good catch!
forcefield_1.to_file(iofile1.name, discard_cosmetic_attributes=False, format=specified_format) | ||
|
||
# Ensure that the new XML string has cosmetic attributes in it | ||
assert 'cosmetic_element="why not?"' in open(iofile1.name).read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I was referring to in my above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see the benefit of doing this, but I'm concerned that the system is already confusing, and adding in string formatting commands would make it even more so.
available for lookup in the ForceField. | ||
|
||
.. warning :: This API is experimental and subject to change. | ||
|
||
Parameters | ||
---------- | ||
parameter_handler_class : A ParameterHandler-derived object | ||
The ParameterHandler to register | ||
parameter_handler : A ParameterHandler-derived object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just specifying a type of ParameterHandler
already implies that derived objects are also allowable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now that I look at it, I feel weird about how I use the word "derived" in these docstrings. I'm removing this phrase from the files.
|
||
Parameters | ||
---------- | ||
format : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good, although I'm really not sure about having format as just some string literal... it makes it really unclear as to which values it can take (e.g 'Xml, xml, XML, .xml, .offxml, ..., json, Json, JSON... etc) and doesn't lend itself well to static linting. My personal preference would be to have an Enum
which contains all of the available formats, and to pass this instead.
e.g: ff_string = ff.to_string(ForceField.Format.OFFXML, True)
or ff_string = ff.to_string(ForceField.Format.JSON, True)
if we extend support to other formats in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this it's less than ideal to have string literals instead of enumerated types.
Alternatively, we could have registration of a handler add actual methods like object.to_xml()
and object.from_xml()
, but that might make it harder to generally and flexibly handle many parameter types.
---------- | ||
filename : str | ||
The filename to write to | ||
format : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above ^^^^
@@ -652,6 +646,40 @@ def __init__(self, permit_cosmetic_attributes=False, **kwargs): | |||
"'permit_cosmetic_attributes=True'".format(key, self.__class__)) | |||
|
|||
|
|||
def _add_parameters(self, section_dict, permit_cosmetic_attributes=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
IncompatibleParameterError if handler_kwargs are incompatible with existing parameters. | ||
""" | ||
float_attrs_to_compare = [] | ||
string_attrs_to_compare = ['potential'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As opposed to splitting into different arrays, it may be a bit cleaner to have a single array of all attrs and then just do a type check in the below loops. If these checks (i.e abs(this_val - other_val) > 1.e-6
) are going to be used repeatedly they could probably just be dumped into static methods on the ParameterHandler
class.
Something like
@staticmethod
def compare_numbers(a, b, tolerance)
....
or something more like numpy's isclose
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a good first step for streamlining. For now, I want to see what this function turns into. There may be some cases where differences in some attributes are permissible, so I want to develop these separately a bit more to see how much flexibility a single inherited solution would require.
@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs): | |||
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler. | |||
super().__init__(**kwargs) | |||
|
|||
def check_handler_compatibility(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely like the idea of this! And for sure when you get more of a handle on what comparisons can be made there definitely looks to be scope for condensing things, but for now is probably fine (maybe throw in some todo's in case of bus?)!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some comments on the API!
self.register_parameter_handler(new_handler) | ||
return_handler = new_handler | ||
|
||
return return_handler | ||
|
||
def get_io_handler(self, io_format): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be get_parameter_io_handler
to be consistent with register_parameter_io_handler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'll also change get_handler
to get_parameter_handler
.
raise IOError(msg) | ||
|
||
|
||
def to_string(self, format, discard_cosmetic_attributes=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think format
should be an optional kwarg to avoid breaking assumptions about how object.to_*()
should typically work. The kwargs can modify behavior, but some sensible behavior should be default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, OK. I'll set the default to XML.
|
||
Parameters | ||
---------- | ||
format : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this it's less than ideal to have string literals instead of enumerated types.
Alternatively, we could have registration of a handler add actual methods like object.to_xml()
and object.from_xml()
, but that might make it harder to generally and flexibly handle many parameter types.
@@ -312,8 +303,6 @@ def prepend_all_keys(d, char='@'): | |||
prepend_all_keys(item) | |||
|
|||
prepend_all_keys(smirnoff_data['SMIRNOFF']) | |||
print(smirnoff_data) | |||
print() | |||
return xmltodict.unparse(smirnoff_data, pretty=True) | |||
|
|||
def to_xml(self, smirnoff_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is to_xml()
added to the ForceField
class, or is it something awkward like object.parameter_io_handlers['XML'].to_xml()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like you said, it would have to be ff_object.get_io_handler('XML').to_xml()
. And all is does is call XMLParameterIOHandler.to_string()
anyway, which is easier to access via ff_object.to_string('XML')
It's so unlikely that this would ever be used, that I'd be OK to remove it.
return_cosmetic_attributes : bool, optional. default = False | ||
Whether to return non-spec attributes of each ParameterType-derived object. | ||
discard_cosmetic_attributes : bool, optional. Default = True | ||
Whether to discard non-spec attributes of each ParameterType-derived object. | ||
|
||
Returns | ||
------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these deep copies of ParameterType
s, or the actual objects? Would modifying an object modify the corresponding object in ForceField
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are serialized representations of ParameterTypes (so, not ParameterType
s at all, but dicts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So modifying them has no effect on the corresponding object in ForceField
.
@@ -274,15 +274,15 @@ def __contains__(self, item): | |||
# Fall back to traditional access | |||
return list.__contains__(self, item) | |||
|
|||
def to_list(self, return_cosmetic_attributes=False): | |||
def to_list(self, discard_cosmetic_attributes=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a .to_list()
but not a .from_list()
?
@@ -447,16 +449,16 @@ def smirks(self, smirks): | |||
smirks, ensure_valence_type=self._VALENCE_TYPE) | |||
self._smirks = smirks | |||
|
|||
def to_dict(self, return_cosmetic_attributes=False): | |||
def to_dict(self, discard_cosmetic_attributes=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a .to_dict()
but not a .from_dict()
?
@@ -818,16 +846,16 @@ def postprocess_system(self, topology, system, **kwargs): | |||
""" | |||
pass | |||
|
|||
def to_dict(self, output_units=None, return_cosmetic_attributes=False): | |||
def to_dict(self, output_units=None, discard_cosmetic_attributes=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a .to_dict()
but not a .from_dict()
?
def check_handler_compatibility(self, | ||
other_handler): | ||
""" | ||
Checks whether this ParameterHandler encodes the same physics as another ParameterHandler. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say "the same physics", but at least "compatible physics".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Added.
@jchodera asked several questions of the form "why is there a This was a really interesting question to think about. What it comes down to is that the SMIRNOFF data model makes serialization and deserialization asymmetric. That is, it would be satisfying to be able to take a SMIRNOFF dict (hierarchical dict representation of an OFFXML), and pass it to the This is because there are at least two cases where information that is scientifically important to the
So, what really happens is:
Because of this, we can't directly serialize a ParameterType to a line in an OFFXML file. At a minimum, the owning I think that modifying the SMIRNOFF spec to achieve this symmetry would be more than just satisfying -- It would probably head off several errors and simplify development in the future. What do you think, @jchodera @SimonBoothroyd @andrrizzi @davidlmobley ? |
100% supportive of modifying the spec to make the architecture simpler! This would mean each entry would just have its own units attached, right? We can still have convenience methods that will convert all unit-bearing quantities of a force field or parameter class to a desired unit system. It could even be chained for convenience: forcefields.standardize_units().to_file(filename, format=XML) |
I definitely think that would be a good way forward, even if it adds some extra bulk / repeated information into the force field file itself. At the end of the day, I think we really want people to be accessing and editing the FF objects through the python API rather than the file itself, such that we don't need to be too concerned about slightly verbose files / the previously common copy paste errors! It seems like this change would very much help in allowing the force field to be serialized to different formats in the future, making things a bit more future proof. If I understand this correctly, making these changes would allow FF objects to be directly turned into ( / created from) a hierarchical I guess this would also allow easier integration of libraries like |
I also agree this makes sense. We don't want our thoughts of what we liked in terms of the spec to keep us from doing stuff which really matters/make software architecture hard, and most of those preferences were not strong preferences. |
Great. Thanks for the feedback, I think this is the right way to go. I'm going to handle the spec change in a later PR so the FF versioning can move forward for now (otherwise we'll never finish 0.3.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! I've added a few very minor comments in the review.
I also agree that the simplification obtained by changing the units OFFXML spec might be worth the redundant info.
Dict to be passed to the handler for construction or checking compatibility. If None, will be assumed | ||
to represent handler defaults. | ||
permit_cosmetic_attributes : bool, optional. Default = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outrageously nitpicky, but we used allow_
instead of permit_
somewhere else (at least) in allow_undefined_stereo
, and I wonder if we should change one of the two for naming consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I agree we should standardize on one, and am basically ambivalent about which. If I had to choose, I think allow
is slightly clearer.
|
||
.. warning :: This API is experimental and subject to change. | ||
|
||
Parameters | ||
---------- | ||
parameter_io_handler_class : A subclass of ParameterIOHandler | ||
parameter_io_handler : A ParameterIOHandler object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Good call!
|
||
Parameters | ||
---------- | ||
format : str, optional. Default='XML' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think of having formats codification here, for io_format
, and in toolkits.py
in an Enum
object instead of strings (probably for a later PR/API-breaking version).
We could also extend this to accept directly a ParameterIOHandler
object so that we could easily implement translations between specifications in the future. So maybe we can call this something like parameter_io_handler
, which accepts also a parameter handler key (i.e. a format), and the code could be something like
if isinstance(parameter_io_handler, ParameterIOHandler):
io_handler = parameter_io_handler
else:
io_handler = self. get_parameter_io_handler(parameter_io_handler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brilliant! Agree that IO format enums should come later, but will extend this to accept ParameterIOHandlers
right now
@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs): | |||
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler. | |||
super().__init__(**kwargs) | |||
|
|||
def check_handler_compatibility(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it. In general, I think passing around objects (at least in the public API) rather than dict
s will generally improve encapsulation and thus extensibility.
@@ -1069,6 +1097,33 @@ def __init__(self, **kwargs): | |||
# TODO: Do we want a docstring here? If not, check that docstring get inherited from ParameterHandler. | |||
super().__init__(**kwargs) | |||
|
|||
def check_handler_compatibility(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to open an issue to remind us to look into code simplification.
Thanks for the feedback, everyone. I think I've either addressed everything here, or moved it to a issue that we can track. Merging this into 0.3.0. |
This PR:
ParameterHandler
compatibility checksForceField
'sregister_X_handler
andget_X_handler
functions to accept initialized objects instead of classes (details in PR comments)Status: