-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fragmentation API #49
Comments
This also would enable a bunch of easy optimizations. For example, a A future extension could also use |
@jchodera @ChayaSt @j-wags I wanted to restart this conversation about the Fragmenter API ahead of the planned integration into the OFFtoolkit. Following the idea of rebasing the code to use the I also like the idea of the I am not fully sure yet of how |
We want to avoid making the fragmentation scheme be a method of I think creating a I do not think the factory should store the parent molecule and fragments. That's not any sort of standard idiom or design pattern I'm familiar with. Instead, the factory should be something you create, configure, and use to create new objects that are returned by a factory method. The object created by the factory, however, could contain the parent and daughter fragments in a convenient data object model, however---call it a Writing out a few clear use cases is the easiest way to approach design. For example, consider a fragmenter that just returns new # Create a Fragmenter
fragmenter = WBOFragmenter(wbo_threshold=0.8, capping_rules='default')
# Further configure the fragmenter
fragmenter.set_wbo_threshold(0.7)
# Fragment a bunch of molecules
all_fragments = set()
for molecule in molecules:
fragments = fragmenter.fragment(molecule)
all_fragments.add(fragments) That could be very useful on its own. If we wanted to also bundle the results of fragmentation---including the computed WBOs for the parent and child molecules---we could do that through a # Create a Fragmenter
fragmenter = WBOFragmenter(wbo_threshold=0.8, capping_rules='default')
# Further configure the fragmenter
fragmenter.set_wbo_threshold(0.7)
# Fragment a bunch of molecules
all_fragments = set()
for molecule in molecules:
fragmented_molecule = fragmenter.fragment(molecule)
all_fragments.add(fragmented_molecule.fragments)
# we can also inspect the parent molecule
parent = fragmented_molecule.parent_molecule etc. But do we really need to preserve this relationship by bundling everything in an object? What information would be useful to bundle together, and can we make a compelling case for why? Filtering can simply be done by using a |
In light of the planned integration of Fragmenter into OFFTK, I agree with @jchodera's last proposal about not keeping links between the original molecule and the fragmentation results. Keeping/caching those relationships would introduce all sorts of surprising behaviors and edge cases -- For example a user who wants to fragment one million molecules would get stuck with a huge memory footprint, even if they wrote out the fragments after processing each molecule. Also, since
I like this idea, with the added bonus that the |
Are we set on integrating the |
@jchodera I recall that Fragmenter was planned for integration into OFFTK, but CMILES was slated to remain largely independent. I can dig up the meeting notes for that tonight if we'd like to reevaluate. |
I think it could still work do to this, provided we do so in a way that makes it easy to extend via subclassing and still use externally developed variants of |
The only reason to maintain the relationship between the parent and daughter molecule is to know which torsions to drive for bespoke torsion fitting because the fragments are generated for the rotatable bond. For general forcefield parameterization, there is no need for that relationship because all torsions (or combination of torsions) will be driven. I like the idea of having the |
I'm not sure I get this. My understanding is that the input to the fragmenter algorithm includes the torsion to fragment around, which would mean we start with that information. Or do you mean that this would be useful for a different API call in which you ask it to "fragment this whole molecule", which would then iteratively fragment around each rotatable bond? In that case, we'd like to know which fragments go with which bond. |
I think with the bespoke fitting for a molecule with multiple bonds users will want to fragment around all of them and refit the parameters which is why I want to return the relational information for each torsion. If this was not built into fragmenter it could instead be part of the bespoke fitting package which could make multiple calls to fragmenter and record the relation its self, but I think this functionality would make more sense in fragmenter. |
Currently, the torsion to fragment around is not part of Validating fragmentation schemes is another reason to retain the parental relationship but since it will probably not happen very often it is not that important. The relationship is needed when validating because it makes it straight forward to find all fragments that have the bond of interest. It can be a good idea to have the torsion of interest be part of the input because that will reduce the cost of fragmenting in general. |
That might be a nice extension to the API to have the option to fragment all or fragment around a list of user-supplied bonds. In the future, we may have some method of estimating the performance of a set of parameters applied to a molecule (for bespoke fitting), which could indicate some suspect torsions that should be refit. In this case, only fragmenting around these bonds would be more efficient. So when fragmenting one molecule we could # Create a Fragmenter
fragmenter = WBOFragmenter(wbo_threshold=0.8, capping_rules='default')
# Further configure the fragmenter
fragmenter.set_wbo_threshold(0.7)
# fragment our molecule around target bonds
target_bonds = [(0,1), (3,4)]
fragmented_molecules = fragmenter.fragment(molecule, target_bonds=target_bonds) The default use of the fragmenter would then still fragment around all rotable bonds. fragmented_molecules = fragmenter.fragment(molecule, target_bonds='all') |
Can you folks make some concrete proposals about the API you're envisioning here? Simple examples illustrating use (like my example above) make it much easier to see what you're proposing. |
My proposal for the API is:
The fragmenter settings can be changed on the object directly:
The fragmentation engine will inherit from the The The output data models (again
A mapping between each fragment and the parent could then be generated by:
|
@SimonBoothroyd : I like the idea of using pydantic to be able to easily serialize class options, but how would one modify the # Change an option after factory has been instantiated
fragmenter.options.threshold = 0.04 |
I've been playing around with this a bit more and it seems to make sense to make the fragmenter class inherit directly from a pydantic base model. In this way options can be easily changed like normal properties:
and the whole factory can be easily (de)serialized:
I've updated the above comment to reflect this proposed API. |
@SimonBoothroyd : This is great! I was just looking into this, and it sounds like this is the best way to go. Any class fields prepended with |
Great! I've implemented this now in #103 (example notebook here) if anyone has any additional feedback they'd like to get in. |
The API for fragmenting is a little awkward:
Instead of making the
Fragmenter
object only operate on a single molecule, and splitting all the options that control its behavior across the constructor (WBOFragmenter()
) and thefragment()
method, what about making this a factory?You could still produce an object for each molecule that had all the information about the fragments, like
FragmentSet
. Using this API would look something like this:The text was updated successfully, but these errors were encountered: