Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__getitem___ Implementation for ForceField class #505

Closed
umesh-timalsina opened this issue Feb 10, 2021 · 2 comments
Closed

__getitem___ Implementation for ForceField class #505

umesh-timalsina opened this issue Feb 10, 2021 · 2 comments
Assignees
Labels

Comments

@umesh-timalsina
Copy link
Member

Related to #238 and #188 and #192. Currently, the ForceField class in GMSO is just a container with a bunch of dictionaries for each potential terms. You can search an AtomType, BondType, AngleType etc... by their class/type names with . However, the following things are missing from the current implementation:

  1. No support for BondTypes/AngleTypes/DihedralTypes with wildcards. Full or Partial wildcards support
  2. No utility to support wildcard patterns for an AtomType class name.

Resolution

After #501, we can support arbitrary tags for any Potential type in gmso. Which means if we want to add wildcard patterns that a particular atom type is supposed to match, these patterns (list, set) can be added and saved as a tag like atom_class_patterns for an AtomType. A tokenizer for wildcard tokens for an AtomType might look like( Assuming that there's no branching in partial wild cards for a particular type/class (which I am not convinced is the case and warrants further discussion)):

class WildCardTokenizer:
    def __init__(self, token: str) -> None:
        self.token = token
        self.tokens_chain = []
        self._initialize()

    def _initialize(self) -> None:
        self.tokens_chain.append(self.token)
        max_len = len(self.token)
        self.tokens_chain = list(f'{self.token[0:max_len-j]}'
                                 if j == 0 else f'{self.token[0:max_len-j]}*'
                                 for j in range(0, max_len+1))
>>> print(WildCardTokenizer('CHH').tokens_chain)
['CHH', 'CH*', 'C*', '*']

And the AtomType can be extended to have a method like:

    def initialize_match_tokens(self, overwrite=True) -> None:
        """Add wildcard tokens for this atomtype's potential matches"""
        if self.atomclass:
            self.add_tag(
                'class_tokens',
                WildCardTokenizer(self.atomclass),
                overwrite=overwrite
            )
        if self.name:
            self.add_tag(
                'type_tokens',
                WildCardTokenizer(self.name),
                overwrite=overwrite
            )

Now comes the search problem in the ForceFiled. I think a st. forward way to do it would be to override object.__getitem__ for the ForceField class to search for wildcard matches in multiple passes. For example if a ForceField has a dihedral type like:

'*~Ar~Ar~*': <DihedralType DihedralType1, id 139964614361616>,

A Dihedral in a Topology with four Atoms with their atomclasses ['Ar', 'Ar', 'Ar', 'Ar'], should match the above dihedral type while parametrizing the Topology.

Unanswered Questions

  1. What rules if any should be followed while matching partial/full wildcards?
  2. Are there any restrictions in the presence of wildcards?
  3. What is the precedence order in case a multiple match is found?
  4. Any alternative ideas for the resolution of the problem?
@umesh-timalsina
Copy link
Member Author

In #506, I have implemented a tokenizer class. I am going for an exact match rule there. But, I think there can be a regex based solution as well. Lets look at an example:

>>> from gmso import ForceField
>>> from gmso.tests.utils import get_path
>>> ff = ForceField(get_path('ff-example0.xml'))
>>> ff.atom_types
{'Ar': <AtomType Ar, id 139711668333968>, 'Xe': <AtomType Xe, id 139711668558096>, 'Li': <AtomType Li, id 139711668346128>}
>>> ff.atom_types['Ar'].tags
{'element': 'Ar', 'class_tokens': <gmso.utils.wildcard.WildCardTokenizer object at 0x7f112dd93090>, 'type_tokens': <gmso.utils.wildcard.WildCardTokenizer object at 0x7f112861df90>}
>>> ff.atom_types['Ar'].get_tag('class_tokens').tokens_chain
['Ar', 'A*', '*']
>>> ff.atom_types['Xe'].get_tag('class_tokens').tokens_chain
['Xe', 'X*', '*']

Now, if these to AtomClasses i.e. Xe and Ar are associated with two atoms that have a Bond in a Topology, while parameterizing the sytem using this ForceField, The following precedence order should be followed:

  1. If there is a BondType, where member types is [ Xe, Ar ] in the ForceField the3. If there as BondType where member types is [ *, Ar ] in the ForceField the Bond should be assigned that BondType.
  2. If there as BondType where member types is [ *, Ar ] in the ForceField the Bond should be assigned that BondType.
    Bond should be assigned that BondType.
  3. If there as BondType where member types is [ X*, Ar ] in the ForceField the Bond should be assigned that BondType.
  4. If there as BondType where member types is [ *, Ar ] in the ForceField the Bond should be assigned that BondType.
  5. If there as BondType where member types is [ Xe, * ] in the ForceField the Bond should be assigned that BondType. etc...

However, there are multiple cases for these rules and I think we need some domain expertise to generalize these rules @mosdef-hub/mosdef-contributors.

@umesh-timalsina
Copy link
Member Author

Closed by #519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant