### Welcome to the ProtoSyn.jl examples

# 2 - Selections

ProtoSyn comes equipped with a powerful selecting syntax, useful for highlighting and specifying targets for the several manipulation tools of ProtoSyn. In this example, we will take a closer look at the different type of selections and how to apply them. In order to have a canvas for this exploration, the 2A3D peptidic structure will be used.

In [1]:
using ProtoSyn

[36m | Loading TorchANI[39m
[36m | Loading ONNX models[39m
[36m | Loading PyRosetta[39m
[36m | Loading SeqDes[39m
[34m[1m[ Loading: [22m[39m[32mProtoSyn loaded successfully![39m

.      ____            _       ____              
      |  _ \ _ __ ___ | |_ ___/ ___| _   _ _ __  
      | |_) | '__/ _ \| __/ _ \___ \| | | | '_ \ 
      |  __/| | | (_) | || (_) |__) | |_| | | | |
      |_|   |_|  \___/ \__\___/____/ \__, |_| |_|
                                       |_/       
    
      ---------------------------------------------

 Version      : 1.10
 License      : GNU-GPL-3
 Developed by : José Pereira (jose.manuel.pereira@ua.pt)
                Sérgio Santos




In [2]:
pose = ProtoSyn.load("data/2a3d.pdb")

Pose{Topology}(Topology{/2a3d:922}, State{Float64}:
 Size: 1140
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

## Masks

Before starting to explore the different selection types, it's important to introduce the concept of Masks. All ProtoSyn selections can be applied to either a Pose or an AbstractContainer (a subset of the Pose's Graph) and return a Mask, a BitArray that states whether an Atom, Residue of Segment is currently selected or not. Of important notice is the fact that Masks, in ProtoSyn, are typed based on the level of the Graph they refer too: Atom, Residue or Segment level.

## Stateful and Stateless selections

Another important observation is the existance of Stateful and Stateless selections. As the name suggests, Stateless selections are not dependent on the Pose's State (as no information regarding the position of Atoms is important). In opposition, Stateful selections require a State to properly determine the selected Atom, Residue or Segment instances.

## a) Selecting by index

SerialSelections allow us to select Atom, Residue and Segment instances based on their `:index` or `:id`.

In [3]:
selection = SerialSelection{Atom}(10, :id)

SerialSelection › Atom.id = 10


In [4]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 1 / 1140
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

As stated before, a selection can be applied to a subset of the Pose's graph, such as a single Residue. This will loop over this subset of Atoms only and return a smaller Mask.

In [5]:
residue = pose.graph[1][1]
selection(residue)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (19,)
 ├── Count: 1 / 19
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

A selection type defines the type of instance looped over. In the next example, we will be selecting Residue instances, instead of Atom instances.

In [6]:
selection = SerialSelection{Residue}(71, :id)

SerialSelection › Residue.id = 71


In [7]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 1 / 73
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 1, 0, 0]

Finally, most selection types in ProtoSyn have a short syntax. In the next example, we will showcase some of the SerialSelection short syntax. Check the documentation for more details.

In [8]:
rid"71"(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 1 / 73
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 1, 0, 0]

In [9]:
rix"10"(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 1 / 73
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [10]:
aid"10"(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 1 / 1140
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## b) Selecting a range of instances

A common selection type is selecting a block of Atom, Residue or Segment instances, based on its `:id` or `:index` field. All instances with `:id` or `:index` between the given values are marked as selected (inclusive the ends).

In [11]:
selection = RangeSelection{Atom}(40:70, :id)

RangeSelection › Atom.id between 40 and 70


In [12]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 31 / 1140
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

A short syntax is available.

In [13]:
rid"1:10"(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 10 / 73
 └── Content: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## c) Selecting by field

FieldSelections allow the user to search the Pose for Atom, Residue or Segments with a given field value. This is, in essence, similar to the behaviour of SerialSelections, but with especialized short syntax versions. Here are some examples. Check the documentation for more details.

In [14]:
selection = FieldSelection{Residue}("ALA", :name)

FieldSelection › Residue.name = ALA


In [15]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 15 / 73
 └── Content: [0, 0, 0, 0, 1, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 1, 0, 0, 0, 0]

In [16]:
rn"ILE"(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 3 / 73
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [17]:
an"O"(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 73 / 1140
 └── Content: [0, 0, 0, 1, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

FieldSelections, as stated above, have a specialized short syntax that allows the usage of regular expressions when querying the Pose. Bellow are some examples.

In [18]:
selection = FieldSelection{Residue}("GL*", :name, is_regex = true)

FieldSelection › Residue.name = r"GL*"


In [19]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 25 / 73
 └── Content: [0, 1, 0, 0, 0, 1, 0, 0, 1, 1  …  1, 0, 1, 0, 1, 0, 0, 1, 0, 0]

In [20]:
an"C$|CA$"r(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 146 / 1140
 └── Content: [0, 1, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## d) Selecting based on distance

DistanceSelection instances allow the selection of Atom instances based on its distance to another set of selected atoms, selecting those under the given cut-off value (in Angstrom). As such, this is an example of a Stateful selections, as it requires the Pose's State to correctly determine the distance between the particles. Bellow are some examples. Check the documentation for more details.

In [21]:
selection = DistanceSelection(3.0, an"CA")

DistanceSelection ❯ Within 3.0 Å (Atom)
 └── FieldSelection › Atom.name = CA


In [22]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 813 / 1140
 └── Content: [1, 1, 1, 1, 1, 1, 0, 0, 1, 1  …  1, 0, 0, 1, 1, 1, 1, 1, 1, 0]

Note that all DistanceSelection instances create an Atom Mask (even if the given selection is of type Residue). Check the "Promotion" topic bellow or the documentation for more details. As with other types of Selection, a short syntax is available.

In [23]:
(5.0:rn"ALA")(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 793 / 1140
 └── Content: [0, 1, 1, 1, 0, 0, 0, 0, 0, 0  …  1, 1, 1, 0, 1, 1, 1, 1, 1, 1]

## e) Selecting all Atom, Residue and Segment instances

In certain cases it can be useful to select all instances of a given type, especially when combined with unary and binary selections (see bellow and the documentation for more details). For this, ProtoSyn makes available the TrueSelection, as exemplified next.

In [24]:
selection = TrueSelection{Residue}()

TrueSelection (Residue)


In [25]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 73 / 73
 └── Content: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

## f) Selecting the terminal ends of a Pose

Sometimes it can be useful to query the Pose for all the Residue instances that either have no children or are children of the Root of the Graph (marking them as terminals). This is achieved in ProtoSyn by applying the DownstreamTerminalSelection{Residue}() and UpstreamTerminalSelection{Residue}(), and is exemplified bellow. **Note that, in the given example, since we are loading the Pose vie the Core `ProtoSyn.load` method, no information regarding the Residue level graph is able to be inferred, and all Residue instances are marked as children of the Root. A more real result can be achieved by loading the PDB file with the more specific `ProtoSyn.Peptides.load` method, which correctly identifies Residue parenthood relationships in peptitic chains.**

In [26]:
selection = DownstreamTerminalSelection{Residue}() | UpstreamTerminalSelection{Residue}()

BinarySelection ❯  | "or" (Residue)
 ├── DownstreamTerminalSelection (Residue)
 └── UpstreamTerminalSelection (Residue)


In [27]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 73 / 73
 └── Content: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

## g) Randomly selecting instances of a given type

RandomSelections allow the user to retrieve a random Atom, Residue or Segment every time the selection is applied to a Pose, as exemplified bellow.

In [28]:
selection = RandomSelection{Residue}()

RandomSelection › Residue.id


In [29]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 1 / 73
 └── Content: [0, 0, 0, 0, 0, 0, 1, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Alternativelly, a RandomSelection may receive an input selection, selecting a random instance from only the pre-selected instances.

In [30]:
selection = RandomSelection{Atom}(rid"1")

RandomSelection › Atom.id › From
 └── SerialSelection › Residue.id = 1


In [31]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 1 / 1140
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## h) Randomly selecting a range of instances of a given type

Another common selection task is to retrieve a random range of consecutivelly numbered Atom, Residue or Segments, everytime the selection is applied to a Pose. This is achieved by employing the RandomRangeSelection, as exemplified next.

In [32]:
selection = RandomRangeSelection{Residue}()

RandomRangeSelection › Residue.id


In [33]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 69 / 73
 └── Content: [0, 0, 0, 0, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

## i) Selecting a random selection to apply to a Pose

A final example of random selections is the selection of a random selection. Everytime the RandomSelectionFromList is applied, a selection is chosen from the provided list and applied to the Pose. This allows the user to fine tune the exploration space for random selection, as exemplified bellow. Note that all selections in the list must be of the same type.

In [34]:
selection = RandomSelectionFromList([rid"1", rid"2"])

RandomSelectionFromList ❯ (Residue)
 ├── SerialSelection › Residue.id = 1
 └── SerialSelection › Residue.id = 2


In [35]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 1 / 73
 └── Content: [0, 1, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

## j) Selecting the opposite of a selection

UnarySelections select the non selected Atom, Residue and Segment instances of a given selection (commonly refered to as the "not" selection). The short syntax is exemplified bellow.

In [36]:
selection = !rid"1:3"

UnarySelection ❯ ! "not" (Residue)
 └── RangeSelection › Residue.id between 1 and 3


In [37]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 70 / 73
 └── Content: [0, 0, 0, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

## k) Binary combinations of selections

One important type of selection is the BinarySelection, allowing users to combine two or more selections using "and" and "or" operators, as shown next.

In [38]:
selection = BinarySelection(&, rn"ALA", an"CA")

BinarySelection ❯  & "and" (Atom)
 ├── FieldSelection › Residue.name = ALA
 └── FieldSelection › Atom.name = CA


In [39]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 15 / 1140
 └── Content: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

This can also be achieved using the available short syntax.

In [40]:
(an"CA" | an"C")(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 146 / 1140
 └── Content: [0, 1, 1, 0, 0, 0, 0, 0, 0, 0  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Note that, when combining more than 2 selections, the order is important. Specific grouping of selections can be achieved by using parenthesis, as exemplified bellow.

In [41]:
an"CA" & rid"1:3" | rid"6:7"

BinarySelection ❯  | "or" (Atom)
 ├── BinarySelection ❯  & "and" (Atom)
 |    ├── FieldSelection › Atom.name = CA
 |    └── RangeSelection › Residue.id between 1 and 3
 └── RangeSelection › Residue.id between 6 and 7


In [42]:
an"CA" & (rid"1:3" | rid"6:7")

BinarySelection ❯  & "and" (Atom)
 ├── FieldSelection › Atom.name = CA
 └── BinarySelection ❯  | "or" (Residue)
      ├── RangeSelection › Residue.id between 1 and 3
      └── RangeSelection › Residue.id between 6 and 7


## l) Promoting a selection from one type to another

A final selection type of the PromoteSelection, which takes a selection and transforms the resulting Mask from one type (for example, Residue) to another (Atom, for example).

In [43]:
selection = PromoteSelection(rid"1", Atom, all)

PromoteSelection ❯ From Residue to Atom
 └── SerialSelection › Residue.id = 1


In [44]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 19 / 1140
 └── Content: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Or, in a different syntax, using the `promote` method. By default, this uses the `any` operator, meaning that any instance of the requested type that contains at least one of the selected instances is marked as selected. An alterative would be to use the `all` operator, meaning that all instances of the requested type must be selected in the given selection to be marked as true in the resulting Mask.

In [45]:
selection = ProtoSyn.promote(an"CG", Residue)

PromoteSelection ❯ From Atom to Residue
 └── FieldSelection › Atom.name = CG


In [46]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 45 / 73
 └── Content: [1, 0, 0, 1, 0, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 0, 1, 1, 1, 1]

## Some more examples

In this short topic, an assorted set of complex selections is showcased.

+ Selecting all Residue instances where all atom instances are within 10 angstrom of any atom of the first residue.

In [47]:
ProtoSyn.promote((10:rid"1"), Residue, all)

PromoteSelection ❯ From Atom to Residue
 └── DistanceSelection ❯ Within 10 Å (Atom)
      └── SerialSelection › Residue.id = 1


+ Selecting all backbone Atom instances in a pose which belong to either an ALA or ARG residue.

In [48]:
an"^C$|^CA$|^N$|^H$|^O$|"r & (rn"ALA" | rn"ARG")

BinarySelection ❯  & "and" (Atom)
 ├── FieldSelection › Atom.name = r"^C$|^CA$|^N$|^H$|^O$|"
 └── BinarySelection ❯  | "or" (Residue)
      ├── FieldSelection › Residue.name = ALA
      └── FieldSelection › Residue.name = ARG


+ Selecting all CA atoms from a random residue range.

In [49]:
an"CA" & RandomRangeSelection{Residue}()

BinarySelection ❯  & "and" (Atom)
 ├── FieldSelection › Atom.name = CA
 └── RandomRangeSelection › Residue.id


+ Selecting all sidechain atoms in the first 20 or last 20 residues of the Pose.

In [50]:
!an"^C$|^CA$|^N$|^H$|^O$|"r & (rid"1:20" | rid"63:end")

BinarySelection ❯  & "and" (Atom)
 ├── UnarySelection ❯ ! "not" (Atom)
 |    └── FieldSelection › Atom.name = r"^C$|^CA$|^N$|^H$|^O$|"
 └── BinarySelection ❯  | "or" (Residue)
      ├── RangeSelection › Residue.id between 1 and 20
      └── RangeSelection › Residue.id larger than 63


***

## Peptides Module Selections

Some ProtoSyn modules may include extra selection types, specific for the topic of that module. This is the case with the Peptides module, adding the following selection types. Note that, for simplification of the syntax, all added selection types are available without the `ProtoSyn.Peptides` prefix, being available directly from ProtoSyn. For the next examples, the pose will be re-loaded, using the more specific `Peptides.load` method.

In [51]:
pose = ProtoSyn.Peptides.load("data/2a3d.pdb")

Pose{Topology}(Topology{/2a3d:3914}, State{Float64}:
 Size: 1140
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

## m) Selecting the polar aminoacids

A first addition is the ability to directly select only the polar aminoacids, as described in the `ProtoSyn.Peptides.polar_residues` list.

In [52]:
selection = PolarSelection()

PolarSelection › (Residue)


In [53]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 34 / 73
 └── Content: [0, 0, 1, 0, 0, 1, 0, 1, 1, 1  …  1, 1, 1, 0, 1, 0, 0, 1, 1, 1]

## n) Selecting the sidechain directly

In the previous examples, an approach to selecting the backbone and sidechain atoms was shown, using regular expressions. In the Peptides module, this is simplified, by employing the SidechainSelection directly.

In [54]:
selection = SidechainSelection()

SidechainSelection › (Atom)


In [55]:
selection(pose)

ProtoSyn.Mask
 ├── Type: Atom
 ├── Size: (1140,)
 ├── Count: 777 / 1140
 └── Content: [0, 0, 0, 0, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 0, 1, 1, 1, 1, 1]

## o) Selecting residues based on the current secondary structure

This final selection type attempts to select residue instances based on the phi and psi dihedral angle values, placing them in bins such as alpha helix, beta sheet, etc. Note that, in certain loop residues, the phi and psi dihedral angle values may still fall into the classification of one of these categories. Check the documentation for more details.

In [56]:
selection = SecondaryStructureSelection(:helix)

SecondaryStructureSelection › helix (± 50.0°)


In [57]:
selection(pose)

[33m[1m└ [22m[39m[90m@ ProtoSyn.Peptides ~/ProtoSyn.jl/src/Peptides/Types/types.jl:33[39m


ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 63 / 73
 └── Content: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 0, 0]

A short syntax is also available.

In [58]:
ss"helix"(pose)

[33m[1m└ [22m[39m[90m@ ProtoSyn.Peptides ~/ProtoSyn.jl/src/Peptides/Types/types.jl:33[39m


ProtoSyn.Mask
 ├── Type: Residue
 ├── Size: (73,)
 ├── Count: 63 / 73
 └── Content: [0, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 0, 0]

## Conclusion

In this example script we took a look into the rich and detailed selection syntax for ProtoSyn. In the next examples, this will be used to specify what parts of the structures are subjected to certain changes, allowing a great deal of control over the simulations performed in ProtoSyn.