### Welcome to the ProtoSyn.jl examples

# 1 - Getting Started

In this first example script, we will explore how ProtoSyn.jl is organized and what are the available data structures. The example script is divided in 4 parts:
+ Loading a PDB file as a Pose
+ Exploring the Graph structure
+ Exploring the State structure
+ Export a Pose as a PDB file

## Loading a PDB file as a Pose

A Pose is the main data structure in ProtoSyn, and is subdivided in a directed Graph, and a State. We will explore these latter. In order to load a new Pose from a file, we can use the `load` function. 

In [1]:
using ProtoSyn

┌ Info: Precompiling ProtoSyn [c9758760-7c0d-11e9-0ffc-fb9355b7d293]
└ @ Base loading.jl:1317
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mLoading required packages
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m | Loading SIMD
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m | Loading CUDA
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mSetting up variables
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mCurrent acceleration set to ProtoSyn.Acceleration(ProtoSyn.CUDA_2)
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mLoading Core
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mLoading Calculators
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m | Loading TorchANI
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m | Loading Restraint Models
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39m | Loading Energy Function
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mLoading Mutators
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mLoading Drivers
[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mLoading Peptides
[36


.      ____            _       ____              
      |  _ \ _ __ ___ | |_ ___/ ___| _   _ _ __  
      | |_) | '__/ _ \| __/ _ \___ \| | | | '_ \ 
      |  __/| | | (_) | || (_) |__) | |_| | | | |
      |_|   |_|  \___/ \__\___/____/ \__, |_| |_|
                                       |_/       
    
      ---------------------------------------------

 Version      : 0.90
 License      : GNU-GPL-3
 Developed by : José Pereira (jose.manuel.pereira@ua.pt)
                Sérgio Santos


In [29]:
pose = ProtoSyn.load("data/2a3d.pdb")

Pose{Topology}(Topology{/2a3d:4745}, State{Float64}:
 Size: 1140
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

Note that ProtoSyn automatically detects the file format from the file name ending (currently supports PDB and YML formats). As stated before, a Pose is comprised of 2 important data structures: the Graph and the State. In one hand, the Graph is responsible to maintain an accurate representation of the atoms contained in the pose, their organization and relationship with eachother. For example, what atoms belong to a given aminoacid in a peptidic chain, or what other atoms are bonded to any given atom. Unless a molecular manipulation task is performed (such as a mutation of appendage of residues), the Graph information remains stable and immutable throughout a simulation. On the other hand, a State is responsible to hold the information regarding the 3D position of each atom in the Pose, and usually changes each step over the course of a simulation job. Both these structures will be explored more in depth in the following sections.

## Exploring the Graph structure

In [30]:
pose.graph

Topology{/2a3d:4745}

Firstly, a Graph is comprised of several levels of data organization, each subsequentially more specialized. The levels are the following: at the top level, a Topology holds all chains of a molecular structure. Each chain is called a Segment, and holds one or more Residue instances. In the case of Peptides, a Residue can be considered an aminoacid, por example. Each Residue holds one or more Atom instances, the lowest level of organization.

### Topology > Segment > Residue > Atom

In [31]:
pose.graph.items

1-element Vector{Segment}:
 Segment{/2a3d:4745/A:1}

In [32]:
pose.graph.items[1].items

73-element Vector{Residue}:
 Residue{/2a3d:4745/A:1/MET:1}
 Residue{/2a3d:4745/A:1/GLY:2}
 Residue{/2a3d:4745/A:1/SER:3}
 Residue{/2a3d:4745/A:1/TRP:4}
 Residue{/2a3d:4745/A:1/ALA:5}
 Residue{/2a3d:4745/A:1/GLU:6}
 Residue{/2a3d:4745/A:1/PHE:7}
 Residue{/2a3d:4745/A:1/LYS:8}
 Residue{/2a3d:4745/A:1/GLN:9}
 Residue{/2a3d:4745/A:1/ARG:10}
 Residue{/2a3d:4745/A:1/LEU:11}
 Residue{/2a3d:4745/A:1/ALA:12}
 Residue{/2a3d:4745/A:1/ALA:13}
 ⋮
 Residue{/2a3d:4745/A:1/ALA:62}
 Residue{/2a3d:4745/A:1/ILE:63}
 Residue{/2a3d:4745/A:1/ARG:64}
 Residue{/2a3d:4745/A:1/ASP:65}
 Residue{/2a3d:4745/A:1/GLU:66}
 Residue{/2a3d:4745/A:1/LEU:67}
 Residue{/2a3d:4745/A:1/GLN:68}
 Residue{/2a3d:4745/A:1/ALA:69}
 Residue{/2a3d:4745/A:1/TYR:70}
 Residue{/2a3d:4745/A:1/ARG:71}
 Residue{/2a3d:4745/A:1/HIS:72}
 Residue{/2a3d:4745/A:1/ASN:73}

In [33]:
pose.graph.items[1].items[1].items

19-element Vector{Atom}:
 Atom{/2a3d:4745/A:1/MET:1/N:1}
 Atom{/2a3d:4745/A:1/MET:1/CA:2}
 Atom{/2a3d:4745/A:1/MET:1/C:3}
 Atom{/2a3d:4745/A:1/MET:1/O:4}
 Atom{/2a3d:4745/A:1/MET:1/CB:5}
 Atom{/2a3d:4745/A:1/MET:1/CG:6}
 Atom{/2a3d:4745/A:1/MET:1/SD:7}
 Atom{/2a3d:4745/A:1/MET:1/CE:8}
 Atom{/2a3d:4745/A:1/MET:1/HA:9}
 Atom{/2a3d:4745/A:1/MET:1/HB2:10}
 Atom{/2a3d:4745/A:1/MET:1/HB3:11}
 Atom{/2a3d:4745/A:1/MET:1/HG2:12}
 Atom{/2a3d:4745/A:1/MET:1/HG3:13}
 Atom{/2a3d:4745/A:1/MET:1/HE1:14}
 Atom{/2a3d:4745/A:1/MET:1/HE2:15}
 Atom{/2a3d:4745/A:1/MET:1/HE3:16}
 Atom{/2a3d:4745/A:1/MET:1/H1:17}
 Atom{/2a3d:4745/A:1/MET:1/H2:18}
 Atom{/2a3d:4745/A:1/MET:1/H3:19}

In this example, the 2A3D structure ony has 1 chain, which in turn contains 73 Residue instances (or aminoacids, in this case). As example, the first Residue is a MET aminoacid, with 19 Atoms.

There are multiple ways of accessing this information:

1. As show above, each of the levels in a Graph has an `:items` field (a list of the lower level instances contained within). This can easily become too cumbersome to type.

2. We can also access this list using a short syntax: `pose.graph[1][1]`

3. Or using the condensed syntax: `pose.graph[1, 1]`

In [34]:
pose.graph[1][1][1]

Atom{/2a3d:4745/A:1/MET:1/N:1}

In [35]:
pose.graph[1, 1, 1]

Atom{/2a3d:4745/A:1/MET:1/N:1}

When dealing with Atom instances inside a Residue, an extra way of indexing the atom is provided: based on the name of the atom. Following the IUPAC reccomendations, Atom names in a residue should be unique. In ProtoSyn, this information is stored in the Residue `:itemsbyname` field, as a dictionary. As such, the last level of indexation can be the Atom's name.

In [36]:
pose.graph[1][1]["CA"]

Atom{/2a3d:4745/A:1/MET:1/CA:2}

In [37]:
pose.graph[1, 1, "CA"]

Atom{/2a3d:4745/A:1/MET:1/CA:2}

Another important aspect of the Graph structure is that it is an example of a directed graph. This means that every data point in the graph has a parenthood relationship with the rest of the structure. This is applied, in ProtoSyn, at the levels of both the Atom and Residue instances. This means that every Atom in a Pose has a `:parent` and may have one or more `:children` atoms, and the same applies to Residue parenthood relationships.

In [38]:
pose.graph[1, 1, "CA"].parent

Atom{/2a3d:4745/A:1/MET:1/N:1}

In [39]:
pose.graph[1, 1, "CA"].children

3-element Vector{Atom}:
 Atom{/2a3d:4745/A:1/MET:1/C:3}
 Atom{/2a3d:4745/A:1/MET:1/CB:5}
 Atom{/2a3d:4745/A:1/MET:1/HA:9}

Note that parenhood relationships do not necessarilly follow the physical bonds of a molecular structure. In fact, Atom instances have a `:bond` field that specifies the bonds that include that atom, while only have a single parent Atom.

In [40]:
pose.graph[1, 1, "CA"].bonds

4-element Vector{Atom}:
 Atom{/2a3d:4745/A:1/MET:1/N:1}
 Atom{/2a3d:4745/A:1/MET:1/C:3}
 Atom{/2a3d:4745/A:1/MET:1/CB:5}
 Atom{/2a3d:4745/A:1/MET:1/HA:9}

A directed graph allows us to traverse the totallity of the molecular structure until a break or cut is encountered. It is also useful when applying internal coordinates, which will be discussed further ahead. Note that, when loading a Pose from a file, most parenthood relationships are infered, and may not be correct. Always verify the integrity of your Graph. Some especialized `load` functions can be found in other ProtoSyn modules, such as the Peptides module, allowing the program to correctly assign parenthood relationships.

## Exploring the State structure

As state before, a State contains the information regarding the 3D position of an Atom particle. In ProtoSyn, this is performed using two complementary coordinate systems: the internal coordinates and the cartesian coordinates. The cartesian coordinates `:t` are a set of 3 dimensions: the X, Y and Z position; while the internal coordinates position the Atom based on the relational dimension: the distance, angle and dihedral angle to the previously placed Atoms. In other words, while the cartesian coordinates are independent from the rest of the structure, the internal coordinates place each atom in relationship to the previous atoms (called the `:ascendents`). As such, an atom is placed at a distance `:b` from its `parent`; at an angle `:θ` from its `:parent` and its `parent.parent`; and at a dihedral angle `ϕ` from its `parent`, `parent.parent` and `parent.parent.parent`. This is one of the main reasons for using a directed Graph.

Each of these coordinate systems information is stored individually for each Atom, in an AtomState structure.

In [41]:
pose.state.items[4]

AtomState{Float64}:
 Index: 1
 T: [55.881, 8.038, 16.840]
 b: 58.914 Å | θ:  2.819 rad ( 161.53°) | ϕ:  1.125 rad (  64.48°) | Δϕ:  0.000 rad (   0.00°)
 Changed: false


Following the template set with the Graph, the State also has a list of `:items`. A shorter (and more reccomended) syntax for indexing the AtomStates is as follows:

In [42]:
pose.state[1]

AtomState{Float64}:
 Index: 1
 T: [55.881, 8.038, 16.840]
 b: 58.914 Å | θ:  2.819 rad ( 161.53°) | ϕ:  1.125 rad (  64.48°) | Δϕ:  0.000 rad (   0.00°)
 Changed: false


Note that, when indexing via the `:items` list, there are 3 pseudo atoms at positions 1, 2 and 3, but are ignored when using the shorter syntax. These are called the Root atoms. In order to place the first Atom, using the internal coordinates, that atom must have a `parent`, a `parent.parent` and a `parent.parent.parent`. For this reason, all Pose structures have a root. In fact, this can be queried for using the `ProtoSyn.root` method.

In [43]:
r = ProtoSyn.root(pose.graph)

Atom{/ROOT:7385/OO:0}

In [44]:
r.parent

Atom{/ROOT:7385/OX:-1}

In [45]:
r.parent.parent

Atom{/ROOT:7385/OY:-2}

A third option of indexing an AtomState structure is by querying with an actual Atom instance. This can be useful to make sure we get the right atom.

In [46]:
atom = pose.graph[1, 1, "CA"]
pose.state[atom]

AtomState{Float64}:
 Index: 2
 T: [56.203, 9.492, 16.942]
 b: 1.493 Å | θ:  1.936 rad ( 110.92°) | ϕ: -1.142 rad ( -65.45°) | Δϕ:  0.000 rad (   0.00°)
 Changed: false


Besides the list of AtomState, a State also contains a StateMatrix. This is simply a matrix representation of all cartesian coordinates, which is useful in certain matrix operations, such as rotating a large block of Atom instances around a virtual axis, for example. Note that updating a StateMatrix column also updates the complementary AtomState, and viceversa.

In [47]:
pose.state.x

StateMatrix{Float64}:
 Parent set: true
3×1140 Matrix{Float64}:
 55.881  56.203  56.556  56.095  …   51.228   49.869   50.721   49.18
  8.038   9.492  10.053   9.567      -2.694   -1.716   -5.225   -5.708
 16.84   16.942  15.56   14.547     -15.449  -16.001  -15.756  -15.232

Finally, the last two points of notice are the energy `:e` field and the forces `:f` field. These are included in the State structure and can be set by calculating the energy and forces using an EnergyFunction, and will be explored in another example.

In [48]:
pose.state.e

Dict{Symbol, Float64} with 1 entry:
  :Total => Inf

In [49]:
pose.state.f

3×1140 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0

An AtomState contains one last piece of important information, the `:Δϕ` field. This is a dihedral angle change applied to all `:children` Atom instances. This is useful when applying dihedral rotations. Instead of changing all `:ϕ` fields of all affected atoms (by the rotation), changing the `:Δϕ` field of the common `:parent` Atom automatically rotates all `:children` Atom instances.

In [50]:
pose.state[pose.graph[1, 10, "CA"]].Δϕ += 0.10
pose.state[pose.graph[1, 10, "CA"]]

AtomState{Float64}:
 Index: 147
 T: [56.124, 7.219, 1.744]
 b: 1.492 Å | θ:  2.095 rad ( 120.01°) | ϕ: -3.141 rad (-179.98°) | Δϕ:  0.100 rad (   5.73°)
 Changed: true


The internal and cartesian coordinates may often become "out of sync" with eachother: it can sometimes be useful to apply multiple internal coordinates changes (such as dihedral rotations) before calculating the actual cartesian coordinates. Therefore, a State contains two flags: `:i2c` (internal to cartesian) and `:c2i` (cartesian to internal), which can be set using the `ProtoSyn.request_i2c!` and `ProtoSyn.request_c2i!` methods, respectivelly.

In [51]:
ProtoSyn.request_i2c!(pose.state)

State{Float64}:
 Size: 1140
 i2c: true | c2i: false
 Energy: Dict(:Total => Inf)


Note that a Pose cannot be synched with both the `:i2c` and `:c2i` flasg set to True, simulatenously. Once set up, a Pose can then be synched using the `ProtoSyn.sync!` method.

In [52]:
ProtoSyn.sync!(pose)

Pose{Topology}(Topology{/2a3d:4745}, State{Float64}:
 Size: 1140
 i2c: false | c2i: false
 Energy: Dict(:Total => Inf)
)

## Export a Pose as a PDB file

In order to visualize the rotation we just introduced in this structure, using ProtoSyn, we can export the current synched Pose using the `ProtoSyn.write` method.

In [53]:
ProtoSyn.write(pose, "output/2a3d_mod.pdb")

## Conclusion

In this first example, we took a look inside the hood of the main ProtoSyn data structure: the Pose. This is sub-divided in the Graph and the State, which work together to correctly define a molecular structure and allow useful and common tasks in molecular simulations.