# Spheronizator Documentation

## Getting Started
Using the code should be straightforward as I've designed the interface to work nicely with the IPython interpreter, either standalone or in Jupyter.

### Installation
There are a number of ways you could go about installing and using this package. As this package is currently in the process of being published, and information may change, please refer to installation documentation elsewhere in the repository.

A sensible way to use this package is to create a conda environment for it, then install the package to that conda environment with a package manager of your choice.


## Importing
First create your Jupyter notebook and import the package.
```
import spheronizator as sp
```
This allows you to create instances of the voxelBuilder class, which is the heart of the package.

As an example, create a new voxelBuilder object called *x*.
```
x=sp.voxelBuilder()
```

In [2]:
import spheronizator as sp
x=sp.voxelBuilder()

Configation file was unable to be parsed, applying defaults.


## Configuration

### Config File

When you call a new instance of the voxelBuilder class, it accepts 1 argument, *config*, which is the path to an optional configuration file. This file stores the default configuration and one is provided to you in the directory to work with. The argument allows you to specify a different configuration file.

```
x=sp.voxelBuilder(~/path/to/config/testing.config)
```

### Config Attributes

Optionally, you can update the settings by changing the objects attributes on-the-fly. It's not necessary to change or load different configation files every time.

```
x.boxSize=20
x.voxelSpacing=1
x.useFloatVoxels=True
```

In [3]:
x.boxSize

20

In [4]:
x.boxSize=30
x.boxSize

30

## Loading Protein Data

To load protein data, you will need to parse both a PDB and corresponding mol2 file. Parsing of both of these file types is handled separately by the mol2parser class which can be found in *mol2parser.py*.

To simplify the interface of this project, a wrapper for the mol2parser class is included as a method for the voxelBuilder class. This method prevents you from needing to interact with the mol2parser class at all; however, the mol2parser class can be used as a standalone parser for other projects if needed.

### Using the wrapper

Using the parser method is simple. Simply call this method with 1 argument specifying the path to a PDB file:

```
x.parse('testing_set/1YU6_C.pdb')
```

By default, the parser looks in the same directory for the corresponding mol2 file. The mol2 file must have the same name, plus the extension *.mol2*. This way you won't need to specify both PBD and mol2 files, just the PDB file if they follow this naming scheme.

If needed, a second argument specifies the corresponding mol2 file:

```
x.parse('testing_set/1YU6_C.pdb', 'testing_set/1YU6_C.pdb.mol2')
```

### Parsing Details
The parser will parse the specified PDB file and from it extract a list of atom objects. The parser then parses the corresponding mol2 file and updates the atom objects with additional data. The residues and atom objects are then all stored as attributes of the voxelBuilder class.

```
x.structure
x.residues
x.atoms
x.resnames
```

In [5]:
x.parse('../testing_set/1YU6_C.pdb')

In [6]:
x.structure

<Structure id=../testing_set/1YU6_C.pdb>

In [7]:
x.residues[0]

<Residue VAL het=  resseq=6 icode= >

In [8]:
x.atoms[0]

<Atom N>

In [9]:
x.atoms[0].get_vector()

<Vector 17.03, 9.60, 37.84>

In [10]:
x.resnames[0]

'VAL'

### Atom Objects
Biopython atom objects are the core data type for this project. You can learn more about Biopython atom objects by reading the Biopython documentation. The attributes of each Biopython atom object are updated with data extracted from the associated mol2file. A list of these attributes follows.

```
atom.bondData
atom.isAA
atom.detailedAtomType
atom.atomType
atom.residueIndex
atom.mol2atomIndex
```

In [11]:
x.atoms[0].bondData

[[1, '1']]

In [12]:
x.atoms[0].isAA

True

In [13]:
x.atoms[0].detailedAtomType

'N.3'

In [14]:
x.atoms[0].atomType

'N'

In [15]:
x.atoms[0].residueIndex

0

In [16]:
x.atoms[0].mol2atomIndex

0

## Building Boxes
Building the boxes is as simply as invoking the voxelBuilder.buildData method as follows. This presupposes that the data has already been parsed. This method, in order:
1. Generates the voxels that we will need the boxes based on the configuration parameters boxSize and voxelSpacing. The voxels are not unique to each box, so the same voxel array is used for all residues in the protein. The generated voxels can be found under the attribute *voxelBuilder.voxels*.
2. Initializes numpy arrays of zeros for our output data with the appropriate shape. Currently two arrays are created:
    - *voxelBuilder.output* for atom presence / abscence
    - *voxelBuilder.outputBonds* for count of bonds within each box
3. Iterates through each residue computing a box for each and updating the output arrays.

In [17]:
x.buildData()



In [18]:
x.voxels[0][0]

array([[-15., -15., -15.],
       [-15., -15., -14.],
       [-15., -15., -13.],
       [-15., -15., -12.],
       [-15., -15., -11.],
       [-15., -15., -10.],
       [-15., -15.,  -9.],
       [-15., -15.,  -8.],
       [-15., -15.,  -7.],
       [-15., -15.,  -6.],
       [-15., -15.,  -5.],
       [-15., -15.,  -4.],
       [-15., -15.,  -3.],
       [-15., -15.,  -2.],
       [-15., -15.,  -1.],
       [-15., -15.,   0.],
       [-15., -15.,   1.],
       [-15., -15.,   2.],
       [-15., -15.,   3.],
       [-15., -15.,   4.],
       [-15., -15.,   5.],
       [-15., -15.,   6.],
       [-15., -15.,   7.],
       [-15., -15.,   8.],
       [-15., -15.,   9.],
       [-15., -15.,  10.],
       [-15., -15.,  11.],
       [-15., -15.,  12.],
       [-15., -15.,  13.],
       [-15., -15.,  14.],
       [-15., -15.,  15.]])

*for boxes with a size of 30 in units of 1, you can see it scanning through the z dimension from -15 to 15 including the origin*

In [19]:
x.output.shape

(51, 31, 31, 31, 6, 2)

In [20]:
x.outputBonds.shape

(51, 5)

For fun, lets use the private methods to find where our last atom is located in the output array.

In [21]:
x.atoms[-1]

<Atom OXT>

In [22]:
x.atoms[-1].get_vector()

<Vector 13.72, 24.93, 28.44>

In [23]:
x.atoms[-1].residueIndex

50

We see that the last atom, OXT, belongs to residue number 50. Let's assume it's within the box for residue 50.

In [24]:
foundAtomIndices, projectedCoords=x._build_box(x.residues[50])

In [25]:
len(x.atoms)

387

In [26]:
if 386 in foundAtomIndices:
    print("True")

True


Turns out it is within that box, a good guess.

In [27]:
len(foundAtomIndices)

247

In [28]:
projectedCoords[-1]

array([-1.66864626,  1.59396444, -0.45076026])

So it looks like this OXT atom is pretty close to the center of the box. Let's find what the closest voxel is to that and it's index and see if it all checks out.

In [29]:
from spheronizator import functions as box
voxelIndex, voxelCoords=box.get_closestVoxel(projectedCoords[-1], x.voxels)

In [30]:
voxelIndex

(13, 17, 15)

In [31]:
voxelCoords

array([-2.,  2.,  0.])

In [32]:
x.atoms[-1].atomType

'O'

We should see this Oxygen in our output array at this index position for the 50th residue. Let's check!

In [33]:
x.output[50][13][17][15]

array([[False, False],
       [False, False],
       [False, False],
       [ True,  True],
       [False, False],
       [False, False]])

Indeed, Oxygens are in our 4th channel at index 3 and because this Oxygen belongs to the the box of its parent residue, it shows up as a -1.

In [34]:
x.outputBonds[50]

array([142,  34,   0,  35,  33])