In [None]:
from ptmpsi.alphafold import prediction
from ptmpsi.protein import Protein
from ptmpsi.nwchem import get_qm_data
from ptmpsi.polymers import gen_pdb

# Structure Prediction with AlphaFold

## Pass a FASTA sequence to the `prediction` function

In [None]:
prediction("GGGGGGGGGGG", multimer=False)

Three files will be generated in the current working directory: `alphafold.sbatch`, `run_singularity.py` and `temp1.fasta`.
Paths to the required databases will default to their location in the Tahoma cluster.

The `prediction` function also accepts a list of sequences for the sequential prediction for each one of the sequences.

If **multimer** is instead desired, one must pass `multimer=True` to the `prediction` and a path to 
a file containing two or more sequences with the following format:

```
>sequence_1
GGGGGGGG

>sequence_2
AAAAAAAA
```

# Loading a predicted 3D structure

The `Protein` class handles all protein/polymer operations. We will use this class to load or build the 3D structure 
of a polymer. 

## Read a structure from a local PDB file
Create a `Protein` object using the `filename` option to pass the path to a local PDB file

In [None]:
local_pdb = Protein(filename="/path/to/local.pdb")

## Fetch a structure from the Protein Data Bank
Only works when internet access is available. Create a `Protein` object using the `pdbid` option to pass a Protein Data Bank ID code.

In [None]:
fetched_pdb = Protein(pdbid="6tht")

## Fetch a structure from the AlphaFold Database
Only works when internet access is available. Create a `Protein` object using the `uniprotid` option to pass a Uniprot ID code.

In [None]:
uniprot_pdb = Protein(uniprotid="G9BY57")

## Build a chain from scratch
Create an empty `Protein` object and use the `append` and `prepend` functions to add residues to a given polymer *chain*. The `phi`/`psi` amide torsions can be customized while using the `prepend`/`append` commands, respectively. The default values will produce torsion angles compatible with the alpha-helix conformation. 

In [None]:
chain = Protein()

# Build a trimer of epsilon-amino caproic acid
chain.prepend(chain="A", residue="EAC")
chain.append(chain="A", residue="EAC", psi=120.0)
chain.append(chain="A", residue="EAC", psi=120.0)

# Cap the trimer to have neutral ends
chain.prepend(chain="A", residue="ACE", phi=-120.0)
chain.append(chain="A", residue="NME", psi=120.0)

## Generate a Nylon-6 surface
Use the `gen_pdb` function to create a PDB file of a finite cluster approximation of a Nylon-6 surface by specifying the number of unit cells to include along the `a`, `b`, and `c` directions. Two PDB files will be generated, one where all of the strands share the same *MODEL* parameter and another where each strand is its own *MODEL*. There is no difference among the models, but some visualization software packages might get confused with many strands in one *MODEL*.

In [None]:
gen_pdb(rangea=3, rangeb=3, rangec=3)

# Introducing point mutations

The `mutate` function of the `Protein` class can be used to introduce point mutations in a protein. For example, the active form of the ICCG variant of cutinase can be obtained by the following commands.

In [None]:
cutinase = Protein(pdbid="6tht")
cutinase.mutate("A:ALA130","SER")

The residue to be mutated can be specified with the nomenclature "\[chain\]:\[name\]\[number\]", where \[chain\] is the one-letter chain identifier, \[name\] is the three-letter aminoacid name, and \[number\] is the residue number. The \[name\] can be omitted.

# Obtaining AMBER99 parameters for a residue

The `get_qm_data` function will produce a series of `NWChem` input files that will generate `RESP` charges and bonded force constants according to the `AMBER99` forcefield recipe. The current output is obtained in `STDOUT` and the force constants are given in a format that can be copied directly into a `GROMACS` forcefield definition.

By default, `get_qm_data` will generate two conformations for the new residue capped by an `ACE` and `NME` groups. The `RESP` fitting for both conformers will be performed simultaneously and only one set of charges will be computed. Bond and angle force constants parameters will be generated **separately** and the user must average these values in order to obtain a single constant for each bond or angle.

When needed, torsion potentials are fitted for both conformations at the same time.

In [None]:
eac = Protein()
eac.prepend(chain="A", residue="EAC")
get_qm_data(eac)

Some NWChem input options can be changed via keyword arguments. These options, along with their default values, are:

```
mult    = kwargs.get("mult",1)
charge  = kwargs.get("charge",0)
memory  = kwargs.get("memory",2000)
aobasis = kwargs.get("aobasis","def2-tzvp")
tdbasis = kwargs.get("tdbasis","def2-svp")
cdbasis = kwargs.get("cdbasis","def2-universal-jfit")
xcfun   = kwargs.get("xcfun","r2scan")
grid    = kwargs.get("grid","lebedev 120 14")
tdgrid  = kwargs.get("tdgrid","lebedev 100 14")
nscf    = kwargs.get("nscf",100)
nopt    = kwargs.get("nopt",60)
disp    = kwargs.get("disp","disp vdw 4")
delta   = kwargs.get("delta",0.0189)
lshift  = kwargs.get("lshift",0.1)
```

# Post-translational modifications

Post-translational modifications can be introduced by using the `modify` function of the `Protein` class. The following is a list of available CYS post-translational modifications
 - nitrosylation
 - sulfenylation
 - sulfynilation
 - sulfonylation
 - sulfhydration
 - glutathionylation
 - cysteinylation
 - methylation
 - carbamoylation
 - cyanylation

The following post-translational modifications are also available for a subset of aminoacids
 - acetylation
     - LYS, LYN
 - methylation:
     - GLU, GLH
     - LYS, LYN
     - ARG
     - HIS, HIP, HID, HIE
 - dimethylation:
     - LYS, LYN, ARG
 - trimethylation:
     - LYS, LYN
 - symmetric dimethylation:
     - ARG
 - asymmetric dimethylation:
     - ARG        
 - phosphorylation:
     - SER
     - THR
     - TYR
     - ARG
     - HIS, HIP, HID, HIE
     - LYS, LYN
     - ASP, ASH
     - CYS, CYM

In [None]:
snc = Protein()
snc.prepend(chain="A", residue="CYS")
snc.modify("A:CYS1", "nitrosylation")

# Docking

PTM-Psi relies on existing installations of AutoDock Vina and the AutoDock Tools packages. PTM-Psi will not attempt to fetch these packages. 

In [None]:
cutinase = Protein(pdbid="6tht")
cutinase.mutate("A:ALA130","SER")
cutinase.write_pdb("cutinase.pdb")

nylon6 = Protein()
nylon6.prepend("A", "EAC")
nylon6.prepend("A", "ACE")
nylon6.append("A", "EAC")
nylon6.append("A", "NME")
nylon6.write_xyz("nylon6.xyz")
cutinase.dock(ligand="nylon6.xyz", receptor="cutinase.pdb", boxcenter="A:SER130", boxsize=20.0)

# NWChem QM/MM input generation