Notebook for creating the initial models of the mt-tRNA, particularly for the mt-tRNAs that don't have available 3D structures.

### This notebook is step 1 of the process!

By: Mihir G

#### RNA Sequence

Below are the primary structure of the mt-tRNAs (excluding Met), which are necessary to develop the base pair sequence files later on.


In [None]:
seqlys = 'CACUGUAAAGCUAACUUAGCAUUAACCUUUUAAGUUAAAGAUUAAGAGAACCAACACCUCUUUACAGUGA' # lys
seqlysM = 'CACUGUAAAGCUAACUUAGCAUUAACCUUUUAAGUUAAAGAUUAAGAGAGCCAACACCUCUUUACAGUGA' # lys mutated
seqleu = 'GUUAAGAUGGCAGAGCCCGGUAAUCGCAUAAAACUUAAAACUUUACAGUCAGAGGUUCAAUUCCUCUUCUUAACA' # leu(uur)
seqleuM = 'GUUAAGAUGGCAGGGCCCGGUAAUCGCAUAAAACUUAAAACUUUACAGUCAGAGGUUCAAUUCCUCUUCUUAACA' # leu(uur) mutated
seqval = 'CAGAGUGUAGCUUAACACAAAGCACCCAACUUACACUUAGGAGAUUUCAACUUAACUUGACCGCUCUGA' # val
seqmet = 'AGUAAGGUCAGCUAAAUAAGCUAUCGGGCCCAUACCCCGAAAAUGUUGGUUAUACCCUUCCCGUACUA' # met

lconv = {}
lconv['A'] = 1
lconv['C'] = 2
lconv['G'] = 3
lconv['U'] = 4

seqlyslist = [lconv[i] for i in seqlys]
seqlysMlist = [lconv[i] for i in seqlysM]
seqleulist = [lconv[i] for i in seqleu]
seqleuMlist = [lconv[i] for i in seqleuM]
seqvallist = [lconv[i] for i in seqval]

#### bpseq and ct formats

In the lists below, the index of the base that is Watson-Crick bonded is indicated. For example, with mt-tRNA_{Lys}, it starts as [69, 68, ..], meaning the first base, 'C' is bonded with the 69th base. 0's indicate no bond.


In [None]:
# these bonds don't include non-canonical base pairs (ie, if there's a supposed non-watson-crick bond, it's not included)
lysbond = [69, 68, 67, 66, 65, 64, 63, 0, 0, 20, 19, 18, 17, 0, 0, 0, 13, 12, 11, 10, 0, 38, 37, 36, 35, 34, 0, 0, 0, 0, 0, 0, 0, 26, 25, 24, 23, 22, 0, 0, 0, 0, 0, 62, 61, 60, 59, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 47, 46, 45, 44, 7, 6, 5, 4, 3, 2, 1, 0]
leubond = [74, 73, 72, 71, 70, 69, 68, 0, 0, 27, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 10, 0, 45, 44, 43, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 31, 30, 29, 0, 0, 0, 0, 0, 67, 66, 65, 64, 63, 0, 0, 0, 0, 0, 0, 0, 55, 54, 53, 52, 51, 7, 6, 5, 4, 3, 2, 1, 0]
valbond = [68, 67, 66, 65, 64, 63, 62, 0, 0, 23, 22, 21, 20, 0, 0, 0, 0, 0, 0, 13, 12, 11, 10, 0, 41, 40, 0, 38, 37, 0, 0, 0, 0, 0, 0, 0, 29, 28, 0, 26, 25, 0, 0, 0, 0, 0, 60, 59, 58, 57, 0, 0, 0, 0, 0, 0, 50, 49, 48, 47, 0, 7, 6, 5, 4, 3, 2, 1, 0]

# due to mutations in both leu and lys being in non-stem regions, no watson-crick bp is missing.
# hence, base pair sequence and bonding are the same, canonically.
leubond_M = leubond
lysbond_M = lysbond

The function below prints out the base pair sequence, in a format to paste into files for. The files are already made in the directory. 


In [None]:
def bpseqout(seq, bonds):
    """
    Takes in a sequence of RNA and the bases with which it has watson-crick bonds.
    seq: string, RNA sequence.
    bonds: list of ints, list of base indices that a given base bonds with.
    """
    for i in range(len(seq)):
        print(f"{i+1} {seq[i]} {bonds[i]}")

In [None]:
# Example run highlighted
# bpseqout(seqleu, leubond)



#### Remove Hydrogen Atoms from the Files
Necessary step for running through the SMOG webtool, this allows the file to be parameterized.

Note: all the mt-tRNA files have been either extracted from larger PDB structures, or generated computationally. 

In [None]:
!grep -v H lys.pdb > lys_noH.pdb

#### Run through SMOG

Follow this site to input the structure: https://smog-server.org/cgi-bin/GenTopGro.pl

Specfications (outside of default):
- Contact map: *cut-off*

The return should be a .tar.gz file.

#### Post-SMOG processing with GROMACS

Open the .tar.gz file

In [1]:
!tar -xf {filename}

tar: {filename}: Cannot open: No such file or directory
tar: Error is not recoverable: exiting now


The new folder should contain a new pdb file to conduct simulations on for step (notebook) 3