# Mutagenesis: determine codon substitution

## Set working directory

The working directory should contain the amino acid sequence for the protein of interest and the nucleotide sequence of the coding region for the gene of interest.

In [2]:
import os

os.chdir('C:/Users/mny3/Desktop/Mike Young Yang Lab/Protocols/Mutagenesis')


## Codon table

Using a python dictionary establish a codon table.

In [3]:
codontable = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
    }

## Custom functions

strip_string(s)

    This function is meant to strip the user input in the form of X##X in which X is a single character amino acid 
    designation and # refers to the position of the particular amino acid in order to determine the position
        
    Arguments:
        s - string to be be stripped of amino acid designations in order to determine amino acid number based on user input
            
    Ex.
        input:strip_string(R296A)
        output:296

In [4]:
def strip_string(s):
    remove = ["I", "T", "N", "S","L","P","H","R","V","A","D","G","F","Y",
                     "C","E","M","W","_","Q","K"]
    for i in remove:
        s = s.replace(i, "")
    return s

similarity(s1,l2)

    This function compares two codons and determines how similar they are in order to select the most conservative substitution 
    needed to make the mutation of interest.
    
    Arguments:
        s1 - codon of the original amino acid
        l2 - list of putative codons of the amino acid for the mutation of interest
       
    Ex. 
        input:similarity(AAC,[CAC,CAT,AAT,ACA])
        output:[1,2,1,2]

In [5]:
def similarity(s1, l2): 
    mismatch=[]
    for i in l2:
        score=0
        if i[0]!=s1[0]:
            score+=1
        else:
            next
        if i[1]!=s1[1]:
            score+=1
        else:
            next
        if i[2]!=s1[2]:
            score+=1
        else:
            next
        mismatch.append(score)
    return mismatch

substitution(s1,s2)
    Creates a binary list designating the positions at which two codons match with 1s and the positions at which they don't 
    match with 0s

    Arguments:
        s1 - codon of original amino acid
        s2 - codon of substituted amino acid
    
    Ex. 
        input:substitution('AAT','TAT')
        output: [0,1,1]

In [6]:
def substitution(s1, s2): 
    matches=[]
    for i in range(len(s2)):
        if s2[i]==s1[i]:
            matches.append(1)
        else:
            matches.append(0)
    return matches

combine(s1,s2)
    
    This function takes two codons and constructs the appropriate code in the form of NN###NN in which N are the substituted 
    codon(s) and # refers to the position of the first nucleotide substitution. This format is specific for primer design on 
    http://www.bioinformatics.org/primerx/cgi-bin/DNA_1.cgi
    
    Arguments:
        s1:original amino acid codon
        s2:optimal codon for mutagenesis
    
    Ex. 
        input:combine('ATT','TAT')
        output:AT1365TA

In [7]:
def combine(s1,s2):
    mut_cod=''
    matched = substitution(s1,s2)
    for i in range(len(matched)):
        if matched[i] == 0:
            mut_cod+=s1[i]
        else:
            next
    counter = 0
    for i in matched:
        if i==0:
            break
        else:
            counter += 1
    mut_cod += str(codon_start + counter)
    for i in range(len(matched)):
        if matched[i] == 0:
            mut_cod+=s2[i]
        else:
            next
    return mut_cod

## Mutation optimization program

Ensure that your working directory contains a text file with your coding sequence and amino acid sequence. The filenames should be changed to the appropriate filenames in the code below. When asked to input the mutation of interest do so in the form of Ex. 'R452A'. 

In [8]:
#Open the DNA coding sequence for the gene of interest
f = open('16F coding seq.txt')
coding_seq = f.read()
coding_seq = coding_seq.replace('\n','')

#Open the amino acid sequence for the protein of interest
f=open('mtmem16f prot.txt')
aa_seq = f.read()
aa_seq = aa_seq.replace('\n','')

#Request user input for the mutation of interest, AA# subtracted by 1 due to python numbering
mut = str(input("Enter mutation: "))
aa_num = int(strip_string(mut))-1

#Determine the codon corresponding to the residue of interest
codon = coding_seq[aa_num*3:aa_num*3+3]
codon_start=(aa_num+1)*3-2
codon = codon.upper()

#Residue to be substituted based on user input
new_res= mut[-1:]

#Refers to dictionary for codons and finds optimal substitution for mutation of interest
new_codons = [key for key,value in codontable.items() if value==new_res]
rel_match = similarity(codon,new_codons)
best_mut = new_codons[rel_match.index(min(rel_match))]

#Returns the DNA coding sequence, original codon, substituted codon, and the substitution code in the form of N###N
print(coding_seq)
print(codon)
print(best_mut)
print(combine(codon,best_mut))

Enter mutation: R452A
atgcagatgatgactaggaaggtcctgctgaacatggagctggaggaggacgacgatgaggatggagacattgtgctggaaaactttgaccagacaattgtctgccccacctttggatcactggagaatcagcaggacttcaggactccagagtttgaagaatttaacgggaagcccgactccctctttttcaccgatggccagaggcgaatcgacttcatcctcgtgtatgaagatgagagcaaaaaggagaacaataagaaagggacaaatgagaaacagaagaggaaaagacaagcatacgaatctaaccttatctgccatgggctgcagctggaagcaacaagatctgtttctgatgacaagcttgtgttcgtaaaagtgcacgcgccctgggaagtgctgtgcacctatgctgagatcatgcacatcaaactcccgctaaagccaaacgacctgaaaacgcgctcgccctttggcaacctcaactggttcaccaaggtcctccgggtgaacgagagtgtcatcaagccagagcaggagttcttcactgccccttttgagaagagccggatgaatgatttctacatcctcgatagagattccttcttcaaccctgccaccagaagccgcattgtttatttcatcctctctcgggtcaaataccaagtgatgaacaacgttaacaaatttgggattaatagactggtcagctctggaatctacaaagcagcgtttcctctgcacgactgcagattcaactatgagtcggaggacatcagttgtcctagcgagcgttacctcctgtacagagaatgggctcaccctcggagtatatacaagaagcagcccttggatcttatcaggaagtattacggcgagaagattggaatctactttgcttggctgggctattacacgcagatgctccttctcgcagctgtggtgggcgtggcctgcttcctctatggatatctt