Name: Jonathan Kim

Email: jkim185@uncc.edu

# OOP2

In a notebook, start with a markdown cell and plan out what you think these 3 classes should look like. What are the common elements of Sequences (things we could define in the parent class Sequence) and what would need to be unique to DNASequence and ProteinSequence classes? What rules do you want to enforce about what these sequences should look like and how do you want enforce those rules? Do you need to override constructors, or could the parent's work? Remember, eventually you want these to work with the SequenceRecord class we built earlier, so don't make any huge fundamental changes that would break that.

 

Your classes should, at minimum:

have a __repr__ and __str__ that provide a meaningful representation as a string
check that the bases or amino acids in the string are valid
work as the argument for a SequenceRecord

# My Plan
I wanted to add the functions available for DNA and protein sequences separately like translation for DNA and reverse translation for proteins. I enforced them as their type of sequence as long as they pass the `if` statement in the classes. The parent components could also work, but since this is about DNA and protein sequences, I wanted to work with only those classes. I did try to test SequenceRecord in this code as well.

# Code from OOP1.5

In [1]:
#sequence class goes here
from functools import total_ordering

@total_ordering
class Sequence:
    def __init__(self, seq):
        self.seq = seq
    def __len__(self):
        return len(seq)
    def __add__(self, other):
        if (isinstance(other,Sequence)):
            self.seq += other.seq
            return Sequence(self.seq)
        else:
            return "Other is not a sequence, cannot add."
    # Informal Report
    def __str__(self):
        return self.seq
    # Formal Representation
    def __repr__(self):
        return f'The Sequence is {self.seq}.'
    # If both strings are EXACTLY the same
    def __eq__(self,other):
        return self.seq == other.seq
    # Comparing lengths of two sequences
    def __lt__(self,other):
        return len(self.seq) < len(other.seq)

### Testing of Sequences w/ Sanity Checks

In [2]:
#Use this cell for testing your Sequence class. Show us what tests you ran to confirm your methods worked correctly
s1 = Sequence("TCGTCAGCTGACTGATATAGC")
s2 = Sequence("CTGACCTAGTCGATCGATCG")
s3 = Sequence("TCGTCAGCTGACTGATATAGC")
print("Test for __str__: ", s1.__str__())
print("Test for __repr__: ", s1.__repr__())
print("Sanity Check: s1 == s2 is ", s1 == s2, "and __eq__ gives ", s1.__eq__(s2))
print("Sanity Check: s1 == s3 is ", s1 == s3, "and __eq__ gives ", s1.__eq__(s3))
print("Sanity Check: s1 < s3 is ", s1 < s3, "and __lt__ gives ", s1.__lt__(s3))
print("Sanity Check: s2 < s3 is ", s2 < s3, "and __lt__ gives ", s2.__lt__(s3))

Test for __str__:  TCGTCAGCTGACTGATATAGC
Test for __repr__:  The Sequence is TCGTCAGCTGACTGATATAGC.
Sanity Check: s1 == s2 is  False and __eq__ gives  False
Sanity Check: s1 == s3 is  True and __eq__ gives  True
Sanity Check: s1 < s3 is  False and __lt__ gives  False
Sanity Check: s2 < s3 is  True and __lt__ gives  True


In [3]:
s = 'hello'
print("test1",isinstance(s1,Sequence))
print("test2",isinstance(s1,str))

test1 True
test2 False


In [4]:
# SequenceRecord class goes here
class SequenceRecord:
    def __init__(self,label,seq):
        self.label = label
        self.seq = self.seqCheck(seq)
    def seqCheck(self, var):
        temp = "Not a Sequence, input a valid Sequence"
        if (isinstance(var,Sequence)):
            temp = var
        return temp
    def __str__(self):
        return self.label
    def __repr__(self):
        return f"The header is {self.label} with sequence: {self.seq}"

### Sanity Check Testing

In [5]:
# Use this cell to test your SequenceRecord class
header1 = "MD10G1276500"
header2 = "MD10G1110200"
header3 = "MD10G1036500"
rec = SequenceRecord(header1,s1)
print("Test for __str__: ", rec.__str__())
print("Test for __repr__: ", rec.__repr__())

Test for __str__:  MD10G1276500
Test for __repr__:  The header is MD10G1276500 with sequence: TCGTCAGCTGACTGATATAGC


In [6]:
fakes = "ATGCTAGCTGATGTCAG"
# fakeSeq # works
fakerec = SequenceRecord(header2,fakes)
fakerec # Does not work

The header is MD10G1110200 with sequence: Not a Sequence, input a valid Sequence

# DNA

- a translate method that will convert the DNA sequence and return a ProteinSequence object
one other method of your choice (what you did previously is fine)

In [7]:
class DNASequence(Sequence):
    def __init__(self, seq):
        super().__init__(seq)
    def __str__(self):
        return self.seq
    def __repr__(self):
        return f'The DNA sequence is {self.seq}'
    def GCcount(self):
        count = 0
        print(self.seq.__str__())
        for i in self.seq.__str__():
            if (i=="G") | (i=="C"):
                count+=1
        return count
    def trans(self):
        aa_dict = {'M':['ATG'], 'F':['TTT', 'TTC'], 'L':['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'], 'C':['TGT', 'TGC'], 'Y':['TAC', 'TAT'], 'W':['TGG'], 'P':['CCT', 'CCC', 'CCA', 'CCG'], 'H':['CAT', 'CAC'],
    'Q':['CAA', 'CAG'], 'R':['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'], 'I':['ATT', 'ATC', 'ATA'], 'T':['ACT', 'ACC', 'ACA', 'ACG'],
    'N':['AAT', 'AAC'], 'K':['AAA', 'AAG'], 'S':['AGT', 'AGC', 'TCT', 'TCC', 'TCA', 'TCG'], 'V':['GTT', 'GTC', 'GTA', 'GTG'],
    'A':['GCT', 'GCC', 'GCA', 'GCG'], 'D':['GAT', 'GAC'], 'E':['GAA', 'GAG'], 'G':['GGT', 'GGC', 'GGA', 'GGG'], '*':['TAA','TAG','TGA']}
        prot = ''
        if (len(self.seq.__str__()))%3 == 0:
            for i in range(0,len(self.seq.__str__()),3):
                codon = self.seq.__str__()[i:i+3]
                for key, values in aa_dict.items():
                    if codon in values:
                        prot+=key
        else:
            print("This is not a valid DNA Sequence")
        return ProteinSequence(prot)

# Protein

- a method of your choice. In this case, if the method you would implement is too complex to reasonably implement or would use resources you don't have access to, it is okay to leave it as what is called a stub method (has only one line, "pass") and explain in comments what this method would do and it's purpose


In [8]:
import random
class ProteinSequence(Sequence):
    def __init__(self, seq):
        super().__init__(seq)
    def __str__(self):
        return self.seq
    def __repr__(self):
        return f'The protein sequence is {self.seq}'
    # Reverse translation
    # Gives a DNA sequence from a protein sequence
    # Chooses randomly if there are more than one codon for a protein
    def revtrans(self):
        aa_dict = {'M':['ATG'], 'F':['TTT', 'TTC'], 'L':['TTA', 'TTG', 'CTT', 'CTC', 'CTA', 'CTG'], 'C':['TGT', 'TGC'], 'Y':['TAC', 'TAT'], 'W':['TGG'], 'P':['CCT', 'CCC', 'CCA', 'CCG'], 'H':['CAT', 'CAC'],
    'Q':['CAA', 'CAG'], 'R':['CGT', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'], 'I':['ATT', 'ATC', 'ATA'], 'T':['ACT', 'ACC', 'ACA', 'ACG'],
    'N':['AAT', 'AAC'], 'K':['AAA', 'AAG'], 'S':['AGT', 'AGC', 'TCT', 'TCC', 'TCA', 'TCG'], 'V':['GTT', 'GTC', 'GTA', 'GTG'],
    'A':['GCT', 'GCC', 'GCA', 'GCG'], 'D':['GAT', 'GAC'], 'E':['GAA', 'GAG'], 'G':['GGT', 'GGC', 'GGA', 'GGG'], '*':['TAA','TAG','TGA']}
        seq = ''
        for i in self.seq.__str__():
            for keys, values in aa_dict.items():
                if i in keys:
                    seq+=random.choice(aa_dict[keys])
        return seq

## DNASequence Sanity Check Testing

In [9]:
s4 = s1 + s2
s4 = DNASequence(s4)
s5 = DNASequence(Sequence("ATGCGATCGATCGAGAGCTAG"))
print("Test for __str__: ", s4.__str__())
print("Test for __repr__: ", s4.__repr__())
print("GCcount is", s4.GCcount())
print("Translated Protein is", s5.trans())

Test for __str__:  TCGTCAGCTGACTGATATAGCCTGACCTAGTCGATCGATCG
Test for __repr__:  The DNA sequence is TCGTCAGCTGACTGATATAGCCTGACCTAGTCGATCGATCG
TCGTCAGCTGACTGATATAGCCTGACCTAGTCGATCGATCG
GCcount is 21
Translated Protein is MRSIES*


## ProteinSequence Sanity Check Testing

In [10]:
s6 = ProteinSequence(s5.trans())
print("Test for __str__: ", s6.__str__())
print("Test for __repr__: ", s6.__repr__())
print("A possible DNA sequence is",s6.revtrans())
print("Another possible DNA sequence is",s6.revtrans())

Test for __str__:  MRSIES*
Test for __repr__:  The protein sequence is MRSIES*
A possible DNA sequence is ATGCGCTCGATAGAAAGCTGA
Another possible DNA sequence is ATGAGGAGCATTGAGAGCTGA


# SequenceRecord Sanity Check Testing

In [11]:
# Use this cell to test your SequenceRecord class
header1 = "MD10G1276500"
rec = SequenceRecord(header1,s4.__str__())
print("Test for __str__: ", rec.__str__())
print("Test for __repr__: ", rec.__repr__())

Test for __str__:  MD10G1276500
Test for __repr__:  The header is MD10G1276500 with sequence: TCGTCAGCTGACTGATATAGCCTGACCTAGTCGATCGATCG


In [12]:
header2 = "MD10G1110200"
rec2 = SequenceRecord(header2,s6.__str__())
print("Test for __str__: ", rec2.__str__())
print("Test for __repr__: ", rec2.__repr__())

Test for __str__:  MD10G1110200
Test for __repr__:  The header is MD10G1110200 with sequence: MRSIES*
