# Python. More advanced topics
## Classes and Methods
Python is what is called an object oriented language. In fact all the data types used so far, numbers, strings, lists, tuples and dictionaries, are Python objects. 

What is an object? In older programming languages, data was stored in variables while operations on the data were bundled into code units called functions, subroutines or procedures. Data and operations were separate entities. Objects, however, combine both elements of data storage and the operations on data. In Python, objects are used throughout to store data and to implement methods to operate on it. For example string objects store sequences of characters and they support many methods like $\textbf{.upper()}$, which transform the strings. The list object stores many kinds of data and has methods like  $\textbf{.append(...)}$ and $\textbf{.sort()}$. Different objects have different sets of methods that are appropriate to the data being stored

You can create your own customized objects using classes. There are three steps:

#### 1) Define the class.

#### 2) Create one of more 'instances' or 'cases' of your class.

#### 3) Use these instance(s)  of  your class

It's easiest to learn by example:

In [None]:
# Step 1: A class definition is basically a template for the creation of your new type of object
# by convention class names start with Upper case to distinguish them from defs and regular variables
class Sequence: 
    """ 
    as always, start with comments on your code
    Sequence is a general class object which will be the basis for
    DNA, RNA or protein sequence objects. 
    """
    def __init__(self):
      """ 
      most classes start with a special def named __init__ which is automatically executed just once
      every time you create an instance of your class. Here you can place any operations that must be done
      every time, like initializing variables. Since the __init__ def is always executed first,
      put here any data or data structure needed by the other defs, so they will be defined when needed
      """
      # define which characters can occur in our sequences
      self.legal = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' # A-Z are the legal characters
      # create an empty list to store our sequence
      self.seq = []
    """
    The heart of any class is a set of defs which implement the methods of your class
    Note the defs are one indent over, since they live inside the class definition
    """   
    def getseq(self):
      # input our sequence
      str = input('enter sequence >> ')
      s = str.upper()
      for c in s: 
        #  only store legal characters
        if(c in self.legal):
          self.seq.append(c)
      print('input sequence length: ',len(self.seq))

    def printseq(self):
      # print nicely in blocks of 10, 50 per line
      n = len(self.seq)
      for i in range(n):
        if(i % 10 == 0): print(' ',end="")
        if(i % 50 == 0): print('\n')
        print(self.seq[i],end="")
      print('\n')   

Some things to note: The syntax for the defs inside a class is identical to the normal Python defs, except that each def has an extra argument, conventionally called $\textbf{self}$. This dummy argument must be the first argument, and it is required inside the class definition only. As we see below, when invoking the class methods, the self argument in not needed. Some variables in the defs, such as $\textbf{self.seq}$ have the prefix $\textbf{self.}$. This means they can be referred to outside the class object once an instance of the class has been invoked. All other variables such as $\textbf{c, s}$ are hidden inside the class and cannot be referred to (or messed up!) from outside.

In [None]:
# Step 2: create an instance of our Sequence object
seq1 = Sequence()
# behind the scenes 'self' is replaced by the name we chose for this instance, namely 'seq1'
#
# Step3: we can now use the methods (defs) of our Sequence object with the usual Python syntax
seq1.getseq()
seq1.printseq()
# remember 'self' was the dummy argument. There were no other arguments in the defs,
# and so these two particular methods take no argument.
#
# we can access a self-prefixed variable from outside the class object, replacing 'self' by the class instance
# name, here 'seq1'. Accessing local variables in this way should should be done with caution: generally for 
# debug-printing or more rarely for assigning its value to another variable, NOT to change its value! Changing the
# value can have unpredictable results or break your object
print(seq1.seq)
#
# create and use more instances of our Sequence object as needed
seq2 = Sequence()
seq2.getseq()
# etc...

In [None]:
print(seq2.seq)

## Class Inheritance
One of the most powerful features of Python classes is inheritance: A new sub-class can be defined which inherits all the defs and data structures of one or more previously defined classes

In [None]:
class SequenceAA(Sequence):
    """
    inherit  defs (methods) from the Sequence class by including its name as an argument in the
    class statement. To inherit from more than one class, include their names separated by commas
    as arguments in the class statement
    """
    def __init__(self):
      # we redefine this def for proteins
      self.seq = []
      self.legal = 'ACDEFGHIKLMNPQRSTVWY' # legal characters exclude BJOUXZ
    
    def rescount(self,resLetterName):
        # a new def, specific to the SequenceAA class
        # it has one dummy argument, and one 'real' argument
        count = 0
        for s in self.seq:
            if(s == resLetterName): count += 1
        return count
    #
    # No defs for the getseq and print methods, so they behave the same as in parent class
    #

In [None]:
aa1 = SequenceAA()
# we get our inherited methods for free!
aa1.getseq()
aa1.printseq()
# use our new method.
prolines = aa1.rescount('P') # remember, the dummy self argument is gone, we only need the one real argument
print('# of prolines: ',prolines)

We'll define a couple more Sequence sub-classes, and have some fun with dictionaries.

In [None]:
class SequenceNA(Sequence):
    """
    inherits general sequence methods, but specialize to nucleic acids
    """
    def __init__(self):
      self.seq = []
      self.legal = 'AGCTU' # legal characters
      # define a dictionary to implement WC base pairing
      self.pair = {'A':'T', 'G':'C', 'C':'G', 'T':'A', 'U':'A'}
    
    def complement(self,RNA=False): 
      # return the complementary strand, either DNA or RNA
      # the RNA argument is optional since a default is given
      cseq  = []
      for c in self.seq:
        c_comp  = self.pair[c]
        if(RNA and (c_comp == 'T')): c_comp = 'U'
        cseq.append(c_comp)
      return cseq


In [None]:
# an instance of a nucleic acid sequence
dna1 = SequenceNA()
dna1.getseq()
# complentary DNA strand
dna1.complement()
# another NA instance
rna1 = SequenceNA()
# complementary RNA strand
rna1.seq = dna1.complement(RNA=True)
rna1.printseq()

In [None]:
class SequenceRNA(SequenceNA):
    """
    inherits sequenceNA methods
    """
    dna2rna = {'T':'U','U':'U','A':'A','C':'C','G':'G'} # in case we get DNA sequence by mistake
    # dictionaries defining genetic code
    #=================================================
    # if 1st position is U
    pos23_u = { \
                'UU':'f','UC':'f','UA':'l','UG':'l', \
                'CU':'s','CC':'s','CA':'s','CG':'s', \
                'AU':'y','AC':'y','AA':'|','AG':'|', \
                'GU':'c','GC':'c','GA':'|','GG':'w', \
                  }
    # if 1st position is C
    pos23_c = { \
                   'UU':'l','UC':'l','UA':'l','UG':'l', \
                   'CU':'p','CC':'p','CA':'p','CG':'p', \
                   'AU':'h','AC':'h','AA':'q','AG':'q', \
                   'GU':'r','GC':'r','GA':'r','GG':'r', \
                    }
    # if 1st position is A
    pos23_a = { \
                   'UU':'i','UC':'i','UA':'i','UG':'m', \
                   'CU':'t','CC':'t','CA':'t','CG':'t', \
                   'AU':'n','AC':'n','AA':'k','AG':'k', \
                   'GU':'s','GC':'s','GA':'r','GG':'r', \
                    }
    # if 1st position is G
    pos23_g = { \
                   'UU':'v','UC':'v','UA':'v','UG':'v', \
                   'CU':'a','CC':'a','CA':'a','CG':'a', \
                   'AU':'d','AC':'d','AA':'e','AG':'e', \
                   'GU':'g','GC':'g','GA':'g','GG':'g', \
                    }
    # 1st position is dictionary of dictionaries
    pos1 = { 'U':pos23_u, 'C':pos23_c, 'A':pos23_a, 'G':pos23_g }
    #=================================================
    def __init__(self):
        self.seq = []
        self.legal = 'AGCU' # legal characters
        self.pair = {'A':'U', 'G':'C', 'C':'G', 'U':'A'}

    def translate(self,fshift):
      aaseq  = []
      i = fshift
      n = len(self.seq)
      while(i+3 <= n):
        base1 = self.dna2rna[self.seq[i]]
        base2 = self.dna2rna[self.seq[i+1]]
        base3 = self.dna2rna[self.seq[i+2]]
        base23  = base2 + base3
        codon = base1 + base23
        print('codon: ',codon)
        base23_dict = self.pos1[base1]
        aa  = base23_dict[base23]
        aaseq.append(aa)
        i += 3
      return aaseq


In [None]:
rna2 = SequenceRNA()
rna2.seq = dna1.complement(RNA=True)
rna2.printseq()
aa2 = SequenceAA()
aa2.seq = rna2.translate(0)
aa2.printseq()