# Module 2 - Python classes, modules, and packages

## Classes and objects
Python is an object-oriented programming language. This means that it provides features that support object-oriented programming (OOP). Object-oriented programming brings together data and its behaviour (methods) in a single location (called an “object”) making it easier to conceptualize and understand. This reduces complexity and makes it easier to reuse code in different parts of a program or in different programs.

A class is a blueprint for the object. It describes how the object is made (what data it contains and what methods it has). An object is an instance of a class. It contains real values instead of variables. You can create as many objects as you want from a class. Each object is independent of the others. You can modify an object without affecting the others.

### Creating a class
Lets start by creating a class to hold a DNA sequence. We will need to define the attributes of the class, which are the variables that will be associated with each instance of the class.

We will start with a simple class that only has a single attribute, the DNA sequence itself.

In [None]:
class DNA:
    def _init_(self,seq):
        self.seq = seq


: 

Now let's create an instance of our class. We do this by calling the class name as if it were a function, and passing the required arguments to the class `__init__` method. The `__init__` method is a special method that is called when an instance of the class is created. It is used to initialize the attributes of the class. The first argument of the `__init__` method is always the object itself (the instance of the class). By convention, this argument is called `self`. The other arguments are the ones that we passed to the class when we created the instance.

In [None]:
myDNA = DNA('ATGCAGTACTGACGTATCGCATTCGTCATGC')

Right now this instance doesn't do much. It has a single attribute, but no methods to do anything with it. Let's add a few methods to our class that will allow us to calculate features of the DNA sequence.

In [None]:
class DNA:
    def __init__(self,seq):
        self.seq = seq
    def length(self):
        return(len(self.seq))
    def gc(self):
        G = self.seq.count("G")
        C = self.seq.count("C")
        nGC = G+C
        return(nGC/len(self.seq)*100)
    def tm(self):
        G = self.seq.count("G")
        C = self.seq.count("C")
        A = self.seq.count("A")
        T = self.seq.count("T")
        Tm = 64.9+41*(G+C-16.4)/(A+T+G+C)
        return(Tm)


In [None]:
myDNA = DNA('ATGCAGTACTGACGTATCGCATTCGTCATGC')

print(myDNA.length())
print(myDNA.gc())
print(myDNA.tm())


31
48.38709677419355
63.0483870967742


# Reading and parsing Genomics data files
## Reading FASTA files
FASTA is a file format for representing nucleotide or peptide sequences. A FASTA file consists of a header line followed by lines of sequence data. The header line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier (name) of the sequence, and the rest of the line is an optional description of the entry. There should be no space between the ">" and the first letter of the identifier. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence.

FASTA is a common format in bioinformatics for storing sequence strings. It is a simple format that is easy to parse. Let's write a function that reads a FASTA file and returns a list of DNA sequences.

In [None]:
# Create a python function to take a fasta filename as an argument an return a list of sequences
def parseFasta(filename):
    # Open the file
    fastaFile = open(filename, 'r')
    # Create an empty list to store the sequences
    sequences = []
    # Create an empty string to store the current sequence
    currentSequence = ''
    # Loop through the lines in the file
    for line in fastaFile:
        # If the line starts with a >, we have a new sequence
        if line.startswith('>'):
            # If we have a current sequence, add it to the list
            if currentSequence != '':
                sequences.append(currentSequence)
            # Reset the current sequence
            currentSequence = ''
        # Otherwise, we have a sequence line
        else:
            # Add the line to the current sequence
            currentSequence += line.strip()
    # Add the last sequence to the list
    sequences.append(currentSequence)
    # Close the file
    fastaFile.close()
    # Return the list of sequences
    return sequences