# Day 2: Programming Styles

## Functional Programming

Functional programming rests on a few basic principles:

1. **First-Class and Higher-Order Functions**: Python treats functions as first-class objects, which can be assigned to variables, stored in data structures, passed as arguments to other functions, or returned as values from other functions. A higher-order function is a function that takes one or more functions as arguments, returns a function as its result, or both.
2. **Pure Functions**: A pure function is a function where the return value is only determined by its input values, without observable side effects. This is a core concept in functional programming.
3. **Recursion**: Functional programming relies heavily on recursion, the process of a function calling itself as a subroutine.
4. **Lambda Functions**: These are small anonymous functions that are thrown away after use, which helps keep your code concise. They are often used as arguments to higher-order functions, or in the context of pure functions.
5. **Immutability**: In functional programming, once a data structure is created, it cannot be changed.

In [None]:
# Higher-order functions
def multiply_by_two(n):
    return n * 2

numbers = [1, 2, 3, 4, 5]
numbers_multiplied = map(multiply_by_two, numbers)
print(list(numbers_multiplied))  # Outputs: [2, 4, 6, 8, 10]

In [None]:
numbers = [1, 2, 3, 4, 5]
numbers_multiplied = map(lambda n: n * 2, numbers)
print(list(numbers_multiplied))  # Outputs: [1, 4, 9, 16, 25]

The often cited benefits of functional programming are that it is easier to reason about, and easier to test. It also tends to be more concise, and more easily parallelizable.

What are some of the downsides? Functional programming can be more difficult to learn, and it can be more difficult to find developers with the necessary skillset. It can also be more difficult to debug, depending on the complexity of the code.

There are a few functions in Python you will often hear about in the context of functional programming. Let's take a look at them.

### map()

The `map()` function applies a given function to each item of an iterable (list, tuple etc.) and returns a list of the results.

### filter()

The `filter()` function constructs an iterator from elements of an iterable (list, tuple etc.) for which a function returns true.

### reduce()

The `reduce()` function is used to apply a particular function passed in its argument to all of the list elements mentioned in the sequence passed along. This function is defined in `functools` module.

However, today most of the time we use list comprehension instead of `map()`, `filter()` and `reduce()`. Why? Because it is more readable and concise. Let's compare how they fare against each other.


In [1]:
sequences = ["ATGC", "AAGC", "TTGC", "ATTC", "ATGG"]

#get sequence lengths
sequence_lengths = map(len, sequences)
print(list(sequence_lengths))  # Outputs: [4, 4, 4, 4, 4]

#filter sequences with AT start
at_sequences = filter(lambda seq: seq.startswith("AT"), sequences)
print(list(at_sequences))  # Outputs: ['ATGC', 'ATTC', 'ATGG']

#calculate total length of all sequences
from functools import reduce
total_length = reduce(lambda x, y: x + y, map(len, sequences))
print(total_length)  # Outputs: 20

#get sequences with GC 
gc_sequences = filter(lambda seq: "GC" in seq, sequences)
print(list(gc_sequences))  # Outputs: ['ATGC', 'AAGC', 'TTGC']

#filter sequences with length > 3
long_sequences = [seq for seq in sequences if len(seq) > 3]
print(long_sequences)  # Outputs: ['ATGC', 'AAGC', 'TTGC', 'ATTC', 'ATGG']

#put everything together the old way
total_gc_length = reduce(lambda x, y: x + y, map(len, filter(lambda seq: "GC" in seq, sequences)))
print(total_gc_length)  # Outputs: 12

#put everything together with a generator comprehension
total_gc_length = sum(len(seq) for seq in sequences if "GC" in seq)
print(total_gc_length)  # Outputs: 

[4, 4, 4, 4, 4]
['ATGC', 'ATTC', 'ATGG']
20
['ATGC', 'AAGC', 'TTGC']
['ATGC', 'AAGC', 'TTGC', 'ATTC', 'ATGG']
12
12


## Object-Oriented Programming (OOP)

Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data, in the form of fields (often known as attributes or properties), and code, in the form of procedures (often known as methods).

A feature of objects is an object's procedures that can access and often modify the data fields of the object with which they are associated (objects have a notion of "this" or "self"). In OOP, computer programs are designed by making them out of objects that interact with one another. OOP languages are diverse, but the most popular ones are class-based, meaning that objects are instances of classes, which also determine their types.

In Python, the concept of OOP follows some basic principles:

* **Inheritance**: A process of using details from a new class without modifying existing class.
* **Composition**: Classes can be composed of parts of other classes, enabling re-use of code.
* **Encapsulation**: Hiding the private details of a class from other objects.
* **Polymorphism**: A concept of using common operation in different ways for different data input.

### Class

A class is a blueprint for the object.

We can think of class as an sketch of a parrot with labels. It contains all the details about the name, colors, size etc. Based on these descriptions, we can study about the parrot. Here, parrot is an object.

The example for class of parrot can be :



In [None]:
class Parrot:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def sing(self, song):
        return f"{self.name} sings {song}"

    def dance(self):
        return f"{self.name} is now dancing"

### Object

Objects on the other hand are an encapsulation of variables and functions into a single entity. Objects get their variables and functions from classes. Classes are essentially a template to create your objects.

There are a few rules in Python that are useful to keep in mind:

### Variable lookup via LEGB: Local, Enclosing, Global, Built-in

The LEGB rule is used to decide the order in which the namespaces are to be searched for scope resolution. The scopes are listed below in terms of hierarchy(highest to lowest/narrowest to broadest):

* Local(L): Defined inside function/class
* Enclosed(E): Defined inside enclosing functions(Nested function concept)
* Global(G): Defined at the uppermost level
* Built-in(B): Reserved names in Python builtin modules

### Attribute lookup via ICPO: Instance, Class, Parent, Object

The ICPO rule is used to decide the order in which the namespaces are to be searched for scope resolution. The scopes are listed below in terms of hierarchy(highest to lowest/narrowest to broadest):

* Instance(I): Defined inside method and self
* Class(C): Defined inside class and class name
* Parent(P): Defined inside parent class
* Object(O): Defined inside __dict__


### Inheritance

Inheritance is a way of creating a new class for using details of an existing class without modifying it. The newly formed class is a derived class (or child class). Similarly, the existing class is a base class (or parent class).

In [2]:
class Sequence:
    def __init__(self, seq):
        self.seq = seq

    def length(self):
        return len(self.seq)
    
class DNASequence(Sequence):
    def gc_content(self):
        return (self.seq.count('G') + self.seq.count('C')) / self.length()

class ProteinSequence(Sequence):
    def count_amino_acid(self, amino_acid):
        return self.seq.count(amino_acid)

### Composition

In contrast to inheritance, composition is a way of combining objects of different types into a new type of object. Composition is an alternative to inheritance. Inheritance is used to reuse the code and composition is used to combine existing objects in different ways.

When would you use inheritance vs. composition? Inheritance is useful when you want to create a new class and there is already a class that includes some of the code that you want, so you can derive your new class from the existing class. On the other hand, composition is useful when you have some existing code that you want to reuse but there is not a suitable existing class to derive from. In general, composition tends to be more flexible than inheritance and leads to less tightly coupled code, in the spirit of the principle of *sepation of concerns*: the ability to change one component without affecting others.

In [7]:
class SequenceAnalyzer:
    def __init__(self, sequence):
        self.sequence = sequence

    def get_gc_content(self):
        """calculates the GC content of a DNA sequence"""
        #check if sequence is of class DNASequence
        if not isinstance(self.sequence, DNASequence):
            raise TypeError("Sequence must be of type DNASequence")
        return self.sequence.gc_content()

    def get_amino_acid_count(self, amino_acid):
        """calculates the number of amino acids in a protein sequence"""
        #check if sequence is of class ProteinSequence
        if not isinstance(self.sequence, ProteinSequence):
            raise TypeError("Sequence must be of type ProteinSequence")
        return self.sequence.count(amino_acid)
    
analyser = SequenceAnalyzer(DNASequence("ATGC"))
print(analyser.get_gc_content())  # Outputs: 0.5

0.5


### Polymorphism and Abstract Base Classes (ABCs)

Polymorphism is a concept that allows us to use a unified interface for different classes. Let's suppose we want to ensure that all sequence types can calculate a representation of their content. We can use an abstract base class to enforce that all subclasses implement this functionality.

Abstract Base Classes (ABCs) ensure that derived classes implement particular methods from the base class. They can also provide default implementations of these methods.
Why would you want to use ABCs? They allow you to create a common API for a set of subclasses. This is especially useful for libraries, where you can define a base class that specifies the methods that a subclass must implement, without actually implementing any of the methods. This way, you can ensure that the subclasses implement the methods you need, without having to implement them yourself.

In [11]:
from abc import ABC, abstractmethod

class Sequence(ABC):
    def __init__(self, seq):
        self.seq = seq

    def length(self):
        return len(self.seq)

    @abstractmethod
    def content(self):
        pass

class DNASequence(Sequence):
    def content(self):
        return (self.seq.count('G') + self.seq.count('C')) / self.length()

class ProteinSequence(Sequence):
    def content(self):
        # For protein, let's return a dictionary of amino acid counts
        return {aa: self.seq.count(aa) for aa in set(self.seq)}
    
class RNASequence(Sequence):
    def gc_content(self):
        return (self.seq.count('G') + self.seq.count('C')) / self.length()

rna = RNASequence("AUGCUAGC")

TypeError: Can't instantiate abstract class RNASequence with abstract method content

### Making your classes readable with docstrings, \_\_repr__ and \_\_str__

Documentation is an important part of any codebase. It helps you and others understand what your code does, and how to use it. In Python, there are a few ways to document your code. The first is docstrings, which are a type of comment that can be used to document modules, functions, classes and methods. They are surrounded by triple quotes, and are the preferred way to document your code.

In addition, there are two special methods you can implement to control how your classes are displayed. The first is `__repr__`, which is used to display the "official" string representation of an object. The second is `__str__`, which is used to display a more informal string representation of an object.

When would you use `__repr__` vs `__str__`? `__repr__` is used for debugging, and should return the most unambiguous representation of an object possible. `__str__` is used for display, and should return a human-readable representation of an object. If you only implement one of these methods, choose `__repr__`. If `__str__` is not implemented, Python will fall back on `__repr__`, so you will get the best of both worlds.

### Exceptions, where to find them and how to write them

Exceptions are a type of error that occurs during the execution of a program. They are "thrown" from a function, and can be "caught" in a `try` block. If an exception is not caught, the program will crash.

Exceptions are a useful way to handle errors in your code. They allow you to gracefully handle errors, and provide useful error messages to your users.

Where can you find exceptions? The Python Standard Library has a list of built-in exceptions, which you can find [here](https://docs.python.org/3/library/exceptions.html). You can also create your own custom exceptions, which is useful if you want to create a custom error message for your users.

What do you need to do in order to create a custom exception? You need to create a class that inherits from the `Exception` class. You can also implement the `__str__` method, which will allow you to customize the error message that is displayed to your users. Ideally you actually inherit not from the generic `Exception` class, but from a more specific subclass of `Exception` that is appropriate for your use case (e.g. `ValueError` or `TypeError`).

In [13]:
class InvalidDNASymbolError(ValueError):
    pass

class DNASequence(Sequence):
    def __init__(self, seq):
        if set(seq) - {'A', 'T', 'G', 'C'}:
            raise InvalidDNASymbolError("Invalid symbol in DNA sequence")
        super().__init__(seq)

    def content(self):
        return (self.seq.count('G') + self.seq.count('C')) / self.length()

# This will raise an InvalidDNASymbolError
dna_seq = DNASequence("ATGCB")

InvalidDNASymbolError: Invalid symbol in DNA sequence

### Instance, Class, and Static Methods

Before we talk about the topic of instance, class, and static methods, let's first talk about the difference between variables of a class and an instance. A class is a blueprint for an object. An instance is a specific object created from a particular class.

Instance variables are variables that are unique to each instance. They are defined inside the `__init__` method of a class. Class variables are variables that are shared by all instances of a class. They are defined outside the `__init__` method of a class.

In [41]:
class MyClass:
    class_variable = "I am a class variable"

    def __init__(self, value):
        self.instance_variable = value

x = MyClass('x')
y = MyClass('y')

print(x.class_variable)  # Outputs: "I am a class variable"
print(y.class_variable)  # Outputs: "I am a class variable"

MyClass.class_variable = "Changed class variable"

print(x.class_variable)  # Outputs: "Changed class variable"
print(y.class_variable)  # Outputs: "Changed class variable"

x.instance_variable = "Changed instance variable"

print(x.instance_variable)  # Outputs: "Changed instance variable"
print(y.instance_variable)  # Outputs: "y"

I am a class variable
I am a class variable
Changed class variable
Changed class variable
Changed instance variable
y


Now there is something similar not for attributes (variables attached to a class) but for methods (functions attached to a class). There are three types of methods in Python: instance methods, class methods, and static methods.

- Instance methods are methods that are defined inside a class and are called on an instance of that class. They have access to the instance variables of that class. 
- Class methods are methods that are defined inside a class and are called on the class itself. They have access to the class variables of that class. 
- Static methods are methods that are defined inside a class and are called on the class itself. They do not have access to the instance variables or class variables of that class.

This distinction is important because it allows us to restrict access to certain methods. For example, we might want to restrict access to a method that modifies an instance variable. We can do this by making the method a class method instead of an instance method.

Instance methods are the most common type of methods in Python classes. They always take self as the first parameter, which is a reference to the instance of the class.

Let's create an instance method to calculate the GC content of a DNA sequence.

In [14]:
class DNASequence:
    def __init__(self, seq):
        self.seq = seq

    def gc_content(self):
        return (self.seq.count('G') + self.seq.count('C')) / len(self.seq)

In [15]:
dna = DNASequence("ATGCGC")
print(dna.gc_content())

0.6666666666666666


Class methods are methods that operate on the class itself, rather than on instances of the class. They are defined using the @classmethod decorator and their first parameter is cls, which refers to the class.

Let's create a class method that creates a DNASequence object from a string in FASTA format.



In [16]:
class DNASequence:
    def __init__(self, seq):
        self.seq = seq

    @classmethod
    def from_fasta(cls, fasta_str):
        return cls(''.join(fasta_str.split('\n')[1:]))

In [19]:
fasta_str = ">seq1\nATGC\nGCTA"
dna = DNASequence.from_fasta(fasta_str)
dna.seq

'ATGCGCTA'

Static methods are methods that don't operate on instances or the class itself. They're related to the class in some way, but they don't change the state of instances or the class. They are defined using the @staticmethod decorator.

Let's create a static method that checks whether a string is a valid DNA sequence:

In [27]:
class DNASequence:
    def __init__(self, seq):
        self.seq = seq

    @staticmethod
    def is_valid_dna(seq):
        if set(seq.upper()) - {'A', 'T', 'G', 'C'}:
            return False
        return True

In [29]:
print(DNASequence.is_valid_dna("WGC")) 

False


## Context Managers

In [None]:
data = open("../data/assay_data.csv", 'r')
print(data)
print(data.readline())
print(data.readline())

#read data into a list
data_list = []
for line in data:
    data_list.append(line.strip().split(','))
print(data_list[0:5])

data.close()

In [None]:
with open("../data/assay_data.csv", 'r') as data:
    data_list = []
    for line in data:
        data_list.append(line.strip().split(','))
print(data_list[0:5])

In [None]:
data = open("../data/assay_data.csv", 'r')
try:
    data_list = []
    for line in data:
        data_list.append(line.strip().split(','))
finally:
    data.close()

Internally, context managers use the `__enter__` and `__exit__` methods. The `__enter__` method is called when the context is entered and the `__exit__` method is called when the context is exited. The `__exit__` method is called even if an exception is raised in the context. This is useful for cleaning up resources even if an error occurs.Internally, context managers use the `__enter__` and `__exit__` methods. The `__enter__` method is called when the context is entered and the `__exit__` method is called when the context is exited. The `__exit__` method is called even if an exception is raised in the context. This is useful for cleaning up resources even if an error occurs.

In [None]:
#write a context manager that implements a timer that can be used with the with statement
import time

class Timer:
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        self.end = time.time()
        self.interval = self.end - self.start
        print(f"Elapsed time: {self.interval} seconds")

In [None]:
with Timer() as t:
    #perform 1 million calculations
    for i in range(1000000):
        i = i + 1
    with Timer() as t2:
        l = []
        for i in range(100000):
            l.append(i)
        for i in range(100000):
            l.pop()
    with Timer() as t3:
        l = []
        for i in range(100000):
            l.insert(0, i)
        for i in range(100000):
            l.pop(0)

## Type Hints

Type Hints are a formalization of the existing "comment-based" type hinting syntax. They allow you to annotate your code with hints about the types of variables, function parameters and function return values. These hints are not enforced by the Python interpreter, but they can be used by third party tools such as type checkers, IDEs, linters, etc.

In other languages like Java, C++, C#, etc., the compiler can check that the types of the variables, parameters and return values match the type hints. This is called static type checking. Python is a dynamically typed language, so the type of a variable is determined at runtime. This means that type checking cannot be done at compile time. However, type hints can be used by third party tools to check the types of variables, parameters and return values at runtime. This is called dynamic type checking.

Let's look at how we can incorporate type hints into our DNASequence class.

In [30]:
class DNASequence:
    def __init__(self, seq: str) -> None:
        if not self.is_valid_dna(seq):
            raise ValueError("Invalid DNA sequence")
        self.seq = seq

    @staticmethod
    def is_valid_dna(seq: str) -> bool:
        if not seq:  # seq should not be empty
            return False
        return set(seq.upper()) <= {'A', 'T', 'G', 'C'}

    def gc_content(self) -> float:
        return (self.seq.count('G') + self.seq.count('C')) / len(self.seq)

    @classmethod
    def from_fasta(cls, fasta_str: str) -> 'DNASequence':
        return cls(''.join(fasta_str.split('\n')[1:]))

In this version of the class, each method is annotated with type hints. Here's what they mean:

- The __init__ method expects seq to be a string (str), and its return value is None. This is a convention used to indicate that the method doesn't return a meaningful result.

- The is_valid_dna method expects seq to be a string (str) and it returns a boolean (bool).

- The gc_content method doesn't take any parameters other than self, and it returns a float (float).

- The from_fasta class method expects fasta_str to be a string (str), and it returns an instance of DNASequence. Note that we use 'DNASequence' (a string) instead of DNASequence because DNASequence isn't fully defined at this point in the code.

Now, suppose we want to write a function that calculates the average GC content for a list of sequences. Here's how we can use type hints:

In [31]:
from typing import List

def average_gc_content(sequences: List[DNASequence]) -> float:
    return sum(seq.gc_content() for seq in sequences) / len(sequences)

In this function, sequences is expected to be a list of DNASequence instances, and the function returns a float. The List type hint comes from the typing module, which contains many other useful type hints.

Type hints can make your code easier to understand and debug, especially for large and complex codebases. However, they are completely optional and do not affect the runtime behavior of your code.

Python's typing module includes a variety of constructs that can be used to create more descriptive type hints. Let's explore a few of these.

1. **Union** can be used when a variable could be one of several types. For example, suppose we want to modify the DNASequence class to also accept a list of strings in the from_fasta method:



In [36]:
from typing import Union, List

class DNASequence:
    def __init__(self, seq: str) -> None:
        if not self.is_valid_dna(seq):
            raise ValueError("Invalid DNA sequence")
        self.seq = seq
        
    @staticmethod
    def is_valid_dna(seq: str) -> bool:
        if not seq:  # seq should not be empty
            return False
        return set(seq.upper()) <= {'A', 'T', 'G', 'C'}

    def gc_content(self) -> float:
        return (self.seq.count('G') + self.seq.count('C')) / len(self.seq)

    @classmethod
    def from_fasta(cls, fasta_str: Union[str, List[str]]) -> 'DNASequence':
        if isinstance(fasta_str, list):
            return cls(''.join(fasta_str))
        else:
            return cls(''.join(fasta_str.split('\n')[1:]))

2. **Optional** is used when a variable could be a certain type or None. For instance, suppose we have a function that finds a subsequence in a DNA sequence, which may or may not exist:

In [33]:
from typing import Optional

def find_subsequence(seq: DNASequence, subseq: str) -> Optional[int]:
    try:
        return seq.seq.index(subseq)
    except ValueError:
        return None

3. **Callable** is used to type hint function or lambda parameters. For example, suppose we have a function that applies a certain transformation to a DNA sequence:

In [39]:
from typing import Callable

def transform_sequence(seq: DNASequence, transformation: Callable[[str], str]) -> DNASequence:
    return DNASequence(transformation(seq.seq))

#use a transform function to convert a DNA sequence to RNA
def to_rna(seq: str) -> str:
    return seq.replace('T', 'U')

#use a transform function to convert a DNA sequence to its reverse complement
def reverse_complement(seq: str) -> str:
    complement = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    return ''.join(complement[base] for base in reversed(seq))

dna = DNASequence("ATGCGC")
# rna = transform_sequence(dna, to_rna)
rev = transform_sequence(dna, reverse_complement)

4. **Dict** and **Tuple** are used for type hinting dictionaries and tuples.

In [40]:
from typing import Dict, Tuple

def count_nucleotides(seq: DNASequence) -> Dict[str, int]:
    return {base: seq.seq.count(base) for base in 'ATGC'}

def subsequence_indices(seq: DNASequence, subseq: str) -> Tuple[int, int]:
    start = seq.seq.index(subseq)
    return start, start + len(subseq)

5. **TypeVar** and **Generic** can be used to specify that the type of one value depends on another. Here's an example:

In [None]:
from typing import TypeVar, Generic

T = TypeVar('T')

class Box(Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value

    def get(self) -> T:
        return self.value

In this example, a Box can contain a value of any type, and the get method is guaranteed to return a value of the same type.