# Classes and Objects

## Simple classes and objects: Atom and Molecule revisited

Here are the `Atom` and `Molecule` classes from the first part of the session, but with proper docstrings. How close this docstring style is to the optimal approach is unclear (e.g. some sources clearly put more information about each method in the class docstring than I have here):

In [3]:
class Atom:
    """
    A simple class for handling atomic coordinates.
    
    Object attributes:
    element (str): Standard abbreviation for atomic element.
    x (float): x coordinate (position of atom).
    y (float): y coordinate (position of atom).
    z (float): z coordinate (position of atom).

    Object methods:
    __init__(self, element, x=0, y=0, z=0): Create an Atom object.
    coordinates(self): Return x,y,x coordinates of atom.
    """

    def __init__(self, element, x=0, y=0, z=0):
        self.element = element
        (self.x, self.y, self.z) = (x, y, z)

    def coordinates(self):
        """ Return x,y,z coordinates of atom as tuple. """
        return (self.x, self.y, self.z)

class Molecule:
    """
    A simple class for handling inorganic molecules.
    
    Object attributes:
    mol_id (str): Identifier (e.g. name of molecule).
    formula (str): Chemical formula of molecule.
    atoms (list): List of Atom objects.
 
    Object methods: 
    __init__(self, mol_id, formula='', atom_list=None): Create a Molecule object.
    """

    def __init__(self, mol_id, formula='', atom_list=None):
        self.mol_id = mol_id
        self.formula = formula
        if atom_list is None:
            self.atoms = []
        else:
            self.atoms = atom_list

Here's a small file containing the atoms from two amino-acid residues (an aspartic acid and an asparagine) together with their coordinates:

In [1]:
%%bash
head -5 ../data/atoms.txt

N:      23.036  -7.827  85.247  
C:      21.861  -7.231  84.536  
C:      22.278  -6.291  83.393  
O:      21.585  -6.193  82.376  
C:      20.964  -6.488  85.538  


Now I'm going to create two `Molecule` objects from this data. Given the limited functionality of the `Molecule` class (it doesn't have any methods), I'm going to simply count the number of hydrogens in each:

In [18]:
atom_fname = '../data/atoms.txt'

with open(atom_fname, 'r') as f:
    atom_pos_list = f.read().splitlines()

atom_object_list = []
for pos in atom_pos_list:
    (atom, coords) = pos.split(':')
    atom = Atom(atom, coords.split())
    atom_object_list.append(atom)

asp = Molecule('aspartic acid', 'C4H7NO4', atom_object_list[:17])
asn = Molecule('asparagine', 'C4H8N2O3', atom_object_list[17:])

# Count the number of hydrogens in each molecule
h_count = 0
for a in asp.atoms:
    if a.element == 'H':
        h_count += 1
print(asp.formula, 'has', h_count, 'hydrogens')

h_count = 0
for a in asn.atoms:
    if a.element == 'H':
        h_count += 1
print(asn.formula, 'has', h_count, 'hydrogens')

C4H7NO4 has 7 hydrogens
C4H8N2O3 has 8 hydrogens


## Inheritance with overriding: OrganicMolecule

Here is an `OrganicMolecule` class. It differs from the one in my slides as follows: 

- There is a class variable (`global_carbon_count`) to maintain a global count of carbon atoms in all molecules. The count increases when a new `OrganicMolecule` object is created, and descreases when an `OrganicMolecule` object is deleted. 

- Calculating the object carbon count and increasing the class carbon count are handled by the `__init__` method, which means the base class `__init__` is overridden. But the existing functionality in the `__init__` method of the base class, `Molecule`, is still needed. Rather than copy and paste, the base method is invoked within the derived `__init__`, which involves using the Python builtin function `super()`. 

- Decreasing the class carbon count is carried out by a new `__del__` method.

- It has a much better docstring! (Documenting derived classes is quite a big topic; I've chosen a fairly simple approach.)

In [19]:
class OrganicMolecule(Molecule):
    """
    A simple class for handling organic molecules.
    
    Object attributes inherited from base class:
    mol_id (str): Identifier (e.g. name of molecule).
    formula (str): Chemical formula of molecule.
    atoms (list): List of Atom objects.
    
    Object attribute:
    n_carbons (int): Number of carbon atoms in molecule.
    Class attribute:
    global_carbon_count (int): Global count of carbon atoms in all molecules.
 
    Methods: 
    __init__(self, mol_id, formula='', atom_list=None): Create an OrganicMolecule object.
    carbon_count(self): Return count of molecule's carbon atoms.
    """

    global_carbon_count = 0
    
    def __init__(self, mol_id, formula='', atom_list=None):
        super().__init__(mol_id, formula, atom_list)       # This is a key line!
        self.n_carbons = 0
        for a in self.atoms:
            if a.element == 'C':
                self.n_carbons += 1
        OrganicMolecule.global_carbon_count += self.n_carbons

    def __del__(self):
        OrganicMolecule.global_carbon_count -= self.n_carbons
        
    def carbon_count(self):
        """Return count of molecule's carbon atoms."""
        return self.n_carbons


Now I create two glycine residues and test the `carbon_count()` method and the class variable `global_carbon_count`: 

In [20]:
my_atoms = [Atom('C'), Atom('C'), Atom('H'), 
            Atom('H'), Atom('H'), Atom('H'), 
            Atom('H'), Atom('N'), Atom('O'), 
            Atom('O')]

gly1 = OrganicMolecule('glycine A', 'C2H5NO2', my_atoms)
gly2 = OrganicMolecule('glycine B', 'C2H5NO2', my_atoms)
print(gly1.mol_id, ': carbons = ', gly1.carbon_count())
print(gly2.mol_id, ': carbons = ', gly2.carbon_count())

print('Carbons in two gylcines:', OrganicMolecule.global_carbon_count)
del gly1
print('Carbons in one gylcine:', OrganicMolecule.global_carbon_count)

glycine A : carbons =  2
glycine B : carbons =  2
Carbons in two gylcines: 4
Carbons in one gylcine: 2


## Operator overloading: Atom and Molecule revisted (again)

Here I've redefined the `Atom` and `Molecule` classes to incorporate operator overloading, as follows:

- atom + atom
- molecule + atom
- molecule + molecule
- molecule += atom
- molecule += molecule

Notice that in class `Molecule`, both `__add__` and `__iadd__` accept either `Atom` and `Molecule` objects as arguments: they check which of these objects has been passed using the `isinstance()` function, and vary their behaviour accordingly. 

(I've also reverted to minimalistic docstrings...) 

In [3]:
class Atom:
    """A simple class for handling inorganic molecules."""
    def __init__(self, element, x=0, y=0, z=0):
        self.element = element
        (self.x, self.y, self.z) = (x, y, z)

    def __add__(self, other):
        return Molecule('', self.element + other.element, 
                        [self, other])
    
class Molecule:
    """A simple class for handling inorganic molecules."""
    def __init__(self, mol_id, formula='', atom_list=None):
        self.mol_id = mol_id
        self.formula = formula
        if atom_list is None:
            self.atoms = []
        else:
            self.atoms = atom_list
            
    def __add__(self, other):
        if isinstance(other, Atom): # if the other object you're adding is an Atom
            return Molecule(self.mol_id, 
                        self.formula + other.element, 
                        self.atoms.append(other))
        elif isinstance(other, Molecule): # if the other object is a Molecule
            return Molecule(self.mol_id + ' + ' + other.mol_id, 
                        self.formula + other.formula, 
                        self.atoms.extend(other.atoms)) # extend works the same as append here?
        else:
            return None

    def __iadd__(self, other):
        if isinstance(other, Atom):
            self.formula += other.element
            self.atoms.append(other)
            return self # don't need to define as a molecule like with '+', because self is already Molecule class
        elif isinstance(other, Molecule):
            self.mol_id += ' + ' + other.mol_id
            self.formula += other.formula
            self.atoms.extend(other.atoms)
            return self
        else:
            return self

In [25]:
help(list.extend)

Help on method_descriptor:

extend(self, iterable, /)
    Extend list by appending elements from the iterable.



Here are a few examples to test the behaviour of the overloaded operators: 

In [4]:
atom1 = Atom('C')
atom2 = Atom('O')
mol1 = atom1 + atom2
print('Atom + Atom:', mol1.formula)

atom3 = Atom('H')
mol2 = mol1 + atom3
print('Molecule + Atom:', mol2.formula)

atom4 = Atom('C')
mol2 += atom4
print('Molecule += Atom:', mol2.formula)

mol1 += mol2
print('Molecule += Molecule:', mol1.formula)

Atom + Atom: CO
Molecule + Atom: COH
Molecule += Atom: COHC
Molecule += Molecule: COCOHC


## Creating your own simple DnaBase and DnaSeq objects

If you have the time and inclination, create your own simple `DnaBase` and `DnaSeq` classes. They should have the following instance variables and methods:

`DnaBase`:
- `name`: full name of base (adenine, cytosine, thymine, or guanine)
- `letter()`: returns single letter code for base (A, C, T or G). 

`DnaSeq`:
- `seq_id`: DNA sequence identifer 
- `bases`: list of `DnaBase` objects
- `seq()`: returns the DNA sequence as a string.

In other respects, how you decide to implement the classes is up to you (e.g. with `__init__` methods).

In [144]:
class DnaBase:
    """
    Class to define DNA Bases
    
    ---- Class Attributes ----
    base_codes (dict): Key: full name of DNA base. Value: corresponding single letter code for DNA base.
    
    ---- Class Methods ----
    None
    
    ---- Object Attributes ----
    name (str): Full name of base (one of: Adenine, Guanine, Cytosine, or Thymine).
    
    ---- Object Methods ----
    __init__(self, name)
    letter (self)
    """
    base_codes = {"adenine" :"A",
                  "guanine" :"G",
                  "cytosine":"C",
                  "thymine" :"T"}
    
    def __init__(self, name: str) -> str:
        """Creates instance of DnaBase object"""
        self.name = name.lower()
    
    def letter(self: str) -> str:
        """Checks if given name is present in dictionary of base names
        Returns single letter base as a string if true (one of: A, C, T or G)."""
        return DnaBase.base_codes.get(self.name)

    
class DnaSeq:
    """
    Class to define a DNA Sequence
    
    ---- Class Attributes ----
    None
    
    ---- Class Methods ----
    None
    
    ---- Object Attributes ----
    seq_id (str): DNA sequence identifier
    bases (list): A list of bases, of class DnaBase

    ---- Object Methods ---- 
    __init__(self, seq, bases = None)
    seq(self)
    """
    
    def __init__(self, seq_id, bases = None) -> str:
        """Creates instance of DnaSeq object, including seq_id and bases attributes for the object"""
        self.seq_id = seq_id
        if bases == None:
            self.bases = []
        else:
            self.bases = bases
            
    def seq(self: str) -> str:
        """Returns the DNA sequence as a string"""
        try:
            return "".join(self.bases)
        except:
            return "****ERROR: Non-base found in sequence, please check bases_list attribute****"

Having created the classes, create a few `DnaBase` objects, use them to create a `DnaSeq` object, then print out its sequence:

In [151]:
a = DnaBase("Adenine")
a = a.letter()

g = DnaBase("Guanine")
g = g.letter()

c = DnaBase("Cytosine")
c = c.letter()

t = DnaBase("Thymine")
t = t.letter()

bases_list = [a,c,g,c,t,t,g,t,t,t,a,c,c,g,g,c,c,c,c,g,c,a,t,g]

sequence = DnaSeq("Sequence_00001", bases_list)
print(sequence.seq_id, "is:", sequence.seq())

help(DnaBase)
help(DnaSeq)

Sequence_00001 is: ACGCTTGTTTACCGGCCCCGCATG
Help on class DnaBase in module __main__:

class DnaBase(builtins.object)
 |  DnaBase(name: str) -> str
 |  
 |  Class to define DNA Bases
 |  
 |  ---- Class Attributes ----
 |  base_codes (dict): Key: full name of DNA base. Value: corresponding single letter code for DNA base.
 |  
 |  ---- Class Methods ----
 |  None
 |  
 |  ---- Object Attributes ----
 |  name (str): Full name of base (one of: Adenine, Guanine, Cytosine, or Thymine).
 |  
 |  ---- Object Methods ----
 |  __init__(self, name)
 |  letter (self)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, name: str) -> str
 |      Creates instance of DnaBase object
 |  
 |  letter(self: str) -> str
 |      Checks if given name is present in dictionary of base names
 |      Returns single letter base as a string if true (one of: A, C, T or G).
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |    

In [133]:
help(AssertionError)

Help on class AssertionError in module builtins:

class AssertionError(Exception)
 |  Assertion failed.
 |  
 |  Method resolution order:
 |      AssertionError
 |      Exception
 |      BaseException
 |      object
 |  
 |  Methods defined here:
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from BaseException:
 |  
 |  __delattr__(self, name, /)
 |      Implement delattr(self, name).
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __reduce__(...)
 |      Helper for pickle.
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  __s