# <span style="color:teal;">CIS 211 Live Coding Exercise</span>

* demonstrate classes and inheritance
* discuss testing and debugging strategies

###  <span style="color:teal">Background</span>

Cells in animals and plants have **chromosomes** that carry genetic information

Chromosomes are long strands of DNA.  If we zoom in close enough to see the double-helix structure we'll see there are four different kinds of molecules (sometimes called **bases**) connecting the strands
* adenine
* cytosine
* guanine
* thymine

A base on one strand is always paired with its **complement** on the other strand

$$
\mathtt{A} \Longleftrightarrow \mathtt{T}
$$

$$
\mathtt{C} \Longleftrightarrow \mathtt{G}
$$

![dna](http://pages.uoregon.edu/conery/CIS211/Inheritance/chromosome_lg.jpg)

In bioinformatics, we represent strands of DNA by strings containing just four letters
* `A` = adenine
* `C` = cytosine
* `G` = guanine
* `T` = thymine

![dna](http://pages.uoregon.edu/conery/CIS211/Inheritance/transcription_lg.jpg)

A **gene** is a short segment of DNA
* the part of the chromosome that contains a gene is a **coding sequence**

A complex process call **transcription** copies the information in a coding sequence
* the transcripts are used by the cell to manufacture proteins used throughout the body

###  <span style="color:teal">Part 1: &nbsp; DNA Class</span>

Define a class named DNA that will represent a DNA sequence.  Each DNA object should have two attributes:  a name and the sequence letters, both of which will be passed to the constructor.

The class should have two "getters" called `name` and `chars` that return the sequence name and sequence letters.  The representation string should show the sequence using a format called FASTA:  a greater-than sign and the name on the first line, and the sequence characters on the second line:
```
>>> s = DNA('tiny sequence', 'ACGTTGCA'); print(s)
```
should produce
```
> tiny sequence
ACGTTGCA
```

There should be a method named `splice` that removes part of the sequence. A call of the form
```
>>> s.splice(i,j)
```
should remove the bases at locations `i` through `j`-1 from `s`.

Finally, write a method named `revcomp` that creates the "reverse complement" of a sequence.  If we call
```
>>> s.revcomp()
```
we should get back a new DNA object.  The name of the new object will be the name of `s` but with `_rc` appended.  To make the sequence characters in the new object reverse the order of the characters in `s` and replace each character with its complement.  For example, the reverse complement of `AATC` is `GATT`.

**Extra Credit** &nbsp; Give DNA sequences many of the same operations defined for lists and strings.  For example, if `s` is a DNA object, users should be able to
* access individual letters using the index operator, _e.g._ `s[0]`, `s[-1]`
* get a subsequence using a slice operator:  `s[i:j]` (which should return a string)
* add characters using `s.insert` or `s.append`
* delete characters using `del` or `s.remove`

##### <span style="color:red">Code:</span>

In [None]:
class DNA(list):
    # YOUR CODE HERE
    raise NotImplementedError()
    

##### <span style="color:red">Tests:</span>

Use the following code cell as a "sandbox" if you want to do your own tests. You can add additional cells here if you want.

##### <span style="color:red">Autograder Test Cells:</span>

In [None]:
s1 = DNA('test1', 'GATTACA')
assert isinstance(s1,DNA)
assert s1.name() == 'test1'
assert s1.chars() == 'GATTACA'
assert repr(s1) == '> test1\nGATTACA'

In [None]:
s2 = DNA('test2', 'AAAGGGGAAA')
s2.splice(3,7)
assert len(s2.chars()) == 6

In [None]:
s3 = DNA('test3', 'ACCGGGTTTT')
s4 = s3.revcomp()
assert isinstance(s4,DNA)
assert s4.name() == 'test3_rc'
assert s4.chars() == 'AAAACCCGGT'

In [None]:
s4 = DNA('x1', 'TACTGCCTAGT')
assert len(s4) == 11
assert s4[0] == 'T'
assert s4[4:7] == 'GCC'

In [None]:
s5 = DNA('x2', 'AAA')
s5.append('TTT')
assert s5.chars() == 'AAATTT'
s5.insert(3,'CCC')
assert s5.chars() == 'AAACCCTTT'

###  <span style="color:teal">Part 2: &nbsp; CDS Class</span>

Define a new class called CDS (for "coding sequence").  A coding sequence is a strand of DNA the carries codes that will be translated into protein sequences.  These codes are 3-base sequences called **codons**.

CDS sequences will be created just like DNA sequences, by passing a name and sequence letters to the constructor.  CDS objects should support all the operations defined for DNA (`name`, `chars`, `splice`, and `revcomp`).

The class should also have a method named `codons`.  If `s` is a CDS, a call to `s.codons()` should return the list of 3-letter substrings of `s`.  If the length of the coding sequence is not divisible by 3 just return the leftover characters in the last substring.  Example:
```
>>> c1 = CDS('cds1', 'GATTACA')
>>> c1.codons()
['GAT', 'TAC', 'A']
```

##### <span style="color:red">Code:</span>

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

##### <span style="color:red">Tests:</span>

Use the following code cell as a "sandbox" if you want to do your own tests. You can add additional cells here if you want.

##### <span style="color:red">Autograder Test Cells:</span>

In [None]:
c1 = CDS('cds1', 'AACCGGTT')
assert isinstance(c1,CDS)
assert isinstance(c1,DNA)
assert c1.name() == 'cds1'
assert c1.chars() == 'AACCGGTT'

In [None]:
c2 = CDS('cds2', 'ATGACGTAA')
assert c2.codons() == ['ATG', 'ACG', 'TAA']

In [None]:
c3 = CDS('cds3', 'GATTACA')
assert c3.codons() == ['GAT', 'TAC', 'A']