# Problem 2: DNA Sequence Analysis

This problem is about strings and regular expressions. It has two (2) exercises worth nine (9) points.

In [None]:
import re # You'll need this module

## DNA Sequence Analysis

Your friend is a biologist who is studying a particular DNA sequence. The sequence is a string built from an alphabet of four possible letters, `A`, `G`, `C`, and `T`. Biologists refer to each of these letters a _base_.

Here is an example of a DNA fragment as a string of bases.

In [None]:
dna_seq = 'ATGGCAATAACCCCCCGTTTCTACTTCTAGAGGAGAAAAGTATTGACATGAGCGCTCCCGGCACAAGGGCCAAAGAAGTCTCCAATTTCTTATTTCCGAATGACATGCGTCTCCTTGCGGGTAAATCACCGACCGCAATTCATAGAAGCCTGGGGGAACAGATAGGTCTAATTAGCTTAAGAGAGTAAATCCTGGGATCATTCAGTAGTAACCATAAACTTACGCTGGGGCTTCTTCGGCGGATTTTTACAGTTACCAACCAGGAGATTTGAAGTAAATCAGTTGAGGATTTAGCCGCGCTATCCGGTAATCTCCAAATTAAAACATACCGTTCCATGAAGGCTAGAATTACTTACCGGCCTTTTCCATGCCTGCGCTATACCCCCCCACTCTCCCGCTTATCCGTCCGAGCGGAGGCAGTGCGATCCTCCGTTAAGATATTCTTACGTGTGACGTAGCTATGTATTTTGCAGAGCTGGCGAACGCGTTGAACACTTCACAGATGGTAGGGATTCGGGTAAAGGGCGTATAATTGGGGACTAACATAGGCGTAGACTACGATGGCGCCAACTCAATCGCAGCTCGAGCGCCCTGAATAACGTACTCATCTCAACTCATTCTCGGCAATCTACCGAGCGACTCGATTATCAACGGCTGTCTAGCAGTTCTAATCTTTTGCCAGCATCGTAATAGCCTCCAAGAGATTGATGATAGCTATCGGCACAGAACTGAGACGGCGCCGATGGATAGCGGACTTTCGGTCAACCACAATTCCCCACGGGACAGGTCCTGCGGTGCGCATCACTCTGAATGTACAAGCAACCCAAGTGGGCCGAGCCTGGACTCAGCTGGTTCCTGCGTGAGCTCGAGACTCGGGATGACAGCTCTTTAAACATAGAGCGGGGGCGTCGAACGGTCGAGAAAGTCATAGTACCTCGGGTACCAACTTACTCAGGTTATTGCTTGAAGCTGTACTATTTTAGGGGGGGAGCGCTGAAGGTCTCTTCTTCTCATGACTGAACTCGCGAGGGTCGTGAAGTCGGTTCCTTCAATGGTTAAAAAACAAAGGCTTACTGTGCGCAGAGGAACGCCCATCTAGCGGCTGGCGTCTTGAATGCTCGGTCCCCTTTGTCATTCCGGATTAATCCATTTCCCTCATTCACGAGCTTGCGAAGTCTACATTGGTATATGAATGCGACCTAGAAGAGGGCGCTTAAAATTGGCAGTGGTTGATGCTCTAAACTCCATTTGGTTTACTCGTGCATCACCGCGATAGGCTGACAAAGGTTTAACATTGAATAGCAAGGCACTTCCGGTCTCAATGAACGGCCGGGAAAGGTACGCGCGCGGTATGGGAGGATCAAGGGGCCAATAGAGAGGCTCCTCTCTCACTCGCTAGGAGGCAAATGTAAAACAATGGTTACTGCATCGATACATAAAACATGTCCATCGGTTGCCCAAAGTGTTAAGTGTCTATCACCCCTAGGGCCGTTTCCCGCATATAAACGCCAGGTTGTATCCGCATTTGATGCTACCGTGGATGAGTCTGCGTCGAGCGCGCCGCACGAATGTTGCAATGTATTGCATGAGTAGGGTTGACTAAGAGCCGTTAGATGCGTCGCTGTACTAATAGTTGTCGACAGACCGTCGAGATTAGAAAATGGTACCAGCATTTTCGGAGGTTCTCTAACTAGTATGGATTGCGGTGTCTTCACTGTGCTGCGGCTACCCATCGCCTGAAATCCAGCTGGTGTCAAGCCATCCCCTCTCCGGGACGCCGCATGTAGTGAAACATATACGTTGCACGGGTTCACCGCGGTCCGTTCTGAGTCGACCAAGGACACAATCGAGCTCCGATCCGTACCCTCGACAAACTTGTACCCGACCCCCGGAGCTTGCCAGCTCCTCGGGTATCATGGAGCCTGTGGTTCATCGCGTCCGATATCAAACTTCGTCATGATAAAGTCCCCCCCTCGGGAGTACCAGAGAAGATGACTACTGAGTTGTGCGAT'
print("=== Sequence (Number of bases: {}) ===\n\n{}".format(len(dna_seq), dna_seq))

In this problem, you will help your friend analyze this sequence.

**Enzyme "scissors."** Your friend is interested in what will happen to the sequence if she uses certain "restriction enzymes" to cut it. The enzymes work by scanning the DNA sequence from left to right for a particular pattern. It then cuts the DNA wherever it finds a match.

**A biologist's notation.** Your friend does not know about regular expressions. Instead, she uses a [special notation](https://en.wikipedia.org/wiki/Nucleic_acid_sequence) that other biologists use to describe base patterns. These are "extra letters" that have a special meaning.

For example, the special letter `N` denotes any base, i.e., any single occurrence of an `A`, `C`, `G`, or `T`. Therefore, when a biologist writes, `ANT`, that means `AAT`, `ACT`, `AGT`, or `ATT`.

Here is the complete set of special letters:

* `R`: Either `G` or `A`
* `Y`: Either `T` or `C`
* `K`: Either `G` or `T`
* `M`: Either `A` or `C`
* `S`: Either `G` or `C`
* `W`: Either `A` or `T`
* `B`: Anything but `A` (i.e., `G`, `T`, or `C`)
* `D`: Anything but `C`
* `H`: Anything but `G`
* `V`: Anything but `T`
* `N`: Anything, i.e., `A`, `C`, `G`, or `T`

**Exercise 0** (4 points). Given a string in the biologist's notation, complete the function `bio_to_regex(pattern_bio)` so that it returns an equivalent pattern in Python's regular expression language.

If your function is correct, then the following code would also work:

```python
  assert re.search(bio_to_regex('ANT'), 'AGATTA') is not None
```

That's because `ANT` matches `ATT`, which is contained in `AGATTA`.

In [None]:
def bio_to_regex(pattern_bio):
    ### BEGIN SOLUTION
    # A handy conversion table, to map bio letters to regex subpatterns:
    translation_table = {'R': '[AG]', 'Y': '[TC]', 'K': '[GT]', 'M': '[AC]', 'S': '[GC]', 'W': '[AT]',
                         'B': '[^A]', 'D': '[^C]', 'H': '[^G]', 'V': '[^T]', 'N': '.'}

    # Here is the most compact solution we came up with:
    translator = str.maketrans(translation_table)
    return pattern_bio.translate(translator)

    # However, the following loop-based code would also work:
    pattern_regex = ''
    for c in pattern_bio:
        if c in translation_table:
            pattern_regex += translation_table[c]
        else:
            pattern_regex += c
    return pattern_regex
    ### END SOLUTION

answer_bio_to_regex = bio_to_regex('ANT')
display(answer_bio_to_regex)
assert re.search(bio_to_regex('ANT'), 'AGATTA') is not None

In [None]:
# Test cell: `exercise_0_test_0`

assert re.search(bio_to_regex('ANT'), 'AGATTA') is not None
assert set(re.findall(bio_to_regex('ANTAAT'), dna_seq)) == {'ATTAAT', 'ACTAAT'}
assert set(re.findall(bio_to_regex('GCRWTG'), dna_seq)) == {'GCGTTG', 'GCAATG'}
assert len(re.findall(bio_to_regex('CDCHA'), dna_seq)) == 18

print("\n(Passed first group of tests!)")

In [None]:
# Test cell: `exercise_0_test_1`
if False:
    for c in {'Y', 'K', 'M', 'S', 'B', 'D', 'V'}:
        from random import sample
        x = ''.join([sample('ACGT', 1)[0] for _ in range(2)])
        y = ''.join([sample('ACGT', 1)[0] for _ in range(2)])
        pattern = '{}{}{}'.format(x, c, y)
        ans = set(re.findall(bio_to_regex(pattern), dna_seq))
        print("assert set(re.findall(bio_to_regex('{}'), dna_seq)) == {}".format(pattern, ans))

assert set(re.findall(bio_to_regex('GABAT'), dna_seq)) == {'GACAT', 'GAGAT', 'GATAT'}
assert set(re.findall(bio_to_regex('GAVCA'), dna_seq)) == {'GACCA', 'GAACA'}
assert set(re.findall(bio_to_regex('TGYGG'), dna_seq)) == {'TGTGG', 'TGCGG'}
assert set(re.findall(bio_to_regex('GCKAA'), dna_seq)) == {'GCGAA'}
assert set(re.findall(bio_to_regex('ATSCA'), dna_seq)) == {'ATCCA'}
assert set(re.findall(bio_to_regex('GCMTT'), dna_seq)) == {'GCCTT', 'GCATT'}
assert set(re.findall(bio_to_regex('AGDCC'), dna_seq)) == {'AGTCC', 'AGACC'}

print("\n(Passed second set of tests!)")

**Restriction sites.** When an enzyme cuts the string, it does it in a certain location with respect to the target pattern. This information is encoded as a _restriction site_.

The way a biologist specifies the restriction site is with a special notation that embeds the cut in the pattern. For example, there is one enzyme that has a restriction site of the form, `ANT|AAT`, where the vertical bar, `'|'`, shows where the enzyme will split the sequence. So, if the input DNA sequence were

```
   GCATAGTAATGTATTAATGGC
```

then there would two matches:

```
   GCATAGTAATGTATTAATGGC
       ^^^^^^  ^^^^^^
       match!  match!
```

Furthermore, there would be two cuts, since this enzyme splits its pattern in the middle (between `ANT` and `AAT`):

```
   GCATAGT|AATGTATT|AATGGC
       ^^^ ^^^  ^^^ ^^^
```

That would result in three fragments: `GCATAGT`, `AATGTATT`, and `AATGGC`.

**Exercise 1** (5 points). Complete the function, `sim_cuts(site_pattern, s)`, below. 

The first argument, `site_pattern`, is the biologist's restriction site pattern, e.g., `ANT|AAT`, where there may be an embedded cut. 

The second argument, `s`, is the DNA sequence to cut. The function should return the fragments in the sequence order.

For the preceding example,

```python
  sim_cuts('ANT|AAT', 'GCATAGTAATGTATTAATGGC') == ['GCATAGT', 'AATGTATT', 'AATGGC']
```

> **Note.** There are *two* test cells, below. Both must pass for full credit, but if only one passes, you'll at least get some partial credit.

In [None]:
def sim_cuts(site_pattern, s):

    """
    We want to use re.sub to find the "site_pattern" in s and return
        the text on either side of the site_pattern.
    So what we need to do is put together the components that re.sub will
        need, to return the result.
    The components for re.sub are:
    1. The pattern that it will look for, which we call regex_pattern. This is 
        the regex version of what the site_pattern is. So the site_pattern
        translated into something that regex can use. We already wrote a function
        above to do this, so we will use it here also. Note that for the example code,
        the first "sub-pattern" passed in is 'ANT', and we see from the definitions
        above that 'N' can be any letter, so the regex pattern will substitute
        "." in for it.
    2. The backreferences (2nd component for re.sub). See the note below
        for a detailed explanation of backreferences.
    3. The actual text we are going to search.

    Because we don't know what will be passed in, we cannot hard code what 
        we will pass to re.sub. 

    So what we need to do is put together each portion of what we will pass
        to re.sub, using the variables passed into the function.
    
    ********************************************
    The site_pattern passed in are the way that we want to split s (item 1 above).
    
    So what we want to do is take each "component" of the site_pattern
        and convert it into what regex needs passed into it, to detect that pattern.

    site_pattern is divided up by the | character, so what we need to do is
        split the string on the pipe and use bio_to_regex to return the regex
        equivalent of each portion of site_pattern. This will be the regex_pattern.

    The regex pattern defines the consecutive text groups that re.sub will look for.
        In the example below, it will look for "A.T", followed by "AAT" (converting
        (ANT|AAT) to its corresponding regex pattern.
    *********************************************
    When re.sub finds this pattern, we need to tell it what to do. This is the
        variable repl_pattern.

    The exercise requirement tells us that we want to split the string "s" between the 
        two elements of the repl_pattern found, put the left of the split in a list, then
        continue on with the remainder of the string, finding the pattern and splitting
        the string.

    The way we do this, using re.sub, is to find the pattern and add a pipe character ("|") 
        between the two groups, then continue on with the same behavior to the end of 
        the string "s". We use the backreferences to tell regex to do this. 

    Our final step is to split the string on "|", creating a list of the component strings
        on either side of each pipe.
    # ***********************************************
    """
    ### BEGIN SOLUTION
    # ************************************
    # split s into its component parts, on |
    cut_parts = site_pattern.split('|')
    print(cut_parts)

    # create list of the cut_parts, with each list element a string with parenthesis as first and last in the string
    pure_bio_parts = ['(' + p + ')' for p in cut_parts]
    # print(pure_bio_parts)

    # join the pure bio_parts_into a single string
    pure_bio_pattern = ''.join(pure_bio_parts)
    # print(pure_bio_pattern)

    # what is(are) the regex version of the pattern(s) we want to look for, and that we will use in our final regex?
    regex_pattern = bio_to_regex(pure_bio_pattern)
    print(regex_pattern)
    # ***********************************
    
    # create the capturing groups. These are based on the length of the cut_parts variable
    repl_parts = [r'\{}'.format(i+1) for i in range(len(cut_parts))]
    print(repl_parts)
    
    # format the repl_parts into a pattern that can be passed to regex
    repl_pattern = '|'.join(repl_parts)
    print(repl_pattern)
    
    # now pass it all to re.sub, to put the | between each detected 
    # component of the patterns
    s_with_cuts = re.sub(regex_pattern, repl_pattern, s)
    print(s_with_cuts)
   
    # format the result into a list of strings, as required by the exercise
    return s_with_cuts.split('|')
    
    # **************************************
    pass
    ### END SOLUTION

answer_sim_cuts = sim_cuts('ANT|AAT', 'GCATAGTAATGTATTAATGGC')
display(answer_sim_cuts)

#### Explanation of the variables in our code

1. In Python regular expressions, \1, \2, etc., are backreferences. They refer to the text matched by the corresponding capturing group in the regex pattern.

2. Capturing Groups -- This is the variable regex_pattern in our code.

       --Capturing groups are created using parentheses ( ) in a regular expression. 

       --The text matched by each group can be referenced later in the pattern itself or in a replacement string.

       --Groups are numbered from left to right, starting from 1.

3. Backreferences

       --\1 refers to the text captured by the first group.

       --\2 refers to the text captured by the second group, and so on.

In [None]:
# Test cell: `exercise_1_test_0`

def check_sim_cuts(bio_pattern, s, true_cuts):
    print("\nChecking: '{}'...".format(bio_pattern))
    your_cuts = sim_cuts(bio_pattern, s)
    print("   Your result ({} fragments): {}".format(len(your_cuts), your_cuts))
    print("   True result ({}): {}".format(len(true_cuts), true_cuts))
    assert your_cuts == true_cuts, "Did not match!"
    print("   ==> Matched!")

# Check a simple case:
check_sim_cuts('ANT|AAT', 'GCATAGTAATGTATTAATGGC', ['GCATAGT', 'AATGTATT', 'AATGGC'])

print("\n(Passed first test of Exercise 1; two more to go in the next cell.)")

In [None]:
# Test cell: `exercise_1_test_1`

check_sim_cuts('ANT|AAT', dna_seq, ['ATGGCAATAACCCCCCGTTTCTACTTCTAGAGGAGAAAAGTATTGACATGAGCGCTCCCGGCACAAGGGCCAAAGAAGTCTCCAATTTCTTATTTCCGAATGACATGCGTCTCCTTGCGGGTAAATCACCGACCGCAATTCATAGAAGCCTGGGGGAACAGATAGGTCTAATTAGCTTAAGAGAGTAAATCCTGGGATCATTCAGTAGTAACCATAAACTTACGCTGGGGCTTCTTCGGCGGATTTTTACAGTTACCAACCAGGAGATTTGAAGTAAATCAGTTGAGGATTTAGCCGCGCTATCCGGTAATCTCCAAATTAAAACATACCGTTCCATGAAGGCTAGAATTACTTACCGGCCTTTTCCATGCCTGCGCTATACCCCCCCACTCTCCCGCTTATCCGTCCGAGCGGAGGCAGTGCGATCCTCCGTTAAGATATTCTTACGTGTGACGTAGCTATGTATTTTGCAGAGCTGGCGAACGCGTTGAACACTTCACAGATGGTAGGGATTCGGGTAAAGGGCGTATAATTGGGGACTAACATAGGCGTAGACTACGATGGCGCCAACTCAATCGCAGCTCGAGCGCCCTGAATAACGTACTCATCTCAACTCATTCTCGGCAATCTACCGAGCGACTCGATTATCAACGGCTGTCTAGCAGTTCTAATCTTTTGCCAGCATCGTAATAGCCTCCAAGAGATTGATGATAGCTATCGGCACAGAACTGAGACGGCGCCGATGGATAGCGGACTTTCGGTCAACCACAATTCCCCACGGGACAGGTCCTGCGGTGCGCATCACTCTGAATGTACAAGCAACCCAAGTGGGCCGAGCCTGGACTCAGCTGGTTCCTGCGTGAGCTCGAGACTCGGGATGACAGCTCTTTAAACATAGAGCGGGGGCGTCGAACGGTCGAGAAAGTCATAGTACCTCGGGTACCAACTTACTCAGGTTATTGCTTGAAGCTGTACTATTTTAGGGGGGGAGCGCTGAAGGTCTCTTCTTCTCATGACTGAACTCGCGAGGGTCGTGAAGTCGGTTCCTTCAATGGTTAAAAAACAAAGGCTTACTGTGCGCAGAGGAACGCCCATCTAGCGGCTGGCGTCTTGAATGCTCGGTCCCCTTTGTCATTCCGGATT',
 'AATCCATTTCCCTCATTCACGAGCTTGCGAAGTCTACATTGGTATATGAATGCGACCTAGAAGAGGGCGCTTAAAATTGGCAGTGGTTGATGCTCTAAACTCCATTTGGTTTACTCGTGCATCACCGCGATAGGCTGACAAAGGTTTAACATTGAATAGCAAGGCACTTCCGGTCTCAATGAACGGCCGGGAAAGGTACGCGCGCGGTATGGGAGGATCAAGGGGCCAATAGAGAGGCTCCTCTCTCACTCGCTAGGAGGCAAATGTAAAACAATGGTTACTGCATCGATACATAAAACATGTCCATCGGTTGCCCAAAGTGTTAAGTGTCTATCACCCCTAGGGCCGTTTCCCGCATATAAACGCCAGGTTGTATCCGCATTTGATGCTACCGTGGATGAGTCTGCGTCGAGCGCGCCGCACGAATGTTGCAATGTATTGCATGAGTAGGGTTGACTAAGAGCCGTTAGATGCGTCGCTGTACT',
 'AATAGTTGTCGACAGACCGTCGAGATTAGAAAATGGTACCAGCATTTTCGGAGGTTCTCTAACTAGTATGGATTGCGGTGTCTTCACTGTGCTGCGGCTACCCATCGCCTGAAATCCAGCTGGTGTCAAGCCATCCCCTCTCCGGGACGCCGCATGTAGTGAAACATATACGTTGCACGGGTTCACCGCGGTCCGTTCTGAGTCGACCAAGGACACAATCGAGCTCCGATCCGTACCCTCGACAAACTTGTACCCGACCCCCGGAGCTTGCCAGCTCCTCGGGTATCATGGAGCCTGTGGTTCATCGCGTCCGATATCAAACTTCGTCATGATAAAGTCCCCCCCTCGGGAGTACCAGAGAAGATGACTACTGAGTTGTGCGAT'])
check_sim_cuts('GCRW|TG', dna_seq, ['ATGGCAATAACCCCCCGTTTCTACTTCTAGAGGAGAAAAGTATTGACATGAGCGCTCCCGGCACAAGGGCCAAAGAAGTCTCCAATTTCTTATTTCCGAATGACATGCGTCTCCTTGCGGGTAAATCACCGACCGCAATTCATAGAAGCCTGGGGGAACAGATAGGTCTAATTAGCTTAAGAGAGTAAATCCTGGGATCATTCAGTAGTAACCATAAACTTACGCTGGGGCTTCTTCGGCGGATTTTTACAGTTACCAACCAGGAGATTTGAAGTAAATCAGTTGAGGATTTAGCCGCGCTATCCGGTAATCTCCAAATTAAAACATACCGTTCCATGAAGGCTAGAATTACTTACCGGCCTTTTCCATGCCTGCGCTATACCCCCCCACTCTCCCGCTTATCCGTCCGAGCGGAGGCAGTGCGATCCTCCGTTAAGATATTCTTACGTGTGACGTAGCTATGTATTTTGCAGAGCTGGCGAACGCGT',
 'TGAACACTTCACAGATGGTAGGGATTCGGGTAAAGGGCGTATAATTGGGGACTAACATAGGCGTAGACTACGATGGCGCCAACTCAATCGCAGCTCGAGCGCCCTGAATAACGTACTCATCTCAACTCATTCTCGGCAATCTACCGAGCGACTCGATTATCAACGGCTGTCTAGCAGTTCTAATCTTTTGCCAGCATCGTAATAGCCTCCAAGAGATTGATGATAGCTATCGGCACAGAACTGAGACGGCGCCGATGGATAGCGGACTTTCGGTCAACCACAATTCCCCACGGGACAGGTCCTGCGGTGCGCATCACTCTGAATGTACAAGCAACCCAAGTGGGCCGAGCCTGGACTCAGCTGGTTCCTGCGTGAGCTCGAGACTCGGGATGACAGCTCTTTAAACATAGAGCGGGGGCGTCGAACGGTCGAGAAAGTCATAGTACCTCGGGTACCAACTTACTCAGGTTATTGCTTGAAGCTGTACTATTTTAGGGGGGGAGCGCTGAAGGTCTCTTCTTCTCATGACTGAACTCGCGAGGGTCGTGAAGTCGGTTCCTTCAATGGTTAAAAAACAAAGGCTTACTGTGCGCAGAGGAACGCCCATCTAGCGGCTGGCGTCTTGAATGCTCGGTCCCCTTTGTCATTCCGGATTAATCCATTTCCCTCATTCACGAGCTTGCGAAGTCTACATTGGTATATGAATGCGACCTAGAAGAGGGCGCTTAAAATTGGCAGTGGTTGATGCTCTAAACTCCATTTGGTTTACTCGTGCATCACCGCGATAGGCTGACAAAGGTTTAACATTGAATAGCAAGGCACTTCCGGTCTCAATGAACGGCCGGGAAAGGTACGCGCGCGGTATGGGAGGATCAAGGGGCCAATAGAGAGGCTCCTCTCTCACTCGCTAGGAGGCAAATGTAAAACAATGGTTACTGCATCGATACATAAAACATGTCCATCGGTTGCCCAAAGTGTTAAGTGTCTATCACCCCTAGGGCCGTTTCCCGCATATAAACGCCAGGTTGTATCCGCATTTGATGCTACCGTGGATGAGTCTGCGTCGAGCGCGCCGCACGAATGTTGCAA',
 'TGTATTGCATGAGTAGGGTTGACTAAGAGCCGTTAGATGCGTCGCTGTACTAATAGTTGTCGACAGACCGTCGAGATTAGAAAATGGTACCAGCATTTTCGGAGGTTCTCTAACTAGTATGGATTGCGGTGTCTTCACTGTGCTGCGGCTACCCATCGCCTGAAATCCAGCTGGTGTCAAGCCATCCCCTCTCCGGGACGCCGCATGTAGTGAAACATATACGTTGCACGGGTTCACCGCGGTCCGTTCTGAGTCGACCAAGGACACAATCGAGCTCCGATCCGTACCCTCGACAAACTTGTACCCGACCCCCGGAGCTTGCCAGCTCCTCGGGTATCATGGAGCCTGTGGTTCATCGCGTCCGATATCAAACTTCGTCATGATAAAGTCCCCCCCTCGGGAGTACCAGAGAAGATGACTACTGAGTTGTGCGAT'])

print("\n(Passed second tests of Exercise 1!)")

**Fin!** If you've reached this point and all tests above pass, your biologist friend thanks you and you are ready to submit your solution to this problem. Don't forget to save you work prior to submitting.

Portions of this problem were inspired by a fun book called [Python for Biologists](https://pythonforbiologists.com/python-books).