# Midterm exam 1

There are 4 questions, each worth 25 points. Write Python code to solve each question.

Points will be deducted for 

- Functions or classes without `docstrings`
- Grossly inefficient or redundant code
- Excessively verbose code
- Use of *magic* numbers

Partial credit may be given for incomplete or wrong answers but not if you do not attempt the question.

You should only have this notebook tab open during the exam and stay on the same notebook throughout. You may use built-in help, accessed via `?foo`, `foo?` or `help(foo)`.

**IMPORTANT**

- This is a **closed book** exam meant to evaluate fluency in Python
- Use a stopwatch to record the number of minutes you took to complete the exam in the cell below **honestly**. 1 point will be deducted for every 2 minutes beyond 75 minutes. So if you take 90 minutes to complete the exam, 8 points will be deducted.
- Upload the notebook to Sakai when done

**Honor Code**: You agree to follow the Duke Honor code when taking this exam.

**Time taken**

Time: xx mins

**1**. (25 points)

Find the number of times `CATCAT` appears in the file `seq.txt`.

- Count overlapping occurrences - i.e. `CATCATCAT` should count as 2 occurrences.

In [218]:
import numpy as np
from collections import Counter
import re
np.random.seed(123)

In [19]:
# Read in file
with open('seq.txt') as f:
    seq = f.read()

In [222]:
# Initialize counter of 'CATCAT' instances
counter = 0

# Loop through each 6-letter combo
for index, letter in enumerate(seq[:-5]):
    if letter + seq[index + 1: index + 6] == 'CATCAT':
        counter += 1

# Display result        
counter

CPU times: user 32 ms, sys: 8 ms, total: 40 ms
Wall time: 39.4 ms


In [223]:
# Additional option from solutions
len(re.findall('(?=(CATCAT))', seq))

CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 2.58 ms


141

In [224]:
# Additional option from solutions
count = 0
for x in zip(seq, seq[1:], seq[2:], seq[3:], seq[4:], seq[5:]):
    if ''.join(x) == 'CATCAT':
        count += 1
count

CPU times: user 32 ms, sys: 0 ns, total: 32 ms
Wall time: 29.4 ms


**2**. (25 points)

Supposed you had two sets of cards numbered from 1 to 1,000. We define a *match* to occur if the cards in the same position in both decks have the same number. For example, if deck 1 is [1,3,2,4] and deck 2 is [3,4,2,1], there is a single match at position 3 for the card with value 2.

Assuming the cards in each set are randomly shuffled, use 100,000 simulations to estimate

- the expected number of matches (this should be an integer)
- the probability of finding at least one match

Hint: You can use `np.random.permutation`

In [38]:
# Initialize number of simulations array of number of matches
k = 100000
matches = np.zeros(k)

# Perform 100000 simulations
for i in range(k):
    deck1 = np.random.permutation(1000)
    deck2 = np.random.permutation(1000)
    matches[i] = sum(deck1 == deck2)

In [229]:
# Find most common number of matches
print("Most common number of matches: " + str(int(Counter(matches).most_common(1)[0][0])))

# Probability of having at least one match
print("Probability of at least one match: " + str(sum(matches > 0) / len(matches)))

# Find expected number of matches
print("Expected number of matches: " + str(int(round(np.mean(matches)))))

Most common number of matches: 0
Probability of at least one match: 0.62833
Expected number of matches: 1


**3**. (25 points)

One way to find a root (zero) of a function between two points $(a, b)$ is to bisect (find the midpoint $c$) of $(a, b)$, identify if the root is now in $(a, c)$ or $(c, b)$, and repeat the bisection until the function value is sufficiently close to zero.

Write a bisection function with signature `bisect(f, a, b, tol)` and use it to find the square root of 2 given $a=0, b=2$. Stop when the function evaluated at the bisected point is within $10^{-6}$ of 0.

- Hint 1: There is a root between $a$ and $b$ if $f(a)$ and $f(b)$ have opposite signs
- Hint 2: Think about what the function $f$ should be

In [232]:
def bisect(f, a, b, tol):
    """Finds root of given function between points a and b"""
    
    # Find bisected point and check f(c) against tolerance
    c = (a + b) / 2
    if (abs(f(c)) <= tol):
        return c
    
    # Determine which segment contains root
    if (f(a) > 0 and f(c) < 0) or (f(a) < 0 and f(c) > 0):
        return bisect(f, a, c, tol)
    else:
        return bisect(f, c, b, tol)

In [233]:
# Test function to find square root of 2, given a = 0 and b = 2
bisect(f = lambda x: x - np.sqrt(2), a = 0, b = 2, tol = 10 ** -6)

1.4142131805419922

In [230]:
# Another option from solutions
def bisect(f, a, b, tol=1e-6):
    """Bisectin to find roots of f given brackets (a, b)."""
    
    c = (a + b) / 2
    while np.abs(f(c)) > tol:
        if f(a) * f(c) < 0:
            b = c
        else:
            a = c
        c = (a + b) / 2
    return c

In [231]:
bisect(lambda x: 2-x**2, 0, 2)

1.4142136573791504

**4**. (25 points)

In a coin tossing example, you count the number of tosses till one of the following sequence appears

- Seq 1: `HT`
- Seq 2: `HH`

For example, `HTTHH` would be of type Seq 2 with a run length of 5.

Simulate 10,000 coin tossing experiments of the following kind:

- Expt 1: Stop when Seq 1 is observed
- Expt 2: Stop when Seq 2 is observed
- Expt 3: Stop when Seq 1 *or* Seq 2 is observed

Report the average run length of experiments 1, 2 and 3, rounding to the nearest integer.

In [238]:
# Define function that can be used for all three experiments
def coin_exp(target1, target2 = 'NNNNN'):
    """Determine number of coin flips needed to see either of 2 specified sequences"""
    tosses = ''
    while True:
        tosses += np.random.choice(a = ['H', 'T'], replace = True)
        if target1 in tosses or target2 in tosses:
            return(len(tosses))

In [239]:
# Run experiments
result1 = [coin_exp('HT') for i in range(10000)]
result2 = [coin_exp('HH') for i in range(10000)]
result3 = [coin_exp('HH', 'HT') for i in range(10000)]

In [242]:
# Print results
print("Exp 1 avg length: " + str(int(round(np.mean(result1)))))
print("Exp 2 avg length: " + str(int(round(np.mean(result2)))))
print("Exp 3 avg length: " + str(int(round(np.mean(result3)))))

Exp 1 avg length: 4
Exp 2 avg length: 6
Exp 3 avg length: 3
