In [23]:
import numpy as np
import string
import random

**1**. (25 points)

The following iterative sequence is defined for the set of positive integers:

- n → n/2 (n is even)
- n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

In [7]:
# Define recursive function to count length of sequence, given a starting number
def iter_sequence(start_num, chain_len = 1):
    if start_num == 1:
        return(chain_len)
    if start_num % 2 == 0:
        start_num /= 2
    else:
        start_num = 3*start_num + 1
    chain_len += 1
    return(iter_sequence(start_num, chain_len))

In [21]:
# Initialize result variables
largest_start_num = 1 # Starting number for sequence with longest chain
longest_chain_len = 1 # Number of terms in longest sequence

# Iterate through all integers less than one million
for num in range(1, 1000000):
    chain_len = iter_sequence(num)
    # If this sequence is the longest, update the records
    if chain_len > longest_chain_len:
        longest_chain_len = chain_len
        largest_start_num = num

# Display result
print(largest_start_num)

837799
837799
837799
837799
837799
837799
837799
837799
55.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
# Initialize result variables
largest_start_num = 1 # Starting number for sequence with longest chain
longest_chain_len = 1 # Number of terms in longest sequence
storage = {}

# Iterate through all integers less than one million
for num in range(1, 1000000):
    # Reset chain length, and preserve original value of "num" before it is altered
    chain_len = 0
    storage_value = num
    while num > 1:
        if num % 2 == 0:
            num /= 2
        else:
            num = 3*num + 1
        chain_len += 1
        if num in storage:
            chain_len += storage[num]
            break
    storage[storage_value] = chain_len
    if chain_len > longest_chain_len:
        longest_chain_len = chain_len
        largest_start_num = storage_value

# Display result
print(largest_start_num)

837799
837799
837799
837799
837799
837799
837799
837799
2.19 s ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
# Initialize result variables
largest_start_num = 1 # Starting number for sequence with longest chain
longest_chain_len = 1 # Number of terms in longest sequence

# Iterate through all integers less than one million
for num in range(1, 1000000):
    # Reset chain length, and preserve original value of "num" before it is altered
    chain_len = 0
    storage_value = num
    # Perform sequence rules until 1 is reached
    while num > 1:
        if num % 2 == 0:
            num /= 2
        else:
            num = 3*num + 1
        chain_len += 1
    # If this sequence is the longest, update the records
    if chain_len > longest_chain_len:
        longest_chain_len = chain_len
        largest_start_num = storage_value

# Display result
print(largest_start_num)

837799
837799
837799
837799
837799
837799
837799
837799
32.1 s ± 39.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
### Consider making code more efficient using a dictionary to store results that we've already seen
### Look for other areas in homework to increase efficiency and/or refine comments

**2** (25 points)


- Perform the median polish to calculate just the *residuals* for this [example](https://mgimond.github.io/ES218/Week11a.html) in Python. 
- Use the matrix `xs` provided
- Display the final result after 3 iterations to 1 decimal place and check if it agrees with 

![img](https://mgimond.github.io/ES218/img/twoway_09.jpg)

In [25]:
xs = np.array([
    (25.3,32.1,38.8,25.4), 
    (25.3,29,31,21.1),
    (18.2,18.8,19.3,20.3),
    (18.3,24.3,15.7,24),
    (16.3,19,16.8,17.5)
]).T

In [26]:
# Initialize row_median and column median arrays
row_medians = np.zeros(xs.shape[0] + 1)
col_medians = np.zeros(xs.shape[1] + 1)

# Calculate initial global median and residual table
global_median = np.median(xs)
xs_res = xs - global_median

# Run three iterations of median polish
for it in range(1, 4):
    
    # Calculate row medians, and subtract as instructed
    med_col_medians = np.median(col_medians)
    row_medians = np.median(xs_res, axis = 1)
    global_median += med_col_medians
    col_medians -= med_col_medians
    xs_res -= np.array([row_medians]).reshape(-1, 1)
    
    # Calculate column medians, and subtract as instructed
    med_row_medians = np.median(row_medians)
    col_medians = np.median(xs_res, axis = 0).reshape(1, -1)
    global_median += med_row_medians
    row_medians -= med_row_medians
    xs_res -= col_medians

# Print resulting residual matrix rounded to one decimal point
print(xs_res.round(decimals = 1))

[[-1.4  0.2  0.  -1.   0.7]
 [ 1.4 -0.2 -3.4  1.  -0.7]
 [11.   4.7 -0.  -4.7  0. ]
 [-3.1 -5.9  0.3  2.9  0. ]]


**3**. (50 points)

A Caesar cipher is a very simple method of encoding and decoding data. The cipher simply replaces characters with the character offset by $k$ places. For example, if the offset is 3, we replace `a` with `d`, `b` with `e` etc. The cipher wraps around so we replace `y` with `b`, `z` with `c` and so on. Punctuation, spaces and numbers are left unchanged.

- Write a function `encode` that takes as arguments a string and an integer offset and returns the encoded cipher.
- Write a function `decode` that takes as arguments a cipher and an integer offset and returns the decoded string. 
- Write a function `auto_decode` that takes as argument a cipher and uses a statistical method to guess the optimal offset to decode the cipher, assuming the original string is in English which has the following letter frequency:

```python
freq = {
 'a': 0.08167,
 'b': 0.01492,
 'c': 0.02782,
 'd': 0.04253,
 'e': 0.12702,
 'f': 0.02228,
 'g': 0.02015,
 'h': 0.06094,
 'i': 0.06966,
 'j': 0.00153,
 'k': 0.00772,
 'l': 0.04025,
 'm': 0.02406,
 'n': 0.06749,
 'o': 0.07507,
 'p': 0.01929,
 'q': 0.00095,
 'r': 0.05987,
 's': 0.06327,
 't': 0.09056,
 'u': 0.02758,
 'v': 0.00978,
 'w': 0.0236,
 'x': 0.0015,
 'y': 0.01974,
 'z': 0.00074
}
```

- Encode the following nursery rhyme using a random offset from 10 to 20, then recover the original using `auto_decode`:

```text
Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane.
```

In [2]:
# Input nursery rhyme above as a string
rhyme = """Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane.
"""

In [22]:
# Input provided dictionary of letter frequency for English language
expected_freq = {
 'a': 0.08167,
 'b': 0.01492,
 'c': 0.02782,
 'd': 0.04253,
 'e': 0.12702,
 'f': 0.02228,
 'g': 0.02015,
 'h': 0.06094,
 'i': 0.06966,
 'j': 0.00153,
 'k': 0.00772,
 'l': 0.04025,
 'm': 0.02406,
 'n': 0.06749,
 'o': 0.07507,
 'p': 0.01929,
 'q': 0.00095,
 'r': 0.05987,
 's': 0.06327,
 't': 0.09056,
 'u': 0.02758,
 'v': 0.00978,
 'w': 0.0236,
 'x': 0.0015,
 'y': 0.01974,
 'z': 0.00074
}

In [3]:
# Determine numerical bounds of `ord` function for lower and upper case letters
lwr_bound_upr_case = ord('A')
upr_bound_upr_case = ord('Z')
lwr_bound_lwr_case = ord('a')
upr_bound_lwr_case = ord('z')

# Store number of letters in alphabet
num_letters = len(string.ascii_lowercase)

In [4]:
# Create encode function
def encode(message, offset):
    """Takes a message (as a string) and an offset (as an integer)
    
    and returns an encoded version of the message (a cipher).
    """
    
    # Check for bad inputs
    if not type(message) == str:
        return("Please enter a string for the message argument")
    if not type(offset) == int:
        return("Please enter an integer for the offset argument")
    
    # Initialize output string
    cipher = ""
    
    # Flag if offset is negative
    if offset < 0:
        neg_flag = -1
    else:
        neg_flag = 1
    
    # Loop through message and only encode letters
    for char in message:
        unadj_ord = offset + ord(char)
        # For each encoding, check to see if cipher needs to wrap
        if char in string.ascii_lowercase:
            if unadj_ord in range(lwr_bound_lwr_case, upr_bound_lwr_case + 1):
                cipher += chr(unadj_ord)
            else:
                cipher += chr(unadj_ord - neg_flag * num_letters)
        elif char in string.ascii_uppercase:
            if unadj_ord in range(lwr_bound_upr_case, upr_bound_upr_case + 1):
                cipher += chr(unadj_ord)
            else:
                cipher += chr(unadj_ord - neg_flag * num_letters)
        else:
            cipher += char
            
    # Return encoded message
    return(cipher)

In [5]:
# Create decode function
def decode(cipher, offset):
    """Takes an encoded cipher (as a string) and an offset (as an integer)
    
    and returns a decoded message.
    """
    
    # Check for bad inputs
    if not type(cipher) == str:
        return("Please enter a string for the cipher argument")
    if not type(offset) == int:
        return("Please enter an integer for the offset argument")
    
    # Return decoded message
    return(encode(cipher, -offset))

In [44]:
# Create auto-decode function
def auto_decode(cipher):
    """Takes an encoded cipher (as a string) and attempts to
    
    guess the correct offset in order to return the decoded message.
    """
    
    # Check for bad input
    if not type(cipher) == str:
        return("Please enter a string for the cipher argument")
    
    # Initialize smallest SSE and predicted offset
    min_SSE = 10000
    correct_offset = 0
    
    
    # Loop through all potential offsets to see which gives min SSE
    for offset in range(26):
        decoded_message = decode(cipher, offset)
        
        # Count frequency of each letter in decoded message
        letter_counts = {letter: 0 for letter in string.ascii_lowercase}
        for letter in letter_counts:
            letter_counts[letter] = decoded_message.lower().count(letter)
        total_letters = sum(letter_counts.values())
        letter_counts = {k: v / total_letters for k, v in letter_counts.items()}
        
        # Calculate SSE and compare with current minimum
        SSE = 0
        for key in expected_freq:
            SSE += (letter_counts[key] - expected_freq[key]) ** 2
        if SSE < min_SSE:
            min_SSE = SSE
            correct_offset = offset
    
    # Decode cipher using correct offset
    return(decode(cipher, correct_offset))

In [63]:
# Generate random offset from 10 to 20
random_offset = np.random.randint(10, 21)

# Encode nursery rhyme
encoded_rhyme = encode(rhyme, random_offset)

# Auto-decode
print(auto_decode(encoded_rhyme))

Baa, baa, black sheep,
Have you any wool?
Yes, sir, yes, sir,
Three bags full;
One for the master,
And one for the dame,
And one for the little boy
Who lives down the lane.

