In [4]:
import numpy as np
import matplotlib.pyplot as plt
import hashlib as hl

# STUDENT ID NUMBER: 2024-10379

# INSTRUCTIONS

1. Put your student ID number (no other identifying features).  
If collaborating, put collaborators' id numbers underneath (again, no identifying features).
1. Submit both a notebook and exported PDF of the notebook after restart-and-run-all.  
    1. Submitted notebook must run in reasonable time.
    1. Ensure the exported pdf is LaTex formatted. 
1. For the written section:
    1. Output should be listed in a separate, typeset PDF with your ID number.
    1. Question numbers are from *The Algorithm Design Manual*, 2nd edition by Steven Skiena.
    1. Incomplete solutions recieve no credit. 
1. For the coding section:
    1. Limit code answers to the designated spaces in functions (marked off by "###") and cells (marked by "### DO: *\<instructions\>*").  
       You may add extra cells to run subroutines (in order), but only marked cells will be graded. 
    1. Do not change the provided cells, or their order.  
       Any detected alterations will zero out that section. 
       
### Tips
- You are encouraged to work either alone or with your official partner, but not more.  
- For the coding section, test and sanity-check your work extensively.  
- For written questions on the coding exercises, if you really understand the solution, the answer should fit in 1-3 sentences.  

# Written [$2(3\times 5) + 5(2\times 4) = 70$ pts]

## Chapter 1
1. Skiena 1-1
1. Skiena 1-2
2. Skiena 1-8
3. Skiena 1-10
4. Skiena 1-11

## Chapter 2

1. Skiena 2-5
1. Skiena 2-13
1. Skiena 2-32
1. Skiena 2-42
1. Skiena 2-46

## Chapter 3

1. Skiena 3-5
1. Skiena 3-12
1. Skiena 3-18
1. Skiena 3-19
1. Skiena 3-29

## Ex. 1

1. Give the theoretical worst-case analysis for the time complexities of each strategy. 
1. Give the theoretical average-case analysis for the time complexities of each strategy. 
1. If you want to minimize average steps, what is the best strategy and why it it objectively optimal?  
Note: the best strategy may be one of the strategies given, or you may need to make your own.
1. How does the optimal strategy change if instead of saying higher or lower, each guess tells you if you are within $\pm N/k$ of the target?  
For simplicity, assume $k$ is an integer greater than 1.

## Ex. 2

1. Explain your choice of hashing function.  
1. In the book, the hashes are constructed to be pseudo-random, while Python dictionary hashes are not random, both for different reasons.  
Is randomness a necessary property of Rabin-Karp hashes?
1. The earliest implementations of Rabin-Karp used the Rabin fingerprint for hashing.  
What is this fingerprint, and what are its advantages?  
1. How would you modify your function to search multiple patterns? 

# Coding [$15 + 15 = 30$ pts]

## Ex. 1 - Guess The Number

Play the following game.

I am thinking of an integer between $1$ and $N$ inclusive. 

Guess what it is, and I will tell you if the true answer is higher or lower.  
Find the number in as few guesses as possible. 

Below I have left functions for three strategies:
1. Guess from lowest to highest ("linear")
2. Guess randomly, and reduce the guessing range if higher or lower ("random")
3. Guess the middle, and reduce the guessing range if higher or lower ("binary")

Fill in the code below. 

In [15]:
### DO: Fill in the code for the 3 guessing strategies. 

def std_in(N, x):
    x = int(x)
    N = int(N)
    if (x<1)|(x>N):
        raise Exception('Picking a number out of range')
    elif (N<1):
        raise Exception('Low range')

    return N, x

def guessingGame(N, x, guess): 
    N, x = std_in(N, x)
    guess = int(guess)
    
    if guess == x:
        print("Correct")
        return 0
    elif guess < x:
        print("Guess {} - Go higher".format(guess))
        return 1
    else:
        print("Guess {} - Go lower".format(guess))
        return -1

def linGuess(N, x): 
    N, x = std_in(N, x)
    steps = 1

    ### YOUR CODE HERE ###
    
    while guessingGame(N, x, steps) != 0:
        steps += 1
    
    ### END ###
    return steps


def randGuess(N, x):
    N, x = std_in(N, x)
    steps = 1
    
    ### YOUR CODE HERE ###

    min, max = 1, N + 1
    while True:
        guess = np.random.randint(min, max)
        result = guessingGame(N, x, guess)
        if result == 0:
            break
        elif result == 1:
            min = guess + 1
        else:
            max = guess
        steps += 1
    
    ### END ###
    return steps


def binGuess(N, x):
    N, x = std_in(N, x)
    steps = 1
    
    ### YOUR CODE HERE ###

    min, max = 1, N
    while True:
        guess = round((min + max) / 2)
        result = guessingGame(N, x, guess)
        if result == 0:
            break
        elif result == 1:
            min = guess
        else:
            max = guess
        steps += 1
    
    ### END ###
    return steps

In [17]:
### DO: Demonstrate the 3 strategies find the number over a representative and convincincing set of N and x

N,x=100,99

print(f"Linear Search: {linGuess(N, x)}")
print(f"Random Guess: {randGuess(N, x)}")
print(f"Binary Search: {binGuess(N, x)}")



Guess 1 - Go higher
Guess 2 - Go higher
Guess 3 - Go higher
Guess 4 - Go higher
Guess 5 - Go higher
Guess 6 - Go higher
Guess 7 - Go higher
Guess 8 - Go higher
Guess 9 - Go higher
Guess 10 - Go higher
Guess 11 - Go higher
Guess 12 - Go higher
Guess 13 - Go higher
Guess 14 - Go higher
Guess 15 - Go higher
Guess 16 - Go higher
Guess 17 - Go higher
Guess 18 - Go higher
Guess 19 - Go higher
Guess 20 - Go higher
Guess 21 - Go higher
Guess 22 - Go higher
Guess 23 - Go higher
Guess 24 - Go higher
Guess 25 - Go higher
Guess 26 - Go higher
Guess 27 - Go higher
Guess 28 - Go higher
Guess 29 - Go higher
Guess 30 - Go higher
Guess 31 - Go higher
Guess 32 - Go higher
Guess 33 - Go higher
Guess 34 - Go higher
Guess 35 - Go higher
Guess 36 - Go higher
Guess 37 - Go higher
Guess 38 - Go higher
Guess 39 - Go higher
Guess 40 - Go higher
Guess 41 - Go higher
Guess 42 - Go higher
Guess 43 - Go higher
Guess 44 - Go higher
Guess 45 - Go higher
Guess 46 - Go higher
Guess 47 - Go higher
Guess 48 - Go higher
G

In [4]:
### DO: Find the worst case for each strategy and plot their performance in log(N) vs. steps. 
### Note: For the random strategy, you may copy the binary strategy's worst case.  



In [5]:
### DO: Plot the (empirical) average performance in log(N) vs. steps for all strategies. 
### Per log(N), plot the sample average with the sample stdev as error bars. 



# Ex. 2 - Rabin-Karp Algorithm

See the lecture notes on Rabin-Karp and string matching.   

In [80]:
### DO: Implement a Rabin-Karp string matcher. 
### You may use either hashlib or your own hashing function. 

def hash(s):
    hashVal = 0
    ### YOUR CODE HERE ###

    for i in range(len(s)):
        hashVal += s[i].encode('windows-1252')[0] * (256 ** (len(s) - i - 1))
    
    ### END ###
    return hashVal

def RK(s1, s2):
    ### YOUR CODE HERE ###

    patternHash = hash(s2)
    windowHash = hash(s1[0:len(s2)])
    matches = []
    if windowHash == patternHash:
        if s1[:len(s2)] == s2:
            matches.append(0)
    for i in range(len(s1) - len(s2)):
        windowHash = windowHash * 256 - hash(s1[i]) * 256 ** len(s2) + hash(s1[i + len(s2)])
        if windowHash == patternHash:
            print(s1[i:i+len(s2)], s2)
            if s1[i+1:i+1+len(s2)] == s2:
                matches.append(i+1)

    return matches
    
    ### END ###
 

In [81]:
print(hash("ab")*256-hash("a")*256**2+hash("c"))
print(hash("a")*256**2)

25187
6356992


In [82]:
### DO: Verify the algorithm works- you need to show it works over a reasonably large set of test cases, including any edge cases.  
### You can use native string methods as a stable reference of correctness.  
### Identifying a reasonable set of test cases is part of the assessment.  

def run_rk_tests():
    test_cases = [
        # (text, pattern, expected_indices)
        ("abcde", "bc", [1]),
        ("aaaaa", "aa", [0, 1, 2, 3]),
        ("abcdef", "gh", []),
        ("abcabcabc", "abc", [0, 3, 6]),
        ("abababab", "aba", [0, 2, 4]),
        ("a", "a", [0]),
        ("a", "b", []),
        ("", "", [0]),  # edge: empty pattern in empty string
        ("abc", "", [0,1,2,3]),  # edge: empty pattern in non-empty string
        ("", "a", []),  # edge: empty text, non-empty pattern
        ("abc", "abc", [0]),
        ("abc", "abcd", []),
        ("zzzzzz", "zz", [0,1,2,3,4]),
        ("abcdefghijklmnopqrstuvwxyz", "z", [25]),
        ("abcdefghijklmnopqrstuvwxyz", "a", [0]),
        ("abcdefghijklmnopqrstuvwxyz", "mno", [12]),
        ("ababababab", "baba", [1,3,5]),
        ("abcabcabcabc", "cab", [2,5,8]),
        ("abcabcabcabc", "abcabc", [0,3,6]),
        ("abcabcabcabc", "abcabcd", []),
        ("aaaaabaaaaa", "ab", [4]),
    ]

    for idx, (text, pattern, expected) in enumerate(test_cases):
        result = RK(text, pattern)
        assert result == expected, f"Test case {idx+1} failed: RK({text!r}, {pattern!r}) = {result}, expected {expected}"
        # Also check against Python's native findall for correctness
        native = [i for i in range(len(text) - len(pattern) + 1) if text[i:i+len(pattern)] == pattern]
        assert result == native, f"Test case {idx+1} mismatch with native: {result} vs {native}"
    print("All Rabin-Karp test cases passed.")

run_rk_tests()


25187
25187 25187
ab bc
0 [1]
25187 c
25444 25187
1 [1]
25444 d
25701 25187
2 [1]
25701 e
24929
24929 24929
aa aa
0 [0, 1]
24929 a
24929 24929
aa aa
1 [0, 1, 2]
24929 a
24929 24929
aa aa
2 [0, 1, 2, 3]
24929 a
26472
25187 26472
0 []
25187 c
25444 26472
1 []
25444 d
25701 26472
2 []
25701 e
25958 26472
3 []
25958 f
6382179
6447969 6382179
0 [0]
6447969 a
6512994 6382179
1 [0]
6512994 b
6382179 6382179
cab abc
2 [0, 3]
6382179 c
6447969 6382179
3 [0, 3]
6447969 a
6512994 6382179
4 [0, 3]
6512994 b
6382179 6382179
cab abc
5 [0, 3, 6]
6382179 c
6382177
6447458 6382177
0 [0]
6447458 b
6382177 6382177
bab aba
1 [0, 2]
6382177 a
6447458 6382177
2 [0, 2]
6447458 b
6382177 6382177
bab aba
3 [0, 2, 4]
6382177 a
6447458 6382177
4 [0, 2, 4]
6447458 b
97
98
0
0
0 0
 
0 [0, 1]
0 a
0 0
 
1 [0, 1, 2]
0 b
0 0
 
2 [0, 1, 2, 3]
0 c
97
6382179
1633837924
31354
31354 31354
zz zz
0 [0, 1]
31354 z
31354 31354
zz zz
1 [0, 1, 2]
31354 z
31354 31354
zz zz
2 [0, 1, 2, 3]
31354 z
31354 31354
zz zz
3 [0, 1, 2, 3, 

In [8]:
### DO: Plot its time complexity. 
### Compare against both the naive hashing scheme and the theoretical complexity.  

