**Q1 [Python**] The probability of rain on a given calendar day in Vancouver is **p[i]**, where i is the day's index. For example, **p[0]** is the probability of rain on January 1st, and **p[10]** is the probability of precipitation on January 11th. Assume the year has 365 days (i.e. p has 365 elements). What is the chance it rains more than n (e.g. 100) days in Vancouver? Write a function that accepts p (probabilities of rain on a given calendar day) and n as input arguments and returns the possibility of raining at least n days. 

In [16]:
import random
import math
from scipy.special import logsumexp
from typing import Sequence

def prob_rain_more_than_n(p: Sequence[float], n: int) -> float:
    if len(p) != 365:
        raise ValueError("The length of p should be 365.")

    if n < 0 or n > 365:
        raise ValueError("n should be between 0 and 365.")

    q = [1.0 - prob for prob in p]  # Probability of no rain on each day

    log_prob = math.log(p[0])
    log_q = math.log(q[0])

    for i in range(1, n):
        log_prob = logsumexp([log_prob + math.log(p[i]), log_q + math.log(p[i])])

    return math.exp(log_prob)

daily_probabilities = [random.uniform(0.1, 0.9) for _ in range(365)]  # Example daily probabilities
threshold = 100

if len(daily_probabilities) < 365:
    # Extend the daily probabilities by repeating the last value
    last_value = daily_probabilities[-1]
    daily_probabilities.extend([last_value] * (365 - len(daily_probabilities)))
elif len(daily_probabilities) > 365:
    # Truncate the daily probabilities to 365 values
    daily_probabilities = daily_probabilities[:365]

probability = prob_rain_more_than_n(daily_probabilities, threshold)
print(f"The probability of it raining more than {threshold} days is: {probability:.10f}")


The probability of it raining more than 100 days is: 0.5118404653


**Q2 [Python]** A phoneme is a sound unit (similar to a character for text). We have an extensive pronunciation dictionary (think millions of words). Below is a snippet: 

*   ABACUS AE B AH K AH S 
*   BOOK B UH K 
*   THEIR DH EH R 
*   THERE DH EH R 
*   TOMATO T AH M AA T OW 
*   TOMATO T AH M EY T OW 

Given a sequence of phonemes as input (e.g. **["DH", "EH", "R", "DH", "EH", "R"]**), find all the combinations of the words that can produce this sequence (e.g. **[["THEIR", "THEIR"], ["THEIR", "THERE"], ["THERE", "THEIR"], ["THERE", "THERE"]]**). You can preprocess the dictionary into a different data structure if needed. 

In [20]:
from typing import Sequence

def preprocess_pronunciation_dict(pronunciation_dict):
    processed_dict = {}
    for word_phonemes in pronunciation_dict:
        word, *phonemes = word_phonemes.split()
        phonemes = tuple(phonemes)
        if phonemes in processed_dict:
            processed_dict[phonemes].append(word)
        else:
            processed_dict[phonemes] = [word]
    return processed_dict

def find_word_combos_with_pronunciation(phonemes: Sequence[str]) -> Sequence[Sequence[str]]:
    pronunciation_dict = [
        "ABACUS AE B AH K AH S",
        "BOOK B UH K",
        "THEIR DH EH R",
        "THERE DH EH R",
        "TOMATO T AH M AA T OW",
        "TOMATO T AH M EY T OW"
    ]

    processed_dict = preprocess_pronunciation_dict(pronunciation_dict)
    n = len(phonemes)

    def backtrack(index):
        if index == n:
            return [[]]
        combos = []
        for i in range(index, n):
            curr_phonemes = tuple(phonemes[index : i + 1])
            if curr_phonemes in processed_dict:
                for word in processed_dict[curr_phonemes]:
                    remaining_combos = backtrack(i + 1)
                    if remaining_combos is not None:
                        for combo in remaining_combos:
                            combos.append([word] + combo)
        return combos if combos else None

    return backtrack(0)

phonemes = ["DH", "EH", "R", "DH", "EH", "R"]
combos = find_word_combos_with_pronunciation(phonemes)
print(combos)


[['THEIR', 'THEIR'], ['THEIR', 'THERE'], ['THERE', 'THEIR'], ['THERE', 'THERE']]


**Q3 [C]** Find the **n** most frequent words in the TensorFlow Shakespeare dataset. 
