<a href="https://colab.research.google.com/github/shuaiy125/735-project/blob/main/computational_complexity_hw.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework: Computational Complexity

## Problem 1: Complexity Analysis

For each of the following code snippets, determine the time complexity and space complexity in Big-O notation. Provide a brief justification for each.

**(a)**

In [None]:
def func_a(data):
    n = len(data)
    total = 0
    for i in range(0, n, 2):
        for j in range(5):
            total += data[i]
    return total

**(b)**

In [None]:
def func_b(matrix):
    n = len(matrix)
    result = []
    for i in range(n):
        for j in range(i):
            result.append(matrix[i][j])
    return result

**(c)**

In [None]:
def func_c(n):
    if n <= 1:
        return 1
    return func_c(n // 2) + func_c(n // 2)

**(d)**

In [None]:
def func_d(items):
    seen = set()
    duplicates = []
    for item in items:
        if item in seen:
            duplicates.append(item)
        seen.add(item)
    return duplicates

## Problem 2: Choosing the Right Data Structure

Finding elements common to two collections is a frequent operation in data processing (for example, finding shared gene IDs across two experiments).

**(a)** The following function finds common elements using a naive approach. What is its time complexity if `list1` has n elements and `list2` has m elements? Explain why.

In [None]:
def find_common_naive(list1, list2):
    common = []
    for x in list1:
        for y in list2:
            if x == y and x not in common:
                common.append(x)
    return common

**(b)** Write an efficient version called `find_common_fast` that uses a set to achieve better time complexity. What is the time complexity of your version? For example, `find_common_fast([1, 2, 3, 4], [3, 4, 5, 6])` should return `[3, 4]` (in any order).

In [None]:
def find_common_fast(list1, list2):
    # Your code here
    pass

**(c)** Verify the speedup empirically. Write a script that times both functions on lists of size n = 1000, 5000, 10000, and 20000 (where elements are random integers from 0 to n). Print the time for each function at each size.

## Problem 3: Empirical Complexity Measurement

The following function performs a computation on a list:

In [None]:
def mystery_function(data):
    n = len(data)
    data = sorted(data)
    total = 0
    for i in range(n):
        left, right = 0, n - 1
        while left < right:
            if data[left] + data[right] == data[i]:
                total += 1
            if data[left] + data[right] < data[i]:
                left += 1
            else:
                right -= 1
    return total

Your task is to empirically determine the time complexity of `mystery_function` using the log-log method from the lecture.

**(a)** Write a timing script that measures the runtime of `mystery_function` for input sizes n = 500, 1000, 2000, 4000, and 8000. Use random integer data for each size. Run each size at least 3 times and take the average.

**(b)** Compute the log-log slope using `scipy.stats.linregress` on the log of the sizes and the log of the times. Report the slope value.

**(c)** Based on the slope, what is the time complexity of `mystery_function`? Explain why this matches (or doesn't match) what you would expect from reading the code.

## Problem 4: Improving a Statistical Computation

In Bayesian statistics and spatial statistics, you often need to solve many linear systems with the same coefficient matrix but different right-hand side vectors. For example, drawing samples from a multivariate normal or computing conditional distributions.

The following function solves m linear systems using a loop:

In [None]:
import numpy as np


def solve_systems_naive(A, B):
    """Solve A @ X = B for X, where B has m columns.

    Parameters
    ----------
    A : np.ndarray
        Symmetric positive definite matrix of shape (n, n).
    B : np.ndarray
        Matrix of shape (n, m), each column is a right-hand side vector.

    Returns
    -------
    np.ndarray
        Solution matrix X of shape (n, m).
    """
    n, m = B.shape
    X = np.zeros_like(B)
    for i in range(m):
        X[:, i] = np.linalg.solve(A, B[:, i])
    return X

**(a)** What is the time complexity of `solve_systems_naive` in terms of n and m? Explain what happens inside `np.linalg.solve` on each iteration.

**(b)** Rewrite the function as `solve_systems_cholesky` using `scipy.linalg.cholesky` and `scipy.linalg.cho_solve` to factor A once and then solve each system cheaply. What is the new time complexity? For example, `solve_systems_cholesky(np.array([[2, 1], [1, 2]]), np.array([[1, 0], [0, 1]]))` should return `np.array([[2/3, -1/3], [-1/3, 2/3]])` (the inverse of A, since B is the identity).

In [None]:
import scipy.linalg


def solve_systems_cholesky(A, B):
    # Your code here
    pass

**(c)** Time both approaches with n = 300 and m = 100, and report the speedup. Use a random symmetric positive definite matrix (for example, `A = Z @ Z.T + n * np.eye(n)` where `Z` is random).

## Problem 5: Optimizing Pairwise Computation

The following function computes the sum of all pairwise absolute differences in an array:

$$S = \sum_{i=0}^{n-1} \sum_{j=i+1}^{n-1} |a_i - a_j|$$

In [None]:
def pairwise_abs_diff_slow(arr):
    """Compute sum of all pairwise absolute differences.

    Parameters
    ----------
    arr : list
        A list of numbers.

    Returns
    -------
    float
        Sum of |a_i - a_j| for all pairs i < j.

    Examples
    --------
    >>> pairwise_abs_diff_slow([1, 2, 4])
    6
    >>> pairwise_abs_diff_slow([2, 8, 4, 6])
    20
    """
    n = len(arr)
    total = 0
    for i in range(n):
        for j in range(i + 1, n):
            total += abs(arr[i] - arr[j])
    return total

This runs in O(n^2) time. Your task is to write a function `pairwise_abs_diff_fast` that computes the same result in O(n log n) time.

In [None]:
def pairwise_abs_diff_fast(arr):
    # Your code here
    pass

Hint: consider what happens when you sort the array first. After sorting, every element `a[k]` is greater than or equal to all elements before it. Think about how many times `a[k]` is added versus subtracted across all pairs that include index k.