# Week 03 Assignment: Huffman Encoding


### Your assignment

The code below is based on the [work we did in the classroom](https://github.com/lgreco/comp-363-f25-live-coding/blob/main/week03/huffman.ipynb) this week. We are given a binary tree [`Node` object](./Node.py) and a few basic methods towards building a Huffman encoder/decoder. Your assignment is to complete the Huffman encoder/decoder according to the following specifications.

- You may assume that the input message comprises only upper case and spaces. Method `filter_uppercase_and_spaces` can convert any string to just upper case letters and spaces, discarding punctuation marks, numbers, and other characters.

- You may assume that the message is sufficiently long so that space is the most frequent character in it.

An example of an encoding table is shown below. In this example we use `0` to denote a left child and `1` for a right child. This particular example _does not_ assume that space is the most frequency character. The example is from a [slide deck about Huffman](https://docs.google.com/presentation/d/1kSXEB7mzumoUm4pw7dhtxJxX7xckzjpUDyWlGfAxzAI/edit?usp=sharing) that you are welcome to peruse.

![Huffman Table](./Huffman%20Encoding.png)


### Reading

- [Greedy Algorithms](https://jeffe.cs.illinois.edu/teaching/algorithms/book/04-greedy.pdf) from Jeff Erikson's book.


In [None]:
from Node import Node


def filter_uppercase_and_spaces(input_string: str) -> str:
    """
    Filters the input string to retain only uppercase letters and spaces.
    """
    return "".join(
        char for char in input_string.upper() if char.isalpha() or char == " "
    )


def count_frequencies(input_string: str) -> list[int]:
    """
    Counts the frequency of each uppercase letter in the input string.
    Returns a list of 26 integers, where index 0-25 correspond to 'A'-'Z'.
    You can assume the input string contains only uppercase letters and spaces.
    And that spaces are the most frequent character, so really we dont need
    to count them.
    """
    pass


def initialize_forest(frequencies: list[int]) -> list[Node]:
    """
    Initializes a forest (list) of Node objects for each character with a non-zero frequency.
    """
    pass


def build_huffman_tree(frequencies: list[int]) -> Node:
    """
    Builds the Huffman tree from the list of frequencies and returns the root Node.
    """
    forest = initialize_forest(frequencies)
    # Your code here
    return forest[0]


def build_encoding_table(huffman_tree_root: Node) -> list[str]:
    """
    Builds the encoding table from the Huffman tree.
    Returns a list of 27 strings, where index 0-25 correspond to 'A'-'Z'
    and index 26 corresponds to space.
    Each string is the binary encoding for that character.
    """
    pass


def encode(input_string: str, encoding_table: list[str]) -> str:
    """
    Encodes the input string using the provided encoding table. Remember
    that the encoding table has 27 entries, one for each letter A-Z and
    one for space. Space is at the last index (26).
    """
    pass


def decode(encoded_string: str, huffman_root: Node) -> str:
    """
    Decodes the encoded string using the Huffman table as a key.
    """
    pass

# Coding requirements

- You may _not_ import modules in your code without explicit permission from Leo. Basically this means no `import` or `include` or similar statements in your programs.

- You may _not_ use statements like `break` to end loops or `continue` and `pass` to move through branching.

- When possible, methods that return values should have only one return statement. This is no longer a strict requirement (if you took COMP 271/272 with me, you know what I am talking about). In general, there is no good reason for a method with 20-25 lines of code at most to have multiple return statements.

- Your code should be neat and well documented. If you are coding with Visual Studio Code, there are extensions that can do a great job formatting your program. For Python, consider installing the **Black Formatter** by Microsoft.

- If you code in Python, learn to use type hints. They are annoying but useful.

- Use a standard style guide for your code. I like Google's style guides for [Java](https://google.github.io/styleguide/javaguide.html) and [Python](https://google.github.io/styleguide/pyguide.html).

- If you are using Jupyter notebooks, spend some time exploring MarkDown syntax for documentation and LaTeX for mathmetical typesetting. Good skills to have.

# Finals week policy

There is no final exam for the course. There will be a final assignemnt that will be published the week before finals and will be due the week of finals. Additionally, 8 students in the course will be invited randomly to a brief meeting with the instructor during the course's final exam slot. If you are selected for a brief meeting, we'll spend about 15 minutes during the final exam slot to review your work. This interview will cover coding practices based on your past assignments. It is meant as a checkpoint to ensure that you have internalized the work you submitted.
