## Table of contents

[Structures](#Structures)

[Static algorithm](#Static)

[Dynamic algorithm](#Dynamic)

[Utilities](#Utilities)

[Tests - static algorithm](#Tests-static)

[Tests - dynamic algorithm](#Tests-dynamic)

## Imports

In [2]:
from bitarray import bitarray
from time import time
import os

<a id='Structures'></a>
## Structures

In [3]:
class Tree:
    """ Huffman tree """

    def __init__(self, node):
        """ Tree contains just root and is able to be printed in pretty clear way

            Arg:
                node: root of the tree
        """
        self._root = node

    def __str__(self):
        """ Overridden method to print tree more clearly 

            Returns:
                Overridden root's method to for printing 
        """
        return self._root.__str__(tree=True, indent="")

    @staticmethod
    def get_codes_recursively(node, curr_code, letters_codes):
        """ Saving all bit codes to letter_codes for static algorithm

            Args:
                node:          current node
                curr_code:     code from root to this node
                letters_codes: dictionary where key is a letter and value is letter's bit code
                
            Note:
                New bitcode is written to letters_codes in every leaf
        """
        if node.letter == "##":
            Tree.get_codes_recursively(node[0], curr_code + "0", letters_codes)
            Tree.get_codes_recursively(node[1], curr_code + "1", letters_codes)
        else:
            letters_codes[node.letter] = curr_code

    def get_codes(self):
        """ Starting saving all bit codes to letter_codes for static algorithm

            Returns:
                dictionary where keys are letters in compressing text and values
                are letter's bit codes
                
            Note:
                It works only if tree contains at least one letter
        """
        
        assert (self._root.weight > 0)

        letters_codes = {}
        code = bitarray()

        Tree.get_codes_recursively(self._root, code, letters_codes)

        return letters_codes

    @property
    def root(self):
        """ Returns root of the tree """
        return self._root


class Node:
    """ Node of Huffman tree """
    def __init__(self, weight, letter="##", parent=None):
        """ Nodes build structure of Huffman tree

            Args:
                weight: number of letters in this node and its descendants (if letter is repeated 
                    it counts all its repetitions)
                letter: letter which is stored in that node; in case of internal node or special
                    node with weight 0 which appears in dynamic algorithm the letter is default ("##")
                parent: parent of that node; root has None
                
            Note:
                It is a binary tree and every node has exactly 2 or exactly 0 children;
                You can access child as the node would be an array with two children due to
                overridden __setitem__ and __getitem__ methods
        """
        self._weight = weight
        self._letter = letter
        self._children = [None, None]
        self._parent = parent

    def __str__(self, tree=False, indent=""):
        """ Overridden method to print node more clearly 
             Args:
                tree: if true it prints all the tree recursively; it is recommended to use print(tree)
                    where tree is an instance of class Tree
                indent: indentation for better visual effect of printint tree

            Returns:
                Empty string because this method is printing on its own and we want to avoid more printing 
        """
        print(f'{indent}\"{self._letter}\":  {self._weight}')

        if tree and self[0]:
            self[0].__str__(tree=True, indent=(indent + "  "))
            self[1].__str__(tree=True, indent=(indent + "  "))

        return ""

    def __getitem__(self, item):
        """ Overridden method to getting children from node as if the node was an array with size 2 """
        return self._children[item]

    def __setitem__(self, key, value):
        """ Overridden method to setting children in node as if the node was an array with size 2 """
        self._children[key] = value
        
    def get_code(self, curr_code):
        """ Getting reversed bit code of leaf for dynamic algorithm 
            
            Args:
                curr_code: method should be called on leaf but it has to get to root to get full code
                    so curr_node is current node on that path
                    
            Returns:
                Reversed bit code from root to leaf
        """
        if self._parent:
            if self._parent[0] == self:
                return self._parent.get_code(curr_code + "0")
            else:
                return self._parent.get_code(curr_code + "1")
        else:
            return curr_code

    def add_children(self, child_1, child_2):
        """ Adding children to the node
            
            Args:
                child_1, child_2: nodes which will be added as current node's children
            
            Note:
                Order of children is important - they will be added to the array consecutively
        """
        self[0] = child_1
        self[1] = child_2
        self._weight = child_1.weight + child_2.weight

    @staticmethod
    def swap(node1, node2):
        """ Swapping two nodes in dynamic algorithm
            
            Args:
                node_1, node_2: nodes which will be swapped in tree
        """
        if node1.parent[0] == node1:
            bit_to_node2 = 0

        else:
            bit_to_node2 = 1

        if node2.parent[0] == node2:
            node2.parent[0] = node1
        else:
            node2.parent[1] = node1

        node1.parent, node2.parent = node2.parent, node1.parent

        node2.parent[bit_to_node2] = node2

    def increment(self):
        """ Method which ensure good balance of the tree
            
            Note:
                It is enough to mantain nice balance by swapping node with its uncle or sibling when it
                has greater weight
        """
        if self.uncle and self.uncle.weight < self.weight:
            Node.swap(self, self.uncle)

        if self._parent:
            if self._parent[0] == self and self._parent[1].weight < self._weight:
                Node.swap(self, self._parent[1])

            self._parent.weight += 1
            self._parent.increment()

    @property
    def code(self):
        """ Getting bit code of the node 
        
            Note:
                It can be called like node's attribute
        """
        code = self.get_code(bitarray())
        code.reverse()
        return code
            
    @property
    def uncle(self):
        """ Getting parent's sibling 
            
            Returns:
                parent's sibling if exists, otherwise None
                
            Note:
                It can be called like node's attribute
        """
        if self._parent and self._parent.parent:
            if self._parent == self._parent.parent[0]:
                return self._parent.parent[1]
            else:
                return self._parent.parent[0]
        return None

    @property
    def weight(self):
        """ Getting weight of the node """
        return self._weight

    @property
    def letter(self):
        """ Getting letter stored in the node """
        return self._letter

    @property
    def parent(self):
        """ Getting node's parent """
        return self._parent

    @weight.setter
    def weight(self, weight):
        """ Setting weight of the node """
        self._weight = weight

    @parent.setter
    def parent(self, parent):
        """ Setting node's parent """
        self._parent = parent

<a id='Static'></a>
## Static algorithm

In [4]:
def count_letters(text):
    """ Getting dicionary with letters and their occurences in text 
        
        Arg:
            text: text from which we will extract all letters and their occurrences
            
        Returns:
            dictionary where key is a letter and value is its occurrence in the text
    """
    letters = {}
    for letter in text:
        if letter not in letters:
            letters[letter] = 1
        else:
            letters[letter] += 1

    return letters


def huffman(letter_counts):
    """ Static Huffman algorithm for creating Huffman tree 
    
        Arg:
            letter_counts: dictionary of letters and its number of occurrences in text which 
                we want to compress
        Returns:
            Huffman tree created from letters and their occurrences in letter_counts
    """
    
    # list for nodes
    nodes = []

    # creating nodes from letter_counts and putting them to the list
    for letter, weight in letter_counts.items():
        nodes.append(Node(weight, letter))

    # list for internal nodes
    internal_nodes = []
    
    # sorting all leafs (list nodes) by their weight
    leafs = sorted(nodes, key=lambda node: node.weight)

    # connecting nodes to get proper Huffman tree
    while len(leafs) + len(internal_nodes) > 1:
        head = []

        # selecting 2 nodes with the lowest weights
        if len(leafs) >= 2:
            head += leafs[:2]
        elif len(leafs) == 1:
            head += leafs[:1]

        if len(internal_nodes) >= 2:
            head += internal_nodes[:2]
        elif len(internal_nodes) == 1:
            head += internal_nodes[:1]

        element_1, element_2 = sorted(head, key=lambda n: n.weight)[:2]
        
        # making these two nodes children of new node
        new_internal = Node(0)
        new_internal.add_children(element_1, element_2)
        internal_nodes.append(new_internal)

        # removing used nodes from lists
        if len(leafs) > 0 and element_1 == leafs[0]:
            leafs = leafs[1:]
        else:
            internal_nodes = internal_nodes[1:]

        if len(leafs) > 0 and element_2 == leafs[0]:
            leafs = leafs[1:]
        else:
            internal_nodes = internal_nodes[1:]

    # returning Huffman tree
    return Tree(internal_nodes[0])


def save_bitarray_to_file(file_path, array):
    """ Saving text which is a representation of bitarray to the file
    
    Args:
        file_path: path of the file to which we want to save bitarray
        array:     bitarray which we want to save to file
    """
    with open(file_path, "wb") as file:
        file.write(bytearray(array))

        
def get_bitarray_from_file(file_path):
    """ Getting bitarray representing text in file
    
    Arg:
        file_path: path of the file from which we want to get text
    
    Returns:
        bitarray representing text in the file
    """
    with open(file_path, "rb") as file:
        text = file.read()
        array = bitarray()
        array.frombytes(bytes(text))
        return array
    
    
def get_bits_from_number(number):
    """ Auxiliary function which is making bitarray from number
    
    Arg:
        number: number to change to bitarray
    
    Returns:
        bitarray representing the given number
        
    Note:
        number must be smaller than 2**24 because it should be represented by
        3 bytes in compressed text
    """
    assert(number < 2**24)
    
    number_bits = bitarray(24)
    number_bits.setall(False)
    
    curr_inx = 23
    while number > 0:
        if number % 2 != 0:
            number_bits[curr_inx] = True
        
        curr_inx -= 1
        number //= 2
    
    return number_bits
    
    
def get_number_from_bits(number_bits):
    """ Auxiliary function which is transforming bitarray to number (integer)
    
    Arg:
        number_bits: bitarray representing the number
    
    Returns:
        number got from the bitarray
        
    Note:
        bitarray should have length equal to 24 because it have taken 3 bytes
        in compressed text
    """
    assert(len(number_bits) == 24)
    
    number = 0
    
    multiplier = 1
    
    for i in range(23, -1, -1):
        if number_bits[i] == True:
            number += multiplier
        
        multiplier *= 2
        
    return number
    
    
def compress(text):
    """ Compressing text using static algorithm
    
    Arg: 
        text: string containg text which should be compressed
    
    Returns:
        compressed text as a bitarray
    """
        
    # getting all letters and number of their occurences in the text
    letter_counts = count_letters(text)
    
    # making huffman tree
    tree = huffman(letter_counts)
    
    # compressed text to return
    compressed_text = bitarray()
    
    # coding every letter and the number of its occurences in text
    for letter, counter in letter_counts.items():
        # letter in utf-8 (1 byte for 1 letter)
        letter_bits = bitarray()
        letter_bits.frombytes(bytes(letter, "utf-8"))
        
        # number from range 0 - 2^24 (it should be enough for files smaller than 16MB)
        # so every number takes 3 bytes
        counter_bits = get_bits_from_number(counter)
        
        # adding compressed letter and number
        compressed_text += letter_bits
        compressed_text += counter_bits
        
    # it marks end of letter_counts 
    compressed_text += "00000000"
    
    # bitecodes for every letter
    letters_codes = tree.get_codes()
    
    # adding code of every letter of the text to bitarray
    for letter in text:
        compressed_text += letters_codes[letter]
        
    # length of bit code have to be divisible by 8 so we are adding number of "0" at the end
    # to make length of bit code divisible by 8 and then add special byte to indicate that 
    # it is end of the text
    # special byte has just one "1" at the same position in byte where last letter of the text
    # has its last bit
    special_bit = (len(compressed_text) - 1) % 8
    
    compressed_text += "0" * (7 - (len(compressed_text) - 1)%8)
    
    special_byte = bitarray(8)
    special_byte.setall(False)
    special_byte[special_bit] = True
    
    compressed_text += special_byte
    
    # returning compressed text in form of bitarray
    return compressed_text

    
def decompress(compressed_text):
    """ Decompressing text using static algorithm
    
    Arg: 
        compressed_text: bitarray with compressed text which should be decompressed
    
    Returns:
        decompressed text (string)
    """
    
    # where we are reading now
    pointer = 0
    
    # we have to read letter_counts from compressed text
    letter_counts = {}
    
    # getting letter_counts
    while pointer < len(compressed_text):
        # end of letter_counts
        if compressed_text[pointer:(pointer+8)] == bitarray("00000000"):
            pointer += 8
            break
        
        # key in letter_counts dictionary (letter)
        letter = compressed_text[pointer:(pointer+8)].tobytes().decode("utf-8")
        pointer += 8
        
        # value in letter_counts dictionary (number of occurrences of letter in text)
        counter = get_number_from_bits(compressed_text[pointer:(pointer+24)])
        pointer += 24
        
        # adding new item to the dictionary
        letter_counts[letter] = counter
    
    # building huffman tree for decompression
    tree = huffman(letter_counts)
    
    # getting special bit to know where last letter ends
    special_bit = 0
    for i in range(len(compressed_text) - 8, len(compressed_text)):
        if compressed_text[i] == True:
            special_bit = i - (len(compressed_text) - 8)
    
    # decompressed text
    text = ""
    
    # decompressing text using tree
    while pointer <= len(compressed_text) - 16 + special_bit:
        curr_node = tree.root
        while curr_node.letter == "##":
            if compressed_text[pointer] == False:
                curr_node = curr_node[0]
            else:
                curr_node = curr_node[1]
            pointer += 1
            
        text += curr_node.letter
        
    return text

<a id='Dynamic'></a>
## Dynamic algorithm

In [5]:
def adaptive_compress(text):
    """ Compressing text using dynamic algorithm
        Arg: 
            text: string containing text which should be compressed
    
        Returns:
            compressed text as a bitarray
    """
    
    # dictionary with all leaves for constant time access
    nodes = {"##": Node(weight=0)}
    
    # Huffman tree
    tree = Tree(nodes["##"])
    
    # compressed text, result of the algorithm
    compressed_text = bitarray()

    # compressing all letters
    for letter in list(text):
        # letter has already been added
        if letter in nodes:
            # getting node representing letter
            node = nodes[letter]
            
            # adding node code to the compressed text
            compressed_text += node.code
            
            # incrementing weight of the node
            node.weight += 1
            
            # recursively maintaining good balance of the tree
            node.increment()
            
        # adding new letter
        else:
            # place where will be new node
            updated_node = nodes["##"]
            
            # adding code of special node (with weight 0) to indicate adding new letter
            compressed_text += updated_node.code
            
            # adding new letter in utf-8 
            letter_bits = bitarray()
            letter_bits.frombytes(bytes(letter, "utf-8"))
            compressed_text += letter_bits

            # node with new letter
            node = Node(1, letter=letter, parent=updated_node)
            
            # new special node with weight 0
            zero_node = Node(0, parent=updated_node)

            # adding new nodes as children of former special node
            updated_node.add_children(zero_node, node)

            # adding new leaves to the dictionary
            del nodes["##"]
            nodes["##"] = zero_node
            nodes[letter] = node
            
            # recursively maintaining good balance of the tree
            updated_node.increment()
            
    # length of bit code have to be divisible by 8 so we are adding number of "0" at the end
    # to make length of bit code divisible by 8 and then add special byte to indicate that 
    # it is end of the text
    # special byte has just one "1" at the same position in byte where last letter of the text
    # has its last bit
    special_bit = (len(compressed_text) - 1) % 8
    
    compressed_text += "0" * (7 - (len(compressed_text) - 1)%8)
    
    special_byte = bitarray(8)
    special_byte.setall(False)
    special_byte[special_bit] = True
    
    compressed_text += special_byte
    
    # returning bitarray which is compressed text
    return compressed_text

def adaptive_decompress(compressed_text):
    """ Decompressing text using dynamic algorithm
        Arg: 
            compressed_text: bitarray containing compressed text
    
        Returns:
            decompressed text as a string
    """
    
    # dictionary with all leaves for constant time access
    nodes = {"##": Node(weight=0)}
    
    # Huffman tree
    tree = Tree(nodes["##"])
    
    # text after decompression
    text = ""
    
    # current position in compressed text
    pointer = 0
    
    # getting special bit to know where last letter ends
    special_bit = 0
    for i in range(len(compressed_text) - 8, len(compressed_text)):
        if compressed_text[i] == True:
            special_bit = i - (len(compressed_text) - 8)

    # decompressing text
    while pointer <= len(compressed_text) - 16 + special_bit:
        # using current Huffman tree to get next letter
        curr_node = tree.root
        
        # going from root to the leaf using code saved in compressed text
        while curr_node.weight > 0 and curr_node.letter == "##":
            if compressed_text[pointer] == False:
                curr_node = curr_node[0]
            else:
                curr_node = curr_node[1]
            
            pointer += 1
        
        # leaf contains letter
        if curr_node.letter != "##":
            # letter will be added to the decompressed text
            letter = curr_node.letter
            
            # getting node with letter
            node = nodes[letter]
            
            # incrementing node's weight
            node.weight += 1
            
            # recursively maintaining good balance of the tree
            node.increment()
            
        # leaf is a special node so we are adding new letter to the tree
        else:
            # letter will be added to the decompressed text
            letter = compressed_text[pointer:(pointer+8)].tobytes().decode("utf-8")
            
            # reading letter saved in utf-8
            pointer += 8
    
            # getting special node
            updated_node = nodes["##"]

            # new node containing new letter
            node = Node(1, letter=letter, parent=updated_node)
            
            # new special node
            zero_node = Node(0, parent=updated_node)
            
            # making new nodes children of former special node
            updated_node.add_children(zero_node, node)

            # adding new leaves to dictionary
            del nodes["##"]
            nodes["##"] = zero_node
            nodes[letter] = node

            # recursively maintaining good balance of the tree
            updated_node.increment()
        
        # adding letter to the decompressed text
        text += letter
    
    # returning decompressed text
    return text


def adaptive_huffman(text_or_bitarray, mode):
    """ Using adaptive huffman algorithm for compression and decompression
        Args: 
            text_or_bitarray: text to compress or bitarray to decompress
            mode:             "compress" - compressing text; "decompress" - decompressing text
    
        Returns:
            compressed bitarray or decompressed string; depends on mode
    """
    if mode == "compress":
        return adaptive_compress(text_or_bitarray)
    elif mode == "decompress":
        return adaptive_decompress(text_or_bitarray)

<a id='Utilities'></a>
## Utilities

In [6]:
def compress_file(file_path, destination_path, mode):
    """ Compressing file
    
    Args: 
        file_path:        path of the file which we want to compress
        destination_path: path of the file to which we want to save result of compression
        mode:             "static" - use static algorithm; "dynamic" - use dynamic algorithm
    """
    
    with open(file_path, "r") as file:
        text = file.read()
        
        if mode == "static":
            compressed_text = compress(text)
        else:
            compressed_text = adaptive_huffman(text, mode="compress")
        
        save_bitarray_to_file(destination_path, compressed_text)
        
def decompress_file(file_path, destination_path, mode):
    """ Decompressing file
    
    Args: 
        file_path:        path of the file which we want to decompress
        destination_path: path of the file to which we want to save result of decompression
        mode:             "static" - use static algorithm; "dynamic" - use dynamic algorithm
    """
    
    compressed_text = get_bitarray_from_file(file_path)

    if mode == "static":
        text = decompress(compressed_text)
    else:
        text = adaptive_huffman(compressed_text, mode="decompress")
    
    with open(destination_path, "w") as file:
        file.write(text)

def get_measurements(file_size_string, mode):
    """ Printing some statistics about compression and decompression
    
    Args: 
        file_size_string: size of file which we want to compress and decompress for measurements
        mode: "static" - use static algorithm, "dynamic" - use dynamic algorithm
    
    Notes: 
        function is printing information if compression is lossless, time of compression and decompression, 
        size of file before and after compression and rate of compression.
        
        files to measure are in directory "texts"; all of them has special names so to use one of them
        it is neccessary to pass file_size_string equal to string "1kB", "10kB", "100kB" or "1MB"
    """
    
    # names of files
    file = "texts/" + file_size_string + ".txt"
    compressed_file = "texts/" + file_size_string + "_compressed.txt"
    decompressed_file = "texts/" + file_size_string + "_decompressed.txt"
    
    # time measurements
    start_time = time()
    compress_file(file,compressed_file, mode)
    compression_time = time() - start_time

    start_time = time()
    decompress_file(compressed_file, decompressed_file, mode)
    decompression_time = time() - start_time

    # checking if compression and decompression is correct
    with open(file, "r") as file1:
        with open(decompressed_file, "r") as file2:
            print("File before compression and after decompression is the same") if file1.read() == file2.read() \
            else print("File before compression and after decompression is different")
    
    # printing results of measurements
    print(f"Compression time: {compression_time}")
    print(f"Decompression time: {decompression_time}")
    
    print(f"File size: {os.path.getsize(file)}")
    print(f"Compressed file size: {os.path.getsize(compressed_file)}")
    
    compression_rate = "{0:.2f}".format(100 * (1 - os.path.getsize(compressed_file) / os.path.getsize(file)))
    print(f"Compression rate: {compression_rate}%")

<a id='Tests-static'></a>
## Tests - static algorithm

### File size: 1kB

In [7]:
get_measurements("1kB", mode="static")

File before compression and after decompression is the same
Compression time: 0.0013990402221679688
Decompression time: 0.004316806793212891
File size: 1000
Compressed file size: 784
Compression rate: 21.60%


### File size: 10kB

In [8]:
get_measurements("10kB", mode="static")

File before compression and after decompression is the same
Compression time: 0.004400014877319336
Decompression time: 0.03122711181640625
File size: 10000
Compressed file size: 5841
Compression rate: 41.59%


### File size: 100kB

In [9]:
get_measurements("100kB", mode="static")

File before compression and after decompression is the same
Compression time: 0.0459895133972168
Decompression time: 0.2397468090057373
File size: 100000
Compressed file size: 58073
Compression rate: 41.93%


### File size: 1MB

In [10]:
get_measurements("1MB", mode="static")

File before compression and after decompression is the same
Compression time: 0.24092888832092285
Decompression time: 2.1474621295928955
File size: 1000000
Compressed file size: 584719
Compression rate: 41.53%


<a id='Tests-dynamic'></a>
## Tests - dynamic algorithm

### File size: 1kB

In [11]:
get_measurements("1kB", mode="dynamic")

File before compression and after decompression is the same
Compression time: 0.017184972763061523
Decompression time: 0.01620030403137207
File size: 1000
Compressed file size: 632
Compression rate: 36.80%


### File size: 10kB

In [12]:
get_measurements("10kB", mode="dynamic")

File before compression and after decompression is the same
Compression time: 0.14078497886657715
Decompression time: 0.12728595733642578
File size: 10000
Compressed file size: 5766
Compression rate: 42.34%


### File size: 100kB

In [13]:
get_measurements("100kB", mode="dynamic")

File before compression and after decompression is the same
Compression time: 1.3902168273925781
Decompression time: 1.2570855617523193
File size: 100000
Compressed file size: 58852
Compression rate: 41.15%


### File size: 1MB

In [14]:
get_measurements("1MB", mode="dynamic")

File before compression and after decompression is the same
Compression time: 11.10156512260437
Decompression time: 12.311598777770996
File size: 1000000
Compressed file size: 594845
Compression rate: 40.52%
