# Encoding

We will be using 4 functions to help us with encoding: read_file_as_int(), rice_encoding_4(), rice_encoding_2(), and write_to_binfile()

First, we have to import the necessary libraries

In [1]:
import os

The read_file_as_int() function reads a file, and converts the file into a list of binary integers which we need for encoding. It takes the file path as input so as to retrieve the file, then reads the bytes in the specified file and adds it to the list. It then prints the size of the original file for reference.

In [2]:
def read_file_as_int(file_name):
    file_as_int = []
    with open(file_name, "rb") as f:
        filesize = os.path.getsize(file_name)
        while True:
            bytes = f.read(filesize)
            if bytes:
                for byte in bytes:
                    file_as_int.append(byte)
                else:
                    break
        f.close()

        print(filesize)
        
    return file_as_int

The rice_encoding_4() and rice_encoding_2() functions do basically the same things, except the former performs the rice coding compression in 4 bits while the latter does it in 2 bits

The function first specifies the value of k, which is the compression bit length, to either 4 or 2, depending on the function, and calculates the value of M, which is found by M = 2\**k. 

It then loops through the integers in the input list to calculate the values of q and r. q is the quotient of the integer value divided by M. r is the remainder of the integer value divided by M

q is then written in unary to a string, bit_string, and r is converted from decimal binary and added to the end of bit_string

In [3]:
def rice_encoding_4(int_file):
    k = 4
    M = 2**k
    bit_string = ""
    for S in int_file:
        q = int(S/M)
        r = S%M
        unary = ""
        for u in range(q):
            unary += "1"
        unary += "0"
        r_bin = format(r, "b")
        bit = unary + r_bin
        bit_string += bit
    
    return bit_string

In [4]:
def rice_encoding_2(int_file):
    k = 2
    M = 2**k
    bit_string = ""
    for S in int_file:
        q = int(S/M)
        r = S%M
        unary = ""
        for u in range(q):
            unary += "1"
        unary += "0"
        r_bin = format(r, "b")
        bit = unary + r_bin
        bit_string += bit
    
    return bit_string

The function write_to_binfile() takes the bit string produced from the rice encoding function converts it to a bytearray, and writes the bytearray to a new specified file

In [5]:
def write_to_binfile(bit_string, encoded_file_name):
    i = 0
    buffer = bytearray()
    while i < len(bit_string):
        buffer.append( int(bit_string[i:i+8], 2) )
        i += 8

    encoded_file = open(encoded_file_name, "wb")
    encoded_file.write(buffer)
    encoded_file.close

# Decoding

In [6]:
file_as_int = []
with open("Sound1.ex2", "rb") as f:
    filesize = os.path.getsize("Sound1.ex2")
    while True:
        bytes = f.read(filesize)
        if bytes:
            for byte in bytes:
                file_as_int.append(byte)
            else:
                break
    f.close()

    print(filesize)
    print(file_as_int[0:20])

1370895
[250, 244, 251, 123, 126, 122, 158, 125, 253, 251, 122, 254, 223, 183, 250, 100, 1, 19, 211, 255]


In [7]:
k = 4
M = 2**k

In [8]:
byte_array = bytearray()
for i in file_as_int:
    q = 0
    r_string = ""
    bin_val = bin(i).replace("0b","")
    
    for val in bin_val:
        if val == "1":
            q += 1
        else:
            break
    
    r_string = bin_val[q:q+k]
    if r_string == "":
        r_string = "0"

    r = int(r_string, 2)
    S = q*M +r

    byte_array.append(S)    
    
print (byte_array[0:20])

bytearray(b'RDSC`B\x13QaSBp\'\x16R"\x10\x13$\x80')


In [10]:
encoded_file = open("Sound1.enc.wav", "wb")
encoded_file.write(byte_array)
encoded_file.close

<function BufferedWriter.close>