# Problem 2

We begin, as always, by importing required libraries followed by defining any functions we are going to create for later use.

## Library Imports

For this implementation, we will require Python's "*heapq*" library so that we can create a priority que.

In [1]:
import heapq

We will also need the "*csv*" library so that we can read and write CSV files.

In [2]:
import csv

Finally, we will need both the "*numpy*" and "*pandas*" libraries for handling arrays.  Additionally, we will need them for writing arrays out to CSV files.

In [3]:
import numpy
import pandas

Now that we have imported all the required libraries, we can move on to defining **_OUR_** functions and subroutines.

## Function Definitions

We will need several functions of our own to both allow us to endcode/decode using Huffman codes, as a well as to make the later code easier to read and write by moving simple and/or repeated tasks to subroutines of their own.  These subroutines are 

* File Reader (*for text files*)
* File Writer (*for text files*)
* File Reader (*for CSV files*)
* File Writer (*for CSV files*)
* Dictionary Extractor
* N-Gram Generator
* Freqency Counter
* Huffman Tree Maker
* Huffman Code Builder

We will also need an object class for

* Huffman Nodes

### File Reader (*for text files*)

We start with the **File Reader** for text files.  We will name it _**fileRDR**_ and its code is

In [4]:
def fileRDR(filename):
    with open(filename, 'r') as myTextFileIn:
        myTextIn = myTextFileIn.read();
    
        myTextFileIn.close()
        
    
    return myTextIn

#### Testing

In order to test it, we define a string to have the same contents as those which occur in our *TestTextFile.txt* test file

In [5]:
testText = "This is a test text file"

Then we import the file's contents to another string

In [6]:
testTextIn = fileRDR("TestTextFile.txt")

Last, we check that they are the same and print the string if they are

In [7]:
if testText == testTextIn:
    print(testTextIn)
else:
    print("OOOPS!!!")

This is a test text file


Since the text file reader works, we move on to the next subroutine.

### File Writer (*for text files*)

We continue with the **File Writer** for text files.  We will name it _**fileWTR**_ and its code is

In [8]:
def fileWTR(filename, strToWrite):
    with open(filename, 'w') as myTextFileOut:
        myTextFileOut.write(strToWrite)
        
        myTextFileOut.close()
        
        
    return None

#### Testing

We test this subroutine by writing our previously defined string **testText** to another file *TestTextFile2.txt* and then reading that new file back in with **fileRDR** and comparing the read result with the original string.  Starting with the write

In [9]:
fileWTR("TestTextFile2.txt", testText)

we then read the new file back in

In [10]:
testTextIn2 = fileRDR("TestTextFile2.txt")

and check the see that they are the same

In [11]:
if testText == testTextIn2:
    print(testTextIn2)
else:
    print("OOOPS!!!")

This is a test text file


Since the text writer works, we move on to CSV file readers and writers.

### File Reader (*for CSV files*)

Now, we will create a **File Reader** for CSV files.  We will name it _**csvFileRDR**_ and its code is

In [12]:
def csvFileRDR(filename):
    csvOUT = []
    
    with open(filename, 'r') as myCSVfileIn:
        csvReader = csv.reader(x for x in myCSVfileIn)
        
        for row in csvReader:
            temp = row
            csvOUT.append(temp)
            
        myCSVfileIn.close()
        
        
    return csvOUT

#### Testing

In order to test our new CSV file reader, we define a string array to have the same contents as those which occur in our *testCSV.csv* test file

In [13]:
testCSV = [['a', '1'], ['b', '2'], ['c', '3'], ['d', '4'], ['e', '5'], ['f', '6'], ['g', '7'], ['h', '8'], ['i', '9'], ['j', '10']]

We now import the file's contents to another array

In [14]:
testCSVin = csvFileRDR('testCSV.csv')

finally checking if they are equal to our previously defined array and printing the array if they are

In [15]:
testCOND = True

rowCTR = 0
for row in testCSV:
    colCTR = 0
    
    for col in row:
        tmpTest = testCSVin[rowCTR]
        temp = tmpTest[colCTR]
        
        if temp.strip() == col.strip():
            colCTR += 1
        else:
            testCOND = False
            break
            
    rowCTR += 1
    
    
if testCOND:
    print(testCSVin)
else:
    print("OOOPS!!!")

[['a', '1'], ['b', '2'], ['c', '3'], ['d', '4'], ['e', '5'], ['f', '6'], ['g', '7'], ['h', '8'], ['i', '9'], ['j', '10']]


Since the CSV reader works, we move on to the CSV writer.

### File Writer (*for CSV files*)

Now, we will create a **File Writer** for CSV files.  We will name it _**csvFileWTR**_ and its code is

In [16]:
def csvFileWTR(filename, arrayToWrite):
    with open(filename, 'w', newline='') as myCSVfileOut:
        csvWriter = csv.writer(myCSVfileOut, delimiter=',',
                            quotechar=' ', quoting=csv.QUOTE_MINIMAL)
                               #dialect='excel')
                               #delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
        
        for row in arrayToWrite:
            csvWriter.writerow(row)
            
        myCSVfileOut.close()
        
    return None

#### Testing

In order to test our new CSV file writer, we will use our previously defined **testCSV** string array and have the CSV file writer write it to the new file *testCSV2.csv*.  Then we will read this newly written file in and compare it to the oringinal **testCSV**.  Writing the new file

In [17]:
csvFileWTR('testCSV2.csv', testCSV)

Reading the newly written file into the new string array **testCSVin2**.

In [18]:
testCSVin2 = csvFileRDR("testCSV2.csv")

Finally checking if the **testCSV** and **testCSVin2** string arrays match, element for element.

In [19]:
testCOND = True

rowCTR = 0
for row in testCSV:
    colCTR = 0
    
    for col in row:
        tmpTest = testCSVin2[rowCTR]
        temp = tmpTest[colCTR]
        
        if temp.strip() == col.strip():
            colCTR += 1
        else:
            testCOND = False
            break
            
    rowCTR += 1
    
    
if testCOND:
    print(testCSVin2)
else:
    print("OOOPS!!!")

[['a', '1'], ['b', '2'], ['c', '3'], ['d', '4'], ['e', '5'], ['f', '6'], ['g', '7'], ['h', '8'], ['i', '9'], ['j', '10']]


### Dictionary Extractor

We will also need a subroutine to extract a dictionary of all the characters used by a specified text.  Thus, we create the **dictExtractr** sub-routine to extract a character dictionary from the input String provided to it.

In [20]:
def dictExtractr(textIn):
    dictOut = list(set(textIn))
    
    return dictOut

#### Testing

To test our **dictExtractr** function, we will provide it with the previously defined string, **testText**, and a new string **testGophers**, which is defined as

In [21]:
testGophers = "go go gophers"

Testing on **testText** gives

In [22]:
print(testText)

testDict = dictExtractr(testText)
print(testDict)

This is a test text file
['f', 'h', 'T', 'i', 'e', 's', ' ', 'l', 't', 'x', 'a']


While testing on **testGophers** gives

In [23]:
print(testGophers)

testDict2 = dictExtractr(testGophers)
print(testDict2)

go go gophers
['h', 'g', 'e', ' ', 's', 'o', 'r', 'p']


Since the dictionary exractor works, we will new move on to the N-Gram Generator.

### N-Gram Generator

Since we may wish to encode based on Bi-Grams, Tri-Grams, or some other type of N-Grams (*instead of just characters*), we need to write a routine to create N-Grams of the specified dimension (*N*) from a specified character dictionary.  We call this function **nGramBuilder** and its code is

In [24]:
def nGramBuilder(nIn, dictIn):
    gramsOut = []
    
    if nIn == 1:
        gramsOut = dictIn
    else: #if nIn > 1:
        nOut = nIn - 1
        
        tempGrams = nGramBuilder(nOut, dictIn)
        
        for letter in dictIn:
            for gram in tempGrams:
                gramsOut.append(letter + gram)
                
    
    return gramsOut

Our character dictionary is used by one more function which we will define next

### Frequency Counter

We need to know the frequency of characters from a given dictionary in a given document.  Thus, we create the **freqCTR** sub-routine to determine these frequencies.

In [25]:
def freqCTR(dictIn, textIn):
    counts = []
    
    for x in dictIn:
        counts.append(textIn.count(x))
        
        
    return x

In [26]:
def freqCTR(dictIn, textIn):
    counts = {}
    
    for x in dictIn:
        counts[x] = textIn.count(x)
        
        
    return counts

#### Testing

We test this with the 

In [27]:
print(testText)
print(testGophers)

This is a test text file
go go gophers


In [28]:
print(testDict)
print(testDict2)

['f', 'h', 'T', 'i', 'e', 's', ' ', 'l', 't', 'x', 'a']
['h', 'g', 'e', ' ', 's', 'o', 'r', 'p']


In [29]:
testFreqs = freqCTR(testDict, testText)
print(testFreqs)

{'f': 1, 'h': 1, 'T': 1, 'i': 3, 'e': 3, 's': 3, ' ': 5, 'l': 1, 't': 4, 'x': 1, 'a': 1}


Before continuing with subroutines and functions, we need to define an object class for Huffman Nodes (*nodes in our Huffman tree(s)*).

### Huffman Nodes

