# Assignment 1

## Brief
This Jupyter notebook contains a small program that reads in a simple text file and searches for some words. It counts the number of times this word appears in the file, and reports the counts to the terminal.

TEST CHANGE

### Functions

#### Helper functions
To aid readability, there are some helper functions defined. These reduce repetition within the code and keep the notebook neat. The two helper functions are:

    1. input_checks(input1, input2)

This function has two parameters, input1 and input2. The type of each argument given in the function call is checked. input1 should be of type string. input2 should be either of type string or list, where each item in the list is a string. 

    2. print_dict_as_table(data)

This function has one parameter, which must be of type dictionary. The function aggregates all the 'values' from the dictionary, and stores this result as a variable. The function prints out the dictionary as a table, where any string values from the dictionary are left aligned, and any integer values are right aligned. The function gives the printed table a header row, containing the headers 'WORD' and 'COUNT'. It also gives a summary row, which contains 'TOTAL' and the aggregated number of times each word in the dictionary appears in the text. 

Within this function are two smaller functions, which aim to reduce repetition of code. These are print_dash_row() and print_kv_rows(dict_input). 

## Helper Functions

In [1]:
# Import necessary libraries
import re

In [2]:
# Define a function to check types of inputs 
def input_checks(input1, input2):

    # Check first input is a string
    if not isinstance(input1, str):
        return "First input must be of type string."
        
    # Check second input is a string, or list of strings
    elif not (isinstance(input2, str) or
        (isinstance(input2, list) and all(isinstance(item, str) for item in input2))):
        return "Second input must either be a string, or a list of strings."

    # If both inputs are of the necessary type, the check is complete
    else:
        return None 


    
    

In [3]:
# Define a function that takes a dictionary as the argument, and prints out a table
def print_dict_as_table(data):

    # Calculate total of values from dictionary
    total = sum(data.values())

    # Calculate necessary column widths by checking the max length the keys and values against the header and total rows 
    key_col_width = max([len(str(item)) for item in data.keys()] + [len("TOTAL")])
    value_col_width = max([len(str(item)) for item in data.values()] + [len("COUNT"), len(str(total))])

    # Create header and total
    header_row = {"WORD": "COUNT"}
    total_row = {"TOTAL": total}

    # Define a function to generate a dashed row in the table
    def print_dash_row():
        print("|{:-<{}}|{:-<{}}|".format("", key_col_width + 2, "", value_col_width + 2))

    # Define a function to print the search terms
    def print_kv_rows(dict_input):
        for key, value in dict_input.items():
            # If both the keys and values of the dictionary are strings, left align both
            if (isinstance(key, str) and isinstance(value, (str))):
                print("| {:<{}} | {:<{}} |".format(key, key_col_width, value, value_col_width))
            # If key is a string and values are integers, left align strings and right align integers    
            elif (isinstance(key, str) and (isinstance(value, int))):
                print("| {:<{}} | {:>{}} |".format(key, key_col_width, value, value_col_width))
            else:
                return "Dictionary contains incorrect data types. Keys must be of type string. Values must be of either type string or integer."
                
    # Do header first
    print_dash_row()
    print_kv_rows(header_row)
    print_dash_row()

    # Search terms
    print_kv_rows(data)
    print_dash_row()

    # Total
    print_kv_rows(total_row)        
    print_dash_row()


    

### Main Function

    1. word_count_summary(file_path, search_terms)

'word_count_summary' takes two inputs. The first, 'file_path' is a string of the file path that we want to read into the program. The second 'search_terms' is either a string or a list of strings. These are words that we want to count within the file that has been read in. 

The function first checks the inputs are of the correct type, using the function input_checks(). It then opens the file specified in the file_path, stores this as a string, and then finds all words within this string, saving each individual word into a list. 

The function handles cases when 'search_terms' is a string or a list of strings separately. If 'search_terms' is a single string...

In [4]:

# Define a function that takes two inputs. A file path and a string or list of strings.
def word_count_summary_case(file_path, search_terms):

    # Check inputs are of the correct type
    input_checks(file_path, search_terms)
    
    # Open the file, read its contents, and convert to lower
    with open(file_path, "r") as file:
        text = file.read()
    
    # Find all words, and store as a list
    words = re.findall("\\w+", text) 

    # We will handle single strings differently to a list of strings.
    # If search terms is a single string.
    if isinstance(search_terms, str):
        count = 0
        for word in words:
            if search_terms == word:
                count += 1
        return "The word '" + search_terms + "' appears " + str(count) + " times."

    # Create a dictionary by setting the keys to the terms in search_terms and initialise values to 0.
    aggregates = {term: 0 for term in search_terms}

    # Use a for loop to loop through the words from the file, and check them against the keys in the dictionary.
    # If the word from the file is present in the keys from the dictionary, corresponding value is increased by 1. 
    # In this way the values act as a counter for each key.
    for word in words:
        if word in aggregates:
            aggregates[word] += 1
            
    return print_dict_as_table(aggregates)
    

## Testing

In [9]:
word_count_summary_case("CS5901_topic_1/resources/pride-and-prejudice.txt", ["Jane", "Elizabeth", "Mary", "Late"])
word_count_summary_case("CS5901_topic_1/resources/pride-and-prejudice.txt", "the") 

FileNotFoundError: [Errno 2] No such file or directory: 'CS5901_topic_1/resources/pride-and-prejudice.txt'

In [None]:
word_count_summary_case("resources/a-tale-of-two-cities.txt", ["London", "Paris"])