# Programming Assignment 8

## Objectives 

**The purpose of this assignment is to:**

- Practice opening a text file, read it in, and manipulate the contents 
- Use exception handling to avoid FileNotFoundError exceptions 
- To practice using a dictionary 

### Using this Notebook

Notice that some of the cells are not able to be edited – they are just meant to be run to test your code, and perform the auto grading.  You should begin at the top and execute each code cell, adding your code when you see # YOUR CODE HERE.  If you have written the functions correctly, you should see no errors.

When you execute the TEST CASES cell, what you want to see is no output.  You may want to intentionally make the function give a wrong result, so that you can see what the error message looks like – and then fix your function.  Once you execute the TEST CASES cell, and get no output, then your function is most likely correct!


## Assignment Instructions

Write a program that will process a file, and count the number of occurrences of each word, storing the results in a dictionary.  As a sample file, we will use the plain text version of Shakespeare's Macbeth, available on the Project Gutenberg website, here: https://www.gutenberg.org/cache/epub/2264/pg2264.txt This file is provided in the Jupyter Lab environment as `Macbeth.txt`.

### Function 1: process_file(filename) 
Write a function that will open a text file, read each line, and add all words on that line to a dictionary, and finally return the dictionary. 

1) Use the try/catch and with open(filename...) method to open the file, as demonstrated in [Live Coding lecture from this week](https://www.coursera.org/learn/ball-state-univeristy-introduction-to-programming/item/jPzz4).  To start with, use a for loop to print out each line of the file, so that you know it is working.. 
    * Test your code to make sure your exception handling is working -- if you use the wrong filename, do you get a printed error message, not a crash?  
    * If you type in the correct filename, is the file opened correctly?  
    

2) Now, modify the for loop from step 1.  Our overall goal is to take each line....split it into a list of words...then for each word...count it using a dictionary.  This can be accomplished by the following pseudocode.  Translate this to real python.  (Hint: Look at the Live Coding: Dictionary Example from Module 7.)  

- **Hint:** you may want to put this pseudocode in your function, and gradually add each line. As you enter the code, be sure to test it frequently (and use the debugger to visualize what is happening!) 

3. Return the dictionary.

### Step 1: Implement the process_file function


In [37]:
def process_file(filename):
    my_dict = {}
    my_list = []
    try: 
       with open(filename, 'r') as file:
        for each_line in file:
            my_list += each_line.split()
        for each in my_list:
            if each in my_dict:
                my_dict[each] += 1
            else:
                my_dict[each] = 1
        return my_dict
    except FileNotFoundError:
        return f"There is no file named {filename} in this directory."

process_file('Macbeth.txt')

{'The': 117,
 'Tragedie': 1,
 'of': 314,
 'Macbeth': 26,
 'Actus': 5,
 'Primus.': 1,
 'Scoena': 1,
 'Prima.': 5,
 'Thunder': 2,
 'and': 375,
 'Lightning.': 1,
 'Enter': 73,
 'three': 13,
 'Witches.': 4,
 '1.': 21,
 'When': 21,
 'shall': 47,
 'we': 56,
 'meet': 5,
 'againe?': 2,
 'In': 28,
 'Thunder,': 1,
 'Lightning,': 1,
 'or': 25,
 'in': 167,
 'Raine?': 1,
 '2.': 12,
 'the': 528,
 "Hurley-burley's": 1,
 'done,': 7,
 "Battaile's": 1,
 'lost,': 3,
 'wonne': 1,
 '3.': 14,
 'That': 80,
 'will': 59,
 'be': 119,
 'ere': 13,
 'set': 7,
 'Sunne': 3,
 'Where': 20,
 'place?': 2,
 'Vpon': 11,
 'Heath': 2,
 'There': 12,
 'to': 310,
 'with': 133,
 'I': 314,
 'come,': 8,
 'Gray-Malkin': 1,
 'All.': 13,
 'Padock': 1,
 'calls': 2,
 'anon:': 1,
 'faire': 3,
 'is': 156,
 'foule,': 2,
 'foule': 2,
 'faire,': 1,
 'Houer': 1,
 'through': 6,
 'fogge': 1,
 'filthie': 2,
 'ayre.': 1,
 'Exeunt.': 29,
 'Scena': 22,
 'Secunda.': 5,
 'Alarum': 5,
 'within.': 7,
 'King,': 19,
 'Malcome,': 1,
 'Donalbaine,': 3,
 

### Step 2: Testing with the Main Program

The main program is provided for you, to help with testing your code.  Once your function is working, you may want to explore the data a bit, 
and see what other words are frequently (or infrequently) found! 

Notice that this approach treats “The” and “the” as different words, and also includes the punctuation with the word... if you want to experiment (optional!), you could try creating a separate function that would first change the words to lowercase, and remove the punctuation.  However, if you do this, be sure to leave the original function, as that is what the autograder is looking at! 

In [38]:
def process_file(filename):
    my_dict = {}
    my_list = []
    try: 
       with open(filename, 'r') as file:
        for each_line in file:
            my_list += each_line.split()
        for each in my_list:
            if each in my_dict:
                my_dict[each] += 1
            else:
                my_dict[each] = 1
        return my_dict
    except FileNotFoundError:
        return "Not quite. Verify that your filename is accurate"
def main(): 
    # Source: https://www.gutenberg.org/cache/epub/2264/pg2264.txt
    macbeth_filename = 'Macbeth.txt'

    # open and process file
    macbeth_dict = process_file(macbeth_filename)
    
    # find a few things
    print(f'Enter filename: {macbeth_filename}')
    print(f'The word "The" appears {macbeth_dict["The"]}  times.')
    print(f'The word "the" appears {macbeth_dict["the"]} times.')
    print(f'The word "thee" appears {macbeth_dict["thee"]} times.')

    # print the matbeth dictionary
    print('The entire dictionary is here:')
    print(macbeth_dict)

main()

Enter filename: Macbeth.txt
The word "The" appears 117  times.
The word "the" appears 528 times.
The word "thee" appears 43 times.
The entire dictionary is here:
{'The': 117, 'Tragedie': 1, 'of': 314, 'Macbeth': 26, 'Actus': 5, 'Primus.': 1, 'Scoena': 1, 'Prima.': 5, 'Thunder': 2, 'and': 375, 'Lightning.': 1, 'Enter': 73, 'three': 13, 'Witches.': 4, '1.': 21, 'When': 21, 'shall': 47, 'we': 56, 'meet': 5, 'againe?': 2, 'In': 28, 'Thunder,': 1, 'Lightning,': 1, 'or': 25, 'in': 167, 'Raine?': 1, '2.': 12, 'the': 528, "Hurley-burley's": 1, 'done,': 7, "Battaile's": 1, 'lost,': 3, 'wonne': 1, '3.': 14, 'That': 80, 'will': 59, 'be': 119, 'ere': 13, 'set': 7, 'Sunne': 3, 'Where': 20, 'place?': 2, 'Vpon': 11, 'Heath': 2, 'There': 12, 'to': 310, 'with': 133, 'I': 314, 'come,': 8, 'Gray-Malkin': 1, 'All.': 13, 'Padock': 1, 'calls': 2, 'anon:': 1, 'faire': 3, 'is': 156, 'foule,': 2, 'foule': 2, 'faire,': 1, 'Houer': 1, 'through': 6, 'fogge': 1, 'filthie': 2, 'ayre.': 1, 'Exeunt.': 29, 'Scena': 22

#### Sample Output 1
(filename was “doesnotexist.txt”) 

There is no file named `doesnotexist.txt` in this directory. 

#### Sample output 2 
(filename was “Macbeth.txt”) 

    Enter filename:  Macbeth.txt 
    The word "The" appears 117 times. 
    The word "the" appears 528 times. 
    The word "thee" appears 43 times. 
    The entire dictionary is here: 
    {'The': 117, 'Tragedie': 1, 'of': 314, 'Macbeth': 26, 'Actus': 5, 'Primus.': 1,  
    'Scoena': 1, 'Prima.': 5, 'Thunder': 2, 'and': 375, 'Lightning.': 1, 'Enter': 73,  
    'three': 13, 'Witches.': 4, '1.': 21, 'When': 21, 'shall': 47, 'we': 56,  
    'meet': 5, 'againe?': 2, 'In': 28, 'Thunder,': 1, 'Lightning,': 1, 'or': 25,  
    'in': 167, 'Raine?': 1, '2.': 12, 'the': 528, "Hurley-burley's": 1, 'done,': 7,  
    "Battaile's": 1, 'lost,': 3, 'wonne': 1, '3.': 14, 'That': 80, 'will': 59,  
    'be': 119, 'ere': 13, 'set': 7, 'Sunne': 3, 'Where': 20, 'place?': 2,  
    'Vpon': 11, 'Heath': 2, 

    .... <snip> for brevity -- the dictionary is many pages long! 

### Step 3: Graded Test Cases

In [42]:
### GRADED TEST CASE 1: Correct file is opened & dictionary is returned
filename = 'Macbeth.txt'
result = type(process_file(filename))

assert str(result) == "<class 'dict'>", "Not quite. Verify that you are returning a dictionary and have accessed a real file. "

In [41]:
### GRADED TEST CASE 2: Incorrect file is opened
errorCheck = bool

filename = 'Macbeth2.txt'
result = process_file(filename)

if "{'File not found', 'Error'}" in result:
    errorCheck = True
    
assert errorCheck, "Not quite. Verify that your filename is accurate"

In [40]:
### GRADED TEST CASE 3: Word grouping test
word_group = bool

filename = 'Macbeth.txt'
result = process_file(filename)

try:
    if result["The"] == 117:
        word_group = True
except:
    word_group = False

assert word_group == True, "Not quite, verify you've grouped words correctly in your dictionary."

### Step 4: Submit your Work for Grading

Congratulations on completing this assignment.

To receive a final score for your work, please select the "Submit Assignment" button at the top of your lab.