## Exercises

### Exercise 11.1

Write a program that reads the contents of the file "`blakepoems.txt`", splits it into words (where everything that is not a letter is considered a word boundary), and case-insensitively builds a dictionary that stores for every word how often it occurs in the text. Then print all the words with their quantities in alphabetical order.

In [2]:
# Counting words in blakepoems.txt.
#fp = open( "blakepoems.txt")



### Solution: Exercise 11.1

In [5]:
import re

def word_counter(filepath):
    
    '''
    word_counter(filepath):

    reads a file and returns a dictionary of 
    words as keys and 
    word counts as values
    sorted alphabetically
    
    arguments:
    text: a filepath  
    
    '''
    
    # intitiate an empty dictionary
    word_dict = {}
    
    # open file
    with open(filepath) as file:
        
        # read each line
        lines = file.readlines()
        
        # iterate through each line
        for line in lines:
            
            # find all word character from each line and assign to variable words
            words = re.findall(r"\w+", line)
            
            # iterate through and convert each word in words to lower case 
            for word in words:
                word = word.lower()
                
                # if word is already in the dictionary of words, add 1
                if word in word_dict:
                    word_dict[word] += 1
                    
                # otherwise, add word to dict and initialise to 1
                else:
                    word_dict[word] = 1

    # return sorted word dictionary alphabetically
    return dict(sorted(word_dict.items(), key=lambda item: item[0]))
            #print(words)

In [6]:
word_counter('./text-files/blakepoems.txt')

{'1780': 1,
 '1789': 1,
 'a': 128,
 'about': 1,
 'abroad': 1,
 'abstract': 1,
 'ache': 1,
 'admired': 1,
 'adona': 1,
 'afar': 1,
 'affright': 1,
 'after': 2,
 'again': 3,
 'against': 1,
 'age': 2,
 'aged': 1,
 'agree': 1,
 'ah': 7,
 'air': 6,
 'airy': 1,
 'alas': 1,
 'albion': 1,
 'ale': 1,
 'alehouse': 1,
 'all': 39,
 'allay': 1,
 'alone': 1,
 'altar': 1,
 'always': 1,
 'am': 16,
 'ambush': 1,
 'among': 8,
 'an': 21,
 'ancient': 3,
 'and': 348,
 'angel': 9,
 'angels': 2,
 'angry': 2,
 'annoy': 1,
 'another': 8,
 'answer': 2,
 'answerd': 3,
 'answered': 2,
 'anvil': 1,
 'anxious': 1,
 'any': 3,
 'appall': 1,
 'appals': 1,
 'apparel': 1,
 'appear': 2,
 'appeared': 1,
 'appears': 1,
 'appendix': 1,
 'apple': 1,
 'are': 25,
 'arise': 9,
 'arm': 2,
 'armed': 4,
 'arms': 1,
 'arose': 1,
 'around': 5,
 'arrow': 1,
 'arrows': 1,
 'art': 6,
 'artful': 1,
 'as': 16,
 'ask': 2,
 'asleep': 2,
 'aspire': 2,
 'astonish': 1,
 'at': 7,
 'author': 1,
 'away': 24,
 'awoke': 1,
 'babe': 2,
 'babes': 2,

### Exercise 11.2

Do the same thing as you did for the previous exercise, but now process the text line by line. This is something that you would have to do if you had to process a very long text (as it is the case).

### Solution: Exercise 11.2

In [120]:
from collections import Counter
import re

def word_counter_by_line(filepath):
    
    '''
    word_counter_by_line(filepath):

    reads a file and returns a dictionary of 
    words as keys and 
    word counts as values
    sorted alphabetically
    
    arguments:
    text: a filepath  
    
    '''
    
    # intitiate an empty dictionary
    word_dict_by_line = {}
    
    # open file
    with open(filepath) as file:
        
        # read each line
        lines = file.readlines()
        
        # iterate through each line
        for line in lines:
            
            # check if line is not whitespace....
            if line != '\n':
                
            # ....find all word character from each non-whitespace line and assign to variable words
                words = re.findall(r"\w+", line)

                # get the words and word counts for each line 
                words_by_line = Counter(words).keys()
                word_count_by_line = Counter(words).values()
                
                # add words and counts of every line to dictionary 
                word_dict_by_line[f'line {lines.index(line)+1}'] = dict(zip(words_by_line,word_count_by_line))

    # return sorted word dictionary alphabetically
    return word_dict_by_line
            

In [121]:
word_counter_by_line('./text-files/blakepoems.txt')

{'line 1': {'Poems': 1, 'by': 1, 'William': 1, 'Blake': 1, '1789': 1},
 'line 3': {'You': 1, 'can': 1, 'do': 1, 'this': 1},
 'line 6': {'SONGS': 1, 'OF': 2, 'INNOCENCE': 1, 'AND': 1, 'EXPERIENCE': 1},
 'line 7': {'and': 1, 'THE': 1, 'BOOK': 1, 'of': 1, 'THEL': 1},
 'line 10': {'SONGS': 1, 'OF': 1, 'INNOCENCE': 1},
 'line 13': {'INTRODUCTION': 1},
 'line 15': {'Piping': 1, 'down': 1, 'the': 1, 'valleys': 1, 'wild': 1},
 'line 16': {'Piping': 1, 'songs': 1, 'of': 1, 'pleasant': 1, 'glee': 1},
 'line 17': {'On': 1, 'a': 2, 'cloud': 1, 'I': 1, 'saw': 1, 'child': 1},
 'line 18': {'And': 1, 'he': 1, 'laughing': 1, 'said': 1, 'to': 1, 'me': 1},
 'line 20': {'Pipe': 1, 'a': 2, 'song': 1, 'about': 1, 'Lamb': 1},
 'line 21': {'So': 1, 'I': 1, 'piped': 1, 'with': 1, 'merry': 1, 'cheer': 1},
 'line 22': {'Piper': 1, 'pipe': 1, 'that': 1, 'song': 1, 'again': 1},
 'line 23': {'So': 1,
  'I': 1,
  'piped': 1,
  'he': 1,
  'wept': 1,
  'to': 1,
  'hear': 1},
 'line 25': {'Drop': 1, 'thy': 2, 'pipe': 2