# Words

## The Paragraph

The example paragraph from the talk's introduction is taken from near the very end of the short story. All the code below does is normalize the text by removing all punctuation and making all letters lowercase, then count the words, then sort the words into a dictionary, count the unique words, and show the frequencies.

In [48]:
import re

paragraph = """
Rainsford had hardly tumbled to the ground when the pack took up the cry again. 
'Nerve, nerve, nerve!' he panted, as he dashed along. A blue gap showed between 
the trees dead ahead. Ever nearer drew the hounds. Rainsford forced himself on 
toward that gap. He reached it. It was the shore of the sea. Across a cove he 
could see the gloomy gray stone of the chateau. Twenty feet below him the sea 
rumbled and hissed. Rainsford hesitated. He heard the hounds. Then he leaped far 
out into the sea."""

# Just the words, sorted
p_words = re.sub("[^a-zA-Z]"," ", paragraph).lower().split()
print(len(p_words))
print(sorted(p_words))

# Word frequencies & uniques
p_dict = {}
for word in p_words:
    try:
        p_dict[word] += 1
    except: 
        p_dict[word] = 1

frequencies = [(k, v) for k, v in p_dict.items()]
print(len(frequencies))
print(sorted(frequencies))

91
['a', 'a', 'across', 'again', 'ahead', 'along', 'and', 'as', 'below', 'between', 'blue', 'chateau', 'could', 'cove', 'cry', 'dashed', 'dead', 'drew', 'ever', 'far', 'feet', 'forced', 'gap', 'gap', 'gloomy', 'gray', 'ground', 'had', 'hardly', 'he', 'he', 'he', 'he', 'he', 'he', 'heard', 'hesitated', 'him', 'himself', 'hissed', 'hounds', 'hounds', 'into', 'it', 'it', 'leaped', 'nearer', 'nerve', 'nerve', 'nerve', 'of', 'of', 'on', 'out', 'pack', 'panted', 'rainsford', 'rainsford', 'rainsford', 'reached', 'rumbled', 'sea', 'sea', 'sea', 'see', 'shore', 'showed', 'stone', 'that', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'the', 'then', 'to', 'took', 'toward', 'trees', 'tumbled', 'twenty', 'up', 'was', 'when']
64
[('a', 2), ('across', 1), ('again', 1), ('ahead', 1), ('along', 1), ('and', 1), ('as', 1), ('below', 1), ('between', 1), ('blue', 1), ('chateau', 1), ('could', 1), ('cove', 1), ('cry', 1), ('dashed', 1), ('dead', 1), ('drew', 1), ('ever', 1), (

## The Story

In [63]:
# Ingest and listify the complete text of the story
mdg = open('texts/most_dangerous_game.txt', 'r').read()
words = re.sub("[^'a-zA-Z']"," ", mdg).lower().split()

# Get a word count 
# & check to see how things look by printing first 100 words
print("The text is {} words long and made up of {} unique word forms.".format(
    len(words), len(set(words))))
print(words[0:100])

The text is 8017 words long and made up of 1947 unique word forms.
['off', 'there', 'to', 'the', 'right', 'somewhere', 'is', 'a', 'large', 'island', 'said', 'whitney', "it's", 'rather', 'a', 'mystery', 'what', 'island', 'is', 'it', 'rainsford', 'asked', 'the', 'old', 'charts', 'call', 'it', "'ship", 'trap', 'island', "'", 'whitney', 'replied', 'a', 'suggestive', 'name', "isn't", 'it', 'sailors', 'have', 'a', 'curious', 'dread', 'of', 'the', 'place', 'i', "don't", 'know', 'why', 'some', 'superstition', "can't", 'see', 'it', 'remarked', 'rainsford', 'trying', 'to', 'peer', 'through', 'the', 'dank', 'tropical', 'night', 'that', 'was', 'palpable', 'as', 'it', 'pressed', 'its', 'thick', 'warm', 'blackness', 'in', 'upon', 'the', 'yacht', "you've", 'good', 'eyes', 'said', 'whitney', 'with', 'a', 'laugh', 'and', "i've", 'seen', 'you', 'pick', 'off', 'a', 'moose', 'moving', 'in', 'the', 'brown', 'fall']
