# 1. Improving our `kwic` function
Our `kwic` function has at least one annoying bug left: Whenever you print *any* KWIC analysis, it always `prints` one extra line when `loc == -1`.

Modify our `kwic` function so that it does *not* print when `loc == -1`.

When `loc == -1`, `print` a message to the user telling them that there are no instances of their target word in the text.

In [11]:
def find_next(word, text, loc = 0):
    return text.lower().find(word, loc) # this returns the character position of the next instance

In [12]:
def kwic(loc, text, window = 75):
    mn = 0
    mx = len(text)
    start = loc - window
    stop = loc + window
    
    if start < mn:
        start = mn
        stop = window
    if stop > mx:
        start = mx - window
        stop = mx
        
    return text[start:stop]

In [13]:
def get_kwics(word, text, loc = 0):
    while loc != -1:
        loc = find_next(word, text, loc + 1)
        print('word:', word)
        print('loc:', loc)
        print('kwic:')
        print(kwic(loc, text))
        print('-'*50)

In [15]:
whitman = '''I SING the Body electric;
The armies of those I love engirth me, and I engirth them;
They will not let me off till I go with them, respond to them,
And discorrupt them, and charge them full with the charge of the Soul.'''

In [16]:
get_kwics('butter', whitman)

word: butter
loc: -1
kwic:
I SING the Body electric;
The armies of those I love engirth me, and I engi
--------------------------------------------------


# Optional challenge
How could you rewrite `get_kwics` into two or fewer functions? You may answer in pseudocode or code.

# 2. Textual analysis with `get_kwics`

1. Pick a text of your choosing from Project Gutenberg or elsewhere.
2. Save the `utf-8` file to your computer with the extension `.txt`.
3. Using your improved `get_kwics` function, print every instance of a word that interests you in the text. For example, if you're looking at "Sherlock Holmes," the word "clue" might be interesting.
4. Read each instance of your target word in your `get_kwics` results.
5. Write a few sentences describing the different *contexts* in which your word appears.

# 3. Repeating Piper's punctuation analysis on other texts

In *Enumerations*, Piper counted the number of punctuation marks per poem over the 19th and 20th centuries. We're going to use the same idea to compare two texts:

1. Download one additional text of your choosing to compare to your text from question 2. It might be another text by the same author, period, genre, or any other reason you might want to compare them.
2. Write a function that counts all of the instances of the punctuation marks `.`, `,`, and `?` in any text. The function should `print` your results.
3. Scale the total number of each of those punctuation marks by the total number of words in the text. (Remember, we discussed how to count the total number of words in a string using `split`.)
4. Write a few sentences about the differences you observe between the two texts. Do they use these punctation marks at similar rates? How are they different? This answer will depend entirely on the texts you choose.

**Here is a hint for how to scale your texts:**

In [25]:
'I don\'t have strong opinions about punctuation. Oh, wait, perhaps I do?'

"I don't have strong opinions about punctuation. Oh, wait, perhaps I do?"

I count 2 `,` and 12 words in the string above. So the scaled value measuring the rate at which commas occur would be:

In [22]:
2/12

0.16666666666666666

The `float` above represents the *number of commas per word* in that example string. We want a similar measure for each of the three punctuation marks identified above: `.`, `,`, and `?`

# 4. Reflection
This week we discussed the relationship between our data, our models, our corpora, and our results. This was described by both Piper and Arnold and Tilton as an *iterative process*.

Reflect on the iterative process that led to your improved `kwic` function above. In what other ways could you still improve it? How did we discover its different properties and behaviors? What would we need to change about it in order for it to be more useful for analyzing texts?

*Write your answer in this markdown cell*