# "For" loops

Now that we have seen how to read files in to Python, and extract something meaningful from them (so far, TTR), it would be nice to go a little further. When doing corpus analysis, we would like to:

1. read a number of files in, one at a time
1. extract some value from each of them (TTR, MLU, etc.)
1. save those values in a separate location (e.g. a spreadsheet)

To tackle the first of these steps, we will need "for" loops.

We'll start by using a "for" loop to cycle through a list of words and find the mean word length

## Start with a list

To keep it simple, we'll start with a single sentence

In [21]:
import string

# our text to analyze
a = 'The farmer killed the duckling.'

# make all characters lower-case
a = a.lower()

# remove punctuation
punct = set(string.punctuation)
a = ''.join(x for x in a if x not in punct)

# split into a list of words
a = a.split()

# find all the unique words
a = set(a)

# convert the set of unique words back to a list
a = list(a)
print(a)





['the', 'farmer', 'killed', 'the', 'duckling']


## Finding the length of a word

We already know how to pick any particular word from the list and find its length

In [22]:
b = a[0]
len(b)

3

## Find all the word lengths

Now let's use a "for" loop to cycle through all the words in our list. NOTE: the indentation is important! Python will yell at you if you get it wrong!

In [23]:
for word in a:
    b = len(word)
    print(b)


3
6
6
3
8


## Adding context

We might want to add some context to these numbers, so we know what they mean

In [24]:
for word in a:
    b = len(word)
    print('"' + word + '"' + ' is ' + str(b) + ' letters long')

"the" is 3 letters long
"farmer" is 6 letters long
"killed" is 6 letters long
"the" is 3 letters long
"duckling" is 8 letters long


## Saving our data

In the examples above, every time we go to the next word, Python has forgotten the length of the previous word. If we want to calculate the mean length of these words, we need to retain this information. One way to do this is to set up an empty list, which we will fill with the values that are calculated inside our "for" loop.

In [25]:
# set up an empty list, to be filled with word lengths
word_length_counter = []

for word in a:
    b = len(word)
    
    # use the "append" method to put the word-length values in our previously empty counter variable
    word_length_counter.append(b)
    
print(word_length_counter)

[3, 6, 6, 3, 8]


## Find the mean

Now we can do simple math to add up all the word length values in the counter variable, and divide by the total number of items in the counter variable. Notice that we do not need to know how many items are stored in "word_length_counter"

In [26]:
av = sum(word_length_counter) / len(word_length_counter)
print(av)

5.2


# Quiz


1. Adapt the line above so that it outputs the sentence: "The mean length of words in the sentence 'The farmer killed the duckling' is 5.2"
1. Write a script that calculates:
    1. the mean word length in the Jane Austen novel "Pride and Predjudice"
    1. the total number of words in the Jane Austen novel "Pride and Predjudice"
1. [Here] [austen] you can find six novels by Jane Austen. Write a script that cycles through all of them and calculates:
    1. the total number of words for each novel
    1. the mean length of words for each novel
    1. the "grand average" word length for a Jane Austen novel (i.e., the mean of all the indivudual word-length means calculated above)
    
#### HINT: you can nest many "for" loops inside of each other
#### HINT to the HINT: make sure each loop is indented properly - otherwise nothing will work

[austen]: https://drive.google.com/open?id=0B4lOOWBOiL9jUXNYMWNBZkFROG8