For citation information, please see the "Source Information" section listed in the associated README file: https://github.com/stephbuon/digital-history/tree/master/hist3368-week2-critical-word-count

# Week 2 Assignment: For Loops Tutorial

In this week's assignment, you'll learn how to loop over lists of data.  You'll also start the process of thinking critically about which words matter to you for the purposes of text mining, and how to use a thesaurus and the powers of reason to expand your expert vocabulary and divide it into categories of information. 

we'll be looking at commands that tell Python to repeat:

    take an item in a list
    do something to it 
    take the next item in the list
    do something to it
    repeat until all the items in the list have been touched.

This structure is called a "loop" because when Python reaches the end of the statements in the body, it "loops" back to the beginning of the body, and executes the same statements again (this time with the next item in the list).


The list comprehension syntax discussed earlier is very powerful: it allows you to succinctly transform one list into another list by a repeated modification. 

### Using the 'for'...'in' formula

The 'in' operator is part of the grammar of most for loops.  'In' usually tells Python to iterate over the variables in a list.

The basic 'for loop' formula that we'll be using in this class using the formula:

for [dummy variable] in [list]:
    [do something to the] [dummy variable]
    
What you should notice:
    
    * notice that the line begins with 'for'
    * note the use of 'in'
    * notice that the 'for' line closes with a colon -- ':' -- which is right next to the name of the list.
    * note that the name of the dummy variable called by 'for' is repeated inside the loop.
    
That may seem terribly abstract, so let's look at a hands-on example.

Let's start out with a list of words.

In [42]:
wordstring = ['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times']
print(wordstring)

['it', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times']


We can use a for loop to format wordstring in new ways.

In [43]:
for word in wordstring:
    print(word)

it
was
the
best
of
times
it
was
the
worst
of
times


In [44]:
for word in wordstring:
    print(word + '!!')

it!!
was!!
the!!
best!!
of!!
times!!
it!!
was!!
the!!
worst!!
of!!
times!!


You might be wondering where we got the 'word' in the formula 'for [blank] in [list].' This is important: **word** in wordstring could be anything.  'word' is just a dummy variable.  

In [45]:
for rutabaga in wordstring:
    print(rutabaga)

it
was
the
best
of
times
it
was
the
worst
of
times


What's important is consistency. Whatever you name a dummy variable, you must continue to use that same variable name **inside** the for loop.  Otherwise you'll be telling Python to do something very different.

In [46]:
for tyrannosaurus in wordstring:
    print(rutabaga)

times
times
times
times
times
times
times
times
times
times
times
times


In [47]:
for tyrannosaurus in wordstring:
    print(tyrannosaurus)

it
was
the
best
of
times
it
was
the
worst
of
times


*Can you see what is different between the two commands above?*

## What is 'for' doing?

"For" is Python's command to repeat.  

Consider this: if you wanted to print out each word in 'wordstring,' you could just write out a series of commands like so:'

In [48]:
print(wordstring[0]+'!!!')
print(wordstring[1]+'!!!')
print(wordstring[2]+'!!!')
print(wordstring[3]+'!!!')
print(wordstring[4]+'!!!')
print(wordstring[5]+'!!!')
print(wordstring[6]+'!!!')
print(wordstring[7]+'!!!')
print(wordstring[8]+'!!!')
print(wordstring[9]+'!!!')
print(wordstring[10]+'!!!')
print(wordstring[11]+'!!!')

it!!!
was!!!
the!!!
best!!!
of!!!
times!!!
it!!!
was!!!
the!!!
worst!!!
of!!!
times!!!


But that's a lot of typing.  'For' saves you from unnecessary, repetitive typing. It's one of the kinds of repeated tasks that computers are great at.

In [49]:
for word in wordstring:
    print(word + ' -- is the best word! ')

it -- is the best word! 
was -- is the best word! 
the -- is the best word! 
best -- is the best word! 
of -- is the best word! 
times -- is the best word! 
it -- is the best word! 
was -- is the best word! 
the -- is the best word! 
worst -- is the best word! 
of -- is the best word! 
times -- is the best word! 


Here's a mathematical example.

In [50]:
numberlist = [1, 2, 3, 4]
    

In [51]:
for int in numberlist:
    print(int * 1000)


1000
2000
3000
4000


In the rest of this class, we won't be making up silly punctuation for lines of words. But we will want to count the number of words for every document in a given year. We will want to change the punctuation or spelling  or plural form of many words so as to produce a uniform text that is ieasy to count.  So we will have many occasions to use repeated commands.

## Other formats for for loops

There are also 'for' loops that don't use 'in.'  

Sometimes they use other commandments.  For instance, 'range()' is often used with for loops.  

'Range' calls up a list of integers leading up to a number. Thus 'range(4)' produces "0, 1, 2, 3, 4".

In the example that follows, we make an empty list with the line

    integers = []
    
We then invoke 'for' and 'range' to tell Python to repeat the next command in a 'loop'.

    for i in range(10): -- tells Python to do the next line 11 times 
        integers.append(i)  --- this line tells Python to 'append' the contents of the changing variable 'i' to the list 'integers'
        
The result of this for loop is that Python takes the dummy variable 'i' and 'appends' it as a new member in the list 'integers.'  

Because of the structure of for...range, for each loop of 'for i', the variable i increases from 0 to 10.

In [52]:
integers = []

for i in range(10):
   integers.append(i)

print(integers)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


## Doing More than One Thing inside a For Loop


    
Of course, the body of the loop can have more than one statement, and you can assign values to variables inside the loop:


In [54]:
for item in wordstring:
    yelling = item.upper()
    print(yelling)

IT
WAS
THE
BEST
OF
TIMES
IT
WAS
THE
WORST
OF
TIMES


You can even put a loop inside a for loop.  This is called a 'nested for loop'.

The nested for loop below contains TWO for statements.  
    * The first statemetn ('for item in wordstring') moves through each word in wordstring, as we saw above.
    * The second statement ('for letter in item') moves through each letter in each word.

The result is to print out one letter per line.

In [55]:
for item in wordstring:
    for letter in item:
        print(letter)

i
t
w
a
s
t
h
e
b
e
s
t
o
f
t
i
m
e
s
i
t
w
a
s
t
h
e
w
o
r
s
t
o
f
t
i
m
e
s


Here's an example of a nested for loop that calls two lists in succession.

In [59]:
num_list = [1, 2, 3]
alpha_list = ['a', 'b', 'c']

In [60]:
for number in num_list:
    print(number)
    for letter in alpha_list:
        print('   ' + letter)

1
   a
   b
   c
2
   a
   b
   c
3
   a
   b
   c


First the **first** for loop calls an item from num_list.

Then the **second** for loop calls all the items from alpha_list, printing each slightly indented.

The loop repeats until all the numbers in num_list are exhausted.

## Conditional Loops

You can also include other kinds of nested statements inside the for loop.  

"Conditional" statements, for instance "if", ask the computer to first consider whether a certain condition is true before proceeding. 

In the code below, the "if" statement asks if the length of item in characters is 2 -- in other words: 

    if len(item) == 2:
    
If that statement is true, then the computer will obey the next command, which tells Python to print words that meet the above condition in uppercase:

    print(item.upper())
   
In other words, the command

    if len(item) == 2:
         print(item.upper())

means that the computer will look for ONLY two-character words in wordstring, and those two-character words will be printed in uppercase. 

In [61]:

for item in wordstring:
    if len(item) == 2:
        print(item.upper())

IT
OF
IT
OF


Conditional statements can become very complicated. You might not see very many of these in our class, but it's useful to have seen the commands just in case.

"If" statements are often followed by one or more "elif" statements that mean: "if the conditions for the original 'if' are wrong, test the next condition"

    elif len(item) == 3:
        print("   " + item)
        
"If" and "elif" statements are often given with an alternative, which is formatted as "else." An "else" statement tells the computer what to do 
    else:
        print(item)

In [66]:

for item in wordstring:
    if len(item) == 2:
        print(item.upper())
    elif len(item) == 3:
        print("   " + item)
    else:
        print(item)

IT
   was
   the
best
OF
times
IT
   was
   the
worst
OF
times


# Digression: Doing things with Text

That's enough for loops for now.  Let's quickly pick up a few more functions that are useful for working with text.

## Introducing the .split() function

One quick way to make a list out of a line of text is to use the ".split()" function.  Applied to a line of text, .split() will *split* the variable string of text into a list of words.

Let's say that you want to print every string in a list. Here's a short text:

In [1]:
text = "it was the best of times, it was the worst of times"

We can make a list of all the words in the text by splitting on whitespace:

In [2]:
words = text.split()

Of course, we can see what's in the list simply by evaluating the variable:

In [3]:
words

['it',
 'was',
 'the',
 'best',
 'of',
 'times,',
 'it',
 'was',
 'the',
 'worst',
 'of',
 'times']

## Join: Making strings from lists

Once we've created a list of words, it's a common task to want to take that list and "glue" it back together, so it's a single string again, instead of a list. So, for example:

In [11]:
element_list = ["hydrogen", "helium", "lithium", "beryllium", "boron"]
glue = ", and "
glue.join(element_list)

'hydrogen, and helium, and lithium, and beryllium, and boron'

The .join() method needs a "glue" string to the left of it---this is the string that will be placed in between the list elements. In the parentheses to the right, you need to put an expression that evaluates to a list. Very frequently with .join(), programmers don't bother to assign the "glue" string to a variable first, so you end up with code that looks like this:


In [12]:
words = ["this", "is", "a", "test"]
" ".join(words)

'this is a test'