# Week 4: Conditionals, Iteration, and Counting Types

## Part 1: Conditionals

We we lean all about `if`, `elif`, `else`, and indentation. We will do some more fun group role-playing in order to learn about conditionals. Get ready for 🐍!


## Part 2: Loops and Iteration

We will learn all about `for`, learn how to embed conditionals *inside* loops, followed by... more role-playing, more 🐍!


## Part 3: Using Loops and Iteration to Calculate Types

We will put together what we learned about conditionals and iteration, plus some new friends, `in` and `list.append()`, in order to record the number of unique words in a text

## Links

You may find these sections of Melanie Walsh's textbook useful:
* [Python Comparisons and Conditionals](https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/08-Comparisons-Conditionals.html)
* [Python Links and Loops](https://melaniewalsh.github.io/Intro-Cultural-Analytics/02-Python/09-Lists-Loops-Part1.html)



# 1. Conditionals

## Revisiting comparisons

Two whole weeks ago (how the time flies!), you may recall that we mets lot of **operators**.

* `==`: equal to
* `!=`: not equal to
* `>`: greater than
* `>=`: greather than or equal to
* `<`: less than
* `<=`: less than or equal too

We used these operators to make **comparisons**. For instance, 


In [None]:
author = "Zadie Smith"
age = 47

In [None]:
author == "Zadie Smith"

In [None]:
age < 100

Let's add one further layer of complexity to comparisons with `and` and `or` (and `not`, which we'll use later on)

| **Logical Operator** | **Explanation**                                                                                   |
|:-------------:|:---------------------------------------------------------------------------------------------------:|
| `x and y`         | `True` if x and y are both True                                                                             |
| `x or y`         | `True` if either x or y is True                                              |
| `not x` | `True` if x is not True

In [None]:
sugar = True
cream = True

In [None]:
sugar == True and cream == True

In [None]:
sugar == True or cream == True

Note that Python's `or` is not to be confused with "the exclusive or". 

`Or` returns `True` if EITHER (or BOTH) of the conditions is true.

The "exclusive or" returns `True` ONLY if ONE but **not BOTH** of the conditions is true.

* **`or`**: "Do you take sugar or cream in your coffee?" You can choose one, the other, or both.
* **"exclusive or"**: "Would you like fries or a salad with your burger?" You're being asked to choose one or the other, not both. 

The `or` we're talking about today is the "sugar or cream" `or`; **not** the "fries or salad" "exclusive or."

In [None]:
age == 47 and author == "Zadie Smith"

In [None]:
age == 47 or name == "Zadie Smith"

In [None]:
age == 47 and name == "Zadie Smith"

In [None]:
age == 47 or name == "Zadie Smith"

The thing we're learning about right now, **conditionals**, allow us to actually **do something** with the comparisons that we make. 

For instance, we could make a little program where, if someone is 35 or older, we have Python inform them they're not eligible for the 5 under 35 award?

Anyway, for better or worse, it's pretty easy to make this horribly cruel program. 

You do it with an **`if` statement**. 

## `if` statements

An `if` statement is an instruction to do something *if* a particular condition is met. `if` statements are useful because they enable Python to **reason** about something rather than just compute!


A common Python conditional is made up of two lines
* On the first line, you type the English word `if` followed by an **expression** (for instance, a **comparison**) and then a colon (`:`) 
* On the second line, you **indent** (Jupyter will automaticalaly insert this indentation for you, but you could also use the `Tab` key on your keyboard\*\*) and write an instruction or "statement" to be completed if the condition is met

Here's a Python `if` statement:

In [None]:
if age >= 35:
    print("You are not eligible for the 5 under 35 award.")

As humanities students, you're all ready to handle the syntax of an `if` statement, because it's a lot like the way you introduce a block quotation in an essay.

```
The opening of Eliot's The Waste Land immediately establishes a mood of dread:
    April is the cruellest month, breeding
    Lilacs out of the dead land, mixing
    Memory and desire, stirring
    Dull roots with spring rain. (1-4)
From this point onward in the poem, it is all just further downhill.
```

The first line introduces the quotation — it sets up what's to follow — and ends with a colon `:`, which signals that we're about to move into something else. 

Then all subsequent lines are *indented*, to signal that they are in some way subordinate to that introductory phrase. The power of the colon `:` — the subordination of all subsequent lines to that introductory phrase — goes away only when we stop indenting.

**So much is communicated with one bit of puncutation (`:`) and one element of layout (indentation)!**

Same with an `if` statement in Python. Other languagues are hopelessly inelegant in the way they handle them. Python, like our convention for introducing block quotes, does a lot with a little: mere colons and indentation.

The opening line of an `if` statement names the condition we're looking to meet; and it ends with a `:`, signalling we're about to specify what will actually happen if that condition is met. The `:` leaves us hanging, waiting to know what action will occur if the condition is met!

The second line is indented, to show that it's subordinated to the first line. It picks up where the `:` left off, filling in the blank: if the condition is met, **do this**.

In [None]:
if age >= 35:
    print("You are not eligible for the 5 under 35 award.")

Although this syntax is goregously and elegantly minimalist, it is also quite unforgiving. Think of Python as a demanding aesthete: it has exquisite taste, and so will not tolerate even the slightest gaffe.

In [None]:
if age >= 35
    print("You are not eligible for the 5 under 35 award.")

Actually, what I just said is not fair. Because, unlike a demanding aesthete — who would merely turn up their nose and shoo you away — Python is kind enough to explain where we've gone wrong when we make a faux-pas.

In [None]:
if age >= 35:
print("You are not eligible for the 5 under 35 award.")

## `else` statements

You can add more complexity to your `if` statement by specifying what do to if the condition in the `if` statement **isn't** met.

An `else` statement comes after an `if` statement and is formatted in the same way, except that you don't have to specify a condition (because it serves as an "if-all-else-fails-do-*this*" bucket).

In [35]:
if age >= 35:
    print("You are not eligible for the 5 under 35 award.")
else:
    print("Hell yeah, you might be eligible for the award, " + author + "!")

Hell yeah, you might be eligible for the award, Zadie Smith!


## `elif` statements

We can add even **more** nuance with `elif` — "else if" — statements.

The Python Interpreter (🐍) will evaluate the `if` statement first. Then, if it's not true, 🐍 will go to the `elif` statement (and we can stack as many of these as we like) until 🐍 finds a true one. If none of those are true, 🐍 will go to the `else` statement, if we've provided one.

In [36]:
if age >= 35:
    print("You are not eligible for the 5 under 35 award.")
elif age == 34:
    print("This is the last year, " + author + ". You better nail it.")
else:
    print("Plenty of time. Go out and smell the flowers.")

Plenty of time. Go out and smell the flowers.


In [41]:
if age > 35:
    print("You are not eligible for the 5 under 35 award.")
elif age == 34:
    print("This is the last year, " + author + ". You better nail it.")
elif age > 15 and age <= 35:
    print("Plenty of time. Go out and smell the flowers.")
else:
    print("You are too young to know anything at all. Stop worrying about literary awards.")

Plenty of time. Go out and smell the flowers.


Okay, those 🐍s up there were to get you excited about the group role-playing activity I have planned now...

# 🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍🐍

# 2. Iteration and Loops

I'm just going to show you some code for a particular kind of loop, and let's see if you can figure out its syntax and what it does.

In [None]:
number = 10

while number > 0:
    print(number)
    number = number - 1

## `for` loops

That's one kind of Python loop — a `while` loop. And `while` it's very cool and useful, it's not as useful `for` us as another kind of loop: the `for` loop.

Last class, we talked about **indexing** and **slicing** in relation to two data types: `str`s and `list`s. 

This taught us the way that Python "breaks down" those two data types:
* `str`s are broken up into...
* and `lists` are broken up into...

In [77]:
text = "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."
text_words = text.split()

In [None]:
text[:5]

In [None]:
text_words[:5]

A `for` loop allows us to **move through the parts of** a particular variable — **iterate over it**, in the stylish and fashionable lexicon of Python — and **do something** to each part of it.

In [78]:
for character in text:
    print(character.upper())

I
T
 
I
S
 
A
 
T
R
U
T
H
 
U
N
I
V
E
R
S
A
L
L
Y
 
A
C
K
N
O
W
L
E
D
G
E
D
,
 
T
H
A
T
 
A
 
S
I
N
G
L
E
 
M
A
N
 
I
N
 
P
O
S
S
E
S
S
I
O
N
 
O
F
 
A
 
G
O
O
D
 
F
O
R
T
U
N
E
,
 
M
U
S
T
 
B
E
 
I
N
 
W
A
N
T
 
O
F
 
A
 
W
I
F
E
.


In [82]:
for word in text_words:
    print(word.upper())

IT
IS
A
TRUTH
UNIVERSALLY
ACKNOWLEDGED,
THAT
A
SINGLE
MAN
IN
POSSESSION
OF
A
GOOD
FORTUNE,
MUST
BE
IN
WANT
OF
A
WIFE.


Here's the way I personally understand the syntax of a `for` loop.

In the below `for` loop, you're telling Python, 
> **Hey, Python! `for` every `element` that's "`in`" the variable `whatever`, please go in and do the following thing to it`:` `print()` off that `element`**

In [88]:
whatever = "blah blah blah"

for element in whatever:
    print(element)

b
l
a
h
 
b
l
a
h
 
b
l
a
h


Below is a more formal overview of a `for` loop.

The above `for` loops consist of two lines and have this syntax:

* On the first line, you type the word `for`, then a **variable name** for each item in the thing you'll be iterating over, then the word `in`, then the **name of the variable you want to iterate over**, and then a colon (`:`)
* On the second line, you indent and write an instruction or “statement” to be completed for each item in the list.

Note that the **variable name** you provide between `for` and `in` can be anything (as long as it follows variable naming conventions). It's nice to give it a descriptive name that corresponds to what the individual items of the stering or list *are* — but it doesn't need to be.

In [89]:
instructors = ["Karl", "Dash", "Mary"]

In [90]:
for name in instructors:
    print(f"This instructor's name is {name}.")

This instructor's name is Karl.
This instructor's name is Dash.
This instructor's name is Mary.


In [91]:
for x in instructors:
    print(f"This instructor's name is {x}.")

This instructor's name is Karl.
This instructor's name is Dash.
This instructor's name is Mary.


## Combining loops and conditionals

So, it turns out that our new friends `if` and `for` are already friends! They get along really well with one another. 

For instance:

In [92]:
for name in instructors:
    print(f"Your name is {name}.")
    if name == "Dash":
        print("You are standing at the front of the room.")
    elif name == "David":
        print("You are sitting at the front of the room.")
    elif name == "Mary":
        print("You are sitting at the back of the room.")

Your name is Karl.
Your name is Dash.
You are standing at the front of the room.
Your name is Mary.
You are sitting at the back of the room.


Notice, in the above, how indentation and colons work to indicate how everything fits together, keeping everything nestly **nested** like a matryoshka doll.

![Matryoshka dolls](matryoshki.jpg)

```
On the outside is the for loop:
    Inside that is a print() function.
    Then there is an if statement:
        Which contains a print() function.
    Then there is an elif statement:
        Which contains another print() function.
    Then there is yet another elif statement:
        Which contains yet another print() function.
    And then we go back to the start of the for loop, for as long as there are items to iterate over.
And then when there are no items left to iterate over, the for loop is done, and we are outside it.
```
On your own time, play around with the levels of indentation, breaking the logical, nested structure in multiple ways and then bringing it back to life!

How could we combine our big age-related `if` statement above,

```
if age > 41:
    print("You are hopelessly out of touch with today's world.")
elif age == 41:
    print("You are the perfect age and don't need to worry about anything.")
elif age > 15 and age < 41:
    print("You are young and perfectly in tune with your age.")
else:
    print("You are too young to know anything at all. Get back in your crib.")
```

with a for loop that iterates over a list of ages?

# 3. Using Loops and Iteration to Calculate Types

Okay, let's work together to think through what we would actually need to do in order to calculate the number of unique words or **types** in a text file that we've created.

* First we would need to load the text into a string.
* Then we would need to break it up into a list of words.
* Then we would need to go through that list of words and, at every step, determine if we've already met that words before. If it's a new word, we would store it in a new list of unique words.
* When we're done, we need to count how many words are in that new list of unique words.

We already have pretty much all the tools we need to do this. 

## The `in` operator — and `not`

`in` checks whether a particular item is in a particular list.

It can be combined with `not` — cousin to `and` and `or`, which we met above — to check if a particular item is **absent from** a particular list.

In [None]:
print(instructors)

In [None]:
"Adam" in instructors

In [None]:
"Joey" in instructors

In [None]:
"Adam" not in instructors

In [None]:
"Joey" not in instructors

In [None]:
text_words

In [None]:
"universally" in text_words

In [None]:
print(instructors)
instructors.append("Diana")
print(instructors)

Note what happens if we run the above line multiple times. 

This is an important point about Jupyter Notebooks: **when cells are run multiple times, they can yield different reults.**

Imagine a scenario in your homework gets the results you want... but only if someone runs a particular cell multiple times (which they won't know how to do). Since the autograding software only runs each cell once, this would yield an "incorrect" evaluation in an autograding situation. 

**To make sure your code runs correctly without depending on particular cells running more than once, you should regularly "Run All Cells"** (under the Cell menu).

Let's try using the `list.append()` method in a `for` loop.

Let's make a little loop that goes through a string and, for each of its letters, adds it to an empty list.

In [93]:
word = "plenipotentiary"

new_list = []

for letter in word:
    new_list.append(letter)

In [94]:
new_list

['p', 'l', 'e', 'n', 'i', 'p', 'o', 't', 'e', 'n', 't', 'i', 'a', 'r', 'y']

Now let's stick a conditional inside a loop, *and* use the `list.append()` method within that conditional statement. 

### **This will stack together all the skills we need to do today's task!**

Let's look through every word from our `text_words` variable (the opening of *Pride and Prejudice*, split up into words) and store all the ones that begin with a vowel in a new list variable called `vowel_words`. (The internet informs me that the "sometimes Y" very seldom applies to Ys at the beginning of words...)

## Mutating List Methods

We need one new list method to finish our task... but we may as well use this as an opportunity to learn about a few other list methods, since they will come in handy down the line.

* `list.append(another_item)`: adds new item (a `str`, `int`, `float`, or `bool`) to end of list
* `list.extend(another_list)`: adds items from another_list (has to be a `list`) to list
* `list.remove(item)`: removes first instance of item from the list

In [80]:
vowel_words = []

for word in text_words:
    if word[0] == "a" or word[0] == "e" or word[0] == "i" or word[0] == "o" or word[0] == "u":
        vowel_words.append(word)

In [81]:
print(vowel_words)

['is', 'a', 'universally', 'acknowledged,', 'a', 'in', 'of', 'a', 'in', 'of', 'a']


## Now, for some roleplaying and a group exercise...

For this exercise, we will need:

* 1 Python Interpreter
* 8 Students

Now, we shall sort you at the front of the room...

### Ok, now on to the exercise

First, we need to store a list of all the students.

In [97]:
#This is a list of tuples in the form (name, year)
students = [('Al', 1), ('Betty',3), ('Cindy', 2), ('Dorothy',4), ('Karl',1),('Xavier',2), ('Zeke',3), ('Vince',2)]

Now we need to make some lists so Python knows where to put stuff!

In [98]:
#Students to stand on the left side of the room
first_half_alphabet = []
#Students to stand on the right side of the room 
second_half_alphabet = []


Finally, let's put together a for loop!

In [99]:
for name, year in students:
    first_letter = name[0]
    if first_letter < 'M': #Earler letters have smaller numbers assinged in Python, but caps makes a diffence!
        first_half_alphabet.append((name, year))
    else: 
        second_half_alphabet.append((name, year))

Hmm, what do we thing?

In [100]:
print('First Half: ',  first_half_alphabet)
print('Second Half: ', second_half_alphabet)

First Half:  [('Al', 1), ('Betty', 3), ('Cindy', 2), ('Dorothy', 4), ('Karl', 1)]
Second Half:  [('Xavier', 2), ('Zeke', 3), ('Vince', 2)]


Now let's go one step further, and put all the **upperclassmen** in the middle of the room

In [101]:
#Students to stand in the center of the room
upperclassmen = []

for name, year in first_half_alphabet:
    if year >= 3:
        upperclassmen.append((name,year))
        first_half_alphabet.remove((name, year))
        
for name, year in second_half_alphabet:
    if year >= 3:
        upperclassmen.append((name,year))
        second_half_alphabet.remove((name, year))

OK how did we do???

In [102]:
print('First Half: ',  first_half_alphabet)
print('Second Half: ', second_half_alphabet)
print('Upperclassmen: ',upperclassmen)

First Half:  [('Al', 1), ('Cindy', 2), ('Karl', 1)]
Second Half:  [('Xavier', 2), ('Vince', 2)]
Upperclassmen:  [('Betty', 3), ('Dorothy', 4), ('Zeke', 3)]


# Finally: Calculating Type-Token Ratio

## Okay, now we're ready to calculate the number of unique words in `text_words`

Read through the code below and try to figure out what every line does.

In [103]:
unique_words = []

for word in text_words:
    if word not in unique_words:
        unique_words.append(word)

In [104]:
unique_words

['It',
 'is',
 'a',
 'truth',
 'universally',
 'acknowledged,',
 'that',
 'single',
 'man',
 'in',
 'possession',
 'of',
 'good',
 'fortune,',
 'must',
 'be',
 'want',
 'wife.']

## Now we're ready to calculate a type-token ratio!

In [None]:
len(text_words)

In [None]:
len(unique_words)

In [None]:
(len(unique_words) / len(text_words)) * 100

Note that I wrapped (types / tokens) in `()` to make sure that the "order of operations" is calculated correctly. It doesn't actually matter in this case — but might as well get used to it!

## Shall we give this a try with an actual text??

In [105]:
sot4 = open("sign-of-four.txt", encoding="utf-8").read()

In [106]:
sot4[:20]

'Chapter I The Scienc'

In [107]:
sot4_words = sot4.split()

In [108]:
sot4_words[:20]

['Chapter',
 'I',
 'The',
 'Science',
 'of',
 'Deduction',
 'Sherlock',
 'Holmes',
 'took',
 'his',
 'bottle',
 'from',
 'the',
 'corner',
 'of',
 'the',
 'mantel-piece',
 'and',
 'his',
 'hypodermic']

In [109]:
sot4_unique_words = []

for word in sot4_words:
    if word not in sot4_unique_words:
        sot4_unique_words.append(word)

In [110]:
sot4_unique_words[:20]

['Chapter',
 'I',
 'The',
 'Science',
 'of',
 'Deduction',
 'Sherlock',
 'Holmes',
 'took',
 'his',
 'bottle',
 'from',
 'the',
 'corner',
 'mantel-piece',
 'and',
 'hypodermic',
 'syringe',
 'its',
 'neat']

In [111]:
sot4_ttr = len(sot4_unique_words) / len(sot4_words) * 100
print(sot4_ttr)

19.966513185433236


Let's have a peek inside our `sot4_unique_words` variable to see how well we're doing in finding unique words. Let's apply the `list.sort()` method to make our list more legible. (We'll lose word order, but that's okay in this case!)

In [112]:
sot4_unique_words.sort() #This alphabetizes the list

In [113]:
sot4_unique_words[:50]

['1857,',
 '1871,',
 '1878',
 '1878,—nearly',
 '1882',
 '1882.”',
 '1882—an',
 '221_b_',
 '28th',
 '3',
 '3,',
 '340',
 '34th',
 '3rd',
 '4th',
 '7',
 '7.',
 'A',
 'Abdullah',
 'Abdullah,',
 'Abdullah.',
 'Abel',
 'About',
 'Achillis.”',
 'Achmet',
 'Achmet,',
 'Achmet.',
 'Afghan',
 'Afghanistan;',
 'Africa,',
 'After',
 'Again',
 'Again,',
 'Agra',
 'Agra,',
 'Agra.',
 'Ah,',
 'Aided',
 'Akbar',
 'Akbar,',
 'Akbar.’',
 'Alison’s',
 'All',
 'Altogether',
 'America',
 'America,',
 'American,”',
 'Among',
 'An',
 'And']

In [114]:
sot4_unique_words[-50:]

['“magnifiques,”',
 '“not',
 '“pray',
 '“rebels',
 '“shall',
 '“that',
 '“the',
 '“there',
 '“to',
 '“tours-de-force,”',
 '“we',
 '“were,',
 '“whatever',
 '“would',
 '“‘An',
 '“‘Black',
 '“‘But',
 '“‘Consider,',
 '“‘Does',
 '“‘For',
 '“‘Friends,’',
 '“‘Half',
 '“‘Here',
 '“‘How',
 '“‘Hum!’',
 '“‘I',
 '“‘If',
 '“‘It',
 '“‘It’s',
 '“‘Listen',
 '“‘Look',
 '“‘No;',
 '“‘None',
 '“‘Nonsense!’',
 '“‘Nonsense,',
 '“‘Not',
 '“‘Nothing',
 '“‘Quite',
 '“‘Take',
 '“‘The',
 '“‘Then',
 '“‘There',
 '“‘This',
 '“‘To',
 '“‘Well,',
 '“‘What',
 '“‘Who',
 '“‘Why,',
 '“‘You',
 '“‘Your']

How could we improve our list of unique words?
* remove punctuation
* remove capitalization

The former is tricky, but we already know how to do the latter. How could we add that to the for loop that looks for the number of unique words?

In [115]:
sot4_unique_words = []

for word in sot4_words:
    word = word.lower() #We'll come back to this
    if word not in sot4_unique_words:
        sot4_unique_words.append(word)

In [116]:
sot4_unique_words[:50]

['chapter',
 'i',
 'the',
 'science',
 'of',
 'deduction',
 'sherlock',
 'holmes',
 'took',
 'his',
 'bottle',
 'from',
 'corner',
 'mantel-piece',
 'and',
 'hypodermic',
 'syringe',
 'its',
 'neat',
 'morocco',
 'case.',
 'with',
 'long,',
 'white,',
 'nervous',
 'fingers',
 'he',
 'adjusted',
 'delicate',
 'needle,',
 'rolled',
 'back',
 'left',
 'shirt-cuff.',
 'for',
 'some',
 'little',
 'time',
 'eyes',
 'rested',
 'thoughtfully',
 'upon',
 'sinewy',
 'forearm',
 'wrist',
 'all',
 'dotted',
 'scarred',
 'innumerable',
 'puncture-marks.']

In [117]:
sot4_ttr_lowered = (len(sot4_unique_words) / len(sot4_words)) * 100

Let's compare our two TTR results: `sot4_ttr` (capitalization present) and `sot4_ttr_lowered` (capitalization removed). Which do you think will be higher? Why? How much do you expect the two numbers to differ?

In [118]:
print(sot4_ttr)
print(sot4_ttr_lowered)

19.966513185433236
19.240965536486677


## Discussion Question: 

Why did we just spend all this time teaching you how to calculate a TTR using Python? Why not give you a tool that will calculate it for you, like [this one](https://jsfiddle.net/vsr4nt27/)? How does knowing the TTR effect your ability to argue with it? 