# Iteration

# For loops

It's quite common in data science that we want to go through a list and perform some computation on each element.  

Python provides the "for" loop for this purpose.  For loops use indentation to signal what is to be repeated.  A list is provided to the loop, and it will execute its indented body once per item in the list.

In [None]:
people = ['Alice', 'Bob', 'Che']
for person in people:
    print('Hooray for ' + person + '!')

We could "unroll" this loop and write out code that achieves the same effect but doesn't use a loop.  The loop code was easier to maintain; notice how we'd need to change the message in 3 places if we wanted to change it.

In [None]:
person = 'Alice'
print('Hooray for ' + person + '!')
person = 'Bob'
print('Hooray for ' + person + '!')
person = 'Che'
print('Hooray for ' + person + '!')

Here's an iteration over a list of numbers.

In [None]:
my_grades = [4, 3, 2, 3, 4]
for g in my_grades:
    if g == 4:
        print('A')
    elif g == 3:
        print('B')
    elif g == 2:
        print('C')

In the next example, a variable stores a running total that is updated with every iteration of the loop.  In the end, the running total is the sum of all the numbers.

In [None]:
running_total = 0
numbers = [1,2,3,4,10]
for n in numbers:
    running_total = running_total + n  # Could be abbreviated running_total += n
    print('Sum so far: ' + str(running_total))
print('Sum: ' + str(running_total))

This example is similar, but calculates an average.

In [None]:
my_numbers = [0, 5, 10, 15]
my_total = 0
for number in my_numbers: # number is 0, then 5, then 10, then 15
  print(number)
  my_total += number # +=: shorthand for my_total = my_total + number
print("Average: " + str(my_total / len(my_numbers)))

It's common to do some computation on each element of the list that results in a new list.  We can initialize a variable to an empty list before the computation gets started, and then call append() to add to the contents of the list on each iteration.

In [None]:
my_grades = [4, 3, 2, 3, 4]
letter_grades = []
for g in my_grades:
    if g == 4:
        letter_grades.append('A')
    elif g == 3:
        letter_grades.append('B')
    elif g == 2:
        letter_grades.append('C')
print(letter_grades)

If a list of tuples is iterated over in a for loop, you can split the tuple so that its parts are assigned to different variables.

In [None]:
my_movies = [("No", 4), ("Rogue One", 4.5), ("Casablanca", 5)]
for moviename, stars in my_movies:  # Notice the two variable names
    print ('I would rate ' + str(moviename) + ' ' + str(stars) + ' stars')

You could assign each whole tuple to a variable and then index to get each part, but this is just a bit less readable.

In [None]:
my_movies = [("No", 4), ("Rogue One", 4.5), ("Casablanca", 5)]
for movietuple in my_movies:  # Notice the two variable names
    print ('I would rate ' + str(movietuple[0]) + ' ' + str(movietuple[1]) + ' stars')

The next example shows how iteration can be used to find a tuple with a maximum value in a list.  We maintain a variable for the best value seen so far, and compare each new value to the best so far, having the new value take the place of the old when the old value gets beat.  If we want to remember other information about the winner so far, we need to update a variable with that information, too.

In [None]:
my_movies = [("No", 4), ("Rogue One", 4.5), ("Casablanca", 5)]

best_rating = 0 # Initialize with a value that is definitely beat
best_movie = "none"
for movie, rating in my_movies:
  if rating > best_rating:
    best_rating = rating
    best_movie = movie

print("Best movie: " + best_movie + "...rating = " + str(best_rating))

Another thing we can do by iterating through a list is filter the list, creating a new list that only contains elements that satisfy some criterion.  We can do this by creating a new empty list, and only copying over the values we want.

In [None]:
my_movies = [("No", 4), ("Rogue One", 4.5), ("Casablanca", 5), ("The Great Escape", 3), ("Cats", 1)]

good_movies = []
for movie, stars in my_movies:
    if stars >= 4:
        good_movies.append((movie, stars))
print(good_movies)

# Exercise

The list below contains tuples of (Star Wars movie name, Rotten Tomatoes score).  Produce a list (using .append() and a foreach loop) that contains only the titles of movies that scored 80 or above.

In [None]:
sw_movies = [('The Phantom Menace', 52),
('Attack of the Clones', 65),
('Revenge of the Sith', 80),
('Rogue One', 84),
('Solo', 70),
('Star Wars', 92),
('The Empire Strikes Back',94),
('Return of the Jedi', 82),
('The Force Awakens', 93),
('The Last Jedi', 90),
('The Rise of Skywalker', 51)]

In [None]:
# TODO

# For and range

Sometimes you don't want to tie your iteration to a list - you just want to do something n times.  To do this, iterate over range(n), where n is the number of times you want to iterate. 

In [None]:
for i in range(5):
  print ("Iteration " + str(i))

The number that is bound to your iterator variable will start at 0 but will not actually hit n, with n-1 the last value used for an iteration.  Counting from zero is a common computer science practice, and we get n iterations by counting zero and not counting n.

Incidentally, range() isn't a list but a *generator*, a kind of function that returns a value whenever it's asked for one.  It doesn't bother to store all n numbers ahead of time.

range() can start from a number that isn't 0, as long as you supply two arguments, where the first argument is the starting number and the second is one more than the last number you want in the sequence.

In [None]:
for i in range (1,6):
    print(i)

The number generated by range can be used to index into a list.



In [None]:
my_itinerary = ['Boston', 'Atlanta', 'LA', 'Seattle']
for idx in range(len(my_itinerary)):
    print(my_itinerary[idx])

Normally we'd iterate with a straightforward for loop, but it's sometimes handy to know the index of a value in a list.

In [None]:
my_itinerary = ['Boston', 'Atlanta', 'LA', 'Seattle']
for idx in range(len(my_itinerary)-1):  # Avoid indexing out of bounds
    print(my_itinerary[idx] + '-' + my_itinerary[idx+1])

Here is some code that answers the pressing question of which two *consecutive* Star Wars movies had the highest total rating.  (Consecutive in the list and in the movies' timeline, not release date.)

In [None]:
best_pair = (None, None)
best_rating_total = 0
for idx in range(len(sw_movies)-1):
    rating_total = sw_movies[idx][1] + sw_movies[idx+1][1] # [1] is the rating
    if rating_total > best_rating_total:
        best_rating_total = rating_total
        best_pair = (sw_movies[idx], sw_movies[idx+1])
print(best_pair)

# Exercise

Print "Hello!" 3 times in a loop.  Then modify your code so that it prints "Hello 1", "Hello 2", then "Hello 3".

In [None]:
# TODO

Or, another way to do it:

In [None]:
# OPTIONAL TODO

# Break

Sometimes it makes sense to quit early from a loop - maybe we were searching through a list for something and we found it, for example.  A break statement, if reached, immediately breaks out of the current loop and continues executing after the end of the loop.

In [None]:
for movie, rating in sw_movies:
    if movie == 'Rogue One':
        print('The rating of Rogue One is ' + str(rating))
        break  # We don't need to look at any other entries

We can see that the movies after Rogue One aren't iterated through if we print out the movies as we iterate through them.

In [None]:
for movie, rating in sw_movies:
    print('Looking at ' + movie)
    if movie == 'Rogue One':
        print('The rating of Rogue One is ' + str(rating))
        break  # We don't need to look at any other entries
print('Done')

There's often a way to structure the code such that a break isn't necessary.  It's considered a little stylistically bad to use a break when there's a way not to use a break.  But it's good to know break exists, since it does come up sometimes.

# Day 2

# While loops

Sometimes you want to iterate until some condition is met or not met, as opposed to the for loop's iterating once per item in a list.  This is where while loops come in.   They're more general than foreach loops, but they risk running forever if you have a bug.  The loop ends once the condition is not true.


In [None]:
i = 5
while i > 0:
  print(i)
  i = i - 1 # Can be abbreviated i -= 1
print('Final value of i: ' + str(i))

Without the update to i, the code would run forever.  (At least, it would run until we killed it.)

In [None]:
i = 5
while i > 0:  # Runs forever if nothing decrements i
  print(i)

It's important while writing a while loop to know how you expect to get closer to termination (the end of the loop) each time.  A least one line in the loop must advance toward the loop quitting.  Often, this could just be a line advancing a numerical counter.

In [None]:
i = 0
while i <= 20:
    print(i)
    i += 2

The next example shows how a list can act as the "todo list" for a while loop.  The pop() method removes the last element of a list and returns it.  Here, by calling pop() on every iteration, we know we'll eventually have an empty list and the code will terminate.

In [None]:
list1 = ['a','b','c','d','e']
reversed = ""
# For simplicity, assume the lists are the same length
while (len(list1) > 0): 
    reversed += list1.pop()  # pop() removes the last element and returns it
print(reversed)

If you want to iterate down a list with a while loop, you should first double-check that it wouldn't be more elegant to use a for loop.  If you still want to do this iteration, you can use a variable as a counter that indexes into the list.  You have to advance this counter yourself, and also check in the condition that you don't go past the end of the list.  (So this example needs two termination conditions:  finding the greater than 10 item, or reaching the end of the list if the list didn't have what we were looking for.)

In [None]:
my_list = [9, 4, 3, 7, 10, 12, 9]

# Find the first value that is greater than 10
index = 0
while (index < len(my_list) and my_list[index] <= 10):
    index += 1
if (index < len(my_list)):
    print(my_list[index])
else:
    print('not found')

Here is very similar code, but with a for loop.  To get the effect of quitting the loop immediately that we have with the while loop, we introduce the "break" keyword, which quits a loop immediately and moves on.  We also have a special value NOT_FOUND that can signal whether large_item was ever updated.

In [None]:
NOT_FOUND = -1
large_item = NOT_FOUND
for item in my_list:
    if item > 10:
        large_item = item
        break  # A keyword that means end the loop immediately
if large_item == NOT_FOUND:
    print('not found')
else:
    print(large_item)

It's also possible to write a while loop where the condition is just set to True, so it never terminates "normally."  A break statement can finally get out of the loop when a termination condition is met.  If you want to be sure the while loop executes at least once, you can use this pattern.  It's commonly used while getting input from a user - you need to prompt at least once, but may need to prompt again on bad input. 

In [None]:
while (True):
    name = input('What is your name? ') # Get name string from the user
    if (name != 'Bob'):
        break # Successfully set name, don't ask anymore.
    print('Pick a name other than "Bob".')

# Exercise

Write a while loop that prints squares of numbers (1, 4, 9, ...) until the square would be greater than 1000.  (Recall that you can square a value *n* by writing *n ** 2*.)

In [None]:
# TODO

# Nested loops

Loops may need to be nested within each other.  Every iteration through the outer loop results in executing the full set of iterations for the inner loop.  One use for this is generating all the different combinations of values from different lists.

In [None]:
letters = ['a', 'b', 'c']
numbers = [1, 2, 3]
for l in letters:
    print('beginning letter ' + l)
    for n in numbers:
        print(l + str(n))

Another use case for nested loops is if you have a list-of-lists to iterate over.  Hitting every element requires iterating over the lists in the list-of-lists in the outer loop, and iterating over the list you got from that iteration for the inner loop.

In [None]:
# Sum each list in my_lists
my_lists = [[1, 2, 3], [4,5,6], [7,8,9]]

my_totals = [] # empty list
for l in my_lists:
  print('new list')
  listsum = 0
  for l2 in l: # iterating over the list we got from the outer foreach
    print('adding ' + str(l2))
    listsum += l2
  my_totals.append(listsum)
print('List sums:' )
my_totals

Iterating over a 2D array is similar.  We can do something with every item in the array if we create every possible combination of indices into the 2D array - usually with range().

In [None]:
import numpy as np

my_array = np.array([[1,2,3],[4,5,6],[7,8,9]]) # 3x3 matrix
sum = 0
for row in range(3):
    for col in range(3):
        sum += my_array[row,col]
        print('adding row ' + str(row) + ', col ' + str(col))
print(sum)

# Exercise

Write a nested for loop that individually prints every string in the following list-of-lists.

In [None]:
my_strings = [["three", "blind", "mice"],["happily","ever","after"],["no", "free", "lunch"]]

In [None]:
# TODO

It's also possible for two for loops to iterate over the same list in a nested fashion.  This lets you iterate over every possible sequence of two items from the list.

For example, if we have a Chess tournament and want to list every possible matchup including who plays white:

In [None]:
print('Possible matchups:')
players = ['Alice', 'Bobby', 'Caspar', 'Dmitri', 'Eve']
for white_player in players:
  for black_player in players:
    print("White: " + white_player + "; Black player: " + black_player)

There's a slight bug in this code, because players shouldn't be able to play themselves.  But it's often okay to try a first pass at a program, then see what needs fixing.  **Make it run, then make it right, then make it fast** is an old programmer saying that I like.  Get a prototype that mostly works running, then clear out the bugs and get it working for corner cases, and only then worry about speed (which we won't worry about for a while).

So, how do we *not* print "Alice-Alice" and so forth?

In [None]:
print('Possible matchups:')
players = ['Alice', 'Bobby', 'Caspar', 'Dmitri', 'Eve']
for white_player in players:
  for black_player in players:
    if not white_player == black_player:
        print("White: " + white_player + "; Black player: " + black_player)

We could go further and say that we only want to print the possible pairs ignoring who plays what color; so if we list Alice-Eve, we shouldn't also list Eve-Alice.  The idea is to not have the innermost loop start at the beginning, but have it instead start after the index of the outer loop.  In this way, only pairs (i,j) where j > i are produced; (i,j) for i > j is never encountered.

In [None]:
listlen = len(players)
for player in range(listlen):
  for player2 in range(player+1,listlen): # Notice player+1 argument for starting point
    print(players[player] + "-" + players[player2])

We can illustrate which pairs are being evaluated and which not by executing the same logic on a 2D array.  Only one side of the diagonal is filled with 1's; these are the values for which i < j.  So given any i and j with i < j, entry (i,j) gets acted on but (j,i) doesn't.

In [None]:
n = 5
my_grid = np.zeros((n,n))  # Create n x n grid of zeros
for row in range(n):
    for col in range(row+1,n): # Start setting to 1 after the i=j entry on diagonal
        my_grid[row,col] = 1
print(my_grid)

But most of the time, nested loops aren't so fancy and iterate through all possible combinations of their lists.

In [None]:
n = 5
my_grid = np.zeros((n,n))  # Create n x n grid of zeros
for i in range(n):  # Each row 
    for j in range(n):  # Each column
        my_grid[i,j] = 1
print(my_grid)

# Enumerate

You don't have to choose between iterating over an index (like with range()) and getting the item itself (with a foreach loop).  You can have both if you use enumerate.  Every iteration, it returns both the item and its index.

In [None]:
for i, item in enumerate(["duck", "duck", "goose"]):
  print("item " + str(i) + " is " + item)

This might be necessary if, for example, you cared about the item next in the list.  The item by itself can't tell you what comes after it, so you need the index, too.

In [None]:
my_list = ["duck", "duck", "goose"]
for i, item in enumerate(my_list):
  if i < len(my_list)-1: # Make sure we don't index off the end
      print("item " + str(i) + " is " + item + " (next is " + my_list[i+1] + ")")
  else:
      print("item " + str(i) + " is " + item + " (end)")

But enumerate is just a convenience, since we could achieve the same effect by just using an index from range() and indexing into the list when we want the item.

In [None]:
my_list = ["duck", "duck", "goose"]
for i in range(len(my_list)):
  if i < len(my_list)-1: # Make sure we don't index off the end
      print("item " + str(i) + " is " + my_list[i] + " (next is " + my_list[i+1] + ")")
  else:
      print("item " + str(i) + " is " + my_list[i] + " (end)")  

# List comprehensions

A list comprehension is a way to generate a list from another list or generator, when you might otherwise have needed a few lines to write some iterative code.  The syntax is [expression **for** variable **in** iterable], where "for *variable* in *iterable*" works like a foreach loop, and *expression* transforms the list element into the value you want.

In [None]:
# A list comprehension with range input
my_multiples_of_3 = [v * 3 for v in range(5)]
my_multiples_of_3

In [None]:
# A list comprehension with a list input
a = [1, 5, 9]
more_mult_by_3 = [i * 3 for i in a]
more_mult_by_3

We could achieve this functionality with a for loop - it would just take more lines of code.

In [None]:
a = [1, 5, 9]
out = []
for item in a:
    out.append(item * 3)
print(out)

The list content doesn't have to be integers, and the type of the output list doesn't need to match the input.

In [None]:
my_list = ['duck', 'duck', 'pig']
plurals = [item + 's' for item in my_list]
print('Plurals:')
print(plurals)

In [None]:
my_list = ['duck', 'duck', 'pig']
lengths = [len(s) for s in my_list]
print('Lengths: ')
print(lengths)

# Exercise

Try writing a list comprehension that results in the even numbers from 0 to 20 (inclusive).

In [None]:
# TODO