#CSE 101: Computer Science Principles
####Stony Brook University
####Kevin McDonnell (ktm@cs.stonybrook.edu)
##Module 10: Iteration and Lists



#### Overview

Much of what applies to iteration with strings also applies to iteration with lists.

One important difference is that we can change a list while processing it.

#### Example: Counter-driven for-loops

Suppose at a company, everyone gets a 5% raise, but the new salary can't be more than \$100,000. Anyone already making \$100,000 or more will not see any change to his/her salary.

In [0]:
salaries = [120000, 99000, 55000, 79000, 110000]
for i in range(len(salaries)):
    new_salary = min(salaries[i]*1.05, 100000)
    salaries[i] = max(new_salary, salaries[i])  # don't cut anyone's pay!
salaries

[120000, 100000, 57750.0, 82950.0, 110000]

#### Example: for-each Loops

A for-each loop should not be used to change the contents of a list. For most kinds of lists, it's impossible to do so.

In [0]:
salaries = [120000, 99000, 55000, 79000, 110000]
for salary in salaries:
    salary *= 1.10  # attempt to give everyone a 10% raise (doesn't work)
salaries 

[120000, 99000, 55000, 79000, 110000]

#### Example: Using the `enumerate` Function

`enumerate` works for lists as it does for strings, except we can use it to change a list's contents.


In [0]:
salaries = [120000, 99000, 55000, 79000, 110000]
for i, salary in enumerate(salaries):
    salaries[i] = salary + 500  # give everyone a $500 bonus
salaries 

[120500, 99500, 55500, 79500, 110500]

#### Example of What Not to Do

Never insert or delete elements from a list inside a for-loop. Remember that the purpose of a for-loop is to process all of the items in a list.

In [0]:
#nums = [1, 2, 3, 4, 5]
#for i in range(len(nums)):
#    if nums[i] == 3:
#        del nums[i]
#nums

#### Example: YouTube Data

All sorts of interesting YouTube data is available from websites like [kaggle.com](https://www.kaggle.com/).

Below is a sample from a [kaggle dataset](https://www.kaggle.com/datasnaek/youtube-new?select=USvideos.csv) of trending YouTube videos from November 2017:


In [0]:
fields = 'video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count'
row = 'jr9QtXwC9vc,17.14.11,"The Greatest Showman | Official Trailer 2 [HD] | 20th Century FOX","20th Century Fox",1,2017-11-13T14:00:23.000Z,"Trailer"|"Hugh Jackman"|"Michelle Williams"|"Zac Efron"|"Zendaya"|"Rebecca Ferguson"|"pasek and paul"|"la la land"|"moulin rouge"|"high school musical"|"hugh jackman musical"|"zac efron musical"|"musical"|"the greatest showman"|"greatest showman"|"Michael Gracey"|"P.T. Barnum"|"Barnum and Bailey"|"Barnum Circus"|"Barnum and Bailey Circus"|"20th century fox"|"greatest showman trailer"|"trailer"|"official trailer"|"the greatest showman trailer"|"logan"|"Benj Pasek"|"Justin Paul",826059,3543,119,340'
fields = fields.split(',')
row = row.split(',')
print(fields[6])  # "tags"
print(row[6])   # the tags field itself
tags = row[6].split('|')  # collect the tags into a separate list
for i in range(len(tags)):
    tags[i] = tags[i].strip('"')
print(tags)

tags
"Trailer"|"Hugh Jackman"|"Michelle Williams"|"Zac Efron"|"Zendaya"|"Rebecca Ferguson"|"pasek and paul"|"la la land"|"moulin rouge"|"high school musical"|"hugh jackman musical"|"zac efron musical"|"musical"|"the greatest showman"|"greatest showman"|"Michael Gracey"|"P.T. Barnum"|"Barnum and Bailey"|"Barnum Circus"|"Barnum and Bailey Circus"|"20th century fox"|"greatest showman trailer"|"trailer"|"official trailer"|"the greatest showman trailer"|"logan"|"Benj Pasek"|"Justin Paul"
['Trailer', 'Hugh Jackman', 'Michelle Williams', 'Zac Efron', 'Zendaya', 'Rebecca Ferguson', 'pasek and paul', 'la la land', 'moulin rouge', 'high school musical', 'hugh jackman musical', 'zac efron musical', 'musical', 'the greatest showman', 'greatest showman', 'Michael Gracey', 'P.T. Barnum', 'Barnum and Bailey', 'Barnum Circus', 'Barnum and Bailey Circus', '20th century fox', 'greatest showman trailer', 'trailer', 'official trailer', 'the greatest showman trailer', 'logan', 'Benj Pasek', 'Justin Paul']


#### The `zip` Function

The `zip` function allows you to iterate over multiple lists at the same time.

In [0]:
nums1 = [10, 20, 30, 40, 50]
nums2 = [1, 2, 3, 4, 5]
nums3 = []
for n1, n2 in zip(nums1, nums2):
    nums3.append(n1 + n2)
nums3

[11, 22, 33, 44, 55]

If one list is shorter than the other, then the loop iterates a number of times equal to the length of the shorter list.

In [0]:
nums1 = [10, 20, 30]
nums2 = [1, 2, 3, 4, 5]
nums3 = []
for n1, n2 in zip(nums1, nums2):
    nums3.append(n1 + n2)
nums3

[11, 22, 33]

#### Example: Computing the Total Profit/Loss on a Group of Sales
A retailer has sold several lots of goods and wants to know the total profit/loss on the transactions. The wholesale costs and sale prices are stored in separate lists.

In [0]:
costs = [100, 700, 200, 300, 900]
sales = [150, 800, 210, 320, 1100]
total_profit = 0
for c, s in zip(costs, sales):
    total_profit += s - c
total_profit

380

An alternate solution, which builds a list of the individual profits/losses.

In [0]:
costs = [100, 700, 200, 300, 900]
sales = [150, 800, 210, 320, 1100]
diffs = []
for c, s in zip(costs, sales):
    diffs += [s - c]  # this is another way of doin concatenation
sum(diffs)

380

#### List Comprehensions

A **list comprehension** is a compact means in Python of creating a list based on an existing list. A list comprehension combines list notation with a for-loop in a single statement.

In [0]:
names = ['Sasha', 'Manny', 'Leslie', 'Krista']
uppercase_names = [name.upper() for name in names]
uppercase_names

['SASHA', 'MANNY', 'LESLIE', 'KRISTA']

In [0]:
names = ['Sasha', 'Manny', 'Leslie', 'Krista']
initials = [name[0] for name in names]
initials

['S', 'M', 'L', 'K']

We can call a function on each element in a list to generate the new list.

In [0]:
# computes f(x) = x^2 - 2x + 5
def evaluate(x):
    return x**2 - 2*x + 5

xs = [4, 1, 0, -3]
values = [evaluate(x) for x in xs]
values

[13, 4, 5, 20]

We can use `zip` to use two or more lists to generate the new list.

In [0]:
xs = [4, 6, 0, -3]
ys = [1, 4, 8, 4]
zs = [7, 5, 1, 2]
averages = [(x+y+z)/3 for x, y, z in zip(xs, ys, zs)]
averages

[4.0, 5.0, 3.0, 1.0]

A list comprehension can even include a form of the if-statement to conditionally include/exclude values from the new list. (This is called **filtering**.) There is no else-clause.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
passing_scores = [score for score in scores if score >= 65]
passing_scores

[80, 99, 92, 78]

The value inserted into the new list can be computed using the current values.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
bonus = 5
passing_scores = [score+bonus for score in scores if score >= 65]
passing_scores

[85, 104, 97, 83]

With the index-based form of the for-loop, we can find the indices of those values that satisfy a condition.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
passing_scores_indices = [i for i in range(len(scores)) if scores[i] >= 65]
passing_scores_indices

[0, 1, 3, 4]

The `enumerate` function can also be used inside a list comprehension.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
passing_scores_indices = [i for i, score in enumerate(scores) if score >= 65]
passing_scores_indices

[0, 1, 3, 4]

An if-statement with an else-clause can be used in a different manner to conditionally insert one value or another into a list. The else-clause is required in this form.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
passing_scores = ['Pass' if score >= 65 else 'Fail' for score in scores]
passing_scores

['Pass', 'Pass', 'Fail', 'Pass', 'Pass', 'Fail']

Counting the number of list elements that satisfy a given condition is easy.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
passing_scores = [score for score in scores if score >= 65]
print(f'Number passing: {len(passing_scores)}')
print(f'Passing rate: {100*len(passing_scores)/len(scores):0.2f}%')

Number passing: 4
Passing rate: 66.67%


If the index-based version of the for-loop is used, we can pull values from multiple lists at once.

In [0]:
scores = [80, 99, 52, 92, 78, 55]
names = ['Bobby', 'Manny', 'Callie', 'Jenny', 'Fanny', 'Billy']
passed = [names[i] for i in range(len(scores)) if scores[i] >= 65]
passed

['Bobby', 'Manny', 'Jenny', 'Fanny']

#### Revisiting the `join` Method

If we have a list of numbers that we want to join together, we need to convert each into a string before attempting to concatenate them together. A list comprehension makes this easy. Note that the `str` function converts non-string data into a string (if possible).

In [0]:
lengths = [5, 7, 3, 4, 9]
lengths = [str(length) for length in lengths]
displayed = ' '.join(lengths)
displayed

'5 7 3 4 9'

Before long, you will be able to write code like the above on a single line.

In [0]:
lengths = [5, 7, 3, 4, 9]
displayed = ' '.join([str(length) for length in lengths])
displayed

'5 7 3 4 9'

#### Try to Mix It up!

This is just the beginning of how to process a list with iteration. As with many other aspects of Python, your only limitation in mixing all the language features we have studied is your imagination!

#### Example: Processing a Row from a CSV File

Suppose we have student scores stored in a CSV file. The top row of the file contains the names of the columns. Then each row contains the data for each student. We want to compute the weighted total for each student, according to the weights:

* 15% of the average quiz score, plus
* 30% for each exam (60% total), plus
* 25% for the project

One minor annoyance we will have to face is that numerical data is stored in a string. Fortunately, this is easy to fix.

In [0]:
headings = 'name,quiz1,quiz2,quiz3,quiz4,exam1,exam2,project'
data = 'Richard,89,80,81,78,90,86,96'
values = [int(score) for score in data.split(',')[1:]]  # int converts a string containing digits to an integer
average = 0.15*sum(values[:4])/4 + 0.3*sum(values[4:6]) + 0.25*values[6]
average

89.1

#### Application from Mathematics: The Sieve of Eratosthenes

The [Sieve of Eratosthenes](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes) is an ancient algorithm for finding prime numbers:

1. Make a list of numbers, starting with 2.
1. Repeat until there are no more numbers we can cross off:
    
    1. The first unmarked number in the list is prime.
    1. Cross of multiples of the most recently discovered prime.

First, we cross off all multiples of 2, other than 2. This leaves 3 as the next prime number.

Then we cross off all multiples of 3, other than 3. (Note that the even multiples of 3 have already been crossed off.) This leaves 5 as the next prime number.

Next we cross off all multiples of 5, other than 5. (Note that multiples of 5 that are also multiples of 2 and/or 3 have already been crossed off.)

This process continues until there are no more numbers to cross off.

---
This example and its code was adapted from *Explorations in Computing: An Introduction to Computer Science and Python Programming* by John S. Conery. Chapman and Hall/CRC, 2014. ISBN 978-1466572447. 

In [0]:
# Initialize the list of values, which will be transformed into a list of primes.
worksheet = list(range(2, 100))  # quickly creates the list 2, 3, ..., 99
worksheet = [None, None] + worksheet  # add placeholders for the numbers 0 and 1
# The placeholders will make the math a little easier below.

In [0]:
# Crosses off multiples of k by replacing them with None.
def sift(k, nums):
    for i in range(2*k, len(nums), k):  # nums[i] == i, thanks to the placeholders
        nums[i] = None

sample = [None, None] + list(range(2, 20))
print(f'Original sample list: {sample}')
sift(2, sample)
print(f'After crossing off multiples of 2: {sample}')
sift(3, sample)
print(f'After crossing off multiples of 3: {sample}')

Original sample list: [None, None, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
After crossing off multiples of 2: [None, None, 2, 3, None, 5, None, 7, None, 9, None, 11, None, 13, None, 15, None, 17, None, 19]
After crossing off multiples of 3: [None, None, 2, 3, None, 5, None, 7, None, None, None, 11, None, 13, None, None, None, 17, None, 19]


We keep calling `sift`, increasing `k` each time until `k` $=\lceil \sqrt{n} \rceil$, where $n$ is the largest integer we want to check for primality.

In [0]:
import math

n = 100
worksheet = list(range(2, n))  # quickly creates the list 2, 3, ..., 99
worksheet = [None, None] + worksheet  # add placeholders for the numbers 0 and 1

for k in range(2, int(math.ceil(math.sqrt(n)))):
    if worksheet[k] is not None:
        sift(k, worksheet)

print(worksheet)
worksheet = [num for num in worksheet if num]  # "if num" checks if num is not None
print(worksheet)

[None, None, 2, 3, None, 5, None, 7, None, None, None, 11, None, 13, None, None, None, 17, None, 19, None, None, None, 23, None, None, None, None, None, 29, None, 31, None, None, None, None, None, 37, None, None, None, 41, None, 43, None, None, None, 47, None, None, None, None, None, 53, None, None, None, None, None, 59, None, 61, None, None, None, None, None, 67, None, None, None, 71, None, 73, None, None, None, None, None, 79, None, None, None, 83, None, None, None, None, None, 89, None, None, None, None, None, None, None, 97, None, None]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


#### Example: Processing a List of Lists

Tabular data (2D data) in Python can be represented as a *list of list*s. 

Imagine that we have a group of 4 students, and for each student we have 3 exam scores:

In [0]:
scores = [[89, 91, 90], [78, 79, 80],  [99, 90, 92], [82, 84, 79]]

`scores[k]` gives the list of scores for the k-th student:

In [0]:
scores[1]

[78, 79, 80]

We need to use two indices to access a particular student's score. For example, `scores[3][1]` is student #3's score on exam #1.

In [0]:
scores[3][1]

84

Sometimes we use **nested loops** to iterate over a list of lists. The outermost loop is called the **outer loop**, and the one inside is called the **inner loop**.

In [0]:
averages = [0] * len(scores)
for i in range(len(scores)):
    for exam_score in scores[i]:
        averages[i] += exam_score
    averages[i] /= len(scores[i])

averages

[90.0, 79.0, 93.66666666666667, 81.66666666666667]

For this example, we can eliminate the inner loop if we use the `sum` function.

In [0]:
averages = []
for i in range(len(scores)):
    averages.append(sum(scores[i]) / len(scores[i]))

averages

[90.0, 79.0, 93.66666666666667, 81.66666666666667]

We can reduce the entire solution to one line of code with a list comprehension.

In [0]:
averages = [sum(scores[i]) / len(scores[i]) for i in range(len(scores))]
averages

[90.0, 79.0, 93.66666666666667, 81.66666666666667]