# Loops in Python
### by [Jason DeBacker](https://jasondebacker.com), September 2019

This Jupyter Notebook outlines writing loops in Python, with special attention to Pythonic ways of writing loops.

## Types of loops

Python's standard library includes two type of loops, `for` and `while` loops.  These should be familiar to you if you've done any other programming.  `for` loops are for use when you know the number of iterations in advance, `while` loops are to be used when you do not know in advance how many times you'll need to iterate through the loop.

There are a few peculiarties of writing loops in Python that you need to be aware of.  First, the syntax:

In [None]:
# Example of a for loop
for i in range(10):  # note the colon to start the loop
    print(i)  # note that code to execute in the loop is indented

In [None]:
# Nesting loops follow the same structure
for i in range(2):
    for j in range(3):
        print(i, "+", j, "=", i+j)

In [None]:
# while loops will include a conditional for the terminal condition
while i < 10:
    print(i)
    i += 1  # note this shorthand for adding 1 to i

## Loop control statements

There are a few control statements that can be very useful in certain loops.  These include:

* break statements
* continue statements

### Break statements

Break statements help the code break out of the innermost `for` or `while` loop in which the break statement is enclosed.  

Consider the following nested loop:

In [None]:
for i in range(2):
    for j in range(3):
        print(i, "+", j, "=", i+j)
        break

Here, the `break` broke us out of the innermost loop, which contains the break command, so that only the iteration is completed each time the `for j` loop is called.

Perhaps a more interesting example is when there are conditionals inside the nested loop, such as these nested loops that finds prime numbers:

In [None]:
for n in range(2, 10):
    for x in range(2, n):
        if n % x == 0:
            print(n, 'equals', x, '*', n/x)
            break
    else:  # Note that this else statement goes with the for x loop
        # loop fell through without finding a factor
        print(n, 'is a prime number')

### Continue statements

The `continue` statement continues with the next iteration of the loop.  This is useful if you have a conditional statement nested with in a loop.  E.g., in this code searching for even numbers:

In [None]:
for num in range(2, 10):
    if num % 2 == 0:
        print("Found an even number", num)
        continue
    print("Found a number", num)

## Nested loops with nested data

Suppose you have nested data, such as a list of lists.  There is an efficient way to iterate over the items in these lists.  Consider the following list of lists:

In [9]:
students = [("Alejandro", ["CompSci", "Physics"]),
            ("Justin", ["Math", "CompSci", "Stats"]),
            ("Ed", ["CompSci", "Accounting", "Economics"]),
            ("Margot", ["InfSys", "Accounting", "Economics", "CommLaw"]),
            ("Peter", ["Sociology", "Economics", "Law", "Stats", "Music"])]

Now lets iterate over the two lists in one `for` loop.  We can do this as `for (item1, item2) in list_of_lists:`, where `item1` will refer to the items in the outer most list and `item2` will refer to those items on layer deep.

In [10]:
# print all students with a count of their courses.
for (name, subjects) in students:
    print(name, "takes", len(subjects), "courses")

Alejandro takes 2 courses
Justin takes 3 courses
Ed takes 3 courses
Margot takes 4 courses
Peter takes 5 courses


In [11]:
for (name, subjects) in students:
    for course in subjects:
        print(name, "takes", course)

Alejandro takes CompSci
Alejandro takes Physics
Justin takes Math
Justin takes CompSci
Justin takes Stats
Ed takes CompSci
Ed takes Accounting
Ed takes Economics
Margot takes InfSys
Margot takes Accounting
Margot takes Economics
Margot takes CommLaw
Peter takes Sociology
Peter takes Economics
Peter takes Law
Peter takes Stats
Peter takes Music


## List comprehensions

A list comprehension is a way to create lists using compact syntax.  Some examples:

In [3]:
all_files = ['bin', 'Data', 'Desktop', '.bashrc', '.ssh', '.vimrc']
not_hidden = [name for name in all_files if name[0] != '.']
print(not_hidden)
numbers = [1, 2, 3, 4]
squares = [x**2 for x in numbers]
print(squares)

['bin', 'Data', 'Desktop']
[1, 4, 9, 16]


Using a list comperhension can also be much faster.  Condiser the two equivalent ways of computing the square of each element in the list of numbers below:

In [8]:
import time
N = 5000
start_time = time.time()
for i in range(N):
    squares = [x**2 for x in numbers]
end_time = time.time()
print('List comprehension time = ', end_time - start_time)
squares = []
for i in range(N):
    for j in range(len(numbers)):
        squares.append(numbers[j] ** 2)
end_time = time.time()
print('Naive loop time = ', end_time - start_time)

List comprehension time =  0.009401798248291016
Naive loop time =  0.03306078910827637


The list comperehension is orders of magnitude faster.

What's happening?  Remember that Python is an interpreted language.  What you type as Python syntax is interpreted as a precompiled routine that can be executed by your machine.  By using a list comprehension, rather than a `for` loop that could be more general, you are taking advantage of a precompiled routine that is much more efficiently written for the specific task of applying an operation to each element in a list.

This is an important general rule to keep in mind when writing code in Python - one generally wants to use more "Pythonic" syntax where possible as it will call more efficient routines. 

That said, there is also the ability to use "just in time compilers" in Python, such as that available in the [Numba](http://numba.pydata.org) package.  We may talk about these later in the semester.

## Looping over items in special data structures

Two common data structures that you'll use in economics applications are Numpy arrays and Pandas dataframes.  Each has methods that provide simpler and faster iterating than standard Python loops.

### Numpy arrays

Numpy arrays have a method, `nditer()`, that allows for iteration over elements in a numpy array.  E.g.,

In [13]:
import numpy as np
a = np.arange(6).reshape(2,3)
print(a)
for x in np.nditer(a):
    print (x)

[[0 1 2]
 [3 4 5]]
0
1
2
3
4
5


You can of course slice your array to that you iterate only over certain row/columns.

### Pandas dataframes

Pandas dataframes have two methods to iterate over rows, `iterrows()` and `itertuples`.  `iterrows()` iterates over the rows, returning (column name, Series) pairs at each iteration (i.e., for each row iterated over).  `itertuples` iterates over the rows returning named tuples of the values. Generally, `itertuples` is faster.

Let's compare the differences:

In [None]:
import pandas as pd

# create a dictionary with some data
data = {'school': ['Texas', 'Texas', 'Texas', 'UGA', 'UGA'],
        'year': [2014, 2015, 2016, 2015, 2016],
        'wins': [6, 5, 5, 10, 8]}

# create a DataFrame from the dictionary 
frame = pd.DataFrame(data)

# iterate over the rows with iterrows()
print(frame.iterrows())

Note what is returned, not the items are each iteration, but rather a python object called a "generator".  

In [1]:
# assign a name to the generator object
frame_rows = frame.iterrows()

# print each item in the generator object
for item in frame_rows:
    print(item)

NameError: name 'frame' is not defined

In [None]:
# or do this all in one statement
for item in frame.iterrows():
    print(item)

In [None]:
# Now do itertuples to see how output differs
for item in frame.itertuples():
    print(item)

In [None]:
# to see how named tuples work, let's print just the school
# each each row with the itertuples() method
for item in frame.itertuples():
    print(item.school)

In [None]:
# same thing, but referening the name attribute by position
# in the named tuple, rather than the name
for item in frame.itertuples():
    print(item[1])

Named tuples are classes, with atribues names being the names of the elements in the tuple.  Thus you can reference an element in the named tuple by either the attribute name or the position. They can be very useful data structures and you can read more [here](http://intermediatepythonista.com/python-generators). 


### Words of warning

While we looked at efficient ways to iterate over Numpy arrays or Pandas dataframes, in general you will want to avoid such loops.  Always look first at ways to perform vectorized operation on these objects.

