**PySDS Week 1 Lecture 2.**

# Learning Python: Iterations and control statements

# Section 1. For Loops

Last time we ended on collections. Often times we want to do something with every element in a collection. To do this we __iterate__ over a collection. To iterate means to start at the beginning of a collection, do something with the first value, then continue until we run out of values. Even though a dictionary is not ordered, it is still iterable, we just don't know the order in which we will iterate. 

We can imagine iterating over a list and then transforming the values in that list. Imagine you have a list of words and you want to find out the average word length ( $\bar{x}$, the __arithmetic mean__). You would first sum ( $\sum$ ) all the words $w$ and then divide this sum by the number of words $n$. In formulae we would say you sum all the elements: 

$$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} w_i $$

In this case, that big E ($\sum$) means add all the things afterwards in the specified range. The range is specified on the top (in this case $n$ or the number of words). The i = 1 means that we iterate one word after another, rather than skipping a word or taking every third (in which case it would show $i=3$). 

Now we can translate this into computer code:


~~~ python
words = ["apple","banana","chocolate","dumpling"]

word_length_sum = 0

for i in words:
    word_length_sum += i.length() # notice we use += here. 
    
n = len(words) 

average_length = word_length_sum / n
~~~ 

Above, the ```for i in words:``` is what defines the loop. It's pretty similar to English. ```i``` is our iterator. We could have named it anything, but traditionally when you don't care what your iterator is called, we use ```i``` for the first loop, ```j``` for an inner loop and ```k``` for a third inner loop. More than three inner loops and you really should rethink your program design. 

You can notice that ```i``` and ```words``` are variables because they are not given special colors. These colors help us understand which words are system words and which words are user-defined.  This is called __syntax highlighting__. 

We also use a shortcut above. Recall that in the past we have seen 

~~~ python 
result = x + y
~~~

But if we wanted to add something to an existing variable, like when we add some numbers to a ```total```, we can do this: 

~~~ python
total += y 
~~~ 

This is sometimes called __syntactic sugar__ as it is just a way to simplify code. In some languages, there's additional idiom of saying ```x++``` for ```x += 1```. This is where C++ gets its name. Python doesn't have this feature. It's just X += 1 for us. However, later we will see some very clever syntactic sugar in python...all in due course. 

Below we will practice the concept of a loop and see some of its features. 

In [None]:
# As a matter of style I like to define variables before doing my loops. 

food_list = ["apple","banana","chocolate","dumpling"]
word_length_sum = 0
n = len(food_list) 

for i in food_list:
    word_length_sum += len(i) 

average_length = word_length_sum / n
print(average_length)

## Iterating through a dictionary

It's pretty obvious how you iterate through a list. One element after the other. But dictionaries have keys and values. What are you iterating over then? It depends on what you ask of python. Try the default: 

In [None]:
food_dict = {'salmon': 'fish', 'enoki': 'mushroom', 'apple': 'Fruit', 'potato': 'Vegetable'}

for something in food_dict:
    print(something)


It appears it printed the keys. Now it seems that we can print the entire item by using ```food_dict.items()``` or the values by using ```food_dict.values()```. Observe: 

In [None]:
food_dict = {'salmon': 'fish', 'enoki': 'mushroom', 'apple': 'Fruit', 'potato': 'Vegetable'}

for value in food_dict.values():
    print(value)

print()

for item in food_dict.items():
    print(item)

### Slight diversion: the tuple

Notice how it prints the items as 
~~~ python 
('salmon','fish') 
~~~
What is that? Well, it's actually a new kind of collection. A __tuple__. A tuple (I pronounce it like couple) is basically a list except it's immutable and has ```()``` instad of ```[]```. So with a list we could go ```my_list[2] = "grasshopper"``` and it would replace the ~~second~~ third element in the list with grasshopper (assuming there's already a third element). With a tuple, you cannot. You can query for the third item in a tuple with ```my_tuple[2]``` but you can't assign a new value. See below (it gives an error).  

In [None]:
my_list = ["ant","ladybug","beetle"]
print(my_list[2])
my_list[2] = "grasshopper"
print(my_list[2])

my_tuple = ("ant","ladybug","beetle")
print(my_tuple[2])
my_tuple[2] = "grasshopper"
print(my_tuple[2])

One of the nice things about the fact that ```dict.items()``` returns a tuple is that we can actually make use of this in the for loop. Instead of ```for i in dict:``` where i would be (key,value) we can literally go ```for thekey,thevalue in dict``` and then do things with these values directly.

In [None]:
food_dict = {'salmon': 'fish', 'enoki': 'mushroom', 'apple': 'fruit', 'potato': 'vegetable'}

for key,value in food_dict.items():
    print(key, "is a", value)
    
print()
# Reminder: We don't need to use the words 'key' and 'value' 
for food,foodtype in food_dict.items():
    print(food, "is a", foodtype)

# Second 2. If statements and boolean logic. 

Boolean logic is very useful and really important to computation. If a language can implement the basics of ```not```, ```and``` and ```or``` it can do pretty much any computation with enough memory and time. We use boolean logic to evaluate the truth of a statement. Then if a statement is true, we will ask the computer to do something. We can also ask it to do something else if the statement is false. 

In python these are the boolean operators: 

- ```==``` is used for comparison. Does X equal Y? ```x == y```
- ```and``` is used to ask if two things are both true. ```x and y```
- ```or``` is used to ask if either thing is true. ```x or y```
- ```not``` as well as ```!``` are used for not. 
- ```>``` is used for left side greater than right side. 
- ```<``` is used for left side less than right side. 

In [None]:
x = 4
y = 5
z = 5 

print( x == y )
print( y == z )
print( x == y )
print( not (x == y) ) 

Python does comparisons all over the place. Any time you use one of the operators it will evaluate them. But sometimes you want to use these operators to __control the flow__ of a program. For example, if you get some data and it includes a URL you might want to do something with that URL, whereas if it doesn't contain a URL you might want to do something else. For this we use ```if``` and ```else``` statements.

In [None]:
x = 5 
y = 2 
z = x + y 

if z == 7: 
    print("Yes, Z equals 7.")
else:
    print("My math is not good today.")

You can have nested statements with ```elif``` which is a contraction of ```else if ```.

In [None]:
x = 5 
y = 2 
z = x + y 

if z == 10: 
    print("Hmm...should this be? ")
elif z == 7:
    print("Okay, I was worried for a second there.")
else: 
    print("I give up.")

## Important notes on comparisons 

### Note 1. You can compare strings. 
String encodings have code points. These are used to evaluate whether one string is greater than another. So you can ask if 'a' > 'b'. The behavior can be a bit unexpected so I would only use this with caution. For example, what's greater than A? a or B? 

In [None]:
# String comparisons 
print("Is a > b?")
print('a' > 'b')

print("\nIs A > b?")
print('A' > 'b')

print("\nIs a > A?")
print('a' > 'A')

### Note 2. Zero is False, One is True and the rest don't evaluate well

This is the same for a great deal of programming languages. If a variable is 0 it will return false, but if it is any other value positive value it will return true. Negative numbers evaluate to neither positive nor negative. 


In [None]:
print("What evalues to 'True'?")
print("-1\t", -1 == True)
print("0\t",   0 == True)
print("1\t",   1 == True)
print("2\t",   2 == True)


print("\nWhat evalues to 'False'?")

print("-1\t", -1 == False)
print("0\t",   0 == False)
print("1\t",   1 == False)
print("2\t",   2 == False)

### Note 3. Not everything that is empty...is False.

There are a number of ways of expressing _nothing_ in python. There's the notion of a variable being ```None``` or empty. There's a numeric variable that isn't actullay a number (```nan```, for Not A Number for things that don't compute or are missing), there's the empty string ``` "" ``` and I'm sure more. Be extra careful when evaluating these. For example, None is not equal to false, but you could still use it that way sometimes. 


In [None]:
import numpy as np # The python numeric package 'numpy'; we will be using this more later.

print(np.nan == True)
print(np.nan == False)

if np.nan: 
    print("Nan is True")
else: 
    print("Nan is False")

print()
print(None == True)
print(None == False)

if None: 
    print("None is True")
else:
    print("None is False")

print()
print("" == True)
print("" == False)

if "": 
    print("Empty quotes are True")
else:
    print("Empty quotes are False")

print()
print(1 == True)
print(1 == False)
    
if 1: 
    print("One is True")
else:
    print("One is False")

Yeah, it is a bit confusing. This is more to just remind you to be careful. 

# Section 3. Combining Loops and control statements

Often we want to do something under certain conditions. For example, you might loop through a list of email addresses and add the domain name (e.g., gmail.com, yahoo.com, oii.ox.ac.uk, etc..) to a set of domain names if it hasn't appeared before. This means that within each loop you want to include an if statement. 

Doing this might involve looping through an awful lot of data and you might also want a way to report on progress along  the way. so for example, if you are examining a million email, then to report every 20,000 email just too remind you that the program isn't stuck in a loop. Here we introduce a function called "enumerate". This function spits out a number every time you go through a loop. See these two examples below: 

In [None]:
food_list = ["apple","banana","chocolate","dumpling"]

counter = 0
for i in food_list:
    print("Food number",counter,"is",i)
    counter += 1
    
print()

for c,i in enumerate(food_list):
    print("Food number",c,"is",i)


# Section 4. List comprehensions

The list comprehension is literally my favorite syntactic sugar in python. You will encounter it all over the place in my code and in other people's code, so it is worth understanding it now. It also will help us think about operating on a full list at a time. This is important as we will be doing this a lot with data later on, by for example, transforming a column of data. 

The list comprehension is very much like a for loop but is very condensed. 

Here is an example in the traditional way:

~~~ python 
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    new_list.append(i)
~~~

Now here it is as a list comprehension: 

~~~ python
my_list = ["allspice","basil","cumin"]

new_list = [i.upper() for i in my_list]  
~~~

We have condensed it to one line. But it gets better. You can append a control statement at the end, so it will only include that value if if it meets the condition. For example, only do something if the words are of length 5.

~~~ python 
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    if len(i) == 5:
        new_list.append(i)
~~~

Now here is the same outcome using a list comprehension: 

~~~ python
my_list = ["allspice","basil","cumin"]

new_list = [i.upper() for i in my_list if len(i) == 5]  
~~~

The second way is much more condensed and yet it still reads in an intelligible way. Try them out below: 

In [None]:
my_list = ["allspice","basil","cumin"]

new_list = []
for i in my_list:
    i = i.upper()
    if len(i) == 5:
        new_list.append(i)

print(new_list)
my_list = ["allspice","basil","cumin"]

new_list = [i.upper() for i in my_list if len(i) == 5]  

print()
print(new_list)


# Section 5. Building your own functions I 

The list comprehension can tighten a for loop, but it has the drawback that you can only apply one function to a variable. In the examples above, it was ```.upper()```. What if you wanted to do two things, like square a number and format it as a strong before returning it? Well, this is not exactly a problem - you simply have to __write your own function__ that does both these things and then returns the value that we wanted. 

Building your own functions is a crucial part of coding. Without functions, you are left with code that is literally just one command after another. With functions you can abstract away the common parts and just send the novel parts to the function as __arguments__. 

We have already seen a few functions such as ```print()``` and ```len()```. Below is an example of some repetitive code and then is an example of a function that does this repetitive job at once. As I mentioned above, if you are going more than three loops deep (some would say one loop deep) then you're doing it wrong. One of the things you ought to do rather than loops within loops is to call a function. How to do that will become clear as we learn more about these crucial parts of programming. 

In [None]:
x = 5
if x %2 == 1: 
    y = x * 2 
else:
    y = x

x = 8
if x % 2 == 1: 
    y = x * 2
else:
    y = x 
    
x = 10
if x % 2 == 1: 
    y = x * 2 
else:
    y = x
    
    
def doubleIfOdd(input):
    if input % 2 == 1:
        return input * 2
    else:
        return input

x = 6
y = doubleIfEven(x) 

This will serve as a tutorial to functions. It is really, really superficial. Later we will make these functions more complicated but for now, let's stick with the bare minimum. 

A function is a piece of code with a name, a place for some input, a place for some calculations and a means to return the results of the calculations back to where ever the function was invoked. 

Imagine I have a function called ```doubleTheNumber``` that literally just takes a number and doubles it.

~~~ python
x = 5
y = doubleTheNumber(x) 
print(y)
> 10
~~~

Now to build that function we need to do the four things specified above: 
- Name
- Inputs
- Calculations
- Outputs

~~~ python
def doubleTheNumber ( input_number): 
    output_number = input_number * 2
    return output_number
~~~ 

This code above has all four things things we wanted. Of course, we could have just taken the input and went ```input * 2``` but that's not how we learn how to use functions. We can do lots of things inside a function. This way we can then call that function inside a list comprehension. In the example abvoe we did not just double the number, but doubled the number if the number was odd. That way all the numbers that get returned are even.

Below we will use that function inside of a list comprehension.

In [None]:
def doubleTheNumberIfEven ( input_number ): 
    if input_number%2==0:
        return input_number * 2
    else:
        return input_number
    
numbers = [1,4,6,7,9,14,17]

new_numbers = [doubleTheNumberIfEven(i) for i in numbers]

print(new_numbers)

## Important notes on functions 
Functions are a huge topic. These will not be the last notes you'll need. 

### Note 1. Variables have a 'scope'. 
A variable that is created inside of a function is not the same as the one created outside of that function even if they have the same name. This is because the variable inside the function is a __local__ variable. Variables created in jupyter are typically treated as __global__ variables if they are created in a cell but not in a function. To be global means that they can be used anywhere. Local variables are created and destroyed within their local context. You can watch this behavior with a code snippet. 

In [4]:
# Local / Global scope example 1: Variable in the function stays in there.

x = 4 
print( "Before the function",x)

def multiplyTheValue(input_number):
    x = input_number * 2
    print("Inside the function",x)
    return x 

output_number = multiplyTheValue(x)

print("After the function",x)

Before the function 4
Inside the function 8
After the function 4


But ```x``` wasn't the argument, input_number was. So what if we change input number inside the function? 

In [3]:
# Local / Global scope example 2: Argument sent to function doesn't escape the function.

x = 4 
print("Before the function",input_number)

def multiplyTheValue(input_number):
    x = input_number * 2
    input_number = 33
    print("Inside the function",input_number)
    return x 

output_number = multiplyTheValue(x)

print("After the function",input_number)

NameError: name 'input_number' is not defined

It seems that it is still the case. We changed ```input_number``` to 33 inside the function. Yet, when we print it outside of the function, it throws an error. This is because we created ```input_number``` inside the function, so it isn't available outside the function unless we explicitly make it available (which is often a very bad idea that leads to all sorts of unexpected issues). 

In [5]:
# Local / Global scope example 3: Casting a variable as global makes it available outside the function.

x = 4
print("Before the function",x)

def multiplyTheValue(input_number):
    global x
    x = input_number * 2
    
    input_number = 33
    print("Inside the function",x)
    return x 

output_number = multiplyTheValue(x)

print("After the function",x)

Before the function 4
Inside the function 8
After the function 8


In this third example, we can see that when we declare x is a global variable inside the function, that value then becomes the value outside of the function. We double ```x``` inside the function and then later when we print x it is no longer 4, it retains the value it had inside the function. 

### Note 2. There are all kinds of ways of passing data to a function. 


Functions can take more than one input. Here are some things we can do with inputs: 
1. Just give it a name. 

    ~~~ python 
    def example( just_name):
        return just_name

    print ( example1("some data") )
    ~~~

2. Give it a name and a default value.

    ~~~ python
    def example2( just_name, name_default = True ):
        if name_default:
               return just_name
        return "Something else"

    print ( example2("some data") )
    ~~~

3. Leave it ambiguous as a list of values. You'll have to query these in order.

    ~~~ python
    def example3( just_name, **args):
        if len(args) > 0:
            for i in args: print(i)

    print ( example3("some data","Maybe","more data") )
    ~~~

4. Leave it ambiguous as a dictionary of variable names and values. You'll have to query these by key.

    ~~~ python
    def example4( **kwargs):
        if len(args) > 0:
            for i,j in kwargs: print(i)

    print ( example3(var1="some data",var3="Maybe",var2="more data") )
    ~~~

[This page from ProTech](https://www.protechtraining.com/content/python_fundamentals_tutorial-functions) gives a nice simple overview of these sorts of arguments. Below you can try these out for yourself.


In [18]:
# Example 1. Just a single positional argument
def example1( just_name):
    print( just_name)

example1("example 1 argument")

example 1 argument


In [20]:
# Example 2. A positional argument with a default value
def example2( just_name, name_default = True ):
    if name_default:
        print(just_name)
    else:
        print("name_default was false")

example2("Are the defaults true?")

Are the defaults true?


In [21]:
# Example 3. Postional arguments passed but not defined ahead of time
def example3( just_name, *args):
    if len(args) > 0:
        for i in args: print(i)

example3("some data","Maybe","more data")

Maybe
more data


In [22]:
# Example 4. Keyword arguments passed but not defined ahead of time
def example4(**kwargs):
    if len(kwargs) > 0:
        for i,j in kwargs.items(): print("var name:",i,"\tvalue:",j)

example4(var1="some data from var1",var3="Maybe it's var3?",var2="var2's valuedata")

var name: var1 	value: some data from var1
var name: var3 	value: Maybe it's var3?
var name: var2 	value: var2's valuedata
{'var1': 'some data from var1', 'var3': "Maybe it's var3?", 'var2': "var2's valuedata"}


### Note 3. A function always returns, but it might be nothing at all.

Your function always stops at the return statement. You can have multiple return statements for different conditions (like saying if...return one thing and else...return another). After the return statement, the rest will not be evaluated by the program. But if your statement does not have a return, python will still return ```None``` (which if you remember from above evaluates to ```False```). Just try it for yourself. 

In [24]:
def noReturn():
    pass

print( noReturn())

if noReturn(): 
    print("Did it work?")
else:
    print("Oh right, None evaluates to false.")

None
Oh right, None evaluates to false.
