<center> 

# Functions, Conditionals, and Iteration

## Dr. Lange - University of Chicago
## Data 11800 - Winter 2024 

<img src="https://raw.githubusercontent.com/amandakube/Data118LectureImages/main/UChicago_DSI.png" alt="UC-DSI" width="500" height="600">
    
</center>

In [4]:
#import the basics!

import numpy as np
import pandas as pd

## Today

* Functions
* Conditionals
* Iteration

## Goals: 

- Functions 
    
    - User defined functions

    - How to apply them to DataFrames
    
- Conditionals

    - if statements
    
- Iteration

    - for loops
 

## We have seen built-in functions

| Built-in Python Functions     | Description |
| ----------- | ----------- |
| print(...)      | Print function: Returns the output. |
| max(...)      | Maximum function: Returns the maximum of the given inputs. |
| min(...)   | Minimum function: Returns the maximum of the given inputs.  |
| abs(...)     | Absolute value function: Returns the absolute value of the given input. |
| round(...)   | Rounding function: Returns the rounded input. |
| len(...)   | Length function: Returns the length of given input. |
| type(...)   | Data Type function: Returns the datatype of the input. |

We may want to repeat some process multiple times but there is no built-in function to rely on... 


## We can write our own functions!

Replacing multiple lines of code with a function allows seemless reuse of a process or computation.


<img src="https://github.com/SusannaLange/Data_118_images/blob/main/DSSI_images/function_photo.png?raw=true" width="800">

Photo Source: Data8 Textbook

## Let's explore this function

In [None]:
def double(x):
    """Double the input x"""
    
    y = 2*x
    
    return y   

Note, we need the return statement, otherwise nothing will be returned!

The return statement

 
 - Immediately terminates the function 
 
 
 - Allows for the output to be stored
 

We haven't really computed anything yet, we've just **defined** the new function

We've defined this function with the intent of a numerical input...but we can input whatever we want.

In [None]:
double(3)

In [None]:
double(4.5)

In [None]:
double('data')

## We can even double an expression or an array

In [None]:
double(np.array([1,2,3]))

In [None]:
double(2*3)

### Importance of docstring!



Documentation on what your function does 


It can contain: 

 - arguments
 
 
 - function’s purpose
 
 
 - information about return values

In [None]:
help(double)

## Return vs print Discussion

<mark style="background:Thistle;color:black"> What's the difference between print and return?! </mark>

Answer here

Consider the following two functions:

In [2]:
def convert_temp(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    return np.round(celsius,decimals=2)

In [1]:
def convert_temp2(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    print(np.round(celsius,decimals=2))

What is the difference between 'convert_temp' and 'convert_temp2'?
Let's investigate.

In [5]:
temp = convert_temp(68)

Hi, I'm in the function and celsius = 20.0


In [6]:
temp2 = convert_temp2(68)

Hi, I'm in the function and celsius = 20.0
20.0


In [7]:
print(temp)

20.0


In [8]:
print(temp2)

None


## Name Scoping - what meaning do names have where? 

### INDENTATION matters in Python...

**Important:** Variables defined inside function bodies are not visible outside the function

In [None]:
def convert_temp(fahrenheit):
    """This function converts fahrenheit to celsius"""
    
    celsius = 5/9*(fahrenheit-32)
    print("Hi, I'm in the function and celsius =", celsius)
    
    return np.round(celsius,decimals=2)

In [None]:
celsius

Where different variables are defined is called *scoping* in programming languages.

In general, you want to be thoughtful with your variable and function naming

Variables defined outside of functions are known as **global variables**, while variables inside of functions are **local variables**. In general you should refrain from directly referring to global variables inside of a function. However, it is possible.

In contrast, once a function is called, any local variables are gone forever. (Unless you return them).

In [None]:
#In fact, you can reuse the name outside of the function and it is a different variable
celsius = "Hi, I'm a string!"

print('The result of the function is:', convert_temp(68), '\n')
print('The value of celsius outside the function is:', celsius)

### Now that we've defined our own function...

In [None]:
help(convert_temp)

We can call the help (built-in function) on any function we create!!! It gives us the docstring we wrote.

### Let's try another example. Here we write a function that takes two values as input (two arguments).

In [None]:
def register_class(Student_ID, class_name):
    """Takes Student_ID and class_name as input,
    Returns a message about registration"""
    
    message = "Thank you student with ID:" \
                + str(Student_ID) + " for registering for "\
                + str(class_name)
    
    return message

In [None]:
register_class(12345, 'Calculus')

In [None]:
register_class(12345, 'Introduction to Statistics')

### Note order does matter when calling a function!!

In [None]:
register_class('Calculus', 12345)

We can also call the function by being explicit about the arguments

In [None]:
register_class(Student_ID = 12345, class_name = "Calculus")

If we do this, we can switch around the order in the argument.

In [None]:
register_class(class_name = "Calculus", Student_ID = 12345)

Note: the function as written *requires* two arguments

In [None]:
register_class(class_name = "Calculus")

## Making default arguments in our functions!

How do we do this for our own functions?

By assigning the variable in the function definition.

In [9]:
def register_class(Student_ID, class_name = 'Calculus'):
    """Takes Student_ID and class_name as input,
    Returns a message about registration"""
    
    message = "Thank you student with ID:" \
                + str(Student_ID) + " for registering for "\
                + str(class_name)
    
    return message

Now we can call this function with one or two arguments.

In [None]:
register_class(12345)

In [None]:
register_class(12345, 'math')

Calling the help function on your function allows you to see the default settings

In [10]:
help(register_class)

Help on function register_class in module __main__:

register_class(Student_ID, class_name='Calculus')
    Takes Student_ID and class_name as input,
    Returns a message about registration



### <mark style="background-color: Thistle"> Code comprehension - Multiple Choice</mark>

## Now we can build our own functions, but how do we apply them to DataFrames?

Let's define a function that replaces any numbers greater than 100 with 100

In [1]:
def cut_off(age):
    '''returns min of input and age'''
    return min(age, 100)

In [2]:
print('adjusted age of 70 is:', cut_off(70))
print('adjusted age of 170 is:', cut_off(170))

adjusted age of 70 is: 70
adjusted age of 170 is: 100


Create a DataFrame

In [5]:
ages = pd.DataFrame(
    {'Person': np.array(['A', 'B', 'C', 'D']),
    'Age': np.array([63, 101, 99, 102])}
)
ages

Unnamed: 0,Person,Age
0,A,63
1,B,101
2,C,99
3,D,102


### Using "apply()" on a single column

Provide the function name and the column on which to apply the function

`apply()` returns a Series

In [6]:
ages['Age'].apply(cut_off)

0     63
1    100
2     99
3    100
Name: Age, dtype: int64

Note, it is often good practice to retain the original data when you are modifying it for further analysis (so you don't lose data that may be useful in the future and you have a record of what you did.)

If you want to augment the table you need to do so explicitly

In [7]:
ages

Unnamed: 0,Person,Age
0,A,63
1,B,101
2,C,99
3,D,102


In [8]:
ages['AgeCutoff'] = ages['Age'].apply(cut_off)
ages

Unnamed: 0,Person,Age,AgeCutoff
0,A,63,63
1,B,101,100
2,C,99,99
3,D,102,100


When applying functions to dataframes it is sometimes useful to use "anonymous" (unnamed) functions.   

These are called **"lambda functions".**

Note, lambda functions are also handy for impressing friends and family!


Here's the above example done with a lambda function. Note this has the advantage of not having to save the cutoff function as above.

In [None]:
ages.Age.apply(lambda x: min(x,100))

In the above example, the `x` variable is bound to each value in the `Age` column because we called it on `ages.Age`

Note there's really very little different between lambda functions and the ordinary functions we've been defining. In fact, one could theoretically define a function using lambda functions.

In [None]:
double = (lambda x: x*2)
double(4)

Though you should avoid doing this in part because often you will use a function many times.

One limitation lambda functions do not have is single inputs.

In [None]:
mult = (lambda x,y: x*y)
mult(3,7)

With lambda functions we can also have the variable bind to entire rows.  This is useful when your functions need to access multiple columns.

Note that we have to use `axis="columns"` (or `axis = 1`) so that the lambda function is applied to each row.

In [None]:
ages['NewDiff'] = ages.apply(lambda x: x.Age - x.AgeCutoff, axis='columns')                         
ages

In [None]:
ages['Average'] = ages.apply(lambda x: min(x.Age, x.AgeCutoff), axis='columns')                         
ages

Description of "axis" from the python documentation

**axis**: *{0 or 'index', 1 or 'columns'}, default 0*
<br>
Axis along which the function is applied:
- 0 or 'index': apply function to each column.
- 1 or 'columns': apply function to each row.

### <mark style="background-color: Thistle"> Code comprehension - Your Turn</mark>

Define a function that takes two inputs x and y and return the value 
$$x^2+5y$$

The default value of y should be -1.

Call this function my_function.

Test your function by calling my_function(2,6) and my_function(5)

In [None]:
#your function here

## Conditional Statements

There are some situations where we want to proceed with a task or perform an action dependent on whether a certain condition, or perhaps conditions, have been satisfied.

Conditional statements have the form of an "if-then" statement *if* statement `P`, the hypothesis, occurs, *then* statement `Q`, the conclusion, also occurs.


How do we write this in code? We utilize the `if` expression in Python.

The statement below will execute the indented block *conclusion*, if the *hypothesis* is true; otherwise, if *hypothesis* is not true, then the indented block is ignored:

```python
if hypothesis:
    conclusion
```

In [None]:
x=2

if x > 0:
    print('Positive')

In [None]:
x = -6
if x > 0:
    print('Positive')

If we want this to output something when we enter a negative number we can do the following:

In [None]:
x = -6

if x > 0:
    print('Positive')
    
if x < 0:
    print('Negative')

What happens in the above code block, the first if condition is tested, and then the next if condition is tested.

### We can use `elif` statement instead.

Used in combination with an `if` expression, an `elif` statement is only checked if all previous statements evaluate to False. This allows us to check if our first condition is true, otherwise we move to evaluate the truth value of the next statement.

In [None]:
if x > 0:
    print('Positive')
    
elif x < 0:
    print('Negative')

We haven't addressed all possibilities! What happens when we enter 0? Let's redefine the function to have an option in each case.

In [None]:
x=0

if x > 0:
    print('Positive')
    
elif x < 0:
    print('Negative')    

To address this we can add another condition.

In [None]:
if x > 0:
    print('Positive')
    
elif x < 0:
    print('Negative')  
    
elif x == 0:
    print('Neither positive nor negative')

## Else statement

In an `elif` statement each condition is checked until a condition is true. Often we can replace the last `elif` statement with an `else` statement, whose body will be executed only if all the previous comparisons are false.

At this point, all other conditions have been evaluated and none executed. Thus, there is no condition or hypothesis associated with the `else` statement. If it is reached, the conclusion of the `else` statement is executed.

In [None]:
if x > 0:
    print('Positive')
    
elif x < 0:
    print('Negative')  
    
else:
    print('Neither positive nor negative')


In [None]:
x = 0  
if x > 0:
    print('Positive')
    
elif x < 0:
    print('Negative')  
    
else:
    print('Neither positive nor negative')


## The General Format


```python
if hypothesis_1:
    conclusion_1
else:
    conclusion_2
```
    
Or we could have $n+1$ conclusions for a chosen $n$ in which case we have the format:
    
```python
if hypothesis_1:
    conclusion_1
elif hypothesis_2:
    conclusion_2
... 
elif hypothesis_n:
    conclusion_n
else:
    conclusion    
```    

BE CAREFUL!!!  Since the `else` statement is executed without checking a condition, you want to be absolutely certain that all desired possibilies are accounted for in the previous condition(s).

In [None]:
def sign(x):
    '''returns if a number is positive or negative'''
    if x > 0:
        return('Positive')
    
    elif x < 0:
        return('Negative')  
    
    else:
        return('Neither positive nor negative')

In [None]:
ages['Sign'] = ages['Age'].apply(sign)                         
ages

## For Statements

Repeat code a specified number of times (called iteration)

`for` iterates over the contents of a list (or an array, or a series...)
Remember - indentation matters...


```python
for item in sequence:
    action    
```

And for each of the *items* in *sequence* the indented body of the `for` statement is executed (here "action").

 – or the indented body is "looped" – 

## An example

In [None]:
list_of_things= ["red", 2, 7.3, "dog"]

for element in list_of_things:
    print(element)

We chose the *iterator* element, but we could choose anything we want.

In [None]:
for i in list_of_things:
    print(i)

### What can we iterate over?

Some common things:

- list
     - `['item1', 'item2']`
 
 - array
     - `np.arange(10)`
     
     - `np.array([2,4,6,8])`
     

In [None]:
for i in [1,2,3]:
    print(i)

We could also specify a *range* of values

In [None]:
for i in np.arange(3): 
    print(i)

In [None]:
for i in np.arange(1,3): #we could choose the starting and ending values here too.
    print(i)

The iterator, i, does not have to appear in the body of the for statement, or for loop.

In [None]:
for i in np.arange(3):
    print('hello')

Often for loops are useful just to repeat something a specified number of times.

## Nested *for* loops

Suppose we have two lists, and we want to pair every element in `list_1` with every element in `list_2`. 

This takes the following form:

```python
for item_1 in list_1:
    for item_2 in list_2:
        print(item_1, item_2)
```

We *could* write out by hand all the possible combinations pairing elements of `list_1` with `list_2` … or, we could use nested `for` statements to systematically consider each element in `list_2` for each element in `list_1`.

### Example



Suppose we want to find all possible combinations of the below lists.

In [None]:
my_animals = ['dog', 'cat', 'cow']

adjectives = ['hairy', 'scary', 'cute']

In [None]:
for adj in adjectives:
    for animal in my_animals:
        print(adj, animal)

Order of the inner and outer *for* loops does matter!

In [None]:
for animal in my_animals:
    for adj in adjectives:
        print(adj, animal)

## We can use this to count things

In [None]:
count = 0

for adj in adjectives:
    for animal in my_animals:
        if animal.startswith('c'):
            print(adj,animal)
            count = count + 1
        
print(count)

## Our Python repertoire:

* Arithmetic Operations
* Comparisons
* Assignment Statements
* Call Expressions
* Arrays
* Lists
* DataFrames
* Groupby
* Pivot_table
* Merge
* Functions
* Conditionals
* Iteration

### Optional extras

#### That is,  some additional notes (optional...not tested on, but cool)

Please keep reading if you want to know some extra material on functions and for loops and if statements!

We can use the `break` option to stop a `for loop`.

In [None]:
for x in range(6): # 0 to 5
    if x == 3: 
        break
    print(x)

We can use the `continue` option to stop the current iteration of the loop and continue with the next iteration

In [None]:
list_of_things= ["red", 2, 7.3, "dog"]

for element in list_of_things:
    if element == 7.3:
        continue
    print(element)

If we want to ensure a user inputs the correct (or expected) datatype into our function.


We could use `assert` statement. This doesn't check but rather asserts a condition is true.

Useful because it encodes in the error message.

In [None]:
def double(x):
    """Double the input x"""
    assert type(x) == int or type(x) == float
    y = 2*x
    
    return y 

In [None]:
double(5)

In [None]:
double(4.9)

In [None]:
double('data')