# Class 16: For loops and writing functions

In this notebook we will continuing learning some of the fundamentals of Python, namely using for loops, conditional statements and writing functions. This material will be useful for analyzing data, and more generally for any programming you do in the future. 

In [94]:
import YData

# YData.download.download_class_code(16)       
# YData.download.download_class_code(16, TRUE) # get the code with the answers 
# YData.download.download_homework(6)  # download the homework 


YData.download.download_data("States_shapefile.geojson")
YData.download.download_data("state_demographics.csv")

YData.download_data('daily_bike_totals.csv')

The file `States_shapefile.geojson` already exists.
If you would like to download a new copy of the file, please rename the existing copy of the file.
The file `state_demographics.csv` already exists.
If you would like to download a new copy of the file, please rename the existing copy of the file.
The file `daily_bike_totals.csv` already exists.
If you would like to download a new copy of the file, please rename the existing copy of the file.


If you are using google colabs, you should also uncomment and run the code below install the YData package and to mount the your google drive.

In [95]:
# !pip install https://github.com/emeyers/YData_package/tarball/master
# from google.colab import drive
# drive.mount('/content/drive')

In [96]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


## 1. Warm-up exercise on mapping

Let's do a little more practice creating maps but creating a choropleth map of the percentage of people who are over 64 in each state. 

Data with the state boundaries and the demographic information are loaded below. Please do the following steps:

1. Join/merge the map data with the demographic data to create a new geoDataFrame called `state_map_demo`.
2. Create a new column in `state_map_demo` called `percent_over_64` that has the percentage of people over 64 (i.e., we are normalizing our choropleth map).
3. Plot the choropleth showing the percentage of people over 64. Does this map look like you would expect? 


In [97]:
import geopandas as gpd

# The state boundaries data
state_map = gpd.read_file("States_shapefile.geojson")
print(state_map.crs)
state_map.head(3)

EPSG:4326


Unnamed: 0,FID,Program,State_Code,State_Name,Flowing_St,FID_1,geometry
0,1,PERMIT TRACKING,AL,ALABAMA,F,919,"POLYGON ((-85.07007 31.98070, -85.11515 31.907..."
1,2,,AK,ALASKA,N,920,"MULTIPOLYGON (((-161.33379 58.73325, -161.3824..."
2,3,AZURITE,AZ,ARIZONA,F,921,"POLYGON ((-114.52063 33.02771, -114.55909 33.0..."


In [98]:
# The state demographic information 
state_demographics = pd.read_csv("state_demographics.csv")
state_demographics["State"] = state_demographics["State"].apply(str.upper)
state_demographics.head(3)

Unnamed: 0,State,under_5,over_64,bachelors_degree,total
0,ALABAMA,295811.997,741954.681,1095959.202,4849377
1,ALASKA,54518.168,69252.808,202601.3,736732
2,ARIZONA,430814.976,1070305.956,1810769.196,6731484


In [99]:
# Join/merge the state map with the state demographic data




In [100]:
# Add a new column called "percent_over_64" to "normalize your map"




In [101]:
# Plot the choropleth map




## 2. Loops

Loops allow us to repeat a process many times. They are particularly useful in conjuction with lists to process and store multiple values. 


In [102]:
# Loop over items in a list
a_list = ["first", "second", "third", "forth"]





In [103]:
# Loop over numbers using the range() function





In [104]:
# Can you print the squares of the numbers from 1 to 6? 



We can use a loop to build up values in a list...

In [105]:
# Create a list that has the squares of the numbers 1 to 6
# hint: the .append() method will be useful

my_squares = []








### Exercise 1

Can you use loops to sum the numbers 1 to 10? 

Or, to use mathematical notation, can you compute $\sum_{i=1}^{10} i$ ?


In [106]:
# Sum numbers from 1 to 10
my_sum = 0






### Enumerate

We can use enumerate(my_list) to get both values from a list and sequential index numbers.


In [107]:
# We can use enumerate(my_list) to get both values from a list 
# and sequential index numbers

a_list = ["first", "second", "third", "forth"]






### zip function

We can use `zip(list_1, list_2)` to get values from two lists in a for loop. 


In [108]:
list_1 = ["a", "b", "c"]
list_2 = ["x", "y", "z"]







### Exercise 2

The code below extracts two lists that have the the high and low temperatures from 2014 in NYC. 

Please use for loops to create a list called `temp_range` that has the temperature range (high - low temperature) for each day.

There are a few ways to do this, so see if you can come up with a solution that works. Try to do this without using numpy, and once you have a solution, see if you can get the same result using numpy. 



In [109]:
import pandas as pd
import matplotlib.pyplot as plt

bikes = pd.read_csv("daily_bike_totals.csv", parse_dates = [0])
bikes_2014 = bikes.query("date > '2013-12-31'").query("date < '2015-01-01'")

max_temps = bikes_2014["max_temperature"].to_list()
min_temps = bikes_2014["min_temperature"].to_list()

print(max_temps[0:5])
print(min_temps[0:5])

[33.08, 33.08, 18.14, 29.12, 39.92]
[24.26, 18.14, 9.14, 8.24, 27.14]


In [110]:
# Create a list called "temp_range" that has the temperature range for each day in 2014...

# Start with an empty list
temp_range = [];








In [111]:
# Another solution using for loops (i.e., can you do it with and without using the zip() function)

# Start with an empty list
temp_range2 = [];









In [112]:
# Can you do the same calculation using numpy arrays?  
# Which is easier? 
import numpy as np






## 3a. Review of comparisons

Let's do a very quick review of mathematical and string comparisons in Python.

In [113]:
# Basic math comparison



In [114]:
# Checking the type of a basic math comparison



In [115]:
# We use == to compare whether two items are equal (not 3 = 3)



In [116]:
# We can use the `and` keyword to combine multiple logical statements 




In [117]:
# We can also use the `or` keyword to combine multiple logical statements 



In [118]:
# We can also compare strings



In [119]:
# Stings compare alphabetically



In [120]:
# Shorter words occur earlier than longer words that have matching letters



## 3b. Conditional Statements 

Conditional statements allow use to excecute particular pieces of code when certain conditions are met; i.e., they execute a piece of code when a Boolean value is True. 

Let's explore!

In [121]:
num_semesters = 7











In [122]:
# Let's look at a conditional statement in a loop


















## 4. Functions!

We have already used many functions in this class that are built into Python or are imported from different modules/packages. 

Let's now write some new functions outselves! 


In [123]:
# Write a function that doubles a value




In [124]:
# Try the function out 1



In [125]:
# Try the function out 2



In [126]:
# Try the function out 3




In [127]:
# Try the function out 4



In [128]:
# Will this work?



In [129]:
# Will this work? 




In [130]:
# What about this? 




In [131]:
# "local scope"



In [132]:
# Let's set x to 17 



In [133]:
# Double 2



In [134]:
# Did x change?



In [135]:
# What if we double x? 



In [136]:
# Did x change?



### Function extras: docstrings

When writing functions that will be used by other people (or your future self) it is important to write some documentation describing how your function works. In Python, this type of documentation is called a "docstring". The text in a docstring is in triple quotes which allows for multi-line comments.

There are a number of [convensions](https://peps.python.org/pep-0257/) surrounding on how to write a docstring, including: 

- The doc string line should begin with a capital letter and end with a period.
- The first line should be a short description.
- If there are more lines in the documentation string, the second line should be blank, visually separating the summary from the rest of the description.
- The following lines should be one or more paragraphs describing the object’s calling conventions, its side effects, etc.


In [137]:
def double(x):
    """Take a number and doubles it.
    
    Parameters:
    x (int): A number that should be doubled
    
    Returns:
    int: The numbers that is doubled
    
    """
    return x * 2

In [138]:
# View the docstring



## Multiple arguments and default values

We can also write functions that take multiple arguments and we can set particular arguments to have default values that are used if no value for an argument is given. 

Let's explore this...



In [139]:
# Define powerit function 




In [140]:
# Use the function 




In [141]:
# Try the function with a single argument



In [142]:
# Set a default argument value




In [143]:
# Try the new function with a single argument




In [144]:
# Try the function with two arguments




![higher powers](powers.jpg)

## Multiple return values 

We can also write funciton that can return multiple values. We can do this by returning a tuple. 

Recall, tuples are a basic data structure in Python that is like a list. However, unlike lists, elements in tuples are "immutable" meaning that once we create a tuple, we can not modify the values in the tuple.

We create tuples by using values in parentheses separated by commas:

`my_tuple = (10, 20, 30)`

Let's explore tuples now... 

In [145]:
# Recall tuples
my_tuple = (10, 20, 30)

my_tuple

(10, 20, 30)

In [146]:
# We can access elements of the tuple using square brackets (the same as lists)



In [147]:
# Unlike a list, we can't reassign values in a tuple 



In [148]:
# We extract values from tuples into regular names using "tuple unpacking"




Let's create a function `power23(x)` that returns a number squared and a number cubed. 

In [149]:
# Create a function that returns a value squared and cubed




In [150]:
# We can use "tuple unpacking" to assign both outputs to different names




## Passing functions as input arguments

We can also pass functions as input arguments to other functions. Let's explore this...

In [151]:
# Apply the np.mean function to my_array




In [152]:
# Apply the np.sum function to my_array




In [153]:
# Apply power23 to my_array




## 5. Additional practice writing functions

As additional pracice, let's write a function that will mimic flipping coins. This function will be useful when we start talking about statistical inference. 

In particular, let's write a function called `flip_coins(n, prob)` which will simulate flipping a coin `n` times where:
- `n` is the number of times we have flipped the coin
- `prob` is the probability that each coin flip will return "head"

The function should return the number of "heads" that occurred from flipping the coin `n` times; i.e., it should return a number between 0, which means no heads occurred, and `n` which means a "head" occurred on every flip. 

When writing functions, it is often useful to write the bulk of the code outside of a function and then turn in into a function by wrapping your code in a `def` statement. Let's go through a few steps of writing this function now. 


#### Step 1: Generating random numbers between 0 and 1

We can use the numpy function `np.random.rand(n)` to generate `n` random numbers. Please create a name called `n` and set it equal to 500 to simulate 500 random coin flips. Then use the name `n`, along with the `np.random.rand(n)` to generate 500 random numbers between 0 and 1. Save these random numbers to the name `rand_nums`. 

Finally, to see what these numbers look like, visualize `rand_nums` using a histogram. 


In [154]:
# Use np.random.rand() to generate n = 500 random numbers between 0 and 1, and visualize them as a histogram. 








#### Step 2: Count the number of "heads"

Next create a name called `prob` which has the probability that a coin flip is a "head". Let's set `prob` to be equal to .5 to simulate flipping a fair coin. Then see how many of the `rand_nums` are less than the `prob` value to see how many of your coin flips were "head"; i.e., use `np.sum()` to count how many of your coin flips were heads. 



In [155]:
# Set prob to .5 and count how many values are greater than prob





#### Step 3: Creating the function flip_coins(n, prob)

Now write the function `flip_coins(n, prob)` by taking the code you wrote in the previous two steps and turning it into a function. 

Then try out the function a few times and see how the number of "heads" you get varies from simulation to simulation, and also experiment with different values for the arguments `n` and `prob`. 


In [156]:
# Create a function flip_coins(n, prob) that generates n random numbers and returns how many are less than prob







#### Step 4: Adding an additional argument to the function 

Let's add an additional parameter to the `flip_coins` function called `return_prop` which has a default value of `False`; i.e., the function should now be `flip_coins(n, prob, return_prop = False)`. If the `return_prop` is set to `True` that it should return the proportion of coin flips that were heads rather than the number of coin flips that were heads. 

Hint: Adding a conditional statement to your function could be useful. 



In [157]:
# Add an argument return_prop that when set to True will return the proporton of coin flips that were heads (rather than the number of heads)



