# Python: Conditionals, data structures, functions

Notebook developed by Sam Maurer

This demo introduces and reviews a range of lower-level Python code structures. You'll learn how to use *conditionals* to run blocks of code only when certain conditions are met, how to loop through code and through data like lists and dictionaries, how to define your own functions, and when to use low-level data structures vs. Pandas.

## 1. Conditionals (if/then)

RealPython [tutorial](https://realpython.com/python-conditional-statements/)

###  `if <expression>:`

This evaluates an expression, and only runs the subsequent indented code if the expression is True You'll most commonly use this embedded inside other logic, in order to perform different actions depending on what some input looks like.

Optionally, an `if ... :` block can be followed with an `else:` block, which runs if the original `<expression>` evaluates to False.

In [None]:
x = 17

In [None]:
if (x == 17):
    print("it's 17")
    
else:
    print("it's not 17")

Try changing the value of `x` and then re-running the two cells. 

Note that the expression component of the `if ... :` line must evaluate to either True or False:

In [None]:
print(x==17)

Because `x==17` is an *expression* rather than a value, Python will always evaluate it before passing it to another function.

In [None]:
type(x==17)

An `if ... :` block can also be followed with one or more `elif ... :` blocks (standing for "else-if"):

In [None]:
x = 0.0

In [None]:
if (x == 0):
    print(x, "is zero")

elif (x < 0):
    print(x, "is negative")
    
elif (x > 0):
    print(x, "is positive")
    
else:
    print(x, "is not a number")

## 2. Loops

RealPython [tutorial](https://realpython.com/python-for-loop/)

### `for <i> in range(<int>):`

This runs the subsequent indented code `<int>` number of times, while giving you access to a count variable named `<i>`.

You can think of `range(<int>)` as generating a list of integers from 0 to `<int>`-minus-one.

In [None]:
list(range(10))

In [None]:
for i in range(5):
    print(i/100)

Note that in programming, just like in math, it's a convention to use `x`, `y`, or `z` to represent numerical variables, and `i`, `j`, or `k` to represent counts.

It's best not to use `l` as a variable, because it looks similar to the number 1 in monospaced fonts.

### Exercise

Loop through the numbers from 0 to 99 and print the ones that are evenly divisible by 15. 

(Tip: `a % b` gives you the remainder when `a` is divided by `b`)

## 3. Working with lists

RealPython [tutorial 1](https://realpython.com/python-lists-tuples/), [tutorial 2](https://realpython.com/python-for-loop/)

### `for <item> in <list>:`

As we've seen, this runs the subsequent indented code for each item in the list, referring to it using the variable named `<item>`.


### `for (<i>, <item>) in enumerate(<list>):`

This variation counts the items as it loops through them, giving you a count variable named `<i>`.

In [None]:
for city in ['New York', 'Los Angeles', 'Chicago']:
    print(city)

In [None]:
for (i, city) in enumerate(['New York', 'Los Angeles', 'Chicago']):
    print(i, '-', city)

### `break`

Invoking this keyword forces Python to exit the current loop. You might use this for looking at a small portion of a large dataset. But it's also risky to rely on `break` statements, because if it *doesn't* get triggered, your code might keep running indefinitely. (The "stop" button at the top of the notebook window can save you!)

In [None]:
for i in range(1000000000000000000000000):
    print (i)
    
    if (i>5):
        break

### []

This is an empty list.

### `<list>.append(<item>)`

This appends an item to a list.

### `<list> + <list>`

This concatenates (joins) two lists together into a single list.

In [None]:
x = []

In [None]:
x.append(0)
x.append(5)
x

In [None]:
x + [10, 15]

## 4. Working with dictionaries

RealPython [tutorial](https://realpython.com/python-dicts/)

### `for <key> in <dict>`

This loops through the items in a dictionary. For example, you might use this to convert dictionaries into lists so that you can work with them in a table.

In [None]:
d = {'AL': 'Alabama', 
     'AK': 'Alaska', 
     'AR': 'Arizona'}

postcodes = []
names = []

for k in d:
    postcodes.append(k)
    names.append(d[k])

print(postcodes)
print(names)

### Exercise

Here's the earthquake data from last class:

In [None]:
import json
import requests

endpoint_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson"
response = requests.get(endpoint_url)
data = json.loads(response.text)

quakes = [q['properties'] for q in data['features']]

# properties of the most recent earthquake
quakes[0]

The `quakes` variable is a list of dictionaries. Print the 'place' for the first quake.

Print the 'place' for all the quakes with magnitude greater than 5.2.

Did any earthquakes have a tsunami associated with them?

Print the 'place' for the first **ten** quakes with magnitude greater than 4. (There are a few different ways to do this using the syntax we've learned, but it will take a little creativity!)

## 5. Writing your own functions

### `def <funcname>(<args>):`

This defines a function named `<funcname>`. When called, it will run all the subsequent indented code.

A function can specify one or more arguments to accept. Each argument must have a *name*, which will be used to refer to it inside the function. You should describe what the expected data looks like in comments.

You can provide default values for arguments by using `<arg>=<value>` instead of `<arg>`. This makes the argument *optional*: users don't have to provide it if they're happy with the default. Optional arguments must be listed at the end, after any required arguments.

In [None]:
def isdog(animal):
    """
    Shoddy function for identifying dogs. 'animal' should be a string.
    
    """
    answer = False
    
    if animal in ['retriever', 'beagle', 'bulldog', 'labrador']:
        answer = True
    
    return answer

In [None]:
isdog('beagle')

In [None]:
isdog('horse')

A more serious example:

In [None]:
def primes(nmax=50):
    """
    Finds and returns a list of all the prime numbers less than nmax. 
    Default nmax is 50 if not specified.
    
    """
    L = []
    for n in range(2, nmax):
        for factor in L:
            if n % factor == 0:
                break
        else:
            L.append(n)
    return L

You can run the function by calling `primes()`, `primes(<int>)`, or `primes(nmax=<int>)`

In [None]:
primes()

### Exercises

Write a function that takes a number between 1 and 26 and returns the associated letter of the alphabet. (Recall that you can retrieve characters out of a text string just like you would items from a list.)

Write a function that takes a list of numbers and returns the first value greater than 10, or else 0 if there isn't one.

## 6. Lists vs. Series

One of the advantages of using a Pandas `Series` instead of a native Python `list` is that you can apply calculations to all the elements at once.

This means less code, and it's also more efficient computationally. So in practice, you'll mostly use list manipulation as a way to prepare data for Pandas.

In [None]:
L = list(range(10))
L

How can we add 1 to each entry? `L + 1` will give you an error.

One approach:

In [None]:
L2 = []

for (i, val) in enumerate(L):
    L2.append(val + 1)
    
L2

Or, more cryptically:

In [None]:
[(val + 1) for val in L]

But in Pandas, math operates on the individual values. Whenever possible, this is the better way to do things!

In [None]:
import pandas as pd

S = pd.Series(L) + 1

print(S.values)  # .values turns the Series back into a list, which prints nicer here

For more complicated transformations, you can also apply a function to a Series, running it automatically on each row:

### `pd.Series.apply(<function>)`, documentation [here](https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.Series.apply.html)

This runs a function on each value in a Series. The function should take a single value and return a single value. (More complicated transformations are possible too.)

In [None]:
# Set up a function that performs a numerical transformation

prime_list = primes(10000)

def nth_prime(n):
    """
    Returns the nth prime number, up to around 1000.
    
    """
    return prime_list[n]

In [None]:
locations = pd.Series([14, 50, 85, 97, 148, 740, 390, 999])

print(locations.values)

In [None]:
print(locations.apply(nth_prime).values)

### Exercise

Write a function that takes a number and returns a category name: "Negative", "Positive", or "Really big". (You can decide where the cutoffs are.)

Then apply the function to a column of data.