# NM Supercomputing Challenge 2020

## Session #3 (Jon Wheeler): Data workflows, automation, and visualization with Pandas and Matplotlib

This notebook is modified from the "Data Workflows and Automation" and "Data Ingest and Visualization - Matplotlib and Pandas" lessons in the Data Carpentry module, _Data Analysis and Visualization in Python for Ecologists_. The lessons are available at <http://www.datacarpentry.org/python-ecology-lesson/>, and are licensed with a Creative Commons Attribution (CC-BY-4.0) license: <https://creativecommons.org/licenses/by/4.0/>. The requested citation is:

> John Gosset, April Wright (eds): "Data Carpentry Python Ecology lesson."
> Version 2017.04.0, April 2017,
> <http://www.datacarpentry.org/python-ecology-lesson/>

Before we begin, let's go ahead and re-import the libraries we're using for this workshop:

In [50]:
# We will keep this commented out for now, but if needed it is possible to install packages from within a Jupyter notebook.
# The syntax is:
#     pip install <package>

#!pip install pandas

In [51]:
import pandas as pd     # 'pd' is an alias that lets us reference pandas functions and methods without having to type out 'pandas' every time
import os

## Workflows & automation

Recap: What are some of the benefits of using Python? Everyone is encouraged to jot down some reflections or thoughts in this space. (Double click this code block to edit and add text.)

> .
> .
> .

Loops and functions allows us to build efficiencies into our analyses by repeating actions and re-using bits of code. This also helps to reduce the likelihood of mistakes and errors we might make when we perform repetitive tasks manually.

## Loops

In [52]:
# For loops

# Syntax
# for item in collection:
#    do something

# The "collection" we iterate through can be any one of a number of data structures or object, but we often use for loops to iterate
# over lists. Let's start with that:

animals = ['lion', 'tiger', 'crocodile', 'vulture', 'hippo']

# Print the list to screen:
print(animals)

['lion', 'tiger', 'crocodile', 'vulture', 'hippo']


In [53]:
# Referring to the syntax above, "the collection" is the list we have assigned to the variable "animals."
# The items in the list are the individual animals we have named - lion, tiger, etc.

# The following loop will print out the name of each animal. How does the output differ from the 'print(animals)' command above'?
# The loop variable is "animal" - we can call the loop variable almost anything, but we recommend using a term consistent with the collection.
for animal in animals:
    print(animal)

lion
tiger
crocodile
vulture
hippo


In [54]:
# Note that the loop variable still exists after the for loop has completed, and has the value of the last item in the collection:

print(animal)

hippo


# Exercise:

Run the next two code blocks, then answer the questions below:

1. What happens if we don’t include the `pass` statement?
2. Rewrite the loop so that the animals are separated by commas, not new lines (Hint: You can concatenate strings using a plus sign. For example, print(string1 + string2) outputs ‘string1string2’).


In [55]:
for creature in animals:
    pass

In [56]:
print('The loop variable is now: ' + creature)

The loop variable is now: hippo


## Functions

Loops are handy, but they will only be executed from wherever they are in the code. Using the example above, if we are developing a script in which the list of animals is continually being modified, we might find ourselves adding a new for loop every time we want to print the animal names or do something else. Imagine for example a process like this:

```
# A list of family pets:
pets = ['dog', 'cat', 'lizard']

# Let's print the list for reference:
for pet in pets:
    print(pet)

# Hooray, we got a parakeet!
pets.append('parakeet')

# Oh no, the cat ran away!
pets.remove('cat')

# Now we need to re-print the updated list...
for pet in pets:
    print(pet)
```

This is an artificial example, but demonstrates that we often need to update lists or other objects and repeat certain actions on one or several objects. Functions allow us to create re-usable bits of code that can be called as needed from any point in a script. 

The syntax for a function is:

```
def funtion_name(input_argument_1, input_argument_2):
    
    # Do things. The next line is an example.
    combined_args = str(input_argument_1) + " " + str(input_argument_2) 
    
    # Return a result
    return combined_args
```

In [57]:
# This is the example function provided in the Carpentries lesson.
# Note that we have to execute this code block before we can call the function.

def this_is_the_function_name(input_argument1, input_argument2):

    # The body of the function is indented
    # This function prints the two arguments to screen
    print('The function arguments are:', input_argument1, input_argument2, '(this is done inside the function!)')

    # And returns their product
    return input_argument1 * input_argument2

In [58]:
product_of_inputs = this_is_the_function_name(2, 5)

The function arguments are: 2 5 (this is done inside the function!)


In [59]:
print('Their product is:', product_of_inputs, '(this is done outside the function!)')

Their product is: 10 (this is done outside the function!)


In [60]:
# We can re-use the function as needed.

new_product = this_is_the_function_name(98, 1876)

The function arguments are: 98 1876 (this is done inside the function!)


In [61]:
print('Their product is:', new_product, '(this is done outside the function!)')

Their product is: 183848 (this is done outside the function!)


In [62]:
# Combining loops and functions allows us to get a lot work done with just a few lines of code.
# For example, we can create a short function to return the square of a number:

def calculate_square(number):
    return number * number

In [63]:
# Now we can call the function as part of a loop:

for n in range(1, 21, 2):
    n_squared = calculate_square(n)
    print("The square of", n, "is", n_squared)

The square of 1 is 1
The square of 3 is 9
The square of 5 is 25
The square of 7 is 49
The square of 9 is 81
The square of 11 is 121
The square of 13 is 169
The square of 15 is 225
The square of 17 is 289
The square of 19 is 361


## Exercise

1. Change the values of the arguments in either of the functions above and check the output.
2. Try calling one of the functions by giving it the wrong number of argument or not assigning the function call to a variable (no product_of_inputs =)
3. Declare a variable inside the function and test to see where it exists (Hint: can you print it from outside the function?)
4. Explore what happens when a variable both inside and outside the function have the same name. What happens to the global variable when you change the value of the local variable?


## Conditionals

Another way we can control the flow of execution within our programs is to create statements that allow us to execute different commands depending on conditions that we set. Conditional statements are also known as "if" statements.

In [64]:
a = 5

if a < 0:
    print("a is a negative number")

elif a > 0:                           # Think of "elif" as "else if." You can use as many elif statements as needed to test multiple conditions. 
    print("a is a positive number")
    
else:                                 # Use "else" for any cases that don't satisfy any other conditions.
    print("a must be zero!") 

a is a positive number


In [65]:
# If statements can be used in for loops:

for n in range(-10, 11):
    if n < 0:
        print(n, "is a negative number")
    elif n > 0 and n < 5:
        print(n, "is greater than zero but less than 5.")
    elif n == 5:
        print(n, "is equal to 5.")
    elif n > 5 and n < 10:
        print(n, "is greater than 5 but less than 10.")
    elif n == 10:
        print(n, "is equal to 10.")
    elif n > 10:
        print(n, "is greater than 10.")
    else:
        print(n, "must equal zero!")

-10 is a negative number
-9 is a negative number
-8 is a negative number
-7 is a negative number
-6 is a negative number
-5 is a negative number
-4 is a negative number
-3 is a negative number
-2 is a negative number
-1 is a negative number
0 must equal zero!
1 is greater than zero but less than 5.
2 is greater than zero but less than 5.
3 is greater than zero but less than 5.
4 is greater than zero but less than 5.
5 is equal to 5.
6 is greater than 5 but less than 10.
7 is greater than 5 but less than 10.
8 is greater than 5 but less than 10.
9 is greater than 5 but less than 10.
10 is equal to 10.


In [66]:
# If statements can also be used in functions:

def compare(numeric_argument_1, numeric_argument_2):
    if numeric_argument_1 > numeric_argument_2:
        print(numeric_argument_1, "is greater than", "numeric_argument_2")
    elif numeric_argument_1 < numeric_argument_2:
        print(numeric_argument_1, "is less than", numeric_argument_2)
    else:
        print(numeric_argument_1, "is equal to", numeric_argument_2)

In [67]:
compare(2, 987)

2 is less than 987


In [69]:
# Note that our function works with mathematical operations.

compare(1324/6.8726, 1.259 * 198)

192.6490702208771 is less than 249.28199999999998
