# Welcome to the **GW Libraries Programming with Python workshop.**

### Quick tips

Here are some handy shortcuts when working in a Jupyter notebook:
- Shift-Return (Shift-Enter) re-runs the cell you're on.
- Esc, then A inserts a cell above where you are.
- Esc, then B inserts a cell below where you are.
- More shortcuts under Help --> Keyboard Shortcuts

You will probably get some errors in working through this notebook.  That's okay, you can just go back and change the cell and re-run it.

The notebook auto-saves as you work, just like gmail!

# Variables

The first thing we're going to do is to learn about **variables**.  Variables are a way to store values that can be numbers, text, lists, boolean (true/false), etc.

Much like in algebra, you set the value of a variable using what looks like an equation:


In [None]:
price = 7.99

So let's see the effect of that, by having Python **evaluate** our variable called `price`:

In [None]:
price

Next, set the value of a new variable to your name.   A couple of things you'll need to know:
* Variable names can't contain spaces or special characters (except `_` )
* String values (like words or text) must be contained in a pair of either matching single quotes or matching double quotes.

Try using foreign characters in your string.  Python 3 handles these well.

What if we wanted to do some simple math using these variables?

* Create a new variable called `quantity` and set it to a number.
* Create another new variable called `extended_price` and set it to the product of price \* quantity (use `*` for multiplying)


What happens if you try multiplying the name variable by 4?

How about if you *divide* it by 2?  Add a number to it?  Add another name to it (using `+`)?

What we're starting to see here is that Python has different **types** for numbers, text, and more.  You can find out the type of a variable by using:

**`type(myvariablename)`**

Try it.

Numbers, by the way, aren't just numbers.  Look at the type of the `price` variable versus another variable set to `7`.

You can even re-assign a variable to new type.  Try it!

There's also another important basic variable type, called a boolean variable.  Booleans can have a value of **`True`** or **`False`** (capitalized, in Python)

Create a variable called `gwstudent` and assign its value to `True`:

Now evaluate the `type()` of gwstudent.

#### Comparison operators:  <, >, ==, !=, <=, >=, and, or, not, ...

Try out some comparisons, for example, whether:
* `price` is greater than 5.99
* `name` is equal to "Dan"

# Lists and Tuples

**Lists** in Python hold an _ordered_ sequence of elements, like this:

`states = ['Virginia', 'Maryland', 'New Jersey', 'Utah', 'Rhode Island']`

Try creating a list of several countries.

We can access elements of the list using the `mylist[n]` notation.

Try retrieving the first country in the list you created above.

Notice that in Python, the first element of the list is really the "0th" element (this is not the case when programming in R!)

You can also access parts of the list using syntax like: [0:2], [:2], [3:], [-2],[-2:].  See if you can figure out what these do.

Lists come with useful functions, like `len(mylist)` which returns the length of the list.

What if you wanted to add an element to a list?  Or remove an element from a list?  Try using:

`mylist.append(<the new element>)`
for example, `states.append('New York')`

There are also list functions to do things like insert, remove, sort, reverse.  You can also use the `+` operator to add lists together. You can even multiply a list by a number, similar to how earlier we multiplied a string by a number!

Try adding a list of two new states to your list.

Strings (like your name variable) also have some list-like behaviors, because they're lists of individual charactres.  How might you get the the n'th character from a string?

### Tuples

Notice that we created a list using [] square brackets.  If we use () parentheses, we create what's called a *tuple*.  A tuple might be something like this:

In [None]:
colors = ('blue', 'green', 'red')

Notice that tuples are like lists: you can access elements, etc.  But what if you try to append to a tuple?

### The "in" operator

We can use `in` to see whether an element is found in a list.  Try running this:

In [None]:
'yellow' in colors

# Dictionaries

A **Dictionary** is a container that holds pairs of objects - keys and values.  Keys may be strings or numbers.  A value may be any type, whether an integer, string, boolean, list, etc.  It can even be another dictionary!

Here's an example of a dictionary:

In [None]:
workshop = {'name': 'Programming With Python', 'duration': 2,
                 'instructors': ['Dan', 'Laura', 'Justin', 'Ian'],
                 'awesome': True}

So dictionaries are very similar to lists, except that they're indexed with keys.

Dictionaries are actually data structures that can represent objects in **JSON** (JavaScript Object Notation) format, which today is a very common way of representing data!

How do you think you would add an item to a dictionary?  Try adding a 'location' item.  How about replacing an item?

How do you think you would access the value of `'instructors'` key from the `workshop` dictionary?

### Challenge

How might you add two more names to the list of instructors?  (Challenge:  Try to do this in one line, *without* just overriding the list of instructors)

We can also look up whether a certain key exists in a dictionary.  How might you evaluate whether or not `workshop` contains a `location` key?

# Comments

Comment lines start with `#`.  They don't execute any code, but it's a very good idea to comment your code so that the reader (which might be your future self) can understand anything that's not already obvious from your clear and well-written code.

In [None]:
# This is a comment and is here for the reader's benefit

## Iteration (looping)

Iteration allows us to repeat over a section of code, and iterate through a list (or other "iterable") at the same time.  **`for`** and **`while`** create iterations, like this:

```
numbers = [4, 6, 0, 5.5, 3]
for n in numbers:
    print("The next number in the list, squared is ", n**2)
    
print("We're done iterating -- notice that this line isn't indented, so it's outside the loop")
```    

and this brings up the topic of **indentation**!  The block (i.e., lines) of code that you're iterating over needs to be indented.  In Python, you should indent by 4 spaces.

Try creating a list of exam scores that we'd like to grade on a curve.  Use `max()` to find out the highest score.  Then create an iteration that prints out the score as well as the score graded on a curve (by dividing by the highest score).


Iterating over dictionaries is slightly different:


In [None]:
for key, value in workshop.items():
    print ("The key is ", key, " and the value is ", value)

Notice that dictionaries are unordered.  There's no guarantee about what order the iterator will yield the dictionary items to you in.

## Conditionals

Conditional expressions, using `if`, `else`, and `elif` (a contraction of "else if") allow us to execute blocks of code only if the condition is true.

Create some code that iterates through a list of exam scores.  If a score is 70 or above, it should print out "Pass!"  If it's below 70, it should print out "Fail!"

## Functions

We've already seen some built-in functions, such as `print()`, `max()`, and `len()`.  What if we want to write our own functions?

Defining part of a program in Python as a function is done using the `def` keyword. For example a function that takes two arguments and returns their sum can be defined as:

In [None]:
def add_function(a, b, c=0):
    result = a + b
    return result

z = add_function(20, 22)
print(z)

Here's a function that will normalize a score.  We can keep reusing it.

In [None]:
def normalize(score, factor):
    return 100*score/factor

normalize(86, 96)

If we specify which parameter is which, we can pass them in any order we like:

In [None]:
normalize(factor=96, score=86)

Key points here:

- definition starts with **`def`**
- function body is indented
- **`return`** keyword precedes returned value


### Challenge

Can you create a function called `pad` that would take a list and pads it with some new item if it's less than the length you want?  You would use it like this:

```
names = ['Ali', 'Bob', 'Carla']
names = pad(names, 5, '*')
```
Then `names` would evaluate to:
```
['Ali', 'Bob', 'Carla', '*', '*']
```

# Data with Pandas

Since Pandas is a Python library (i.e. not part of what comes "built in") we need to **import** it as a library.

In [None]:
import pandas as pd

### The data we're using today

For this lesson, we will be using the Portal Teaching data (https://figshare.com/articles/Portal_Project_Teaching_Database/1314459), a subset of the data from Ernst et al Long-term monitoring and experimental manipulation of a Chihuahuan Desert ecosystem near Portal, Arizona, USA (http://onlinelibrary.wiley.com/doi/10.1890/15-2115.1/abstract)

This section will use the surveys.csv file that can be downloaded here:  https://ndownloader.figshare.com/files/2292172

Each row records the species and weight of each animal caught in plots in the study area.

The columns represent:

| Column 	| Description |
| --- | --- |
| record_id |	Unique id for the observation|
| month |	month of observation |
|day 	|day of observation|
|year |	year of observation|
|plot_id 	|ID of a particular plot|
|species_id |	2-letter code|
|sex |	sex of animal ("M", "F")|
|hindfoot_length |	length of the hindfoot in mm|
|weight |	weight of the animal in grams|

Each time we call a function that's in a library, we use the syntax *LibraryName.FunctionName*. Adding the library name with a `.` before the function name tells Python where to find the function. In the example above, we have imported Pandas as `pd`. This means we don't have to type out `pandas` each time we call a Pandas function.

Let's use panda's built-in function that reads in a CSV file:

In [None]:
pd.read_csv("surveys.csv")

That read our CSV file, but we'd like to store it as an **object**.  So we'll create a variable for it, called `surveys_df`.  This is just like how we used a variable above to store an integer, or a string, or a list, or a dictionary.  We're just storing a Pandas DataFrame object instead.

Make sure to run the cell below:

In [None]:
surveys_df = pd.read_csv("surveys.csv")

Try evaluating `surveys_df`:

In [None]:
surveys_df 

How would you now check what **class** (type) of object `surveys_df` is?

`surveys_df` is a Pandas **DataFrame**.   A DataFrame is a 2-dimensional structure that can store data in rows and columns - similar to a spreadsheet or a table, but with some other nice features.  (Yes, it's very similar to a `data.frame` in R.)

Just like the Pandas *library* has functions, *objects* can have **functions** (which may take arguments) and **attributes** (which don't).

A Pandas DataFrame object has an attribute called `dtypes` which lists out the type of each column:

In [None]:
surveys_df.dtypes

Try these to see what they do:

    surveys_df.columns
    surveys_df.head()
Also, what does `surveys_df.head(15)` do, versus `surveys_df.head(4)`?

    surveys_df.tail()
    surveys_df.shape
    
Take note of the output of the shape method. What format does it return the shape of the DataFrame in?


In [None]:
surveys_df.shape

Let's see what type `surveys_df['species_id']` is.  Try it.

In [None]:
type(surveys_df['species_id'])

You can think of a Pandas **Series** as a series of observations of one variable.  It behaves like a Python list.

We can also slice and dice -- similar to how we selected parts of list objects above.  What does the next line do?

In [None]:
surveys_df[3:10]

Another way to select is with `.loc`, which selects based on *labels* (as opposed to `.iloc` which selects using *numerical indices*).  Try this:

In [None]:
surveys_df.loc[[3, 10, 12], ['day', 'year', 'species_id']]

We can also use `.query` to select only rows matching certain conditions.  Note that the query expression is in single quotes.

In [None]:
surveys_df.query('hindfoot_length < 10')

**Challenge**:  How might you query to get back only rows with `hindfoot_length < 10` **and** `weight > 10` in ONE expression?  (There is more than one way to accomplish this!)

Pandas has a handy function (well, it has many handy functions!) to get all the unique elements in the column:

In [None]:
unique_species = pd.unique(surveys_df['species_id'])

Try evaluting the **`.size`** attribute on the above result to see how many unique species there are in the data set.

In [None]:
unique_species.size

We see from above that we can also isolate just the data in one column.  Let's try isolating the `weight` column, and calling the **`describe()`** function to get some statistics on it.

In [None]:
surveys_df['weight'].describe()

Pandas can also sort and group data based on the values in a column:

In [None]:
sorted_by_sex = surveys_df.groupby('sex')

Try running **`describe()`** on `sorted_by_sex`:

Now we're going to create some series with:

* The number of animals observed per species
* The mean weight of all animals observed in each species

In [None]:
# a series with the number of samples by species
species_counts = surveys_df.groupby('species_id')['record_id'].count()
# a series with the mean weight by species
species_mean_weights = surveys_df.groupby('species_id')['weight'].mean()

Let's try creating some quick bar charts.  First we need to make sure figures appear inline in the notebook:

In [None]:
%matplotlib inline

And now we'll create some quick charts:

In [None]:
species_counts.plot(kind='bar', title="TITLE GOES HERE")

In [None]:
species_mean_weights.plot(kind='bar')

In [None]:
surveys_df.plot(kind='scatter', x='hindfoot_length', y='weight', 