# Introduction to Python

## Part I: Basics of Python Programming

This document is heavily based on the Python-Novice-Gapminder lesson developed by Software Carpentry, and the original lesson can be found online at http://swcarpentry.github.io/python-novice-gapminder/

Throughout this Jupyter Notebook file, code and comments going through the lesson are already written. You should follow along and run code block by block; this is so that you have a version of the code that won't fail because of simple things like spelling mistakes or missing characters (which happens when you feverishly try to type what's happening up front). Don't worry; there will be opportunities to practice as we move along. If at any point when following along some pre-provided code fails, please don't hesitate to ask one of the instructors for help.

A few more notes - unlike R, which we also have workshops in, Python is a general purpose programming language and has less conveniences than R does in terms of data analysis. This means that programming concepts such as loops and if-statements which we've usually not covered in R are far more important in Python. Still, in the interest of time, we're going to prioritize data analysis (specifically working with the `pandas` and `matplotlib` packages) before we move onto these other topics.

Given that you are reading this in Jupyter notebook, the cell containing this text is called a Markdown cell; in contrast to a Code cell which will contain, you guessed it, our code. You can create a cell by clicking an existing cell above or below you want a cell, and then clicking on 'Insert' (near the top of this website), and then one of the corresponding options. Most people use Jupyter notebook and these cells to write a kind of interactive report that contains the code to perform the analysis of the report inside it; we'll be using it for some comments.

You'll also need to know to click 'Run' to run the code in a cell. There are some useful keyboard shortcuts you can also use - Shift-Enter runs a cell, while Alt-Enter creates a new cell (shortcut may differ on Mac).

In [None]:
# There are other types of comments; this cell is a Code block and everything in here will be executed when you 
# run this cell. However, it's common in programming languages to write helpful messages to yourself and other
# readers throughout the code that isn't meant to be executed. In Python we can create a comment by typing '#';
# afterward everything on that line after the '#' will be ignored and not executed.

In [None]:
# Let's start with some very simple math
2 + 2

If you ran the cell above you should now see `4` as the output. One thing to note about Jupyter notebook cells is that it only displays the output of the last line.

In [None]:
3 + 3 # you never see the answer as there is code beneath this
5 + 5 # last line; so you'll see 10

If you run the previous cell you'll see that we never get the answer for `3 + 3`; that's because it wasn't the output line.

In [None]:
# This isn't to say that the first lines don't get run; you just don't see the output
x = 5
x + 2

Above we ran two lines of code and both were run; we assigned `5` to the variable (think label) of 'x', and then we added 'x' (which is `5`) with `2` to get 7. `x = 5` was still run despite not being the last line.

### Mathematical operations

* Addition: use `+`
* Subtraction: use `-`
* Multiplication: use `*`
* Division: use `/`
* Power: use `**`

In [None]:
3 + 2 # addition

In [None]:
3 - 2 # subtraction

In [None]:
3 * 2 # multiplication

In [None]:
3 / 2 # division

In [None]:
3**2 # 3 to the power of 2

Order of operations in Python follow the same order of operations in algebra.

In [None]:
# Is this 3 + (2*3) = 9 or (3+2) * 3 = 15? 
3 + 2 * 3

In [None]:
# Use parentheses to explicitly set the order
(3 + 2) * 3

# Variables
Just above we performed a variable assignment of `x = 5`; we can assign different types of values to different labels. The label name (or variable name) has a few rules though - it can only contain letters, digits, and the underscore; and it can't start with a digit.

In [None]:
instructor_age = 22 # we are only as old as we think we are!
instructor_first_name = 'Brooke'
instructor_email = 'luetgert@uchicago.edu' # feel free to email any questions you might have later on

Notice that we defined text using `'`; this is because if we just typed `instructor_first_name = Brooke` then Python would look for a variable called `Brooke` which we didn't define (and might store something unrelated like a number); so the quotes tell Python that we mean text. You can also define text using `"`; Python treats it the same as `'` as long as the opening and closing quotes match.

We can use the `print` function to display things on the screen (instead of just typing the variable name). `print` can also be used to overcome Jupyter's limitation with only displaying the last line.

In [None]:
print(instructor_age)
print(instructor_email)

In [None]:
# Here print takes several arguments to print all the things on the screen in the same line
print(instructor_first_name, 'is', instructor_age, 'years old')

There were 4 'things' that we printed on the screen in the previous print statement; `instructor_first_name`, the text "is", `instructor_age`, and the text "years old".

Let's see what happens if we try to use a variable that hasn't been defined. Pay attention to the error message; in the future you might misspell a variable name and seeing this error will be very informative.

In [None]:
print(instructor_last_name)

`name 'instructor_last_name' is not defined`; which makes sense as we never defined as last name variable.

### Practice
Define two variables called `my_age` and `my_first_name` with your age and first name. Use `print` to print a similar message I did above using your information.

Remember that there are two instructors available to help if you need it!

## Variables have Types
So far we've worked with just strings (text) and numbers, but there are many, many, other types of variables in Python. In this workshop today we're also going to work with datasets as variables and lists, but in general know that basically everything is a variable. Technically even functions are variables that we use. We can see the type of a variable by calling the `type` function on it.

In [None]:
type(instructor_age)

In [None]:
type(instructor_first_name)

In [None]:
type(print) # yes; print is technically a variable

Obviously some operations between variables have different meaning depending on the types involved.

In [None]:
instructor_first_name - instructor_age

I can't subtract a number from a string! Also pay attention to the error message as it describes exactly what we did wrong.

In some cases this is inconvenient, because a variable could be converted into another type.

In [None]:
other_age = '23' # note here I'm storing a number as text so we can't do math with it yet

We can convert the types (if possible) using the `int`, `float`, or `str` functions. `float` turns something into a decimal number, while `int` only turns something into a whole number.

In [None]:
instructor_age - int(other_age)

In [None]:
instructor_age - float(other_age)

In [None]:
str(instructor_age) + other_age # just to show that adding strings together does

## Strings
Strings (text) are common enough that they have special features. Here is some information about them.

We can get the length of a string by using the `len` function

In [None]:
helper_first_name = 'Daniel'
len(helper_first_name)

We can use a 'slice' to get a part of the string. Here we use square brackets and say we want the zeroth up to (but not including) the 3rd character. This is a bit confusing; in Python when you have a numbered collection, such as a list or a string, the first element is element 0, the next one is element 1, and so on. 

D  A  N  I  E  L

0  1  2  3  4  5

`0:3` corresponds to asking for elements 0, 1, and 2.

In [None]:
helper_first_name[0:3]

In [None]:
# Get the first letter
helper_first_name[0]

In [None]:
# Get the last letter
helper_first_name[len(helper_first_name) - 1]

### Question
Why did we have to subtract 1 from the length to get the last letter? What would have happened if we didn't subtract 1?

Answer: 


It turns out that Python has a shortcut with slices to go from some index to the end (or from the beginning to some index) by leaving one side of `:` blank. This way we could avoid typing something long like:

In [None]:
helper_first_name[3:len(helper_first_name)]

With something short like:

In [None]:
helper_first_name[3:]

In [None]:
# or to get the beginning of the word
helper_first_name[:3]

## Methods
Most variables also have something called methods that are special functions that apply to that variable.

In [None]:
# upper turns it upper case
helper_first_name.upper()

In [None]:
# lower case
helper_first_name.lower()

In [None]:
# swap case
helper_first_name.swapcase()

In [None]:
# Replace wherever 'iel' appears with 'ny'
helper_first_name.replace('iel', 'ny')

In [None]:
helper_first_name

Notice that these methods didn't actually modify `helper_first_name` but instead created a new string that we could have assigned to a new variable. Sometimes methods will modify their variable; other times they won't. It depends on the method and the object. For strings they won't modify it.

We can look at the help function for a method by appending `?` (or using the `help` function) on a method, function, or variable.

In [None]:
# Will bring up in Jupyter a help file
?helper_first_name.replace

In [None]:
# Will print the help file below the cell
help(helper_first_name.replace)

In [None]:
# Try another method
help (print)

We can look at the methods and sub-objects (attributes) available on a variable by using the `dir` command; ignore everything that starts with `_`

In [None]:
dir(helper_first_name)

There's a lot of output here but if you look you can see the same functions we just used like `lower` and `upper`. We can then check the help files for whatever method looks useful, like we did for `replace`. 

### Practice
Using the `my_name` variable from the previous practice, experiment with:
* Making your name upper case
* Making your name lower case
* Get just the first 2 letters of your name
* Read the help page for the `count` method on strings and describe what it does. Try experimenting with it.


## Functions

A function is a block of code which only runs when it is called.

You can pass data, known as parameters, into a function.

A function can return data as a result.


In [None]:
print(instructor_first_name, 'is', instructor_age, 'years old')

In [None]:
len(info)

In [None]:
help(len)

In [None]:
dir(info)

Here are some other helpful functions

`max`, `min`, and `round`

In [None]:
# Can take a number of numbers directly
print(max(5,3,8,2))
print(min(5,3,8,2))

In [None]:
# Or can take a list of numbers
some_numbers = [5.24,3.12,8.92,2.42]
print(max(some_numbers))
print(min(some_numbers))

In [None]:
print(round(6.7357))
print(round(6.7357, 2))
print(round(6.7357, -1))

In [None]:
# Worth noting that round(some_numbers) won't work
round(some_numbers) # will give error

For-loops can be helpful to apply a function to every item in a list.

## For-Loops
Now that we have lists, we might want to do something to every item inside. As we saw with `round` it's not always easy. There we might want to round all the numbers and get a new list back. In other cases we might just want to *do* something with the items inside, say for example saying "Hello" to all the names in a list. Writing code to the same thing for element 0, element 1, ... manually is not feasible.

There are two variants for for-loops; the simpler version is shown first.
### Variant 1

In [None]:
instructors_friends = ['Daniel', 'Neil', 'Shaun']
for friend in instructors_friends:
    print("Hello", friend)

A few things to note - 
1. We start a for loop with the `for` keyword
2. We define a temporary variable (in this case `friend`) which will be each element from the list
3. We use the `in` keyword to specify which list we're going to use
4. We end that line with a `:`; this tells Python we're now going to start our code
5. On the next line(s) we **indent** the code (either using a tab or spaces); this tells Python we're still in the for-loop. Once the indentation ends the for-loop ends. The length of the indentation doesn't matter; as long as its consistent in the entire loop.

Here's an example of a for-loop with multiple lines to show the indentation

In [None]:
for friend in instructors_friends:
    friend_upper_case = friend.upper()
    print("Hello", friend_upper_case)
    print("Goodbye", friend_upper_case)
    print() # print blank line
print("For-loop completed; this line is run after the for-loop finishes")

In many cases we might want to loop over a range of numbers, say from 1 to 100. We can use the `range` function to help with this instead of creating a long list. Note that `range(a, b)` runs from `a, a+1, a+2, ...., b-2, b-1`; `b` doesn't get included.

In [None]:
sum_hundred = 0
for i in range(1, 101):
    sum_hundred = sum_hundred + i

print(sum_hundred)

Remember when we were working with `some_numbers` and we wanted to `round` each of them? We can use a for-loop to solve it.

In [None]:
rounded_some_numbers = []
for number in some_numbers:
    rounded_number = round(number)
    rounded_some_numbers.append(rounded_number)
    
rounded_some_numbers

### Variant 2

It turns out that using for-loops to do something to everything in a list and then making a new list with that is so common that Python provides a special shortcut way of doing it. We use the list notation of square brackets to create a new list, but inside the list we instead have our command, followed by `for <variable> in <previous_list>`.

In [None]:
rounded_some_numbers = [round(number) for number in some_numbers]
rounded_some_numbers

Much easier than the previous method for this common case.

### Practice

* Write a for-loop to loop through `some_numbers` and print the number on the screen. You'll have to use **Variant 1**.
* Use a for-loop to create a new list containing each element of `some_numbers` divided by 5. Try to use **Variant 2**.
* Let `x = [1, 2, 5, 8]` and `y = [2, 5, 9, 0]`. Write a for-loop that creates a new list that is the sum of each element of `x` and `y` (so your new list should show `[3, 7, 14, 8]`.


## Libraries
In Python there are libraries to provide extra functionality. You might have noticed that those built-in functions of `max`, `min`, and `round` are pretty basic; what about getting `sin` or `cos` to do some trigonometry? We can get these functions by importing the `math` library.

In [None]:
# First to show that sin won't work without math
math.sin(3.14/2)

In [None]:
import math
print(math.pi)
print(math.sin(math.pi/2))

Once `import math` was called, every time we now call `math` it will work. Sometimes library names can get really long; we can use something called *aliasing* to replace the name with a shorter version. This is also convenient if two libraries share the same name.

In [None]:
# aliasing
import math as m
print(m.pi)

If you're getting tired of writing `m.` all the time; say we are going to use `sin` all throughout our calculations, then we can import specific items from a library.

In [None]:
from math import sin, pi
sin(pi/2)

I know that I showed you 3 ways to import libraries and that it can be hard to remember all of them, so if you want permission to forget to one or two of them just remember the first way.

***After*** importing a library we can use `dir` to see what methods it has available.

In [None]:
dir(math)

We can also look at the help page for a library.

In [None]:
help(math)

### Practice

There's a library called `random` that can be used to generate random numbers. Import this package, look at its help file to find a function called `randint`, and use that function to generate a random number between 0 to 10. 

## User-Defined Functions

Throughout this workshop there have been a few times where we've had to write our own code over and over again to do the same thing with only minor changes, like on a different dataset. For example, we converted the columns of a dataset to years by taking the last 4 characters and then converting it to an integer, and we did that again in the practice. 

Wouldn't it be nice if we could write it once then easily use it moving forward? That's what defining our own functions will help us do.

First let's start with something simple, a function that takes a name as an argument and prints a greeting to that person.
We start a function by writing `def`, the name of the function, and then parenthesis with the labels of all the parameters that are needed. We then have a `:`, and then the code we want to run is on later lines, indented like we did with for-loops.

In [None]:
def say_hello(name):
    print('Hello', name)
say_hello("Joel")

Sometimes a function returns something back, like when we called math.sin we expected a number back.


In [None]:
def count_up_to(n):
    summation = 0
    for i in range(0, n+1):
        summation = summation + i
    return summation
count_up_to(10)
count_up_to(100)

Note that this function isn't calling print; we can save the output here for later use.

In [None]:
sum_hundred = count_up_to(100)
sum_hundred/2

## If-statements
When we define our own functions we often want them to be flexible enough to handle different scenarios. We can use if-statements to define different 'paths' our code takes given different scenarios.

We start with if, then some condition that returns either True or False (below we use the > symbol to do a comparison). A : then denotes when the code starts, and we then have those lines of code indented like we did with for-loops and custom functions.

Below here I use a for-loop as an illustration.

In [None]:
masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')

In this case we only do something when m is larger than 3.0; we can use `else` to specify what happens when the condition is not met.

In [None]:
for m in masses:
    if m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

We can use `elif` to specify additional conditions.

In [None]:
for m in masses:
    if m > 9.0:
        print(m, 'is HUGE')
    elif m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')

Here's an if statement integrated in a custom function.

In [None]:
def sell_beer(true_customer_age):
    if true_customer_age >= 19:
        print('Would you like a receipt with that?')
    elif true_customer_age >= 13:
        print("That's clearly a fake ID")
    else:
        print('Where are your parents?')

In [None]:
sell_beer(25)

In [None]:
sell_beer(17)

In [None]:
sell_beer(4)

## Practice

Define a function that takes a grade from 0 to 100 and returns a letter grade.
* If the grade is less than 50, return an 'F'
* If the grade is between 50 and 70, return a 'C'
* If the grade is between 71 and 85, return a 'B'
* If the grade is greater than 85, return an 'A'.