# What have we covered so far?


* Week 1 - Introduction to Jupyter
    * JupyterHub
    * Jupyter Notebooks
    * Markdown
* Week 2 - Python Programming
    * Python Syntax
    * Values & Data types
    * Statements & Expressions
    * Variables
    * Operators
    * Errors
* Week 3 - Control Flow
    * Boolean Expressions
    * Logical Operators
    * Conditional Statements
    * Functions
    * Modules
* Week 4 - Lists & Iteration
    * Lists
    * List methods
    * While Loops
    * For loops
* Week 5 - Collections
    * Strings
    * More lists
    * Dictionaries
    * Indexing into Nested Collections
* Week 6 -  Files
    * Opening files
    * File types (CSV, JSON, MARC)
* Week 7 - Debugging
    * Debugging
    * Exception Handling
    
![It is a lot to process](https://media.giphy.com/media/dvmhOt9jWPKxwc4mY3/giphy.gif)

## The Purpose of this Notebook

This notebook is going to provide a (hopefully) helpful summary of all the stuff you have learned in one easy to read place. Use this document as a kind of "cheat sheet" (and feel free to modify it) for most of the Python topics we have learned up to this point.

---

## Jupyter Notebooks

Notebooks are made of cells:

* Markdown Cells - For writing human readable, stylized text
* Raw Text Cells - For plain old text
* Code Cells - For executable Python code

## Markdown

Markdown cells are a way to write **formatted** text for *human consumption*

See this [Markdown Cheat Sheet](https://www.markdownguide.org/cheat-sheet/) to help you learn the Markdown syntax.




## Python Programming

![Python Expressions](https://images.slideplayer.com/39/10997989/slides/slide_2.jpg)

Crudely: An expression evaluates to a value. A statement does something.

Expressions are evaluated and replaced by the value they produce. Here is how Python evaluates the statement above.


In [None]:
x = 3.145
print("The answer is: " + str(round(x * 5, 1)))

In [None]:
x = 3.145
print("The answer is: " + str(round(3.145 * 5, 1)))

In [None]:
x = 3
print("The answer is: " + str(round(15.725, 1)))

In [None]:
x = 3
print("The answer is: " + str(15.7))

In [None]:
x = 3
print("The answer is: " + "15.7")

In [None]:
x = 3
print("The answer is: 15.7")

### Data Types 

Python has 4 base data types
* Integer 
* Float 
* String
* Boolean

In [None]:
# int is just a number
3

In [None]:
# float has a dot and decimals
3.1415926535897931

In [None]:
# string has quotes around it
"Hello, what a wonderful day!"

In [None]:
# boolean true is capital T True
True

In [None]:
# boolean false is capital F False
False

### Variables

Use the *assignment operator* (`=`) to assign a *values* into a *variable*

In [None]:
# assing the integer value 3 to the variable a_var
a_var = 3

In [None]:
# A variable is an expression that evaluates to its value
a_var # 3

### Variable Names and Python Keywords

* There are rules and conventions for naming variables (also called *identifiers*) in Python.
* Rules:
    * Variable names cannot begin with numbers.
    * Variable names cannot contain spaces.
    * Variable names are case sensitive.
    * Variable names cannot be one of the Python *keywords*

```python
and       del       from      None      True
as        elif      global    nonlocal  try
assert    else      if        not       while
break     except    import    or        with
class     False     in        pass      yield
continue  finally   is        raise
def       for       lambda    return
```

* Conventions:
    * Don't start variables with capital letters.
    * Use underscores instead of spaces.
    * Use descriptive variable names.
    * Read [PEP 8 -- Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)



### Mathematical Operators
* All of the mathematical operators you know and love are available in Python
    * Addition: `+`
    * Subtraction: `-`
    * Multiplication: `*`
    * Division: `/`
    * Modulus: `%`

In [None]:
# addition
5 + 3

In [None]:
# subtraction
5 - 3

In [None]:
# modulus - gives you the remainder of 5 divided by 3
5 % 3

In [None]:
10 % 7 # 10 / 7 leaves 3

### String Operators

* You can use some mathematical operators (+) on string types
* This operator allows you to *concatinate* strings together

In [None]:
# String concat
"John" + "Do'h"

In [None]:
# don't forget the space
"John" + " " + "Doe"

In [None]:
# no
"john" - "Doe"

### Changing Data Types

* If you want to do mathematical operations on your data, make sure your data values are the right type
* You can't add a string to an integer.
* You can't divide a float by a string.

In [None]:
# convert string to integer
int("5")

In [None]:
# Convert integer to float 
float(5)

In [None]:
# convert integer to string
str(5)

In [None]:
# convert integer to boolean
bool(4)

In [None]:
# convert empty string to boolean
bool("")

In [None]:
# convert "False" string to boolean
bool("False")

### Truthiness

* Any Python thing can be tested for a *truth value*
    * convert it with `bool()`
* The following conditions are considered false according to the [specification](https://docs.python.org/3/library/stdtypes.html#truth-value-testing)
    * constants defined to be false: `None` and `False`
    * zero of any numeric type: `0`(int), `0.0`(float), `0j`(complex), `Decimal(0)`([decmial](https://docs.python.org/3/library/decimal.html)), `Fraction(0, 1)`([fractions](https://docs.python.org/3/library/fractions.html))
    * empty sequences and collections: '', (), [], {}, set(), range(0) (we will discuss collections next week)
* All other things evaluate to `True`

## Control flow

Control flow allows you to insert decision points in your code to determine the flow of execution

### Comparing Values


We can use *comparison operators* (<, >, !=, is, is not) and *conditional expressions* (if, else, elif) *control the flow of execution*
* if true then do this, else do that
* We use *boolean expressions* (True, False) to articulate the Yes/No questions

In [None]:
# testing for equality
5 == 5

In [None]:
# Does five equal six?
5 == 6

In [None]:
# Does six not equal five?
6 != 5

* Note the **double equals** sign, if we had one equals we would be a variable assignment.
* `==` is called a *comparison operators* and it has many friends:
```python
x != y               # x is not equal to y
x > y                # x is greater than y
x < y                # x is less than y
x >= y               # x is greater than or equal to y
x <= y               # x is less than or equal to y
x is y               # x is the same as y  # compares identity, not just value
x is not y           # x is not the same as y
```

### Logical Operators


Use `and` and `or` and `not` to create complex logical expressions

In [None]:
# Truth tables for and operator
print("True and True   - ", True and True)
print("True and False  - ", True and False)
print("False and True  - ", False and True)
print("False and False - ", False and False)

In [None]:
# Truth tables for or operator
print("True or True   - ", True or True)
print("True or False  - ", True or False)
print("False or True  - ", False or True)
print("False or False - ", False or False)

In [None]:
# Truth tables for not operator
print("Not True  - ", not True)
print("Not False  - ", not False)

### Conditional Execution

* Using these conditional expressions we can then let the program make decisions
* The `if`, `else`, and `elif` statements allow us to steer the direction of your program 
    * or  *control* the *flow* of execution
    

In [None]:
# Try changig the value of age
age = 900

if 0 < age < 1:
    print("you a baby")
elif 1 <= age <= 3:
    print("you a toddler")
elif 3 <= age <= 9:
    print("you a kid")
elif 10 <= age <= 12:
    print("you a tween")
elif 13 <= age <= 17:
    print("you a teen")
elif 18 <= age < 120:
    print("You an Adult, welcome to misery.")
else:
    print("No one can live that long!")
    

### Functions

Functions are important because they allow for the re-use and modularization of your code. Function have three important components:

* **Function name** - this is the name we are going to use to "run" that block of code we put in our function.  We want to be very careful about the words we use. These are words like "print" and "type" we've seen above.  Pick function names that make sense and that are easy to recall.

* **Parameters** - these are the values or variables that a function accepts as input. When calling a function, parameters are "passed" to the function when placed in parentheses `()` after the function name.  Functions can pass more than one parameter, these are seperated by a comma `,`.

* **Return values** - All functions "do some work" but some functions don't return anything.  Others have return values.  Print is an example of a function that does not return a new value.  We call these functions "void" functions, meaning their return type is void.  The other functions above have return values.  Type returns a variable type; max and min return a single character string; len returns an int.

### Defining a Function

![Anatomy of a function](https://swcarpentry.github.io/python-novice-inflammation/fig/python-function.svg)

*Image used without permission. [Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/08-func/index.html).*

In [None]:
# define a celsius to fahrenheit conversion function
def celsius_to_fahr(temp):
    return 9/5 * temp + 32 # the expression 9/5 * temp + 32 evaluates to a single value

In [None]:
# define a fahrenheit to celsius conversion function
def fahr_to_celsius(temp):
    return ((temp - 32) * (5/9))

In [None]:
# define a celsius to kelvin function
def celsius_to_kelvin(temp_c):
    return temp_c + 273.15

### Calling Functions

Calling a function will replace the entire function call with the value it returns. Function calls are *expressions*.

In [None]:
# call pass the parameter 100 to the function
celsius_to_fahr(100) # will return 212.0

In [None]:
# call pass the parameter 100 to the function
fahr_to_celsius(68) # it will return 20

In [None]:
# call celsius_to_kelvin with the output of fahr_to_celsius
celsius_to_kelvin(fahr_to_celsius(68))

In [None]:
# same as above with the inner function evaluated
celsius_to_kelvin(20.0)

In [None]:
print("The boiling point of water in kelvin is: " + str(celsius_to_kelvin(fahr_to_celsius(212))) + "K")

### Methods are functions

Methods are a special kind of function associated with specific objects. Methods are called with a dot `.`, method name, and parentheses `()`. Methods will automatically be passed the value before the dot.

In [None]:
# string method upper
"boo".upper()

In [None]:
scary = "boo"
scary.upper()

In [None]:
my_list = [1,2,3,4]
my_list.append("cat")
my_list

In [None]:
my_list.append("cat")
my_list.append("cat")
my_list.append("cat")

# How many cats? call the count method
my_list.count('cat')

In [None]:
# len is not a method, it is a global function
# so you have to pass it the lsit
len(my_list)

### Modules

The [Python Standard Library](https://docs.python.org/3/library/index.html) is a collection of Other People's Python. You can use their functions, but you have to explicitly load it with the `import` statement.

Like going to a library to checkout a book rather than reading the book you have at home.

In [None]:
# checkout the calendar module from the library
import calendar
calendar.prmonth(2021,2)

Modules and 3rd party libraries will be covered in more detail in the coming weeks.

## Iteration

Iteration allows you to process large amounts of data with only a few lines of code. There are two kinds of looping mechanisms in Python.

### While Loops
Use the `while` loop to iterate over a *block* of code until a *condition* is true.

```python
while <conditional expression>:
    <python statement>
    <another python statement>
```
Be careful with infinite loops!

```python
while 5 == 5: # 5 always equals five
    print("Never run this code")
    print("It will make JupyterHub cry")

```

In [None]:
n = 10
while n > 0:
    print(n)
    n = n - 1
print("blastoff")

### For Loops

Use the `for` loop to iterate over a sequence of items until you reach the end of the sequence.

![Anatomy of a for loop](https://i.imgur.com/91NoaP0.jpg)

*Image used without permission. [Dataquest](https://www.dataquest.io/blog/python-generators-tutorial/)*

In [None]:
# example from https://geo-python.github.io/site/notebooks/L3/for-loops.html

# create a list of weather conditions
weather_conditions = ['rain', 'sleet', 'snow', 'freezing fog', 'sunny', 'cloudy', 'ice pellets']

# loop over each element in the weather_conditions list 
for weather in weather_conditions:
    print(weather) # print the current value of the weather variable

## Collections

Python provides several data structures for containing and structuring collections of data.

### Lists
A sequence of values, with a numerical index starting at zero. Lists may be constructed in several ways:

* Using a pair of square brackets to denote the empty list: `[]`
* Using square brackets, separating items with commas: `[a]`, `[a, b, c]`
* Using the type constructor: `list()` or `list(sequence)`

In [None]:
# make a list of farm animals
matt_farm = ["barn cat", "cattle dog", "cows", "chickens", "ducks", "goats", "alpaca", "spiders"]

In [None]:
# add some horses and pigs to the farm
matt_farm.append("horses")
matt_farm.append("pigs")
# See my menagerie
matt_farm

### Indexing and Slicing a list

*Index* a list by putting square brackets and a number at the end of a list, you can extract a value from list: `<listname>[<index>]`. 

In [None]:
# get the item at index position two
matt_farm[2] # cows?

*Slice* a list by putting square brackets and two numbers separated by a colon: `<listname>[<start>: <end>]` Note, start is inclusive, end is exclusive.

In [None]:
# fowl slice 
matt_farm[3:5]

Lists work nicely with `for` loops when used as the sequence to iterate over.

In [None]:
for animal in matt_farm:
    if animal == "spiders":
        continue # skip spiders
    else:
        print(animal) # print the value of animal

In [None]:
# use the remove list method to EXTERMINATE the spiders
matt_farm.remove("spiders")
matt_farm

![Burn the spider](https://media.giphy.com/media/MdPZFGgDL3PC8/giphy.gif)

### Strings

The string base datatype is technically a *sequence* of characters. It is a more complex object, which means it has methods

In [None]:
my_string = "I am a string"
type(my_string)

In [None]:
# loop over a string as a sequence of characters
for character in my_string:
    print(character)

### Indexing and Slicing a String

In [None]:
# give me the character at index position 5
my_string[5]

In [None]:
# Create a new string
city_name = "Pittsburgh"
print(city_name[0:4])
print(city_name[-5:]) # slice startings and the end minus 5 and continue to the end

### String Methods

Python provides a set of [String Methods](https://docs.python.org/3/library/stdtypes.html#string-methods) for transforming and manipulating strings. Like list methods, use the dot, method name, and parentheses surrounding parameters (if applicable) to call the method.

`<string>.<method>(<parameter>)`

In [None]:
# make a string of words and then split
my_string = "We built this city on rock and roll and steel n'at."
my_string.split() #split defaults to spaces

We can use string looping, slicing, and string methods to do some data science on strings.

In [None]:
# create a string of some classic song lyrics
lyric = "cash rules everything around me"
# create an empty list for storing letters
letters = []
# Use the split function to create a list of words and loop over the list
for word in lyric.split():
    # append the first letter of the word converted to uppercase to the list 
    letters.append(word[0].upper()) 

# Join all the items in the string together with periods
greatest_wu_tang_song = ".".join(letters)
print(greatest_wu_tang_song)

### Dictionaries

![Dictionary Picture](http://www.trytoprogram.com/images/python_dictionary.jpg)

*Image used without permission from [Trytoprogram](http://www.trytoprogram.com/python-programming/python-dictionary/)*

Dictionaries can be created by several means:

* Using a pair of curly braces to denote the empty dictionary: `{}`
* Use a comma-separated list of key: value pairs within braces: `{'jack': 4098, 'sjoerd': 4127} or {4098: 'jack', 4127: 'sjoerd'}`
* Use the type constructor: `dict()`, `dict([('foo', 100), ('bar', 200)])`, `dict(foo=100, bar=200)`

In [None]:
word_of_the_day = {} # create an empty dictionary
word_of_the_day['megillah'] = "a long involved story or account"
word_of_the_day['cognoscente'] = "a person who has expert knowledge in a subject"
word_of_the_day['perdure']  = "to continue to exist"

In [None]:
# what does megillah means?
word_of_the_day['megillah']

You can loop over dictionaries too, but observe the looping variable will be the key so you still have to look up the value.

In [None]:
# loop over the keys in the dictionary
for word in word_of_the_day:
    # lookup each key
    print("The word", word, "means", word_of_the_day[word])

## Files

The [open() function](https://docs.python.org/3/library/functions.html#open) gives Python the power to programmatically read data from the hard drive. When you call this function, you get back a *file hanlder* that points to the file. This is handy with *really big* files.

In [None]:
# open a file handler to a text file in the files directory
fileHandle = open('files/romeo.txt')

In [None]:
# iterate over each line in the file using the loop
numberOfLines = 0
for line in fileHandle :
    numberOfLines += 1
    
print(numberOfLines)

Always remember to close your files when done!

In [None]:
fileHandle.close()

To re-read a file you must re-open the file.

Use the `read()` method to read the entire contents of a file into a string. 

In [None]:
# open a file handler to a text file in the files directory
fileHandle = open('files/romeo.txt')

# read all of romeo
whole_file = fileHandle.read()


# display the whole_file variable
whole_file

In [None]:
# don't forget to close the file when you are done!
fileHandle.close()

Now we can work with the data without worrying about the file

In [None]:
neo_romeo = whole_file.replace('she', "JULIET").replace("She", "JULIET")
print(neo_romeo) # use print to see newlines


We have re-written the play, we should write it to disk.

Use the `with` statement to automatically close the file handler

In [None]:
# write a file to the current directory
with open("neo-romeo.txt", "w") as fileh:
    fileh.write(neo_romeo)

## Putting it all together - Stylometrics


Don't be afraid of the $5 word, it just a technique for analyzing texts (usually to [determine authorship](https://www.latimes.com/science/sciencenow/la-sci-sn-shakespeare-play-linguistic-analysis-20150410-story.html)). Computational Stylometics does a lot of fancy statistics, but much of it is based on *counting words*.

Given the text `romeo.txt`, we want to write a program that counts all the words.

In [None]:
# open romeo.text and read it all in at once
with open("files/romeo.txt", "r") as swoon_handler:
    romeo = swoon_handler.read()
    
#swoon
print(romeo)

### Computational Thinking

If we want to count the words in the text above, we need to do the following things.

1. Normalize the text by removing punctuation and converting to lowercase.
2. Split the string of text into a list of words
3. Loop over the list and count each instance of a word

In [None]:
# convert everything to lowercase
romeo.lower()

Now we have everything in lowercase, but we need to remove the punctuation. Now, we can use the `replace()` string method and manually identify and remove each punctuation mark. Its ugly but it works.

In [None]:
# ugly approach to removing punctuation
romeo_no_punc = romeo.replace(".","").replace(",","").replace("!","").replace("'","").replace(":","").replace("?","").replace(";","")
romeo_no_punc

Now lets lowercase ever word.

In [None]:
# normalize the text
romeo_normalized = romeo_no_punc.lower()
romeo_normalized

Now we can do computational thinking step 2: split the string of words into a list. Also an easy task thanks to the string method `split()` which will automatically split on whitespace

In [None]:
#split text into a list of words
romeo_list = romeo_normalized.split()
romeo_list[0:10] #look at the first 10 words in the list

Ok, now we can do the final step, which is loop over each word and count them up in a dictionary

In [None]:
# create a counter
word_counter = {}

# loop over each wor
for word in romeo_list:
    # check to see if we have encountered the word
    if word not in word_counter:
        # have not seen this word before, so create a key with value 1
        word_counter[word] = 1
    else: 
        # we have seen this word before, so increment the value by 1
        word_counter[word] += 1

print(word_counter)

![THe bard](https://media.giphy.com/media/oveqQA2LxpwYg/giphy.gif)