# Appendix C — Python tutorial

# Introduction

In this tutorial you'll learn the basics of Python.
Don't freak out, this is not a big deal.
You're not becoming a programmer or anything,
you'll just be learning how to use Python as a fancy calculator.


**Calculator analogy** 
Python commands are similar to the commands you give to a calculator,
but Python commands are more powerful since they allow you to define variables,functions, etc.
The same way a calculator has different buttons for the various arithmetic operations,
the Python language has a number of commands you can "run" or "execute."
Whereas calculators allow only simple arithmetic calculations of one expression (evaluated when you press = button),
the Python prompt accepts entire "paragraphs" of commands allowing you to write complicated multi-step procedures.
This is what people call "coding" or "programming."

Just like knowing how to use a calculator is helpful for doing lots of arithmetic operations,
learning Python is helpful when you need to deal with repetitive operations and procedures.


## Why learn Python?

First off, Python is a really good calculator.
- expressions
- for loops for repeated calculations
- custom functions

Also a scientific calculator
- mathematical operations as functions
- linear algebra with numpy
- SymPy for symbolic math calculations. Very powerful stuff. See examples of solutions to math problems expressed as sympy commands in this [paper](https://arxiv.org/pdf/2112.15594.pdf#page=11) (see [this video](https://www.youtube.com/watch?v=9JZdAq8poww?t=169) for explainer).

Not only that, but Python can be used as a graphical calculator
- plot functions
- data distributions

You can also use Python for spreadsheet-like functionality,
manipulate tabular data, compute totals, etc.


### Python for statistics

Once you learn the basics of Python syntax,
you'll have access to the best-in-class Python libraries for
data management (Pandas, see [pandas_tutorial.ipynb](./pandas_tutorial.ipynb))
and data visualization (Seaborn, see [seaborn_tutorial.ipynb](./seaborn_tutorial.ipynb)),
and (e.g. `statsmodels`) and machine learning (`scikit-learn`, `pytorch`, `huggingface`, etc.).

In this tutorial,
we'll focus on using Python for data calculations and procedures needed for statistics.
Learning a few basic Python constructs like the `for` loop
will enable you to simulate frequentist probability calculations
and experimentally verify how statistics procedures work.
This is a really big deal!
If's good to know the statistical formula and recipes,
but it's even better when you can run your own simulations and check when the formulas work and when they fail.

Don't worry there won't be any advanced math—just sums, products, exponents, logs, and square roots.
Nothing fancy, I promise.
If you've ever created a formula in a spreadsheet,
then you're familiar with all the operations we'll see.
In a spreadsheet formula you'd use `SUM(` in Python we write `sum(`.
You see, it's nothing fancy.

Yes, there will be a lot of code (paragraphs of Python commands) in this tutorial,
but you can totally handle this.

If you ever start to freak out an think "OMG this is too complicated!" remember that Python is just a fancy calculator.

## Overview

TODO: redo as sentences

1. Introduction
1. Getting started
1. Variables and expressions
1. Getting to know Python
1. Lists and for loops
1. Functions
1. Dictionaries and other data structures
1. Objects and classes
1. Python syntax review
1. Bonus topics
1. Python libraries and modules
1. Links

After this tutorial, you'll be ready to read the other two:
- Pandas (see [pandas_tutorial.ipynb](./pandas_tutorial.ipynb))
- Seaborn (see [seaborn_tutorial.ipynb](./seaborn_tutorial.ipynb))  

# Getting started


## Installing JupyterLab Desktop

- TODO: Import screenshots from Sec 1.2
- JupyterLab UI: file browser, notebooks, code cells, Markdown cells
- Alternative: run JupyterLab instance in the cloud via mybinder


### Code cells contain Python commands

The Python command prompt (each of the code cells in this notebook)
allows you to enter Python commands and "run" them by pressing SHIFT + ENTER,
or by clicking the play button in the toolbar.

For example,
you can make Python compute the sum of two numbers by entering `2+3` in a code cell,
then pressing SHIFT + ENTER.

In [1]:
x = 2 + 3
print(x)

5


In the first line of the above code cell, we set the variable `x` to the value `2 + 3` (the sum of two integers). On the second we call the `print` function to display the value the variable `x` to the screen.

Note the print statement on the last line can be skipped,
since notebook cells print the result of the last expression by default.

In [2]:
x = 2 + 3
x

5

When you run a code cell,
you're telling the computer to "execute" the Python instructions in that cell,
which means to do the actions described in the code,
and print the result of the final value computed in that cell.

Running a code cell is similar to using the EQUALS button on the calculator: whatever math expression you entered, the calculator will compute its value and display it as the output. The process is identical when you execute some Python code, but you're allowed to input multiple lines of commands at once. The computer will execute the lines of code one by one in the order it sees them.

The result of final calculations in the cell gets automatically printed in the output cell right below the input cell. This feature allows you to skip the print statements, since the last output gets printed automatically for you. This makes it easy and fun to explore and "poke around" each example given in this notebook.

<a id="vars_and_expr"></a>
# Variables and expressions




## Variables

Similar to variables in math, a variable in Python is a convenient name we use to refer to any value: a constant, the input to a function `x`, the output of a function `y`, or any other intermediate value.

To assign a value to a variable, you use the symbol `=` as follows, from left to right:

* we start by writing the name of the variable
* then, we add the symbol `=`
* finally, we write the value of the variable

In [3]:
x = 3
print(x)

3


In the first line of the above code cell, we set the variable `x` to the value `3`. On the second we call the `print` function to display the value of x to the screen.

In [4]:
x = 3
x

3

### Variables types

There are multiple types of variables in Python:

- **int** - integers ex: `34`,`65`, `78`, `-4`, etc. (rougly equivalent to $\mathbb{Z}$)
- **float** - ex: `4.6`,`78.5`, `1e-3` (full name is "floating point number"; similar to $\mathbb{R}$ but only with finite precision)
- **bool** a Boolean truth value with only two choices: `True` or `False`.
- **string** - text ex: `'Hello'`, `'Hello everyone'`
- **list** a sequence of values ​​- ex: `[69, 81, 92, 77]`. The beginning and the end of the list are denoted by the brackets `[` and `]`, and its elements are separated by commas.
- **dictionary** a collection of key-value pairs. Each key is associated with a value - ex: `{'first_name': 'Julie', 'last_name': 'Tremblay', 'score': 98}`. Dictionaries are denoted by curly braces `{` and `}` inside which we place `'key': value`, pairs separated by commas.
- **tuples**, **sets**, **functions**, **objects**, etc. These are other useful Python building blocks which we'll talk about in later sections.

Let's look at some examples with variables of different types:
an `int`eger, a `float`ing point number, a `bool`ean value, a `str`ing,
a list, and a `dict`ionary.

In [5]:
score = 98
average = 77.5
above_the_average = True
message = "Hello everyone"
scores = [61, 85, 92, 72]
profile = {"first_name":"Julie", "last_name":"Tremblay", "score":98}

In [6]:
level = 3
health = 42.1
alive = True
name = "Julie"
names = ["Al", "Bo33", "Carine", "Julie", "Uma", "Zeno"]
player = {"name":"Julie", "level":3, "team":"a"}


In [7]:
len(message)

14

In [8]:
message.split()

['Hello', 'everyone']

You can explore the different methods available on any python object `int`, `float`, `str`, etc.  by starting to type the dot `.` after the name, e.g., `message.` then pressing the TAB button to get an "autocomplete" dropdown of all the methods available on the variable `message`. Most of these methods are common to all strings in Python.

## Expressions

Similar to expresisons in algebra, a Python expression is can be any combination of variables and operations:

In [9]:
# Expression involving numerical values
secs_in_1min = 60 
secs_in_1day = secs_in_1min * 60 * 24
secs_in_1week = secs_in_1day * 7
print("The number of seconds in a week is", secs_in_1week)


# Expression involving strings
name = "Julie"
message = "Hello " + name    # for strings, + means concatenate
print(message)


# Expression involving a list
scores = [61, 85, 92, 72]
average = sum(scores)/len(scores)
#        `sum` computes the sum of values in the list
#                and `len` gives you the length of the list
print("The average score is", average)


# String expression using a values from a dictionary
profile = {"first_name":"Julie", "last_name":"Tremblay", "score":98}
message2 = "Hi " + profile["first_name"]
print(message2)

The number of seconds in a week is 604800
Hello Julie
The average score is 77.5
Hi Julie


Note in all the above examples, the code had the form:

```Python
var_name = <some expression>
```
which is the important new pattern you have to get used to in programming. Even though this looks like a math equation, the meaning you have to associate with it is much simpler—we are setting variable `var_name` to the value of the expression `<some expression>`.

Don't worry about the lists and dictionary examples—I know they are complicated and we haven't explained all the syntax. We'll get to that in just a little bit. First let's practice computing some Python expresions.

In [10]:
# Numeric expressions

expr1 = 1 + 2.4
print(expr1)  # 3.4

expr2 = 4 - 6
print(expr2)  # -2

expr3 = 0.5 * 3
print(expr3)  # 1.5

3.4
-2
1.5


Here is an expression that Python cannot compute,
so it raises an exception.

In [11]:
expr4 = 5/0

ZeroDivisionError: division by zero

You'll see these threatening looking messages on a red background any time Python encounters an error when trying to run the commands you specified.
This is nothing to be alarmed by.
It usually means you made a typo (symbol not defined error),
forgot a syntax element (e.g. `(`, `,`, `[`, `:`, etc.),
or tried to compute something impossible like the `ZeroDivisionError` in the above example.

The way to read these red messages is to focus on the name of the exception and the message that gets printed on the last line. This should tell you what you need to fix. The solution will be obvious for typos and syntax errors, but for more complicated situations requires googling and searching on stack overflow.


### Integration exercise

Expression for converting a temperature in Celsius to temperature in Fahrenheit.


In [12]:
temp_C = 100
temp_F = (temp_C * 9/5) + 32

temp_F


212.0

Test: if temp_C = 100 then value of expression should be temp_F = 212.

### String expressions

In [13]:
message3 = "Hello" + " " + "world"  # + means concatenate
print(message3)

message4 = "Ai " * 3                # * means repeat
print(message4 + 'Caramba!') 

Hello world
Ai Ai Ai Caramba!


### Boolean expressions

You can use `bool` variables and the logical operations `and`, `or`, `not`, etc. to build more complicated boolean expressions (disjunctions, conjunctions, negations, etc.).

In [14]:
print('not True ==', not True)
print('not False ==', not False)
print('True and True ==', True and True)
print('True and False ==', True and False)
print('True or False ==', True or False)
print('False or False ==', False or False)

not True == False
not False == True
True and True == True
True and False == False
True or False == True
False or False == False


## Types and type conversions

The function `type` tells you the type of any variable, meaning what kind of number or object it is.
Look back to the list above—the Python types are shown in **bold**.

In [15]:
# an integer
score = 98
type(score)

int

In [16]:
# floating point number
average = 77.5
type(average)

float

In [17]:
# a string
name = "Julie"
type(name)

str

In [18]:
# a boolean value
above_the_average = True
type(above_the_average)

bool

You can convert between each of these types using the function which has the same name as the type of an object

- `int` : transform any expression into an int
- `float`: transform any expression into a flaot
- `str`: transform an expression to it's string representaiton (i.e. this is what the function `print` does).
- `bool`: transform any expression into a `True` or `False`

In [19]:
n = int("42")
print("The variable", n, "has type", type(n))

The variable 42 has type <class 'int'>


In [20]:
f = float("42.4")
print("The variable", f, "has type", type(f))

The variable 42.4 has type <class 'float'>


In [21]:
s = str(100.123)
print("The variable", s, "has type", type(s))

The variable 100.123 has type <class 'str'>


In [22]:
b = bool(1)
print("The variable", b, "has type", type(b))

The variable True has type <class 'bool'>


### Example: converting to `float` to compute sum

Suppose we're given two numbers $m$ and $n$ and we want to compute their sum $m+n$.
The two numbers are given to use given expressed as strings.

In [23]:
mstr = "2.1"
nstr = "3.4"
print("The variable mstr has value", mstr, "and type", type(mstr))
print("The variable nstr has value", nstr, "and type", type(nstr))

The variable mstr has value 2.1 and type <class 'str'>
The variable nstr has value 3.4 and type <class 'str'>


Let's try adding the two numbers together to see what happens...

In [24]:
mstr + nstr

'2.13.4'

This is because the addition operator `+` for strings means concatenate...

Python doesn't know automatically that the two text strings are mean to be numbers.
We have to manually convert the strings to a Python numerical type (`float`) first,
then we can do the addition.

In [25]:
mfloat = float(mstr)
nfloat = float(nstr)

print("The variable mfloat has value", mfloat, "and type", type(mfloat))
print("The variable nfloat has value", nfloat, "and type", type(nfloat))

The variable mfloat has value 2.1 and type <class 'float'>
The variable nfloat has value 3.4 and type <class 'float'>


In [26]:
# We can compute the sum:
mfloat + nfloat

5.5

In [27]:
# int/int --> float autoconversion
print('If you divide an integrer by an integer in Python using the / operator...')
print('...you get a float number', 6/5, 'has type', type(6/5))

If you divide an integrer by an integer in Python using the / operator...
...you get a float number 1.2 has type <class 'float'>


### Anything-to-str conversions


In [28]:
str(42)

'42'

In [29]:
str(43.3)

'43.3'

In [30]:
str(True)

'True'

## Your turn to try this...

Try typing in some Python code in this cell. If you've been simply reading until now, this is your chance to switch to "active" mode: use the rocket-button in the top right of the menu at the top, and choose the `Live Code` option to make all the cells in this notebook interactive, then try entering some Python commands in this code cell below:




## Exercises

- Review Python syntax cheatsheet [https://blog.finxter.com/wp-content/uploads/2020/07/Finxter_WorldsMostDensePythonCheatSheet.pdf](https://blog.finxter.com/wp-content/uploads/2020/07/Finxter_WorldsMostDensePythonCheatSheet.pdf)
  - add single dot next to concepts you've hear about
  - double dot next to python concepts you understand
  - triple dot next to concepts you've used in your code
- Try poking-around and explore expressions involving numbers (`int` and `float`),
  strings (`str`), and booleans (`bool`).
- Go through all quiz questions in reading material as notebooks:
  - 03-Variables: [https://introductorypython.github.io/tutorials/03-Variables.html](https://introductorypython.github.io/tutorials/03-Variables.html)
  - 04-Operarators: [https://introductorypython.github.io/tutorials/04-Operators.html](https://introductorypython.github.io/tutorials/04-Operators.html)
  - 06-Data types: [https://introductorypython.github.io/tutorials/06-DataTypes.html](https://introductorypython.github.io/tutorials/06-DataTypes.html) 
  - Review Collections: Lists section. This is important because we can use lists
    to represent vectors in Python, for example the two-dimensional vector can be defined as `v = [3,2]`
  - Complete the section on for loops in 07-Loops notebook.
    The for loop is important because it allows you to do operations for each element in the list.

# Getting to know Python

### Getting help

There are multiple ways to obtain helpful information Python modules, functions, and objects.

#### Python doc-string
- `?`
- SHIFT + TAB while cursor on top of Python function 
- `help()`
- `??`
- `%psource`
- `%pdef` 

#### Auto complete with attributes
- `TAB` button
- `dir()`
- `__dict__`


- online docs


see also https://jakevdp.github.io/PythonDataScienceHandbook/01.01-help-and-documentation.html


Let's say you're interested to know the options available for the function `print`,
which we use to print Python expressions.

In [31]:
# put cursor in the middle of function and press SHIFT+TAB
print

<function print>

In [32]:
msg = "Hello"
print(msg)

Hello


You know this function accepts a variable and prints it,
but what other keywords arguments does it take?

Use the help() function on `print`

In [33]:
help(print)

Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.



In [34]:
print?

[0;31mDocstring:[0m
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
[0;31mType:[0m      builtin_function_or_method


Another option is `print??` which will show even more help details.
Others are `%pdef` and `%psource` 

In [35]:
# place cursor after dot, and press TAB button
msg

'Hello'

In [36]:
msg = "Hello"
msg

'Hello'

In [37]:
# print(print.__doc__)

### Errors
Sometimes the executed commands will encounter an error,
and Python will give an error message describing the problem encountered.
Get psychologically ready for those, because they can be very discouraging.
REJECTED!
The computer doesn't like what you entered.

Examples of errors include `SyntaxError`, `ValueError`, etc.
The error messages look scary,
but really they are there to help you—if you read what the error message tells you,
you will know what needs to be fixed in your input.
The error message literally describes the problem!

The code cell below shows what happens when the code contains a math error.

In [38]:
3/0

ZeroDivisionError: division by zero

Here we get an error since we're trying to compute an expression that contains a divide by zero error:
Python tell us a `ZeroDivisionError: division by zero` has occurred.
Indeed it's not possible to divide by zero.

# Boolean variables and conditional statements

Let's talk about boolean variables, which can take on one of two values: `True` or `False`. 

We obtain boolean values from various comparisons.

In [39]:
3 > 2

True

Other arithmetic comparisons include `>=`, `<`, and `<=`, `==` (equal to), `!=` (not equal to).

The `in` operator can be used to check if an object is part of a list (or another kind of collection).

In [40]:
3 in [1,2,3,4]

True

Additionally, there is a conventions about values are considered "truthy" (i.e. get converted to `True` when converted to boolean using `bool`) and which expressions are falsy (i.e. get converted to `False` when passed through `bool`).

In [41]:
bool(0)

False

In [42]:
bool(1)

True

Any non-zero float is considered is truthy

In [43]:
bool(133.3)

True

Any non-empty string is considered truthy:

In [44]:
bool("something")

True

But empty string is falsy:

In [45]:
bool("")

False

Boolean expressions are important to understand because they are used in conditional statements.

### Conditional statements

Conditional control flow between code block alternatives


In [46]:
if True:
    print("This code will run")

if False:
    print("This code will not run")

This code will run


Can construct boolean values like `True` and `False` using various expressions that involve comparison,
`==` equal to, `>` greater than, `<` less than, etc.

In [47]:
x = 3

x > 2

True

In [48]:
if x > 2:
    print("x is greater than 2")
else:
    print("x is less than or equal to 2")

x is greater than 2


We can do multiple checks using `elif` statements.

In [49]:
temp = 20 # in Celsius

if temp >= 15:
    print("It's nice!")
elif temp < 0:
    print("It's cold!")
else:
    print("It's OK.")

It's nice!


Exercise: add another condition to the above code to print `It's hot` if the temperature is above 25.

### Inline if statements

We can also use if-else keywords to compute conditional expressions.
The general syntax for these is:
```Python
<value1> if <condition> else <value2>
```

This expressions evaluates to `<value1>` if the `<condition>` is true,
and to `<value2>` if `<condition>` is False.


# Lists and for loops


## Lists

To create a list of two values:
- start with an opening square bracket `[` ,
- then put the first value,
- comma `,`,
- then the second value,
- finally close the square bracket `]`

In [50]:
first_val = 3
second_val = 4

my_list = [first_val, second_val]
my_list

[3, 4]

In [51]:
scores = [61.0, 85.0, 92.0, 72.0]  # define a list of floats
scores

[61.0, 85.0, 92.0, 72.0]

In [52]:
# lists have a "length"
len(scores)

4

In [53]:
# elements of a list are acccessed using [ ] and 0-based index
scores[0]  # first score

61.0

In [54]:
# lists can be sorted
sorted(scores)  # returns a new list of sorted scores

[61.0, 72.0, 85.0, 92.0]

In [55]:
scores

[61.0, 85.0, 92.0, 72.0]

In [56]:
scores.sort()  # in-place sort the list
scores

[61.0, 72.0, 85.0, 92.0]

In [57]:
scores.reverse()
scores

[92.0, 85.0, 72.0, 61.0]

In [58]:
scores.append(22)

In [59]:
scores

[92.0, 85.0, 72.0, 61.0, 22]

In [60]:
scores.insert(2, 25)
scores

[92.0, 85.0, 25, 72.0, 61.0, 22]

In [61]:
scores.pop()

22

In [62]:
# list membership
92 in scores

True

Just like `.sort()` method, lists have all kinds of useful methods `.insert`, `.remove`, `.pop`, `.reverse`, ...  

You can see all those methods by starting to type `scores.` then pause for a second to see the auto-complete suggestions:

In [63]:
# scores.

In [3]:
scores = [61.0, 85.0, 92.0, 72.0]
scores.sort()

In [65]:
# len(scores)
# sum(scores)
# any(scores)
# all(scores)
# append/extend/+/

## For loops

repeat a code block multiple times for each element in an iterable
code blocks (via indentation) with multiple expressions (these are indicated based on the text indentation of the code instructions)


The "for loop" is a Python code construct of the form:
```Python
for el in <container>:
    <operations on element `el`>

```
that allows to repeat a block of operations **for each element of the list**.

In [4]:
# Example 1: print all the scores
for score in scores:
    print(score)

61.0
72.0
85.0
92.0


In [5]:
# Example 2: compute the average score  ==  sum(scores)/len(scores)

total = 0

for score in scores:
    total = total + score

total / len(scores)

77.5

The name of the variable used for the for loop is totally up to you, but in general you should choose logical names for elements of the list.

Here is a for loop that uses the single-letter variable:

In [6]:
for s in scores:
    print(s)


61.0
72.0
85.0
92.0


In [7]:
scores

[61.0, 72.0, 85.0, 92.0]

### List comprehension 

In [8]:
[score/100 for score in scores]

[0.61, 0.72, 0.85, 0.92]

### Example: file open and readlines

In [10]:
lines = open("story.txt").read().splitlines()
lines

['This is a short story.',
 'It is very short.',
 'It has only four lines.',
 'It ends with the word cat.']

In [11]:
contents = open("story.txt").read()
lines = contents.splitlines()
lines

['This is a short story.',
 'It is very short.',
 'It has only four lines.',
 'It ends with the word cat.']

### List-like iterable objects

The term "iterable" is used in Python to refer to all objects that are list-like,
and can be iterated using for loops.

- strings
- dictionaries (keys, values, and key:value items)
- sets
- `range` (lazy list of integers)

https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Iterables.html#Functions-that-act-on-iterables




In [12]:
range(0, 4)

range(0, 4)

In [13]:
list(range(0, 4))

[0, 1, 2, 3]

#### Iterating over dictionaries



In [73]:
player = {"name":"Julie", "level":3, "team":"a"}
list(player.keys())

['name', 'level', 'team']

In [74]:
# # ALT.
# list(player)

In [75]:
list(player.values())

['Julie', 3, 'a']

In [76]:
list(player.items())

[('name', 'Julie'), ('level', 3), ('team', 'a')]

We'll talk more about dictionaries [later on](#Dictionaries-and-other-data-structures).

#### Strings are lists of characters

In [77]:
# Define a str of length 26 that contains all the lowercase Latin letters
letters = "abcdefghijklmnopqrstuvwxyz"

# Accessing individual characters within a string
print("The index of the letter `a` in the string letters is 0")
first_letter = letters[0]  # a
print(first_letter)

print("The index of the letter `b` in the string letters is 1")
second_letter = letters[1]  # b
print(second_letter)

print("The index of the last letter in the string letters is -1")
last_letter = letters[-1]  # z
print(last_letter)

print("The last element in list of 26 elments has index 25")
print(letters[25] == "z")

print('\n\n\n')  # '\n' is a special character (an escape sequence) that prints a newline
                 # we'll use this kind of preint-nelines statements to logically
                 # separate the out outplut lines

# Slicing = getting the substring for a particular range of indices
first_four = letters[0:4]
print('The first four letters of the alphabet are:', first_four)
# the notation 0:4 is sugar syntax for `slice(0,4)` and corresponds
# to the range of indices for 0 to 4 (non-inclusinve): [0,1,2,3].

The index of the letter `a` in the string letters is 0
a
The index of the letter `b` in the string letters is 1
b
The index of the last letter in the string letters is -1
z
The last element in list of 26 elments has index 25
True




The first four letters of the alphabet are: abcd


#### Tricks for lists

Tricks:
- `enumerate`
- `zip`

In [78]:
enumerate(scores)

<enumerate at 0x10e415b80>

In [79]:
# Bonus concept: use `enumerate` to get pairs (index, value) from a list
# enumerate(scores) == [(0, 61.0), (1, 72.0), (2, 85.0), (3, 92.0)]

# example
for idx, score in enumerate(scores):
    # this for loop has two variables index and score
    print("Processing score", score, "which is at index", idx, "in the list")

Processing score 61.0 which is at index 0 in the list
Processing score 72.0 which is at index 1 in the list
Processing score 85.0 which is at index 2 in the list
Processing score 92.0 which is at index 3 in the list


In [80]:
# New concept: use `zip(list1,list2)` to get pairs (value1, value2) from two lists 
# list(zip([1,2,3], ['a','b','c'])) == [(1, 'a'), (2, 'b'), (3, 'c')]

# example
list1 = [1, 2, 3]
list2 = [4, 5, 6]

for value1, value2 in zip(list1, list2):
    print("Processing values", value1, "and", value2)


Processing values 1 and 4
Processing values 2 and 5
Processing values 3 and 6


In [81]:
list1 = [1, 2, 3]
list2 = [4, 5, 6]

list(zip(list1, list2))

[(1, 4), (2, 5), (3, 6)]

<a name="funcs"></a>
# Functions

Functions are one of the most important concepts in math and programming.


    y = f(x)  # common convention in math to call function inputs x, and outputs y

We can also draw as `x -----f----> y` the function is a map from input values x to an output value y


    def f(x):
        <steps to compute y from x>
        return y


Functions!! Finally we get to the good stuff! The previous two sessions were important
foundations, but now we get to unlocking the first superpower — modelling.
Once you know the basic properties of 10 or so functions,
you can build precise mathematical models for any real-world system.


Functions are all over the place:
- In high school math (the green book) we learn the basic vocabulary of y=f(x) functions
  and their parameters, which allows us to describe any real world process
- In calculus we analyze functions f(x) behaviour over time
  (integral of f = sum of values of f(x) between x=start and x=finish;
  and derivative of f at a = the slope of the graph of f(x) when x=a)
- In linear algebra we study linear transformations, which are functions
  that satisfy f(ax+by)=af(x)+bf(y), meaning a linear combination of inputs
  produces the same linear combination of outputs.
- In probability theory we use functions to describe the probability density of
  a random variables. For example X = Normal(mu, sigma^2) is a random variable
  whose density is described by the function p(x) = K*exp(-((x-mu)/sigma)^2/2)
  in stats we talk about functions computed from samples (estimators)
- In ML we learn about probabilistic models and use them to predict y from given input x



see also https://www.pythonlikeyoumeanit.com/Module2_EssentialsOfPython/Functions.html

## Python functions

reusable chunks of code that can be def-ined once, and used multiple times by "calling" them with different arguments



Functions in Python are similar to functions in math: a transformation that takes certain inputs and produces certain outputs. The math functions you're familiar with take numbers as inputs and produce numebrs as outputs, but a Python function can take on any type of input and produce any type of output.

Functions allow us to build chunks of reusable code that we can later reuse in other programs.

We declare a function with the keyword `def`.

```python
def function_name(function inputs):
    """
    doc string that describes what the function does (optional)
    """
    <function body line 1>
    <function body line 2>
    <function body line ...>
    <function body line n-1>
    return <function output value>
```

We enter the name of the function, then define the functions arguments inside parentheses—the names of the variables that the function receives as inputs. Then the function body is written as an indented block of code (all lines start with four spaces indentation). The output of the function is specified using the `return` keyword. The return statement is usually the last line in the function body.

Certain functions do not return a value (we call these _procedures_) and they consist of sequences of commands we want to execute, that don't have any outputs. FunctIons can also be attached to objects, in which case they are called _methods_. We'll talk about these layer on, for now let's focus on simple math-like functions that receive some input and produce output:

#### Example 1

A first example of a simple math-like function. The function is called `f`,
takes numbers as inputs, and produces numbers as outputs:

In [82]:
def f(x):
    return 2*x + 3

f(10)

23

#### Example 2

In [83]:
# Example from Session 1
import random

def flip_coin():
    r = random.random()
    if r < 0.5:
        print("heads")
    else:
        print("tails")

flip_coin()  # no return value, but prints output

heads


#### Example 3

Write a function `water_phase(temp)` that takes input temperature `temp` in Celcius,
uses if/else statements to find what state water is in (assume pressure is 1atm).
The function returns a string, which is one of "Solid", "Liquid", "Gas".

In [84]:
def water_phase(temp):
    """
    Returns phase of water at `temp`.
    Input temp is temperature in Celcius (int or float)
    temp must be greater than -273.15.
    """
    if temp > 0 and temp < 100:
        return 'Liquid'
    elif temp <= 0:
        return 'Solid'
    elif temp >= 100:
        return 'Gas'

In [85]:
## tests to try: correct implementation of `water_phase` should return all True
print( water_phase(20.0) == "Liquid" )
print( water_phase(-20.0) == "Solid" )
print( water_phase(200.0) == "Gas" )
print( water_phase(0.0) in ["Liquid", "Solid"] )

True
True
True
True


In [86]:
## range tests
results = []

for temp in range(1, 100):
    result = (water_phase(temp) == 'Liquid')
    results.append(result)

# temp <= 0
for temp in range(-100, 1):
    result = (water_phase(temp) == 'Solid')
    results.append(result)

# temp >= 100
for temp in range(100, 5000):
    result = (water_phase(temp) == 'Gas')
    results.append(result)

print('Completed a total of', len(results), 'checks...')
all(results)  # True if all results are True

Completed a total of 5100 checks...


True

In [87]:
# any = OR for a list
# all = AND for a list
all([True, True, False])

False

### List functions

Your turn to play with lists now! Complete the code required to implement the functions `compute_mean` and `compute_std` below.


#### Question 1: Mean

The formula for the mean of a list of numbers $[x_1, x_2, \ldots, x_n]$ is:
$$
    \text{mean} = \overline{x}
    = \frac{1}{n}\sum_{i=1}^n x_i
    = \tfrac{1}{n} \left[ x_1 + x_2 + \cdots + x_n \right].
$$


Write the function `mean(numbers)`: a function that computes the mean of a list of numbers

In [88]:
def mean(numbers):
    """
    Computes the mean of the `numbers` list using a for loop.
    """
    total = 0
    for number in numbers:
        total = total + number
    return total / len(numbers)  


mean([100,101])

100.5

In [89]:
# TEST CODE (run this code to test you solution)

def random_list(n=10, min=0.0, max=100.0):
    """Returns a list of length `n` of random floats between `min` and `max`."""
    import random
    values = []
    for i in range(n):
        r = random.random()
        value = min + r*(max-min)
        values.append(value)
    return values


def test_mean(function):
    """
    Run a few lists to check if value returned by `function` matches expected.
    """
    import math, statistics
    assert function([1,1,1]) == 1
    assert function([61,72,85,92]) == 77.5
    list10 = random_list(n=10)
    assert math.isclose(function(list10), statistics.mean(list10))
    list100 = random_list(n=100)
    assert math.isclose(function(list100), statistics.mean(list100))
    print("All tests passed. Good job y'all!")


# RUN TESTS
test_mean(mean)

All tests passed. Good job y'all!


In [90]:
(1 + 1e-15)  ==  1 

False

In [91]:
import math
math.isclose(1 + 1e-10, 1)

True

#### Question 2: Sample standard deviation

The formula for the sample standard seviation of a list of numbers is:
$$
    \text{std}(\textbf{x}) = s
    = \sqrt{ \tfrac{1}{n-1}\sum_{i=1}^n (x_i-\overline{x})^2 }
    = \sqrt{ \tfrac{1}{n-1}\left[ (x_1-\overline{x})^2 + (x_2-\overline{x})^2 + \cdots + (x_n-\overline{x})^2\right]}.
$$

Note the division is by $(n-1)$ and not $n$. Strange, no? You'll have to wait until stats to see why this is the case.

Write `compute_std(numbers)`: computes the sample standard deviation

In [92]:
import math

def std(numbers):
    """
    Computes the sample standard deviation (square root of the sample variance)
    using a for loop.
    """
    avg = mean(numbers) 
    total = 0
    for number in numbers:
        total = total + (number-avg)**2
    var = total/(len(numbers)-1)    
    return math.sqrt(var)

numbers = list(range(0,100))
std(numbers)

29.011491975882016

In [93]:
# compare to known good function...
import statistics
statistics.stdev(numbers)

29.011491975882016

In [94]:
# TEST CODE (run this code to test you solution)

def test_std(function):
    """
    Run a few lists to check if value returned by `function` matches expected.
    """
    import math, statistics
    assert function([1,1,1]) == 0
    assert math.isclose(function([61,72,85,92]), 13.771952173409064)
    list10 = random_list(n=10)
    assert math.isclose(function(list10), statistics.stdev(list10))
    list100 = random_list(n=100)
    assert math.isclose(function(list100), statistics.stdev(list100))
    print("All tests passed. Good job y'all!")


# RUN TESTS
test_std(std)

All tests passed. Good job y'all!


#### Exercise 2

Write a Python function called `temp_convert` that converts C to F

In [95]:
import math
def temp_convert(temp_C):
    """
    Convert the temprate temp_C to temp_F.
    """
    pass



### Exercise 4

In [96]:
import random
def roll_die():
    value = random.randint(1, 6)
    return value

In [97]:
roll_die()

1

In [98]:
for n in range(0,10000):
    if roll_die() not in [1,2,3,4,5,6]:
        print("error")

## Example function `head`

We often want to print first few lines from a file to see what data it contains.

## Lambda functions





# Dictionaries and other data structures

- Dictionaries
- Sets
- Tuples (use for swap operation)

# Objects and classes

- object = general purpose data structure
- attributes
- methods
- Application: custom interval class

# Python syntax review

- Square brackets `[]` are used for:
  - defining lists: `list = [1, 2, 3]`
  - list indexing: `ages[3] = 29`
  - dict indexing = `__getitem__` or `__setitem__`
  - slicing


- Round brackets `()` are used for:
  - defining tuples: `(1, 2, 3)`
  - enforcing operation precedence: `result = (x + y) * z`
  - defining functions
  - calling functions
  - defining class
  - creating object


- Accolades `{}`
  - define dict literals
  - define sets


- Quotes `"` and `'` 
  - define string literals
  - note raw string variant `r"..."` also exists


- Triple quotes `"""` and `'''`
  - long string literals entire paragraphs


- Hash symbol `#`
  - comment


- colon `:`
  - key: value separator in dict literals
  - signal beginning of indented block
  - slice of indices `0:3` (first four items)


- period `.`
  - decimal separator for floating point literals
  - access object attributes
  - access object methods


- comma `,`
  - element separator in lists, tuples, dicts
  - separate function arguments


- asterisk `*`
  - multiplication
  - (advanced) unpack elements of a list


- double asterisk `**`
  - exponent
  - (advanced) unpack elements of a dict


- equal sign `=`
  - assignment
  - define default keyword argument
  - pass keyword arguments


- semicolon `;`
  - allows to put multiple Python commands on single line
  - rarely used



### Python keywords

Here is a list of reserved keywords in Python:

    False      class      finally    is         return
    None       continue   for        lambda     try
    True       def        from       nonlocal   while
    and        del        global     not        with
    as         elif       if         or         yield
    assert     else       import     pass
    break      except     in         raise



# Bonus topics

- writing standalone scripts (`argparse`, example: turn `head` into script)
- `functools.partial` for currying functions (e.g sample-generator callables)
- ???generic functions `*args` and `**kwargs`
- Reading and writing files https://python-textbok.readthedocs.io/en/1.0/Python_Basics.html#files



# Python libraries and modules

Everything we discussed so far was using the Python built-in functions and data types,
but that is only a small subset of all the functionality available when using Python.
There are hundreds of Python libraries and modules that provide additional functions and data types
for all kinds of applications. 
There are Python modules for processing different data files, making web requests, performing computations, etc.
The list is almost endless,
and the vast number of libraries and frameworks is all available to you behind a simple `import` statement.

## The `import` statement 

We use the `import` statement to load a python module and make it available in the current context.
The code below shows how to import the module `<mod>` in the current notebook.
```Python
import <mod>
```
After this statement, we can now use the functions in the module `<mod>` by calling them using the prefix `<mod>.`,
which is called the "dot notation" for accessing within the namespace `<mod>`.

For example, let's import the statistics module and use the function `statistics.mean` to compute the mean of three numbers.

In [99]:
import statistics

statistics.mean([1,2,6])

3

A very common trick you'll see in Python notebooks,
is to import python modules under an "alias" name,
which is usually a shorter name that is faster to type.

The alias-import statement looks like this:
```Python
import <mod> as <alias>
```

For example, let's import the statistics module under the alias `stats` and repeat the mean calculation we saw above.

In [100]:
import statistics as stats

stats.mean([1,2,6])

3

As you can imagine,
if you're writing some Python code that requires calling a lot of statistics calculations,
you'll appreciate the alias-import statement,
since you call `stats.mean` and `stats.median` instead of having to type the
full module name each time, `statistics.mean` and `statistics.median`.

## The standard library

The [Python standard library](https://docs.python.org/3/library/) consists of several dozens of Python modules that come bundled with every Python installation.

Here are some modules that come in handy.

- `math`
- `random`
- `statistics`
- `re` 
- `datetime`
- `urllib.parse`
- `json`, `csv`, `os.path`, 
- `functools`


There are also a few libraries that are not part of the standard library,
but almost as important:
- `requests` download stuff from the web

## Installing Python packages with `pip`

TODO


## Scientific computing libraries


### NumPy
Numerical Python (NumPy) is a library that provides high-performance arrays and matrices. NumPy arrays allow mathematical operations to run very fast, which is important when working with medium- and large- datasets.

#### Example: `linspace` and other numerical calculations


### SciPy
Scientific Python (SciPy) is a library that provides most common algorithms and special functions used by scientists and engineers. See https://scipy-lectures.org/


### SymPy
Symbolic math expressions.
See [sympy_tutorial.pdf](https://minireference.com/static/tutorials/sympy_tutorial.pdf).

### Matplotlib
Powerful library for plotting points, lines, and other graphs.

#### Examples: how to create reusable functions for plotting probability distributions
- `plot_pdf_and_cdf`
- `calc_prob_and_plot`
- `calc_prob_and_plot_tails`


## Data science libraries
- `pandas` library for tabular data (See [pandas_tutorial.ipynb](./pandas_tutorial.ipynb) notebook)
- `statsmodels` models for linear regression and other

- `seaborn` high-level library for statistical plots (See [seaborn_tutorial.ipynb](./seaborn_tutorial.ipynb) notebook).
- `plotnine` another high-level library for data visualization base don the grammar of graphics principles
- `scikit-learn` tools and algorithms for machine learning


# Discussion

Let's go over some of the things we skipped in the tutorial,
because they were not essential for getting started.
Now that you know a little bit about Python,
it's worth mentioning some of these details,
since it's useful context to see how this "Python calculator" business works.
I also want to tell you about some of the cool Python applications
you can look forward to if you choose to develop your Python skills further.


## Running Python code interactively

Notebooks are an example of "interactive" use of the Python interpreter.
You enter some commands `2+3` in a code cell,
press SHIFT+ENTER to run the code,
and you see the result.

There are several different ways you can access the Python interpreter.

- `python` shell.
  This is what you get in you install Python on your computer.
  You can open a command prompt (terminal or cmd.exe) and type in the
  command `python` to start the interactive Python shell.

    ```
    > python
    Python 3.6.9 (default, Oct  6 2019, 21:40:49)
    [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 2+3
    5
    >>> 
    ```

- `ipython` shell.
   This is a fancier shell with line numbering and
   many helpful commands.
    ```
    > ipython
    Python 3.6.9 (default, Oct  6 2019, 21:40:49)
    Type 'copyright', 'credits' or 'license' for more information
    IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

    In [1]: 2+3
    Out[1]: 5

    In [2]: 
    ```

- Jupyter notebooks are web-based coding environments that allow you
  to mix code cells and Markdown cells to create "code documents."
  Notebook files have an extension `.ipynb` and can be created using JupyterLab.
  Several other systems like nbviewer, GitHub, VSCode, Google Colab,
  can also be used "open" notebooks for viewing and "run" the notebooks interactively.  
  <img src="attachment:f082c596-74ed-47c2-9081-8faf2984ccb2.png" width="300" alt="jupter-code-cell">

- Colab notebooks.
  Google operates a service called "Google Colaboratory" (Colab for short)
  that allows you to run Python code as Colab notebooks.
  <img src="attachment:d41cd497-3947-46de-8aee-5bfee616a406.png" width="200" alt="colab code cell">

Note the "Python calculator" functionality works the same way in each case.
The basic Python shell, the fancy `ipython` shell, and the notebook interface
all offer a place to input your commands,
they READ your command input,
EVALUATE them (i.e. run them),
PRINT the output of the commands execution.
At the end of the READ-EVAL-PRINT steps,
the Python in interpreter goes back into "listening mode" 
waiting for your next command input.

The overall behaviour of the Python interpreter is an example of the
READ-EVAL-PRINT Loop (REPL) that appears in professional human-computer interfaces.
The command line prompt (terminal on UNIX or `cmd.exe` on Windows),
database prompts,
the JavaScript console in your browser,
the Ruby interactive console `irb`,
and any other interface which accepts commands.

Given this multitude of choices,
we've opted to use a Jypyter notebook to present this tutorial.
Keep in mind you could run all the code examples in python shell,
or ipython shell, or as a Colab notebook.

While we're on the topic of running Python code,
let's briefly mention the other ways Python applications can operate.
This is completely out of scope for the remainder of the discussion in this tutorial,
since we're just using Python as a fancy calculator,
but I though I'd mention some of the other uses of Python codes.

## Python applications

Python is not just a calculator.
Python can also be used for non-interactive programs and services.
Python is a general-purpose programming language so it enables a lot of applications. The list below talks about some areas where Python programming is currently being used.

- command line scripts: you can put commands line scrips are written in Python,
  then run them on the command line (terminal on UNIX or or `cmd.exe` on Windows).
  For example, you can download any video from YouTube by running the command
  `youtube-dl <youtube_url>`.
  If all you want is the audio, you can use some command-line options to specify 
  `youtube-dl --extract-audio --audio-format mp3 <youtube_url>` to extract the
  audio track from the youtube video and save it as an mp3.
  The author uses this type of command daily to make local copies of songs
  to listen to them offline.

- graphical user interface (GUI) programs: many desktop applications are written in Python.
  An example of a graphical, point-and-click application written in Python is `Calibre`,
  which is a powerful eBook management library and eBook reader and eBook converter,
  that supports all imaginable eBook formats.

- web applications: the Django and Flask frameworks are often used to build web applications.
  Many of the websites you access every day have as server component written in Python.

- machine learning systems: create task-specific functions by using probabilistic models instead of code. Machine learning models undergo a training stage in which the model parameters are "learned" from the training data examples, after which the model can be queried to make predictions. 





I mention these examples so you'll know the other possibilities enabled by Python,
beyond the basic "use Python interactively like a calculator" code examples
that you'll show in the rest of this tutorial.
There is a lot of useful stuff

## Python programming

Coding a.k.a. programming, software engineering, or software development is a broad topic,
which is out of scope for this short tutorial.
If you're interested to learn more about coding,
see the article [What is code?](https://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/) by Paul Ford.
Think mobile apps, web apps, APIs, algorithms, CPUs, GPUs, TPUs, SysOps, etc.
There is a lot to learn about applications enabled by learning basic coding skills,
it's almost like reading and writing skills.

Learning programming usually takes several years,
but you don't need to become a professional coder to start using Python for simple tasks,
the same way you don't need to become a professional author to use writing for everyday tasks.

# Links

I've collected the best learning resources for Python,
which you can use to learn more about Python.


## Python cheatsheets

- Good quick reference  
  https://gto76.github.io/python-cheatsheet/
- https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/
- https://ipgp.github.io/scientific_python_cheat_sheet/
- https://learnxinyminutes.com/docs/python/
- https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf
- https://www.pythoncheatsheet.org/
- https://homepage.univie.ac.at/michael.blaschek/media/Cheatsheet_python.pdf
- https://cheatsheets.quantecon.org/


## Introductions and tutorials

- Python tutorial by Russell A. Poldrack  
  https://statsthinking21.github.io/statsthinking21-python/01-IntroductionToPython.html

- Programming with Python by  Software Carpentry team:  
  https://swcarpentry.github.io/python-novice-inflammation/

- Official Python tutorial:
  https://docs.python.org/3.10/tutorial/

- Python glossary:
  https://docs.python.org/3.10/glossary.html#glossary

- Nice tutorial:  
  https://www.pythonlikeyoumeanit.com/

- Python data structures  
  https://devopedia.org/python-data-structures

- Further reading
  https://github.com/rasbt/python_reference

- https://walkintheforest.com/Content/Introduction+to+Python/%F0%9F%90%8D+Introduction+to+Python

- Online tutorial 
  https://www.kaggle.com/learn/python

- Complete list of all the Python builtins  
  https://treyhunner.com/2019/05/python-builtins-worth-learning/  
  via https://news.ycombinator.com/item?id=30621552




## Special topics

- Stats-related python functions  
  https://www.statology.org/python-guides/

- https://github.com/mtlpy/mp-84-atelier/blob/main/ressources.md


- Python types (`int`s, `float`s, and `bool`s)    
  https://github.com/anthony-agbay/introduction-to-python/blob/main/modules/basic-python-types-ints-floats-bools/basic-python-types-ints-floats-bools.ipynb

- Python string operations  
  https://github.com/anthony-agbay/introduction-to-python/blob/main/modules/basic-python-types-strings/basic-python-types-strings.ipynb

- Scientific computing  
  https://devopedia.org/python-for-scientific-computing

- about NaNs
  https://news.ycombinator.com/item?id=30558690



## Books

- Python book for beginners ([discussed here](https://news.ycombinator.com/item?id=27141644))  
  https://learnpythontherightway.com/

- https://automatetheboringstuff.com/

- Object-Oriented Programming in Python  
  https://python-textbok.readthedocs.io/en/1.0/index.html