# Math^Industry Python Prep Notes

Prior to the taught material on Jupyter/Python, we would like you to self assess your skill level in Python and brush up your skills if necessary. Python is a very user friendly and is quite easy to learn. You'll find some notes below, but they are mostly intended to remind you of syntax rather than introduce the concepts. If the concepts are unfamiliar to you and there are _lots_ of other learning resources; here are a few examples

* [Codeacademy](https://www.codecademy.com/learn/learn-python)
* [Google's Python Class](https://developers.google.com/edu/python/)
* [Learnpython.org](http://learnpython.org/)

Please read through the material below and try to complete the exercises, referring to the above resources as needed. Some of these resources will let you run python snippets as part of the tutorial, but you can also access python in a number of other ways.

  1. [m2pi.syzygy.ca](https://m2pi.syzygy.ca) - A free JupyterHub instance created by PIMS, accessible via your GitHub account
  1. [Google Colab](https://colab.research.google.com/?utm_source=scs-index) - Google's browser based collaborative Jupyter environment
  1. [Miniconda/Anaconda](https://www.anaconda.com/products/individual) - A local installation option if you want to run things directly on your own device.
  
Once you've selected an environment, let's get started!

## 1.1 Basic Syntax

Like other programming languages, Python [reserves some keywords](https://www.w3schools.com/python/python_ref_keywords.asp) and symbols for particular purposes. One important symbol is `=`, which isreserved for assignment in Python; it binds a name (on the left hand side) to an object (on the right hand side)

In [None]:
a = 1
a = "lemon"

Python is dynamically typed, so we can change the type of the object `a` refers to. It also supports the notion of multiple assignment using the `,` symbol

In [None]:
a, b, c, _  = (1, 2, 3, 4)

The `_` in the fourth position isn't a mistake here, Python variable names are [very flexible](https://www.w3schools.com/python/gloss_python_variable_names.asp), and in this case a single underscore is a perfectly valid name...

In [None]:
_

That said, _conventionally_, people reading the code above, would usually interprate the 4th value as being "thrown away". Python's syntax is very flexible, which is great, but to avoid unintentional problems, there are a number of idioms and conventions people stick to. You'll see people referring to good code as [pythonic](https://intermediate-and-advanced-software-carpentry.readthedocs.io/en/latest/idiomatic-python.html). Trying to emulate these styles is very much recommended.

Since `=` is used for assignment, we need another operator for equality, python uses `==`. There is also an `is` keyword which does something related but subtly different

In [None]:
a = 3
b = 3.0
a == b

In [None]:
a is b

The distinction is that `==` tests for equality of objects, while `is` tests if the names/variables refer to the same object in memory.

Python has the usual arithmetic operators (`+`, `-`, `*`, `**`, `/`, `//`, `%`). In general these are overloaded and will do something sensible with lots of different types (e.g. `*` will repeat strings). There are also bitwise operators like `>>`, `&`, `|` etc. and convenience functions (`hex`, `bin`, `int`, `abs`, `round` etc.). In general python follows C-style operator precedence, but if in doubt add parentheses.

### 1.1.1 Types

Python has a generous type system, including types for...

  * Numbers: (ints, floats, complex numbers, Fractions, Decimals, ...). 
  * Strings: Python strings support unicode, indexing, slicing etc. and have _lots_ of very useful methods such as , `.upper()`, `.join()`, `.strip()`, `.startswith()`
  * Files: Python has good support for the usual file operations and lots of useful methods such as `.read()`, `.readline()`, `.close()`. The file `open` keyword can be used with the `with` context manager to make sure file handles are tidied up as needed
  * Collections
    * Lists: ordered collections which support arbitrary objects, indexing, sliceing etc. and lots of nice methods (e.g. `.append`, `.pop`, `.reverse`)
    * Dictionaries: the default hashed collection (`for k, v in D.items()`), again lots of really nice methods and fast lookups.
    * Tuples: An immutable collection (being immutable can be very useful and also allows certain operations to be very fast)
    * Sets: Supports unique entries + set operations. Fast lookups and lots of methods - see also [Collections in the Standard Library](https://docs.python.org/3/library/collections.html))

### 1.1.2 Conditionals

The most common conditional statements in python are made up of the `if`, `elif` and `else` keywords.

In [None]:
a = 5
if a < 4:
    print("A is less than 4")
elif a == 4:
    print("A is equal to 4")
else:
    print("A is greater than 4")

but there is also a ternary operator for short statements

In [None]:
e = a if b < c else b

To implement more complicated logic, you can nest these conditionals as well as using the `and`, `or` and `not` keywords, and Python has a well developed notion of the boolean value of various objects...

#### "Truthy" Values

  * Non-empty lists, sets, tuples, dicts, strings
  * non-zero values
  * `True`

### "Falsy" Values

  * Empty lists, sets, tuples, dicts, strings
  * Zero values "0", "0.0", "0j"
  * `False`
  * `None`

You can assign boolean values to variables in your code or you can use the results of a boolean expression directly. Try predicting the output of the cells below before executing them

In [None]:
bool(None)

In [None]:
bool(1)

In [None]:
if (True or False):
    print('a')
else:
    print('b')

In [None]:
for i in [1, 7, 0, -3, -0.1]:
    if i: print(i)

In [None]:
x = []
if x: 
    print('if x') 
if x is not None: 
    print('if x is not None')

In [None]:
myList = [1, 3, "cat", -2, None, 13, dict(), "dog"]

while myList:
    item = myList.pop()
    if item:
        print(f"I found a ... {item}")
    else:
        print(f"I found something, but I think it might be nothing!")


# 1.1.3 Indexing and Slicing

Python indexing is very flexible and can help you solve lots of different kinds of problems. All collections in python support some notion of indexing, but we'll focus on numerical indexing (lists and tuples) for now and talk about hashes (dicts and sets) later. The advantage of numerical indexing is that the index values have a natural order $(0, 1, \ldots, N-1)$

In [None]:
alphabet = [x for x in 'abcdefghijklmnopqrstuvwxyz']
alphabet

Python is "zero indexed"; the first position in an index is labeled by 0 and if the list is $N$ elements long, the last position will be labeled by $N-1$

In [None]:
print(f"Starting at '{alphabet[0]}' and ending at '{alphabet[len(alphabet) - 1]}'")

We can specify ranges (called slices) by separating the start and stop with a colon, but notice that the start value is _inclusive_ while the end value is _exclusive_.

In [None]:
print(f"positions [1:5] give us {alphabet[1:5]} but position [5] is '{alphabet[5]}'")

If you omit the start index, python assumes you want to start at position 0. If you omit the stop index, it assumes you want the last element. You can also specify a "stride" length to select regular patterns of items (e.g. every 3rd letter)

In [None]:
alphabet[5:15:3]

Python also lets you give negative numbers in any of the slice positions. In the start and stop positions, these are interpreted relative to the _end_ (-1 is the last element of the collection). For the stride position it means counting from the stop to the start position. The exclusive/inclusive behaviour still applies for a negative stride but it takes a little bit of getting used to!

In [None]:
alphabet[5:0:-1]

In [None]:
alphabet[-10:-5:1]

In [None]:
alphabet[-5:-10:-2]

## 1.2 Exercises

1. Given the edges of a triangle labled `a`, `b` and `c`, write a conditional test to say whether a triangle is equilateral, isoceles or scalene
1. Given the following rubric assign test the value of a variable called grade and assign a letter grade
  * A: grade > 80
  * B: 70 < grade <= 80
  * C: 60 < grade <= 70
  * D: grade <= 60
1. What is the output of
```python
a = 3
b = 14
print( not ( not a == 3 or not b == 14) )
```
1. Assign the following sentence to a variable (a string) then use the `.split()` method of the string to break it into a list of words. Finally, use indexing to display every second word 
> It was a bright cold day in April, and the clocks were striking 13.


## 2.1 Loops

Python has two main types of loop, `while` and `for`.

#### 2.1.1 `while` loops
Evaluate the condition at the top of the loop and if it is true, execute the body of the statement, if not go to the next statement


In [None]:
a = 0
while True:
    a = a + 1
    if a > 10:
        break
    print(a)

In this case, `True` is always `True` so we use the `break` to tell the loop when we are done. There is also a `continue` statement for flow control inside loops, take a look at `help('continue')`,

#### 2.1.2 `for` loops

`for` loops are very common in Python and similar to those in other languages. Once nice tweak in python is that the for loop can interate over any sequence

In [None]:
for animal in ['cat', 'dog', 'elephant']:
    print(animal, len(animal))

In [None]:
for i in range(10):
    print(i, i**2)

#### Comprehensions

When the loop body is small and simple, you can also use a list comprehension in place of a for loop. Once you get used to the syntax these are very handy, but they can make your code a bit harder for newcomers to follow and it is easy to get carried away so use them sparingly. The syntax is

```python
[<statement in x> for x in <list>]
```

and it will generate a list of the values of <statement in x>. Actually you can include an optional if statement after the <list> to filter the list but again it's best to keep list comprehensions short and simple.

In [None]:
[x for x in range(1, 100)if x % 2 == 1]

There is also a notion of dictionary comprehensions

In [None]:
{f"{x}^2": x**2 for x in range(10)}

#### Exercises

1. Make a simple/explicit for loop printing the odd numbers from 1 to 99
1. Use nested for loops to print all of the items in this list individually
```python
regions = [
    ['Canada','USA', 'Mexico'],
    ['France', 'Germany', 'Romania'],
    ['Australia', 'New Zealand']
]
```
1. Redo the first question as a list comprehension

## 3.1 Functions

### 3.1.1 defining functions
Functions let you encapsulate and reuse logic. To define a function in python you use the `def` keyword.

Typical form
```python
def <name>(arguments):
    <statements>
    return <object>
```

Basically, `def` is assiging that executable code to the name `<name>`.

In [None]:
def double(x):
    return 2 * x

double(3)

In [None]:
double('Hello')

Adding more arguments is easy

In [None]:
def multiplyby(x, n):
    return x * n

multiplyby(5, 3)

The arguments and return value(s) don't have to be simple types...

In [None]:
multiplyby("hip hip, ",3).strip()

### 3.1.2 Scope
Python uses namespaces to keep variables from colobbering one another and to make modules and code more portable. For example, when you define $\pi = 3$ you don't want the value defined in the scipy module to clobber it. With namespacing you can safely set the variable x in two different contexts and have them not interfere with each other. When you _want_ to have them interfere with each other, you have to understand the heirarchy of namespaces that python defines (the scope of the name x).

The basic heirarchy is something like this...
* **B**uilt in: e.g KeyWords open, range, ...
  * **G**lobal (module): Things at the top level of a module e.g. random inside numpy
    * **E**nclosing function locals
      * **L**ocal (function): names assigned within a function and not set global

The further down that list you go, the more specific the name is and the idea is that the most specific should win (like CSS etc.). It is usally referred to as the LEGB rule. As an example, if I do `from numpy import random`, then define random as a variable, my definition "wins"

In [None]:
from numpy import random
random=3
random

Sometimes you might want need to access a variable from one of the outer scopes, you can do this as with the global keyword as follows


In [None]:
x = 3
def increment_x():
    x = 0
    x += 1

increment_x()
print(x)

In [None]:
x = 3
def increment_x():
    global x
    x += 1
increment_x()
print(x)

### 3.1.3 Lambda Functions
Python supports the idea of lambda functions. These are basically "anonymous functions". You can use them anywhere you would normally use a function, but you don't want to go to the bother of actually naming the thing. This sounds abstract or odd, but it is sometimes useful, I swear!

In [None]:
def operateon(f, x, n):
    return f(x, n)

operateon(lambda x, n: x**n, 3, 4)

Lambda functions commonly come up where someone has written code which expects a function as one of the arguments (e.g. massaging numbers to look like dates so that pandas can ingest them). Similar to list comprehensions and generators, you might skip over lambda functions when first learning python but they are worth picking up at sooner or later because they can make your code much neater and more efficient.



#### 3.1.4 Function Arguments
Functions act on arguments passed to them between the parentheses. Going beyond the simple examples above, Python adds a little flexibility to how arguments are specified to

  * Argument lists can be arbitrarily long and each argument can be an arbitary python object.
  * You can include both positional and keyword arguements. Positional arguments are just a list of names (`x`, `y`, `z`), while keyword arguments include values (`x=1`, `y=2`, `z=3`). You can mix the types of arguments, but the positional arguements must come first.
  * You can specify default values when writing keyword arguments. e.g If you include `x=1` in the argument list but don't include a value for `x` when calling the function, the value 1 will be used.
  * Functions can support arbitrary numbers of positional arguments. To do this, you prefix the argument with a `*`. Inside the function you can iterarte over this argument as a list.
  * Functions can support arbitrary keyword arguments. To do this, you prefix the argument with `**`. Inside the function you can iterate over this argument as a dictionary of whatever the caller decided to pass in.

These last two points might sound arcane, but they are important and widely used. A good example is matplotlib where plotting functions can use hundreds of arguments. It is much easier to prepare a dictionary of all of your settings and expand that as needed.



In [None]:
def arguments(a, b, *args, c=1, **kwargs):
    print(f"a and b are required arguments: {a}, {b}")
    print(f"and c always has a value: {c}")
    for arg in args:
        print(f"I found an extra argument: {arg}")
    
    for k, v in kwargs.items():
        print(f"I found an extra keyword argument: {k}:{v}")
        
        
arguments(1, 2, 3, 4, 5, c=6, fruit="banana", time="noon", color='red')

## 3.2 Exercises

1. Write a function which takes two strings as it's arguments and returns a tuple where the first item is both strings concatenated and the second is their combined length
1. Write a function which takes a list of numbers and returns the max and min of the elements
1. Write a function which takes an arbitrary number of positional arguments and multiplies them together
1. Change the function above so that it accepts a single keyword argument which is a binary operator and will apply that operator to all of the positional arguments (try `from operator import mul, add`)

## 4.1 Modules

Python is often described as a "batteries included" language, in that the basic installation has  a _lot_ of functionality included. Beyond the basics, there is an extensive [standard library](https://docs.python.org/3/library/index.html) which provides some extremely useful functionality.

* **os, pathlib, sys, argparse**: Interface with your operating system. Work with files etc.
* **string, re**: Working with strings and regular expressions
* **math, random, statistics, decimal, fractions**: Basic mathematics (we'll stick with `numpy`)
* **time, datetime**: Work with dates and times, datetime interfaces well with `pandas`
* **zlib, gzip, bz2, lzma, zipfile, tarfile**: Working with compression
* **json, csv**: dealing with files (see also `pandas)
* **email, smtplib, urllib**: Internet protocols
* **hashlib, hmac, secrets**: Working with cryptography
* **unittest**: Testing!                                            
* **collections, itertools**:
* **pickle, shelve, dmb, sqlite3**: 
* **subprocess, threading, multiprocessing**:
* **asyncio, socket, ssl**:

#### 4.1.1 OS

OS integrates with the operating system of the machine where Python is running.


  1. Try to find the value of the `HOME` environment variable.
  1. What is the current working directory? Try `os.*cwd?` in a python cell

In [None]:
import os

In [None]:
os.environ['HOME']

In [None]:
os.getcwd()

#### 4.1.2 Datetime

Datetime provides ckasses for manipulating dates and times. For the most part we'll end up using `pandas` to do this for us, but there is significant overlap and `datetime` is well worth knowing

In [None]:
from datetime import date, datetime

 1. Import the `date` object and use it to get today's date
 1. Calculate the number of days between Jan 1st and today
 1. Parse this string to a datetime "2021, August 3 11AM" (try `datetime.strptime`)

In [None]:
date.today()

In [None]:
(date.today() - date(2021,1,1)).days

In [None]:
datetime.strptime("2021, August 3 11AM", "%Y, %B %d %I%p")

#### 4.1.3 Regular Expressions (`re`)

A regular expressions is...

> A sequence of characters that specifies a search pattern - _[Wikipedia: Regular Expression](https://en.wikipedia.org/wiki/Regular_expression)_

Any time you are working with text where there is any sort of regularity or structure in the text, regular expressions can help you pick out the parts you need. [Python Regular Expressions](https://docs.python.org/3/howto/regex.html#) are implemented in the `re` module. At a very basic level, `re` can help you match explicit strings within blocks of text...

In [None]:
import re

line = "Of all the gin joints in all the towns in all the world, she walks into mine."

x = re.findall('in', line)
x

But regular expressions can do a whole lot more. The basic idea is to use a special language to describe exactly what you are trying to find. Here is a simple example

```
^[A-Z]{1}[a-z]+
```

and here is how to interprete that specific regular expression (regex)

  * The `^` at the beginning says to match the beginning of a line (first char)
  * The `[A-Z]{1}` is a single instruction
      * `[A-Z]` match any upper case letter from `A` to `Z`
      * `{1}` match exactly 1 occurence of the preceeding item (1 upper case letter)
  * `[a-z]+` matches one _or more_ occurrences of the letters `a` to `z` 

So this regex should be able to match a single (ascii) word at the beginning of a line which starts with a capital letter.

In [None]:
dahl = """
In fairy-tales, witches always wear silly black hats and black coats,
and they ride on broomsticks. But this is not a fairy-tale. This is 
about REAL WITCHES.
"""

for line, text in enumerate(dahl.strip().split('\n')):
    
    matches = re.search(r'[A-Z]{1}[a-z]+', text)

    print(f"On line matches {matches}")

In addition to just telling you if it finds a match to your regular expression, the `re` module can help you extract the specific text which matched and use it in your code. This is called [match grouping](https://docs.python.org/3/howto/regex.html#grouping) and to use it, you surround the items you want to extract with parentheses

In [None]:
race = [
    "Kiesenhofer AUT  3:52:45",
    "Vleuten     NED  3:54:00",
    "Borghini    ITA  3:54:14"
]

m = re.compile('^([A-Z]{1}[a-z]+)\s+([A-Z]{3})\s+(\d{1,2}\:\d{1,2}\:\d{1,2})$')

for cyclist in race:
    name, country, time = m.match(cyclist).groups()
    print(f"{name} finished in {time} from {country}")

Another handy trick with grouping is to assign names to the match groups. The basic syntax looks like `(?P<name>[A-Z]{1}[a-z]+)` but see the [named groups documentation](https://docs.python.org/3/howto/regex.html#non-capturing-and-named-groups) for more details.

#### Tips

Constructing regex's can be a bit of an art form and the final expressions _can_ become hard to read (even when you are the person who wrote them!). All I can say is it is worth slowing down, reading and reading them carefully. Once you've broken them down into components they become much more digestible and the rewards you will get from being able to wield regex's far outweigh the difficulty in learning them.

* The [Python Regex Documentation](https://docs.python.org/3/howto/regex.html#) is very good
* Make sure you understand the difference between `re.search`, `re.match`, `re.findall`
* [pythex.org](pythex.org) is a very handy website which lets you put in your regular expression and a test string. It'll show you the results and explain how it used your regex to get them with syntax highlighting. If you stumble across an unreadable regex in the wild, [pythex.org](pythex.org) can be a handy first step in understanding it.
* If there is _even more_ structure in your data (e.g. a CSV), then regex's might _not_ be the answer, in that case modules like pandas can do all of your regexes and matching for you.

#### 4.1.4 Collections 

In addition to the builtin collection types, the Python standard library includes the [collections module](https://docs.python.org/3/library/collections.html) which adds some useful variations.

  * namedtuple: Like a tuple, but with names!
  * dequeue: A list like container with fast appends and pops
  * Chainmap: Combine several dictionaries
  * Counter: Turns an iterable into a dictionary of keys and value counts
  * defaultdict: Like a dictionary but works with missing values
  * UserDict, UserList, UserString: Subclasses of existing collections
  
We'll just take a quick look at Counter for now, but they are all worth playing with.


In [None]:
from collections import Counter

message = "we slept in what had once been the gymnasium"

letter_count = Counter(message)
letter_count.most_common()

As with many things in python, you _could_ implement this functionality yourself, but when there's a reliable and efficient module available you should make use of it!

## 4.2 Exercises

1. Using the datetime module to find the day of the week for the date `2322-06-13`
1. Using the `re` module, write a regular expression to extract the first dollar amount from the following sentence
> Lunch cost \\$12.00, but I also spent \\$4.00 on a coffee.

## 5. Classes

Python implements an `object` type and all of the basic types (int, boolean, string) etc. are actually subclasses of that type. When you use the `class` keyword to create your own objects, they will also subclass object, placing your objects on the same level as the built in types.

In [None]:
issubclass(int, object)

In [None]:
class Vehicle:
    def honk(self):
        print("HONK")

In [None]:
issubclass(Vehicle, object)

We can create instances with `<classname>()`

In [None]:
IansBike = Vehicle()
IansBike.honk()

Part of what we inherited from the `object` was the default `__init__` constructor (and a lot of other methods, take a look at the tab completion or `dir(IansBike)`.


Python supports multiple ineritance. The lookup for multiple inheritance is depth first then left to right, in this case the `.m()` method of `B` and `C` are at the same depth but `B` occurs first so that is what we get.  

## 5.2 Exercises

1. Rewrite the the `Vehicle` class to include an explicit `__init__` constructor which can take one argument called alert which contains the noise the vehicle horn should make. Cars should go "HONK" and bikes should go "RING"

1. Write a subclass of the Vehicle class called `Bus` whose contructor requires a `number` argument which gives the number of the bus. Try overriding the `__repr__()` method to change how `Bus`s are displayed.