# Python Overview

## Python's Origins

Python was conceived in the late 1980s, and its implementation began in December 1989 by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a successor to the ABC language. It takes its name from Monty Python's Flying Circus.

Python is a dynamic language but is strongly typed (i.e. variables are untyped but refer to objects of fixed type).

## How Python Evolves

Python evolves in a fairly straightforward way, more-or-less like this:

- people propose changes by writing *Python Enhancement Proposals* (PEPs)
- the Python core committee will assign a 'dictator' who will decide whether the PEP is worthy of becoming part of the standard, and if so it does, after some amount of discussion and revision
- disagreements are finally settled by Guido van Rossum, Python's inventor and the 'Benevolent Dictator for Life' (BDFL)

An important standard PEP is the Style Guide, PEP-8 (https://www.python.org/dev/peps/pep-0008/). By default, PyCharm will warn of any PEP-8 violations. There are external tools such as `flake8` (https://gitlab.com/pycqa/flake8) that can be used to check code for compliance in other environments.


## StackOverflow is your friend!

For Python questions and Python data science questions, make use of StackOverflow. Pay attention to comments on suggested answers; the "accepted answer" is not always the best. Look for comments about whether it is the "most Pythonic".

https://stackoverflow.com/questions/tagged/python

## Python 2.7 or Python 3.x?

You can use conda to create a Python 2.7 virtual environment for when you have to use 2.7, but all new projects should be Python 3.5 or later. Python 2.7 is the end of the 2.x line and is supposed to be end-of-lifed in 2020. Avoid it; the only reason to use it is if there is a package you really need that hasn't been ported yet.

## Python docs

https://docs.python.org/3/ has very detailed documentation.

Most Python packages have good documentation at https://readthedocs.org/

If you use Python a lot on a Mac you may find Dash useful: https://kapeli.com/dash

That said, Python has a help() function that is very useful.

## Installing Packages

The standard way to install packages is with `pip install`. However, if you have installed `conda` you should use `conda install` first and only if that fails use `pip install`. Conda has a smaller set of packages which is why it doesn't always succeed, but the ones it does have have been built for Conda so installing that way is preferred.

Use `conda uninstall` or `pip uninstall` to remove packages.

To see what packages are installed use `pip freeze`.

There's a lot more to package installation than this but this is enough for 90%+ of what you will do.


## Using the REPL

To start the REPL, just type `python` at the command line.

Use the `help()` function to read the documentation for a module/class/function. As a standalone invocation, you enter the help system and can explore various topics.

Python scripts are stored in plain text files with `.py` extensions. You can run the script `foo.py` at the command line by invoking `python foo.py`. When you do so the Python interpreter will compile the script to an intermediate bytecode, and the result will be stored in a file with the same base name and a `.pyc` extension. As an optimisation, the interpreter will look to see if a `.pyc` file with a more recent file modification date exists when you invoke it to run a script and use that if it does.

## A better REPL: bpython

https://www.bpython-interpreter.org/

You can install with `pip install bpython`.

bpython adds a number of useful features at the command line, like syntax highlighting and auto-completion. If you're going to use the command line repl I recommend it, although there are other options too that I haven't tried:

- ptpython https://github.com/jonathanslenders/ptpython
- DreamPie http://dreampie.sourceforge.net/


## Python is an OOPL

Python is a pure object-oriented language. Operators are simply methods on a class. The Python interpreter will convert an infix operator to an instance method call.

For example, there is an `int` class for integers. The operation `int(3)` boxes the literal 3 up into an `int` object instance. There is an `__add__` method defined on that class for addition. So:

    3 + 4
    
is the same as:

    3.__add__(4)
    
The double underscore in Python is called 'dunder' and is used extensively internally; `__add__` is called a dunder-method. Dunder-methods are important to understand if you want to take full advantage of Python hence this early introduction.

You can see the methods on a class by using the `dir` function, for example `dir(int)`.

We will discuss how to define new classes later. A key takeaway here is that this use of dunder-methods allows us to override many operators simply by overriding the associated dunder-method. Two particularly useful ones are `__str__` (cast to string) and `__repr__` (cast to text representation); these are typically the same for a class but need not be. For example, notice the differences here:

In [None]:
a = "abc"
print(a.__str__())  # Equivalent to str(a)
print(a.__repr__())

## Indentation and Comments

Pythomn does not use {} for demarcating blocks of code; instead it uses indentation. This distinguishes it from most other programming languages and can take some getting used to. In particular, it requires care when pasting code in an editor (most Python editors are smart about this but other editors are not). The reason for this choice is that Guido originally designed Python as a teaching language and favored readability.

The convention in Python is to indent with spaces, not tabs (this avoids tab settings causing misnterpretation of code). Indentation standard is 4 spaces at a time, although some companies have different conventions (usually 2, if not 4).

Comments start with # and continue to the end of the line. By convention if # is used on the same line as code it should be preceded by at least two spaces.

## Simple Functions

Python named functions are defined with `def`:

In [74]:
def add(a, b):
    return a + b

add(2, 3)

5

## Types

### None

Python has no null object, but has a special object `None`. 

### Numbers

Most of the typical operators you know from other languag
es are supported. Here are some more-specific to Python:

In [None]:
print(bool(3))  # Convert to Boolean
print(str(3))  # Convert to string
print(bool(0))

In [None]:
print(3 // 2)  # Integer division with truncation
print(3 / 2)  # Float division

In [None]:
print(int(2.5)) # Convert to int with truncation
print(round(2.5))  # Convert to int with rounding (this one is odd; I'd expect it to round up)
print(round(2.5001))  # Convert to int with rounding

In [None]:
print(2 ** 3)  # Exponentiation
print(~3)  # Bitwise inverse
print(2**120)  # Python ints are arbitrary precision, not 64-bit

In [None]:
print(2.0.is_integer())
print(2.5.as_integer_ratio())  # Convert to fraction tuple; we'll cover tuples later

In [None]:
# Note that += and -= (and *=, etc) are supported but ++ and -- are not.

### Strings

Python 3 strings are unicode. String literals can use single our double quotes (but must use same type to close as to open). Multi-line strings are most easily written using triple quotes.

In [43]:
print('foo')
print("bar")
print('"foo"')
print("'bar'")
print("""I am a 
multiline string""")

foo
bar
"foo"
'bar'
I am a 
multiline string


You can use the usual suspects of `\n`, `\t`, etc in strings, and use `\` to escape special characters like quotes and `\` itself.

In [44]:
a = "the cat sat on the mat"
print(len(a))  # len gets the length of the string; implemented by __len__

22


In [46]:
print("cat" in a)  # 'in' is implemented by __contains__
print("dog" in a)

True
False


In [47]:
print(a[0])  # Implemented by __getitem__
a[0] = "t"  # No can do; strings are immutable.

t


TypeError: 'str' object does not support item assignment

In [50]:
# Some useful functions. Note these all return copies of the string; strings are immutable!
print(a.lower())
print(a.upper())
print(a.capitalize())  # Capitalize first letter

the cat sat on the mat
THE CAT SAT ON THE MAT
The cat sat on the mat


In [54]:
# Like any object that supports __len__ and __getitem__, strings are sliceable.
# Slicing uses [start:end] or [start:end:increment] where any of these are optional
# start defaults to 0, end to __len__(), and increment to 1. 
# start and end can be positive (from start of string) or negative (from end of string).

print(a[2:])   # skip first two characters
print(a[-7:])  # the last 7 characters
print(a[2:6])  # 4 characters starting after 2nd character
print(a[::2])  # Every second character

e cat sat on the mat
the mat
e ca
tectsto h a


In [55]:
# Use find and rfind to find first/last occurence of a string; return offset or -1 if not found
# You can also use index/rindex which are similar but raise ValueError exception if not found.

print(a.find('he'))
print(a.rfind('he'))
print(a.find('cat'))
print(a.find('dog'))

1
16
4
-1


In [62]:
# You can convert from character to ordinal or vice-versa with ord() and chr()
print(chr(65))
print(ord('A'))

A
65


In [66]:
# Python has no character type, just string. So functions that would apply to just 
# a character in other languages.
print("123".isdigit())
print("1X3".isdigit())
print("NOOOOooo".isupper())

True
False
False


There are many more string operations available; these are just the basics.

### Lists

Lists are ordered, mutable sequences. They can be indexed, sliced (more on that below), appended to, have elements deleted, and sorted. They are heterogeneous. Examples:

In [None]:
a = [1, 2, 3, "cat"]

print(a)
print(len(a))  # len() gives the length of the list
print(a[1])  # [] can be used to index in to the list; implemented by list.__getitem__; assignment uses list.__setitem__
print(a[-1])  # negative indices can be used to index from the end of the list (-1 for last element)

In [None]:
# * can be used to create multiple concanenated copies of a list; implemented by list.__mul__
    
print(a)
a = a * 2 
print(a)

In [None]:
# `in` can be used to check for membership; implemented by list.__contains__

print(a)
print('cat' in a)  
print('dog' in a)

In [None]:
print(a)
print(['dog'] + a)  # + can be used to concanetenate lists; implemented by list.__add__
a.append('dog')  # append() can be used for concatenating elements
print(a)

In [None]:
print(a)
print(a.index('dog')) # Get index of first matching entry; throws exception if not found
print(a.count('cat'))  # Count the number of instances of an element

In [None]:
print(a)
a.remove('dog')  # Remove first matching instance of element
print(a)
del a[-1]  # Remove element at index; implementedby list.__del__

In [None]:
# reverse() reverses the order of the list in place; implemented by list.__reversed__
print(a)
a.reverse()  
print(a)

In [None]:
#  for..in iterates over elements
    
print(a)
for elt in a: 
    print(elt)

In [None]:
print(a)
for i, v in enumerate(a):
    print(f'Value at index {i} is {v}')  # f'' is a format string that can contain code in {}

In [None]:
b = list(a)  # Makes a shallow copy; can also use b = a.copy()
print(b)
print(a == b)  # Elementwise comparison; implemented by list.__eq__
b[-1] += 1  # Add 1 to last element
print(a == b)
print(a > b)  # Compares starting from first element; implemented by list.__gt__
print(a < b)  # Compares starting from first element; implemented by list.__lt__

In [None]:
print(a)
a.pop()  # Removes last element
print(a)
a.pop(0)  # removes element at index
print(a)

In [58]:
# You can join a list of words into a string
','.join(['cat', 'dog'])

'cat,dog'

In [71]:
# Like any object that supports __len__ and __getitem__, lists are sliceable.
# Slicing uses [start:end] or [start:end:increment] where any of these are optional
# start defaults to 0, end to __len__(), and increment to 1. 
# start and end can be positive (from start of string) or negative (from end of string).
x = [1, 2, 3, 4, 5, 6]
print(x[2:])
print(x[1:3])
print(x[-3:])
print(x[::2])

[3, 4, 5, 6]
[2, 3]
[4, 5, 6]
[1, 3, 5]


In [73]:
# Use insert() to insert at some position. This is done in-place.
x.insert(2, 'A')
print(x)
x.insert(3, [1, 2])
print(x)

[1, 2, 'A', 'A', 3, 4, 5, 6]
[1, 2, 'A', [1, 2], 'A', 3, 4, 5, 6]


In [None]:
a.clear()  # empty the list
print(a)

### Dicts

Dictionaries are mutable mappings of keys to values. Keys must be hashable, but values can be any object. 

---
_Under the hood_

A hashable object is one that defines a `__hash__` dunder-method, and an `__eq__` dunder method; if two objects are equal their hashes must be the same or the results may be unpredictable. 

---


In [2]:
# dict literals (actually a list of dicts in this example)

contacts = [
    {
        'name': 'Alice',
        'phone': '555-123-4567'
    },
    {
        'name': 'Bob',
        'phone': '555-987-6543'        
    }
]
contacts

[{'name': 'Alice', 'phone': '555-123-4567'},
 {'name': 'Bob', 'phone': '555-987-6543'}]

In [None]:
# Use [key] to get an item; this calls dict.__getitem__
contacts[0]['name']

In [3]:
# Use dict[key] = value to change an item; this calls dict.__setitem__
contacts[0]['name'] = 'Carol'
contacts[0]

{'name': 'Carol', 'phone': '555-123-4567'}

In [4]:
# Trying to use a non-existent key causes an exception
contacts[0]['address']

KeyError: 'address'

In [5]:
# You can avoid above and return a default value by using .get()
print(contacts[0].get('name', 'No name'))
print(contacts[0].get('address', 'No address'))

Carol
No address


In [6]:
# Use 'in' to see is a key exists in a dict; this calls dict.__contains__
print('name' in contacts[0])
print('address' in contacts[0])

True
False


In [7]:
# Test for equality with '==' and !=; this calls dict.__eq__ and dict.__ne__
print(contacts[0] == contacts[1])
print(contacts[0] == { 'name': 'Carol', 'phone': '555-123-4567'})

False
True


In [None]:
# Use for-in to iterate over items; this calls dict.__iter__

for x in contacts[0]:
    print(x)

In [1]:
# Use len() to get number of items; this calls dict.__len__

print(len(contacts[0]))

NameError: name 'contacts' is not defined

In [None]:
# Use 'del' to delete a key from a dict; this calls dict.__delitem__

In [None]:
# Use .clear() to empty dict (without changing references)

a = {'name': 'me'}
b = a
a.clear()
b

In [None]:
# Contrast above with assigning empty dict
a = {'name': 'me'}
b = a
a = {}
b

In [None]:
# Use .keys(), .values() or .items() to get the keys, values, or both

There are some alternative implementations in the `collections` module; you won't need these now but they may come in handy in the future, especially the first two:

* `collections.OrderedDict`s remember the order of insertion so this is preserved when iterating over the entries or keys
* `collections.defaultdict`s can specify a type in the constructor whose return vaslue will be used if an entry can't be found
* `collections.ChainMap`s group multiple dictionaries into a single item for lookups; inserts go in the first dictionary

### Sets

A set is a mutable unordered collection that cannot contain duplicates. Sets are used to remove duplicates and test for membership. One use for sets is to quickly see differences. For example, if you have two dicts and want to see what keys are in one but not the other:

In [None]:
a = {'food': 'ham', 'drink': 'soda', 'desert': 'ice cream'}
b = {'food': 'tofu', 'desert': 'cake'}

set(a) - set(b)

Sets are less commonly used than lists and dicts and we will not discuss them further here. You can read more here: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

### Tuples

Tuples are immutable sequences. Typically they are used to store record type data, or to return multiple values from a function. Tuples behave a lot like lists and support many of the same operations with similar behavior, aside from their immutability. We'll consider them briefly here.

The `collections` package defines a variant `namedtuple` which allows each field to be given a name; we won't go into that here other than to point out its existence.

In [None]:
('dog', 'canine')  # tuple

In [None]:
('dog')  # Not a tuple! This is just a string in parens

In [None]:
('dog',)  # For a single-valued tuple, use a trailing comma to avoid above issue

In [None]:
'dog',  # Parentheses are often optional

In [None]:
# Indexing can be used to get at elements, much like lists
print(('dog', 'canine')[0])
print(('dog', 'canine')[1])
print(('dog', 'canine')[-2])
print(('dog',)[0])
print(('dog',)[1])

In [None]:
# We can unpack a tuple through assignment to multiple variables
a = ('dog', 'bone')
animal, toy = a
print(animal)
print(toy)

In [None]:
# But need to ensure we use the right number of variables
a = ('dog', 'bone')
animal, toy, place = a

In [None]:
a = ('dog', 'bone', 'house')
animal, toy = a

In [None]:
# Tuples allow us to do a neat trick in Python that is harder in many languages - swap two values without using a
# temporary intermediate.
# Note what is going on here: the RHS of the assignment is creating a tuple; the LHS is unpacking the tuple.

a = 1
b = 2
print(a,b)
a, b = b, a
print(a,b)

### Iterables

## Some built-in Functions

`abs(num)` - Return absolute value

In [9]:
print(abs(3))
print(abs(-3))

3
3


`all(iterable)` - returns True if all items in the iterable are True

In [10]:
print(all([True, True, True]))
print(all([True, False, True]))

True
False


`any(iterable)` - returns True is any item in the iterable is True.

In [11]:
print(any([False, False]))
print(any([False, True]))

False
True


filter

input

isinstance

iter

len - calls the object's `__len__` method to get the length.

`max(arg1,...)` - returns the largest arg. If a single iterable arg is given it will iterate.

`min(arg1, ...)` - returns the smallest arg

In [22]:
print(max(2, 3, 1))  # Multiple scalar args
print(max([3, 2, 1])) # Single list arg
print(max([3, 2, 1], 4))  # Not allowed

3
3


TypeError: '>' not supported between instances of 'int' and 'list'

next

open

quit

repr - calls the object `__repr__` method to get a string representation

reversed - makes a copy of the objet with items in reversed order (object must support `__len__` and `__getitem__`)

round

`sorted(list)` - returns a sorted version of the list.

In [13]:
print(sorted([3, 1, 3]))

[1, 3, 3]


`sum(iterable)` - returns the sum of the iterable

In [15]:
print(sum([1, 2, 3]))

6


`type(obj)` - return the type of an object

In [16]:
print(type('foo'))

<class 'str'>


`zip(list, ...)` - combines multiple lists into a single list of tuples. Note this returns a lazy iterable, not a list

In [18]:
print(zip(['a', 'b', 'c'], [1, 2, 3]))
print(list(zip(['a', 'b', 'c'], [1, 2, 3])))  # instantiates the iterable as a list

<zip object at 0x10ca11488>
[('a', 1), ('b', 2), ('c', 3)]


## String Formatting

String formatting has evolved over time with Python. Python 3.6 introduced "format strings" which allow code to be directly embedded in the string. This is an improvement over older approaches and we will use it extensively.
Format strings have an `f` prefix and include code in `{}`. For example:

In [38]:
a = 10
print(f"2 x {a} = {2*a}")

2 x 10 = 20


If you need to use the old approaches, there are a lot of details here: https://pyformat.info/ (this doesn't seem to cover format strings yet though). That site covers things like padding, justification, truncation, leading zeroes, fixing number of decimal places, etc. We won't cover these here except the latter:

In [40]:
a = 1.23456
print(a)
print(f'{a:.2f}')  # Float restricted to two decimal places
print(f'{a:06.2f}')  # Float restricted to two decimal places and padded with leading zeroes if less than 6 chars

1.23456
1.23
001.23


When you use `f'{a}'`, Python will look in turn for a `__format__`, a `__repr__` or a `__str__` method to call to get the string representation of `a`. You can force it to use `__repr__` with `f'{a!r}'` or to use `__str__` with `f'{a!s}'`.

## Sorting



## Statements

Here we will consider statements. We'll leave some statements to when we get to exceptions, functions and classes.

For more info on statements see https://docs.python.org/3/reference/simple_stmts.html

### import

Python code is packaged in the form of _packages_ consisting of one of more _modules_. A module is a single Python file, while a package is a directory of Python modules containing an additional `__init__.py` file, to distinguish a package from a directory that just happens to contain a bunch of Python scripts.

You install a package with `pip` or `conda`. Once installed, to use the package you must import it. You can also import modules although this is less common. 

There are several common ways of importing. Let's say we want to import a package `foo` that defines a class `Widget`:

* `import foo` will import the `foo` package; any reference to modules/classes/functions will need to be prefixed with `foo.`; e.g. `foo.Widget`
* `import foo as bar` will import the `foo` package; any reference to modules/classes/functions will need to be prefixed with `bar.`; e.g. `bar.Widget`
* `from foo import Widget` can be used to import a specific module/class/function from `foo` and it will be available as `Widget`
* `from foo import *` will import every item in `foo` into the current namespace; this is bad practice, don't do it.

### pass

The `pass` statement is a no-op. This is needed in Python as the language doesn't use braces, so it is the equivalent of `{}` in Java- or C-like languages.

### del

`del` is used to delete an object; it isn't used much but can be useful if the object uses a lot of memory to allow it to be garbage-collected.

### for, break and continue

You can loop over any iterable with `for...in`. `break` and `continue` are supported, and behave in the expected fashion.

In [None]:
for i in ['green eggs', 'ham']:
    print(i)

In [None]:
for i in 'green eggs':
    print(i)

In [None]:
for i in {'a': 1, 'b': 2}: # This will loop over keys
    print(i)

In [None]:
for i in {'a': 1, 'b': 2}.values(): # This will loop over values
    print(i)

In [None]:
for i in {'a': 1, 'b': 2}.items():  # This will loop over key-value pairs as tuples
    print(i)

In [None]:
for i in [1, 2, 3]:
    print(i)

In [None]:
for i in enumerate([1, 2, 3]):  # Returns (index, value) tuples
    print(i)

In [None]:
for index, value in enumerate([1, 2, 3]):  # We can unpack the (index, value) tuples
    print(f'At position {index} we have value {value}')

In [None]:
for i in range(1, 10):
    print(i)

In [None]:
for i in range(1, 10, 2):
    print(i)

Python has an unusual construct: for..else. The else part is executed if there was no early break from the loop.

This is a common construct in other languages:

    has_even_number = False
    for elt in [1, 2, 3]:
        if elt % 2 == 0:
            has_even_number = True
            break
    if not has_even_number:
        print "list has no even numbers"

but in Python, we can just do:

    for elt in [1, 2, 3]:
        if elt % 2 == 0:
            break
    else:
        print "list has no even numbers"


### while

`while` loops are very straighforward:

In [None]:
i = 0
while i < 10:
    print(i)
    i += 2

`while...else` is supported:

In [None]:
i = 0
while i < 10:
    print(i)
    i += 2
else:
    print('Done')

In [None]:
i = 0
while i < 10:
    print(i)
    if i % 2 == 0:
        print('Found an even number!')
        break
    i += 2
else:
    print('No even numbers!')

In [None]:
i = 1
while i < 10:
    print(i)
    if i % 2 == 0:
        print('Found an even number!')
        break
    i += 2
else:
    print('No even numbers!')

### if Statement and Boolean Expressions

Python uses `if...elif...else` syntax:

In [None]:
grade = 75
if grade > 90:
    print('A')
elif grade > 80:
    print('B')
elif grade > 70:
    print('C')
else:
    print('D')

`and`, `or` and `not` are Boolean operators, while `&`, `|` and `^` are bitwise-operators. Short-circuiting rules apply:

In [None]:
1 and 1/0

In [None]:
1 or 1/0

In [None]:
0 and 1/0

In [None]:
0 or 1/0

You can combine multiple range comparisons into a single one:

In [None]:
print(0 < 2 < 4)
print(2 < 0 < 4)

Note that the Boolean literals are `True` and `False`, with capitalized first letters.

In [None]:
print(0 < 2 < 4 < 6)

If an instance of a class is used in a Boolean expression, it is evaluated by calling its `__bool__` method if it has one, else its `__len__` method (where non-zero is `True`), else it is considered `True`.

Python doesn't support conditional expressions like `:?` but does support ternary expressions with `if...else`:

In [None]:
for count in range(0, 3):
    print(f'{count} {"Widget" if count == 1 else "Widgets"}')

### with

`with` is used for scoped use of classes that need to clean up when they are no longer used (e.g. file objects that need to release underlying file handles). 

The most common place you'll see this is with file reading and writing, which we conver in the next section.

---
_Under the Hood_

When the “with” statement is executed, Python evaluates the following expression, calls the `__enter__` method on the resulting value (a “context guard”), and assigns whatever `__enter__` returns to the variable given by as. Python will then execute the code body, and no matter what happens in that code, call the guard object’s `__exit__` method.

As an extra bonus, the `__exit__` method can look at the exception, if any, and suppress it or act on it as necessary (to suppress it, it just needs to return `True`).

We're getting ahead of ourselves here with classes, but here is an example:


In [None]:
class Wither:
    def __enter__(self):
        return 'green eggs'
    def __exit__(self,  type, value, traceback):
        print('ham')
    
with Wither() as x:
    print(x)

---

## Reading and Writing Files


    with open('myfile.txt') as f:
        for line in f:
            print(line)
            

## Functions and Lambdas

Recall that Python named functions are defined with `def`:

In [74]:
def add(a, b):
    return a + b

add(2, 3)

5

Default arguments are allowed. If a default argument is specified, then all following arguments must have defaults as well:

In [82]:
def add(a, b=1):
    print(f'a={a}, b={b}')
    return a + b

print(add(2, 3))
print(add(2))
print(add())

a=2, b=3
5
a=2, b=1
3


TypeError: add() missing 1 required positional argument: 'a'

Arguments with no defaults are "positional" arguments and must be specified in order _except_ if they are named explicitly when calling the function:

In [83]:
print(add(b=2, a=1))

a=1, b=2
3


Variables referenced in a function are either local or arguments. To access a global variable you must explicitly declare it global (but it is better to avoid using globals):

In [84]:
x = 2

def foo():
    x = 1  # This is local
    
print(x)  # This is the global
foo()
print(x)

2
2


In [85]:
x = 2

def foo():
    global x
    x = 1
    
print(x)
foo()
print(x)

2
1


Functions can be nested. In Python 3 you can declare a variable as "nonlocal" to access an outer but non-global scope.

In [87]:
def outside():
    msg = "Outside!"
    def inside():
        msg = "Inside!"  # This is different to the one in outside()
        print(msg)
    inside()
    print(msg)
    
outside()

Inside!
Outside!


In [88]:
def outside():
    msg = "Outside!"
    def inside():
        nonlocal msg  # This is the same as the one in outside()
        msg = "Inside!"
        print(msg)
    inside()
    print(msg)
    
outside()

Inside!
Inside!


It is good practice to follow the `def` line with a _docstring_ to document the function. There are different conventions for how this should be formatted; I like the Google style: http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

In [90]:
def add(a, b):
    """Adds two objects and returns the result.

    Args:
        a: The first parameter.
        b: The second parameter.

    Returns:
        The result of adding a and b.
    """
    return a + b

# Now we can use help() to get the docstring.
help(add)

Help on function add in module __main__:

add(a, b)
    Adds two objects and returns the result.
    
    Args:
        a: The first parameter.
        b: The second parameter.
    
    Returns:
        The result of adding a and b.



You can return multiple values from a function (really just a tuple):

In [92]:
def sum_diff(a, b):
    return a+b, a-b

print(sum_diff(3, 2))
x, y = sum_diff(4, 5)
print(x)
print(y)

(5, 1)
9
-1


Python supports continuations with yield (this returns a generator which we will dicuss later):

In [96]:
def get_next_even_number(l):
    for v in l:
        if v % 2 == 0:
            yield v
    
x = [1, 2, 3, 4, 5, 6]
for e in get_next_even_number(x):
    print(e)

2
4
6


You can use `*args` for a variable number of non-keyword arguments, which will be available internally as a list:

In [99]:
def multiply(*args):
    z = 1
    for num in args:
        z *= num
    return z
    
print(multiply(1, 2, 3, 4))

24


In [102]:
def foo(*args):
    for i in range(0, len(args)):
        print(f'Argument {i} is {args[i]}')

        
foo(1, 2, 'cat')

Argument 0 is 1
Argument 1 is 2
Argument 2 is cat


For keyword arguments, you can use `**kwargs`, which will be available internally as a dictionary:

In [103]:
def foo(*args, **kwargs):
    for i in range(0, len(args)):
        print(f'Positional argument {i} is {args[i]}')
    for k, v in kwargs.items():
        print(f'Keyword argument {k} is {v}')
        
foo('cat', 1, clothing='hat', location='mat')

Positional argument 0 is cat
Positional argument 1 is 1
Keyword argument clothing is hat
Keyword argument location is mat


You can mix all types of arguments but the order is important:
* Formal positional arguments
* `*args`
* Keyword arguments
* `**kwargs`

You can do the opposite as well - pass a list instead of several positional arguments, and a dictionary instead of several keyword arguments, by using `*` and `**`:

In [105]:
def foo(pos1, pos2, named1='a', named2='b'):
    print(f"Positional 1 is {pos1}")
    print(f"Positional 2 is {pos2}")
    print(f"Named1 is {named1}")
    print(f"Named1 is {named2}")    
    
p = [1, 2]
n = {'named1': 'cat', 'named2': 'hat'}
foo(*p, **n)

Positional 1 is 1
Positional 2 is 2
Named1 is cat
Named1 is hat


Finally, you can use `lambda` to define anonymous functions. These will be very useful when we get to using Pandas for data manipulation:

In [106]:
adder = lambda a, b: a + b

adder(1, 2)

3

## Classes

In Python, we can declare a class with `class(base)`. If the base class is omitted then `object` is assumed.

There is a very simple mechanism used to determine whether a method is a class method or an instance method: the this-reference (actually `self` in Python) is declared explictly:

In [None]:
class Widget:  # same as "class Widget(object):"
    """ This is a Widget class. """  # Classes have docstrings too.
    
    def print_my_class(self):  # Instance method as it has a 'self' parameter
        """ Print the instance class. """
        print(self.__class__)  # __class__ is the easy way to get at an object's class
        
    def print_class():  # Class method as it has no 'self' parameter
        """ Print the class class. """
        print(Widget)
        
        
x = Widget()  # We don't use 'new' in Python
x.__doc__  # __doc__ has the docstring

In [None]:
help(x)

In [None]:
x.print_my_class()

In [None]:
x.print_class()

In [None]:
Widget.print_class()

In [None]:
Widget.print_my_class()

### Constructors and visibility

A class does not require a constructor, but can have (at most) one. The constructor is an instance method named `__init__`. It can take additional parameters other than `self`.

Python does not support private or protected members. By convention, private members should be named starting with an underscore, but this is an 'honor system'; everything is public. Also by convention, you should avoid double underscores; that should be reerved for dunder-methods.

In [None]:
class Bug:
    """ A class for creepy crawly things. """
    
    heads = 1  # This is a class variable
    
    def __init__(self, legs=6, name='bug'):
        self.legs = legs  # Any variable assigned to with self.var = ... in constructor is an instance variable
        self.name = name
        
    def _article(name):  # 'private' class method
        """ Return the English article for the given name. """
        return 'an'if 'aeiouAEIOU'.find(name[0]) >= 0 else 'a'

    def article(self):  # 'public' instance method
        """ Return the English article for the given name. """
        return Bug._article(self.name)
    
    def __repr__(self):  # __repr__ is called to get a printable representation of an object
        return f"I'm {Bug._article(self.name)} {self.name} with {self.legs} legs"

# Notice how help() will show help for article() but not _article().
# It respects the '_' convention for 'privacy'.
help(Bug)

In [None]:
Bug()

In [None]:
Bug(legs=8)

It is recommended to always define a `__repr__` method on your classes.

### Inheritance

Python supports both single and multiple inheritance (which we won't discuss). To up-call to a base method we use `super()`:

In [None]:
class Insect(Bug):
    
    def __init__(self):
        super().__init__(name='insect')
        
Insect()

In [None]:
class Spider(Bug):
    
    def __init__(self):
        super().__init__(legs=8, name='spider')
        
Spider()

## Exceptions

You can raise an exception with the `raise` statememt. You can catch exceptions using `try: except:`. If you want to get a reference to the exception, use `catch..as..`:

In [109]:
try:
    raise Exception('The dude minds, man!')
except Exception as x:  # Exception is the type of exception to catch, x is the variable to catch it with.
    print(x)
    
# You can catch different types of exceptions, and you can use 'raise' on its own in the exception handling
# block to rethrow the exception.

def average(seq):
    "Compute the average of an iterable. "
    try:
        result = sum(seq) / len(seq)
    except ZeroDivisionError as e:
        return None
    except Exception:
        raise
    return result

print(average([]))
print(average(['cat']))

The dude minds, man!
None


TypeError: unsupported operand type(s) for +: 'int' and 'str'

## Comprehensions

Comprehensions are a powerful feature in Python, allowing lists, dictionaries and tuples to be constructed from iterative computations with minimal code. These are best illustrated by examples:

In [29]:
# A list of all squares from 1 to 25
[x*x for x in range(1, 6)]

[1, 4, 9, 16, 25]

In [32]:
# A list of all squares from 1 to 1024 except those divisble by 5
[x*x for x in range(1, 33) if (x*x) % 5 != 0]

[1,
 4,
 9,
 16,
 36,
 49,
 64,
 81,
 121,
 144,
 169,
 196,
 256,
 289,
 324,
 361,
 441,
 484,
 529,
 576,
 676,
 729,
 784,
 841,
 961,
 1024]

In [33]:
# Comprehensions can be nested
t = [
    ['1', '2'],
    ['3', '4']
]

# Make a list of lists from t where we convert the strings to floats
[[float(y) for y in x] for x in t]

[[1.0, 2.0], [3.0, 4.0]]

In [37]:
# Dictionary comprehension
{ f'Square of {x}': x*x for x in range(1, 6)}

{'Square of 1': 1,
 'Square of 2': 4,
 'Square of 3': 9,
 'Square of 4': 16,
 'Square of 5': 25}

## async/await

Python runs as a single-threaded process. That means things like I/O can slow things down a lot. It is possible to use multiple threads - there are several libaries for that - but even with a single thread big improvements are possible with async code. The details are beyond the scope of the bootcamp, but more info is available here: https://docs.python.org/3/library/asyncio-task.html. Recent changes in Python have made this much more powerful, flexible and easy to use.

## Type Annotations

Python has some mechanisms for doing optional type annotations. These can improve execution speed and there are some packages that can enforce type checking at run-time. It's not a bad idea to start using these but they're out of scope of this bootcamp. 

See https://docs.python.org/3/library/typing.html and http://mypy-lang.org/ for more.


## Logging

See https://opensource.com/article/17/9/python-logging for detals on Python logging.

I recommend looking at Daiquiri, which biulds on top of the standard logging library and make things easy:

https://julien.danjou.info/blog/python-logging-easy-with-daiquiri

In [None]:
!pip install daiquiri

In [27]:
import logging
import daiquiri

daiquiri.setup(level=logging.INFO)

logger = daiquiri.getLogger("bootcamp")
logger.info("It works and logs to stderr by default with color!")

2017-11-08 21:19:46,564 [81452] INFO     bootcamp: It works and logs to stderr by default with color!


## Cool Stuff

See https://github.com/tukkek/notablepython

Finding good packages: https://python.libhunt.com/ and https://awesome-python.com/

Concise reference: https://github.com/mattharrison/Tiny-Python-3.6-Notebook

## Going Deeper

### The sys module

`sys.modules` is a dictionary of the currently imported modules:

In [None]:
import sys

sys.modules

`sys.path` is the path to look for imports:

In [None]:
sys.path

### Using Threads and Processes

See https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b

### Extending Python with C code

See https://dbader.org/blog/python-ctypes-tutorial#.


## Exercise 1 - building a find function that works on 2-dimensional sorted arrays

Consider a 2-dimensional array which is sorted on primary, secondary, and possibly more columns. E.g.:

    [
        [2, 3, 7],
        [2, 5, 2],
        [2, 6, -1],
        [3, 1, 0],
        [3, 9, 1],
        [8, 0, -2]
    ]
    
which is sorted first on column 0 and then on column 1. We want to write a function that will help us to locate a particular 
value in this table given a set of keys. E.g. given keys (2, 6), it will return -1, and given (3, 9) it will return 1.

Python has a libray function `bisect` that can do this on single-dimensional lists (https://docs.python.org/3.6/library/bisect.html) but not on 2-d lists.

One way to do this is to generalize it and pass in the start and end indices of the list, the index of the column that is the key, and the key value, and return the new start and end indices of the subrange, and then call that function on successive keys. E.g. if we passed in start=0, end=len(data), index=0, and key=2, it should return (0,3), while for start=0, end=len(data), index=0, key=8 it should return (5, 6), and for start=0, end=3, index=1, key=5 it should return (1,1). If the value is not found it should return (-1, -1).

Write a function `2dbisect` to do this. A binary search can be used. For reference, here is pseudo-code for binary search:

    binary_search(A, target):
        lo = 1, hi = size(A)
        while lo <= hi:
           mid = lo + (hi-lo)/2
           if A[mid] == target:
              return mid            
           else if A[mid] < target: 
              lo = mid+1
           else:
              hi = mid-1
            
        // target was not found
        
        