# Python Overview

## Introduction

### Python's Origins

Python was conceived in the late 1980s, and its implementation began in December 1989 by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a successor to the ABC language. It takes its name from Monty Python's Flying Circus.

Python is a dynamic language but is strongly typed (i.e. variables are untyped but refer to objects of fixed type).

> ### How Python Evolves
> 
> Python evolves in a fairly straightforward way, more-or-less like this:
> 
> - people propose changes by writing *Python Enhancement Proposals* (PEPs): https://www.python.org/dev/peps/
> - the Python core committee will assign a 'dictator' who will decide whether the PEP is worthy of becoming part of the standard, and if so it does, after some amount of discussion and revision
> - disagreements are finally settled by Guido van Rossum, Python's inventor and the 'Benevolent Dictator for Life' (BDFL)
> 
> An important standard PEP is the Style Guide, PEP-8 (https://www.python.org/dev/peps/pep-0008/). By default, PyCharm will warn of any PEP-8 violations. There are external tools such as `flake8` (https://gitlab.com/pycqa/flake8) that can be used to check code for compliance in other environments.


### StackOverflow is your friend!

For Python questions and Python data science questions, make use of StackOverflow. Pay attention to comments on suggested answers; the "accepted answer" is often not the best. Look for comments about whether it is the "most Pythonic". Python has an idiomatic style different to many other languages and so a novice coming from another language will often accept an answer that is closer to idiomatic in that other language rather than Python.

https://stackoverflow.com/questions/tagged/python

Also, if you're struggling to understand some code in your early days with Python, you may find this 'execution visualizer' helpful:

http://pythontutor.com/

### "Batteries Included"

Python is often described as having "batteries included". This is a reference to the rich set of libraries (packages)included in the standard distribution as well as the vast collection of freely available packages that can be used to bootstrap your development.

![](https://imgs.xkcd.com/comics/python.png)

There are many thousands of Python packages available, often giving you many choices for similar purposes. One way to find better packages is to look the curated lists at https://python.libhunt.com/ and https://awesome-python.com/

### Python 2.7 or Python 3.x?

You can use conda to create a Python 2.7 virtual environment for when you have to use 2.7, but all new projects should be Python 3.5 or later. Python 2.7 is the end of the 2.x line and is supposed to be end-of-lifed in 2020. Avoid it; the only reason to use it is if there is a package you really need that hasn't been ported yet.

## Python docs

https://docs.python.org/3/ has very detailed documentation.

Most Python packages have good documentation at https://readthedocs.org/

If you use Python a lot on a Mac you may find Dash useful: https://kapeli.com/dash

That said, Python has a help() function that is very useful.

## Using the REPL

To start the REPL, just type `python` at the command line.

Use the `help()` function to read the documentation for a module/class/function. As a standalone invocation, you enter the help system and can explore various topics.

Python scripts are stored in plain text files with `.py` extensions. You can run the script `foo.py` at the command line by invoking `python foo.py`. When you do so the Python interpreter will compile the script to an intermediate bytecode, and the result will be stored in a file with the same base name and a `.pyc` extension. As an optimisation, the interpreter will look to see if a `.pyc` file with a more recent file modification date exists when you invoke it to run a script and use that if it does. In Python 2.x these files were saved alongside the Python source files but in Python 3.x they are stored in a subdirectory named `__pycache__`.

> ### A better REPL: bpython
> 
> https://www.bpython-interpreter.org/
> 
> You can install with `pip install bpython`.
> 
> bpython adds a number of useful features at the command line, like syntax highlighting and auto-completion. If you're going to use the command line repl I recommend it, although there are other options too that I haven't tried:
> 
> - ptpython https://github.com/jonathanslenders/ptpython
> - DreamPie http://dreampie.sourceforge.net/
> 
> Yet another alternative to the REPL, of course, is Jupyter.
> 
> For the hard-core Pythonista, you can replace your entire shell with one based on Python; see http://xon.sh/.

## Quickstart - A Simple Example

Before diving into the details, let's look at a simple Python script to get a quick overview. We're not going to go into details here but have annotated the code with some comments and if you are familiar with other object-oriented languages this should be quite easy to understand. Some things that may be unusual to you:

- No braces; in Python whitespace is significant. This can take some getting used to but isn't as bad as it seems once you do.
- Instance methods require an explicit "this" argument which in Python by convention is called `self` .
- Static methods have a `@staticmethod` decorator.
- The class constructor - of which there can only be one - is called `__init__`.
- Docstrings are specified using actual string literals inline rather than in comments.
- The method to convert to string is named `__str__` not tostring.
- String formatting is done using embedded code in {} and preceding the string with 'f' (this is new to Python 3.6).

In [None]:
import math  # import math module
from IPython.display import SVG, display

"""
A simple turtle graphics example that produces SVG output that can
be displayed in Jupyter.
"""

class Turtle:
    " Turtle graphics drawing to SVG path "  # class docstring
    
    DEG2RAD = math.pi/180  # class level variable
    
    @staticmethod
    def deg2rad(d):  # static method
        """ Convert degrees to radians """
        return d * Turtle.DEG2RAD
    
    def __init__(self):  # class constructor; "self" is like "this"
        self.reset()
        
    def reset(self):
        self.draw = True  # instance variable
        self.path = "M0,0 "
        self.x = self.y = 0
        self.turnto(0.0)
    
    def turnto(self, angle):
        " Turn to absolute angle. "
        self.angle = angle % 360.0
        self.dx = math.sin(Turtle.deg2rad(self.angle))
        self.dy = math.cos(Turtle.deg2rad(self.angle))
        
    def right(self, angle):
        " Relative turn "
        self.turnto(self.angle + angle)

    def left(self, angle):
        self.right(angle)
        
    def up(self):
        self.draw = False
        
    def down(self):
        self.draw = True
        
    def move(self, distance):
        " Relative move by distance "
        self.x = int(distance * self.dx)
        self.y = int(distance * self.dy)
        self.path += f"{'l' if self.draw else 'm'}{self.x},{self.y} "

    def moveto(self, x, y):
        " Absolute move to (x, y)"
        self.x = x
        self.y = y
        self.path += f"{'L' if self.draw else 'M'}{self.x},{self.y} "
        
    def svg(self):
        return '<svg id="doc" xmlns="http://www.w3.org/2000/svg" ' +\
            'version="1.1" width="500" height="500"><path d="' +\
            self.path +\
            '" stroke="green" fill="none" vector-effect="non-scaling-stroke" /></svg>'
            
    def __str__(self):
        " Convert to string representation. "
        return f"Turtle at {self.x},{self.y} facing {self.angle}"

            
def swisscross(turtle, level):  # top-level function
    " Swiss cross is a space filling curve. "
    if level >= 0:
        swisscross(turtle, level - 1)
        t.right(90)
        swisscross(turtle, level - 1)
        t.move(10)
        swisscross(turtle, level - 1)
        t.right(90)
        swisscross(turtle, level - 1)
        

t = Turtle()  # create class instance; note no 'new' 
t.up()
t.moveto(20, 30)
t.turnto(315)
t.down()
swisscross(t, 5)
t.move(10)
swisscross(t, 5)

# Display the result using SVG
display(SVG(t.svg()))
        
# final state
print(t)

## Installing Third-Party Packages

The standard way to install packages is with `pip install`. However, if you have installed `conda` you should use `conda install` first and only if that fails use `pip install`. Conda has a smaller set of packages which is why it doesn't always succeed, but the ones it does have have been built for Conda so installing that way is preferred.

Use `conda uninstall` or `pip uninstall` to remove packages.

To see what packages are installed use `pip freeze`.

There's a lot more to package installation than this but this is enough for 90%+ of what you will do.

## Python is an OOPL

Python is a pure object-oriented language. Operators like `+` are simply methods on a class. The Python interpreter will convert an infix operator to an instance method call.

For example, there is an `int` class for integers. There is an `__add__` method defined on that class for addition. So:    

In [None]:
3 + 4

is the same as:

In [None]:
(3).__add__(4)

The double underscore in Python is called *dunder* and is used extensively internally; `__add__` is called a *dunder-method*. Dunder-methods are important to understand if you want to take full advantage of Python hence this early introduction.

You can see the methods on a class by using the `dir` function, for example `dir(int)`.

We will discuss how to define new classes later. A key takeaway here is that this use of dunder-methods allows us to override many operators simply by overriding the associated dunder-method. Two particularly useful ones are `__str__` (cast to string) and `__repr__` (cast to text representation); these are typically the same for a class but need not be. For example, notice the differences here:

In [None]:
a = "abc"
print(a.__str__())  # Equivalent to str(a)
print(a.__repr__())

## Indentation and Comments

Python does not use {} for demarcating blocks of code; instead it uses indentation. This distinguishes it from most other programming languages and can take some getting used to. In particular, it requires care when pasting code in an editor (most Python editors are smart about this but other editors are not). The reason for this choice is that Guido originally designed Python as a teaching language and favored readability.

The convention in Python is to indent with spaces, not tabs (this avoids tab settings causing misnterpretation of code). Indentation standard is 4 spaces at a time, although some companies have different conventions (usually 2, if not 4).

Comments start with # and continue to the end of the line. By convention if # is used on the same line as code it should be preceded by at least two spaces.

## Simple Functions

Python named functions are defined with `def`:

In [None]:
def add(a, b):
    return a + b

add(2, 3)

### import

Python code is packaged in the form of _packages_ consisting of one of more _modules_. A module is a single Python file, while a package is a directory of Python modules containing an additional `__init__.py` file, to distinguish a package from a directory that just happens to contain a bunch of Python scripts.

You install a package with `pip` or `conda`. Once installed, to use the package you must import it. You can also import modules although this is less common. 

There are several common ways of importing. Let's say we want to import a package `foo` that defines a class `Widget`:

* `import foo` will import the `foo` package; any reference to modules/classes/functions will need to be prefixed with `foo.`; e.g. `foo.Widget`
* `import foo as bar` will import the `foo` package with the alias `bar`; any reference to modules/classes/functions will need to be prefixed with `bar.`; e.g. `bar.Widget`
* `from foo import Widget` can be used to import a specific module/class/function from `foo` and it will be available as `Widget`
* `from foo import *` will import every item in `foo` into the current namespace; this is bad practice, don't do it.

### Writing a main function and handling command line arguments

The `sys` module lets us access command line arguments:

```python
    #!/usr/bin/python

    import sys

    def main():
        # print command line arguments
        for arg in sys.argv[1:]:
            print arg

    if __name__ == "__main__":
        main()
```

The `__name__` variable is set to the name of the executing module, or `"__main__"` if this is the top-level module.

If you want to parse command-line arguments like flags etc, there is an `argparse` library as part of the standard distribution but a much easier way IMO is to use `docopt`: just write the help string and `docopt` generates the parse for you: http://docopt.org/

## An Overview of Python Types

See https://docs.python.org/3/library/stdtypes.html for detailed documentation.

The main types are:

| TYPE      | GROUP     | MUTABLE? |
|-----------|-----------|----------|
| int       | Numerics  | N        |
| float     | Numerics  | N        |
| complex   | Numerics  | N        |
| str       | Sequences | N        |
| bytes     | Sequences | N        |
| bytearray | Sequences | Y        |
| list      | Sequences | Y        |
| tuple     | Sequences | N        |
| range     | Sequences | N        |
| set       | Sets      | Y        |
| frozenset | Sets      | N        |
| dict      | Mapping   | Y        |

In addition, modules, classes, instances, methods, and functions are all types. The Boolean constants `True` and `False`, and the value `None`, are instances of their own special types, and there are several other special cases like this. See the link above for more. Note that there is a string type but not a character type; characters are not treated any differently from other strings.

### The Boolean Truth Value of Types

Any object can be tested for truth value, for use in an `if` or `while` condition or as operand in a Boolean expression.

By default, an object is considered true unless its class defines either a `__bool__()` method that returns False or a `__len__()` method that returns zero, when called with the object. Zero numeric values are considered False, as are empty collections or sequences, and vice-versa.

Operations and built-in functions that have a Boolean result always return `0` or `False` for false and `1` or `True` for true, unless otherwise stated.

Important exception: the Boolean operations `or` and `and` always return one of their operands. This allows for useful defaults using Boolean expressions with `or`:

In [None]:
s = None

name = s or "N/A"

print(name)

### None

Python has no null object, but has a special object instance `None`.

To test if an object is `None`, use `is` or `is not`, not `==` or `!=`.

In [None]:
a = None
print(a is None)
print(a is not None)

### Numbers

Most of the typical operators you know from other languages are supported. Here are some more-specific to Python:

In [None]:
print(bool(3))  # Convert to Boolean
print(str(3))  # Convert to string
print(bool(0))

In [None]:
print(3 // 2)  # Integer division with truncation
print(3 / 2)  # Float division

In [None]:
print(int(2.5)) # Convert to int with truncation
print(round(2.4999))  # Convert to int with rounding
print(round(2.5001))  # Convert to int with rounding
print(round(2.5))  # Convert to int with rounding (rounds half to nearest even)
print(round(3.5))  # Convert to int with rounding (rounds half to nearest even)


In [None]:
print(2 ** 3)  # Exponentiation
print(~3)  # Bitwise inverse
print(2**120)  # Python ints are arbitrary precision, not 64-bit

In [None]:
print(2.0.is_integer())
print(2.5.as_integer_ratio())  # Convert to fraction tuple; we'll cover tuples later

In [None]:
# Note that += and -= (and *=, etc) are supported but ++ and -- are not. Use +=1 and -=1 instead.

### Strings

Python 3 strings are unicode. String literals can use single our double quotes (but must use same type to close as to open). Multi-line strings are most easily written using triple quotes.

In [None]:
print('foo')
print("bar")
print('"foo"')
print("'bar'")
print("""I am a 
multiline string""")

You can use the usual suspects of `\n`, `\t`, etc in strings, and use `\` to escape special characters like quotes and `\` itself.

In [None]:
a = "the cat sat on the mat"
print(len(a))  # len gets the length of the string; implemented by __len__

In [None]:
print("cat" in a)  # 'in' is implemented by __contains__
print("dog" in a)

In [None]:
print(a[0])  # Implemented by __getitem__
a[0] = "t"  # No can do; strings are immutable.

In [None]:
# Some useful functions. Note these all return copies of the string; strings are immutable!
print(a.lower())
print(a.upper())
print(a.capitalize())  # Capitalize first letter

In [None]:
# Like any object that supports __len__ and __getitem__, strings are sliceable.
# Slicing uses [start:end] or [start:end:increment] where any of these are optional
# start defaults to 0, end to __len__(), and increment to 1. 
# start and end can be positive (from start of string) or negative (from end of string).

print(a[2:])   # skip first two characters
print(a[-7:])  # the last 7 characters
print(a[2:6])  # 4 characters starting after 2nd character
print(a[::2])  # Every second character

In [None]:
# Use find and rfind to find first/last occurence of a string; return offset or -1 if not found
# You can also use index/rindex which are similar but raise ValueError exception if not found.

print(a.find('he'))
print(a.rfind('he'))
print(a.find('cat'))
print(a.find('dog'))

In [None]:
# You can convert from character to ordinal or vice-versa with ord() and chr()
print(chr(65))
print(ord('A'))

In [None]:
# Python has no character type, just string. So functions that would apply to just 
# a character in other languages apply to entire string in Python.
print("123".isdigit())
print("1X3".isdigit())
print("NOOOOooo".isupper())

There are many more string operations available; these are just the basics.

### Lists

Lists are ordered, mutable sequences. They can be indexed, sliced (more on that below), appended to, have elements deleted, and sorted. They are heterogeneous. Examples:

In [None]:
a = [1, 2, 3, "cat"]

print(a)
print(len(a))  # len() gives the length of the list
print(a[1])  # [] can be used to index in to the list; implemented by list.__getitem__; assignment uses list.__setitem__
print(a[-1])  # negative indices can be used to index from the end of the list (-1 for last element)

In [None]:
# * can be used to create multiple concanenated copies of a list; implemented by list.__mul__
    
print(a)
a = a * 2 
print(a)

In [None]:
# `in` can be used to check for membership; implemented by list.__contains__

print(a)
print('cat' in a)  
print('dog' in a)

In [None]:
print(a)
print(['dog'] + a)  # + can be used to concanetenate lists; implemented by list.__add__
a.append('dog')  # append() can be used for concatenating elements
print(a)

In [None]:
print(a)
print(a.index('dog')) # Get index of first matching entry; throws exception if not found
print(a.count('cat'))  # Count the number of instances of an element

In [None]:
print(a)
a.remove('dog')  # Remove first matching instance of element
print(a)
del a[-1]  # Remove element at index; implementedby list.__del__

In [None]:
# reverse() reverses the order of the list in place; implemented by list.__reversed__
print(a)
a.reverse()  
print(a)

In [None]:
# for..in iterates over elements
    
print(a)
for elt in a: 
    print(elt)

In [None]:
# enumerate() will return tuples of index, value
print(a)
for i, v in enumerate(a):
    print(f'Value at index {i} is {v}')  # f'' is a format string that can contain code in {}

In [None]:
b = list(a)  # Makes a shallow copy; can also use b = a.copy()
print(b)
print(a == b)  # Elementwise comparison; implemented by list.__eq__
b[-1] += 1  # Add 1 to last element
print(a == b)
print(a > b)  # Compares starting from first element; implemented by list.__gt__
print(a < b)  # Compares starting from first element; implemented by list.__lt__

In [None]:
print(a)
a.pop()  # Removes last element
print(a)
a.pop(0)  # removes element at index 0
print(a)

In [None]:
# You can join a list of words into a string
','.join(['cat', 'dog'])

In [None]:
# Like any object that supports __len__ and __getitem__, lists are sliceable.
# Slicing uses [start:end] or [start:end:increment] where any of these are optional
# start defaults to 0, end to __len__(), and increment to 1. 
# start and end can be positive (from start of string) or negative (from end of string).
x = [1, 2, 3, 4, 5, 6]
print(x[2:])
print(x[1:3])
print(x[-3:])
print(x[::2])

In [None]:
# Use insert() to insert at some position. This is done in-place.
x.insert(2, 'A')
print(x)
x.insert(3, [1, 2])  # Note: insert() is for elements, so [1, 2] is a single element, not expanded
print(x)

In [None]:
a.clear()  # empty the list
print(a)

### Dicts

Dictionaries are mutable mappings of keys to values. Keys must be hashable, but values can be any object. 

---
_Under the hood_

A hashable object is one that defines a `__hash__` dunder-method, and an `__eq__` dunder method; if two objects are equal their hashes must be the same or the results may be unpredictable. 

---


In [None]:
# dict literals (actually a list of dicts in this example)

contacts = [
    {
        'name': 'Alice',
        'phone': '555-123-4567'
    },
    {
        'name': 'Bob',
        'phone': '555-987-6543'        
    }
]
contacts

In [None]:
# Use [key] to get an item; this calls dict.__getitem__
contacts[0]['name']

In [None]:
# Use dict[key] = value to change an item; this calls dict.__setitem__
contacts[0]['name'] = 'Carol'
contacts[0]

In [None]:
# Trying to use a non-existent key raises an exception
contacts[0]['address']

In [None]:
# You can avoid above and return a default value by using .get()
print(contacts[0].get('name', 'No name'))
print(contacts[0].get('address', 'No address'))

In [None]:
# Use 'in' to see if a key exists in a dict; this calls dict.__contains__
print('name' in contacts[0])
print('address' in contacts[0])

In [None]:
# Test for equality with '==' and !=; this calls dict.__eq__ and dict.__ne__
print(contacts[0] == contacts[1])
print(contacts[0] == { 'name': 'Carol', 'phone': '555-123-4567'})

In [None]:
# Use for-in to iterate over items; this calls dict.__iter__

for x in contacts[0]:
    print(x)

In [None]:
# Use len() to get number of items; this calls dict.__len__

print(len(contacts[0]))

In [None]:
# Use 'del' to delete a key from a dict; this calls dict.__delitem__

In [None]:
# Use .clear() to empty dict (without changing references)

a = {'name': 'me'}
b = a
a.clear()
b

In [None]:
# Contrast above with assigning empty dict
a = {'name': 'me'}
b = a
a = {}
b

In [None]:
# Use .keys(), .values() or .items() to get the keys, values, or both

There are some alternative implementations in the `collections` module; you won't need these now but they may come in handy in the future, especially the first two:

* `collections.OrderedDict`s remember the order of insertion so this is preserved when iterating over the entries or keys
* `collections.defaultdict`s can specify a type in the constructor whose return vaslue will be used if an entry can't be found
* `collections.ChainMap`s group multiple dictionaries into a single item for lookups; inserts go in the first dictionary

### Sets

A set is a mutable unordered collection that cannot contain duplicates. Sets are used to remove duplicates and test for membership. One use for sets is to quickly see differences. For example, if you have two dicts and want to see what keys are in one but not the other:

In [None]:
a = {'food': 'ham', 'drink': 'soda', 'desert': 'ice cream'}
b = {'food': 'tofu', 'desert': 'cake'}

set(a) - set(b)

Sets are less commonly used than lists and dicts and we will not discuss them further here. You can read more here: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

### Tuples

Tuples are immutable sequences. Typically they are used to store record type data, or to return multiple values from a function. Tuples behave a lot like lists and support many of the same operations with similar behavior, aside from their immutability. We'll consider them briefly here.

The `collections` package defines a variant `namedtuple` which allows each field to be given a name; we won't go into that here other than to point out its existence.

In [None]:
('dog', 'canine')  # tuple

In [None]:
('dog')  # Not a tuple! This is just a string in parens

In [None]:
('dog',)  # For a single-valued tuple, use a trailing comma to avoid above issue

In [None]:
'dog',  # Parentheses are often optional

In [None]:
# Indexing can be used to get at elements, much like lists
print(('dog', 'canine')[0])
print(('dog', 'canine')[1])
print(('dog', 'canine')[-2])
print(('dog',)[0])
print(('dog',)[1])

In [None]:
# We can unpack a tuple through assignment to multiple variables
a = ('dog', 'bone')
animal, toy = a
print(animal)
print(toy)

In [None]:
# But need to ensure we use the right number of variables
a = ('dog', 'bone')
animal, toy, place = a

In [None]:
a = ('dog', 'bone', 'house')
animal, toy = a

In [None]:
# Tuples allow us to do a neat trick in Python that is harder in many languages - swap two values without using a
# temporary intermediate.
# Note what is going on here: the RHS of the assignment is creating a tuple; the LHS is unpacking the tuple.

a = 1
b = 2
print(a,b)
a, b = b, a
print(a,b)

## Some built-in Functions

See https://docs.python.org/3.6/library/functions.html for a full list and more details.

`abs(num)` - Return absolute value

In [None]:
print(abs(3))
print(abs(-3))

`all(iterable)` - returns True if all items in the iterable are True

In [None]:
print(all([True, True, True]))
print(all([True, False, True]))

`any(iterable)` - returns True is any item in the iterable is True.

In [None]:
print(any([False, False]))
print(any([False, True]))

`filter` - construct an iterator from the elements of iterable for which a function returns true.

In [None]:
names = ["John Smith", "Alan Alda"]

# Get the names that start and end with same letter
for i in filter(lambda s: s[0].upper() == s[-1].upper(), names):
    print(i)

`input` - get input from the console

In [None]:
n = input("What is your name?")
print(f'Hello {n}!')

`isinstance` - check if an object has a certain type

In [None]:
s = 'abc'
n = 123
print(isinstance(s, int))
print(isinstance(s, str))
print(isinstance(n, int))
print(isinstance(n, str))

`iter` - create an sequential iterable from an object; we will discuss iterables later

In [None]:
x = iter([1, 2, 3, 4])
print(x)
print("Before first next()")
print(next(x))  # returns first item and advances
print("Before second next()")
print(next(x))  # returns second item and advances
print("After second next()")
for v in x:  # iterates through remaining items
    print(v)

`len` - calls the object's `__len__` method to get the length.

`map` - similar to `filter` but returns an iterable with the results of applying the function

In [None]:
names = ["John Smith", "Alan Alda"]

print(list(map(lambda s: s[0].upper() == s[-1].upper(), names)))

`max(arg1,...)` - returns the largest arg. If a single iterable arg is given it will iterate.

`min(arg1, ...)` - returns the smallest arg

In [None]:
print(max(2, 3, 1))  # Multiple scalar args
print(max([3, 2, 1])) # Single list arg
print(max([3, 2, 1], 4))  # Not allowed

`next` - gets next item from an iterable; see the section on iterables and example for `iter` above.

`repr` - calls the object `__repr__` method to get a string representation

`reversed` - makes a copy of the object with items in reversed order (object must support `__len__` and `__getitem__`)

`round` - rounds number to some number of decimal places (default 0)

In [None]:
pi = 3.1415927
print(round(pi))
print(round(pi, 3))

`sorted(list)` - returns a sorted version of the list.

In [None]:
print(sorted([3, 1, 3]))

`sum(iterable)` - returns the sum of the iterable

In [None]:
print(sum([1, 2, 3]))

`type(obj)` - return the type of an object

In [None]:
print(type('foo'))

`zip(list, ...)` - combines multiple lists into a single list of tuples. Note this returns a lazy iterable, not a list

In [None]:
print(zip(['a', 'b', 'c'], [1, 2, 3]))
print(list(zip(['a', 'b', 'c'], [1, 2, 3])))  # instantiates the iterable as a list

## String Formatting

String formatting has evolved over time with Python. Python 3.6 introduced "format strings" which allow code to be directly embedded in the string. This is an improvement over older approaches and we will use it extensively.
Format strings have an `f` prefix and include code in `{}`. For example:

In [None]:
a = 10
print(f"2 x {a} = {2*a}")

If you need to use the old approaches, there are a lot of details here: https://pyformat.info/ (this doesn't seem to cover format strings yet though). That site covers things like padding, justification, truncation, leading zeroes, fixing number of decimal places, etc. We won't cover these here except the latter:

In [None]:
a = 1.23456
print(a)
print(f'{a:.2f}')  # Float restricted to two decimal places
print(f'{a:06.2f}')  # Float restricted to two decimal places and padded with leading zeroes if less than 6 chars

When you use `f'{a}'`, Python will look in turn for a `__format__`, a `__repr__` or a `__str__` method to call to get the string representation of `a`. You can force it to use `__repr__` with `f'{a!r}'` or to use `__str__` with `f'{a!s}'`.

## Sorting

We've already seen the `sorted` function, that can create a sorted list from any iterable:

In [None]:
d = [3,5,2,4,1,7]
for i in sorted(d):
    print(i)

You can do a descending sort by adding a `reverse=True` argument:

In [None]:
for i in sorted(d, reverse=True):
    print(i)

You can sort a list in place with `sort`, but this only applies to lists:

In [None]:
print(d)
d.sort()
print(d)

You can read more about sorting here, including how to sort composite objects like dictionaries, tuples and nested lists, and by multiple keys: https://docs.python.org/3/howto/sorting.html

## Statements

Here we will consider statements. We'll leave some statements to when we get to exceptions, functions and classes.

For more info on statements see https://docs.python.org/3/reference/simple_stmts.html

### pass

The `pass` statement is a no-op. This is needed in Python as the language doesn't use braces, so it is the equivalent of `{}` in Java- or C-like languages.

### del

`del` is used to delete an object; it isn't used much but can be useful if the object uses a lot of memory to allow it to be garbage-collected.

### for, break and continue

You can loop over any iterable with `for...in`. `break` and `continue` are supported, and behave in the expected fashion.

In [None]:
for i in ['green eggs', 'ham']:
    print(i)

In [None]:
for i in 'green eggs':
    print(i)

In [None]:
for i in {'a': 1, 'b': 2}: # This will loop over keys
    print(i)

In [None]:
for i in {'a': 1, 'b': 2}.values(): # This will loop over values
    print(i)

In [None]:
for i in {'a': 1, 'b': 2}.items():  # This will loop over key-value pairs as tuples
    print(i)

In [None]:
for i in [1, 2, 3]:
    print(i)

In [None]:
for i in enumerate([1, 2, 3]):  # Returns (index, value) tuples
    print(i)

In [None]:
for index, value in enumerate([1, 2, 3]):  # We can unpack the (index, value) tuples
    print(f'At position {index} we have value {value}')

In [None]:
for i in range(1, 10):
    print(i)

In [None]:
for i in range(1, 10, 2):
    print(i)

Python has an unusual construct: for..else. The else part is executed if there was no early break from the loop.

This is a common construct in other languages:

```python
    has_even_number = False
    for elt in [1, 2, 3]:
        if elt % 2 == 0:
            has_even_number = True
            break
    if not has_even_number:
        print "list has no even numbers"
```

but in Python, we can just do:

```python
    for elt in [1, 2, 3]:
        if elt % 2 == 0:
            break
    else:
        print "list has no even numbers"
```

### while

`while` loops are very straighforward:

In [None]:
i = 0
while i < 10:
    print(i)
    i += 2

`while...else` is supported:

In [None]:
i = 0
while i < 10:
    print(i)
    i += 2
else:
    print('Done')

In [None]:
i = 0
while i < 10:
    print(i)
    if i % 2 == 0:
        print('Found an even number!')
        break
    i += 2
else:
    print('No even numbers!')

In [None]:
i = 1
while i < 10:
    print(i)
    if i % 2 == 0:
        print('Found an even number!')
        break
    i += 2
else:
    print('No even numbers!')

### if Statement and Boolean Expressions

Python uses `if...elif...else` syntax:

In [None]:
grade = 75
if grade > 90:
    print('A')
elif grade > 80:
    print('B')
elif grade > 70:
    print('C')
else:
    print('D')

`and`, `or` and `not` are Boolean operators, while `&`, `|` and `^` are bitwise-operators. Short-circuiting rules apply:

In [None]:
1 and 1/0

In [None]:
1 or 1/0

In [None]:
0 and 1/0

In [None]:
0 or 1/0

You can combine multiple range comparisons into a single one:

In [None]:
print(0 < 2 < 4)
print(2 < 0 < 4)

Note that the Boolean literals are `True` and `False`, with capitalized first letters.

In [None]:
print(0 < 2 < 4 < 6)

If an instance of a class is used in a Boolean expression, it is evaluated by calling its `__bool__` method if it has one, else its `__len__` method (where non-zero is `True`), else it is considered `True`.

Python doesn't support conditional expressions like `:?` but does support ternary expressions with `if...else`:

In [None]:
for count in range(0, 3):
    print(f'{count} {"Widget" if count == 1 else "Widgets"}')

### with

`with` is used for scoped use of classes that need to clean up when they are no longer used (e.g. file objects that need to release underlying file handles). 

The most common place you'll see this is with file reading and writing, which we conver in the next section.

---
> _Under the Hood_
>
> When the “with” statement is executed, Python evaluates the following expression, calls the `__enter__` method on the resulting value (a “context guard”), and assigns whatever `__enter__` returns to the variable given by as. Python will then execute the code body, and no matter what happens in that code, call the guard object’s `__exit__` method.
> 
> As an extra bonus, the `__exit__` method can look at the exception, if any, and suppress it or act on it as necessary (to suppress it, it just needs to return `True`).
> 
> We're getting ahead of ourselves here with classes, but here is an example:

In [None]:
class Wither:
    def __enter__(self):
        return 'green eggs'
    def __exit__(self,  type, value, traceback):
        print('ham')
    
with Wither() as x:
    print(x)

## Reading and Writing Files

Python has a built-in `open` function for opening files for reading and writing: https://docs.python.org/3.6/library/functions.html#open

The simplest for of reading a file is just:

```python
with open('myfile.txt') as f:
    for line in f:
        print(line)
```

and writing a file, assuming we have a list of strings `data`:

```python
with open('myfile.txt', 'w') as f:
    for line in data:
        f.write(line)
```

You can see more detailed examples in the tutorial, section 7.2, here: https://docs.python.org/3/tutorial/inputoutput.html

## Functions and Lambdas

Recall that Python named functions are defined with `def`:

In [None]:
def add(a, b):
    return a + b

add(2, 3)

Default arguments are allowed. If a default argument is specified, then all following arguments must have defaults as well:

In [None]:
def add(a, b=1):
    print(f'a={a}, b={b}')
    return a + b

print(add(2, 3))
print(add(2))
print(add())

Arguments with no defaults are "positional" arguments and must be specified in order _except_ if they are named explicitly when calling the function:

In [None]:
print(add(b=2, a=1))

Variables referenced in a function are either local or arguments. To access a global variable you must explicitly declare it global (but it is better to avoid using globals):

In [None]:
x = 2

def foo():
    x = 1  # This is local
    
print(x)  # This is the global
foo()
print(x)

In [None]:
x = 2

def foo():
    global x
    x = 1
    
print(x)
foo()
print(x)

Functions can be nested. In Python 3 you can declare a variable as "nonlocal" to access an outer but non-global scope.

In [None]:
def outside():
    msg = "Outside!"
    def inside():
        msg = "Inside!"  # This is different to the one in outside()
        print(msg)
    inside()
    print(msg)
    
outside()

In [None]:
def outside():
    msg = "Outside!"
    def inside():
        nonlocal msg  # This is the same as the one in outside()
        msg = "Inside!"
        print(msg)
    inside()
    print(msg)
    
outside()

It is good practice to follow the `def` line with a _docstring_ to document the function. There are different conventions for how this should be formatted; I like the Google style: http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html

In [None]:
def add(a, b):
    """Adds two objects and returns the result.

    Args:
        a: The first parameter.
        b: The second parameter.

    Returns:
        The result of adding a and b.
    """
    return a + b

# Now we can use help() to get the docstring.
help(add)

You can return multiple values from a function (really just a tuple):

In [None]:
def sum_diff(a, b):
    return a+b, a-b

print(sum_diff(3, 2))
x, y = sum_diff(4, 5)
print(x)
print(y)

Python supports continuations with yield (this returns a generator which we will dicuss later):

In [None]:
def get_next_even_number(l):
    for v in l:
        if v % 2 == 0:
            yield v
    
x = [1, 2, 3, 4, 5, 6]
for e in get_next_even_number(x):
    print(e)

You can use `*args` for a variable number of non-keyword arguments, which will be available internally as a list:

In [None]:
def multiply(*args):
    z = 1
    for num in args:
        z *= num
    return z
    
print(multiply(1, 2, 3, 4))

In [None]:
def foo(*args):
    for i in range(0, len(args)):
        print(f'Argument {i} is {args[i]}')

        
foo(1, 2, 'cat')

For keyword arguments, you can use `**kwargs`, which will be available internally as a dictionary:

In [None]:
def foo(*args, **kwargs):
    for i in range(0, len(args)):
        print(f'Positional argument {i} is {args[i]}')
    for k, v in kwargs.items():
        print(f'Keyword argument {k} is {v}')
        
foo('cat', 1, clothing='hat', location='mat')

You can mix all types of arguments but the order is important:
* Formal positional arguments
* `*args`
* Keyword arguments
* `**kwargs`

You can do the opposite as well - pass a list instead of several positional arguments, and a dictionary instead of several keyword arguments, by using `*` and `**`:

In [None]:
def foo(pos1, pos2, named1='a', named2='b'):
    print(f"Positional 1 is {pos1}")
    print(f"Positional 2 is {pos2}")
    print(f"Named1 is {named1}")
    print(f"Named1 is {named2}")    
    
p = [1, 2]
n = {'named1': 'cat', 'named2': 'hat'}
foo(*p, **n)

The above is actually a common pattern in Python when writing wrapper functions that need to support arbitrary arguments that they are just going to pass on to some other function. For example, say we wanted to write a wrapper that timed the execution of a function:

In [None]:
import datetime as dt


def foo(a, b=None, c=None):
    print(f'a={a}, b={b}, c={c}')


def log_time(fn, *args, **kwargs):
    start = dt.datetime.now()
    fn(*args, **kwargs)
    end = dt.datetime.now()
    print(f"{fn} took {(end-start).microseconds} microseconds")
    
log_time(foo, 1, c='hello')
    

Finally, you can use `lambda` to define anonymous functions. These will be very useful when we get to using Pandas for data manipulation:

In [None]:
adder = lambda a, b: a + b

adder(1, 2)

## Classes

In [None]:
class Widget:  # same as "class Widget(object):"
    """ This is a Widget class. """  # Classes have docstrings too.
    
    def print_my_class(self):  # Instance method as it has a 'self' parameter
        """ Print the instance class. """
        print(self.__class__)  # __class__ is the easy way to get at an object's class
    
    @staticmethod
    def print_class():  # Static method as it has no 'self' parameter
        """ Print the class class. """
        print(Widget)
        
        
x = Widget()  # We don't use 'new' in Python
x.__doc__  # __doc__ has the docstring

In Python, we can declare a class with `class(base)`. If the base class is omitted then `object` is assumed.

As mentioned earlier, instance methods take an explicit `self` first parameter which references the instance. So if `widget` is an instance of a `Widget` class and we call:

```python
widget.foo()
```

internally that gets converted to the equivalent of:

```python
Widget.foo(widget)
```

To declare an instance method, we omit the `self` argument and use a `staticmethod` decorator. The latter prevents the instance being passed as a parameter when we call the method from that instance.

In [None]:
help(x)

In [None]:
x.print_my_class()

In [None]:
x.print_class()

In [None]:
Widget.print_class()

In [None]:
Widget.print_my_class()

Note that if we had:

```python
class Foo():
     def s1():
         print('s1')

     @staticmethod
     def s2():
         print('s2')
```

then we could call `Foo.s1()` or `Foo.s2()` with no issues, but if `foo` was an instance of `Foo`, while we could call `foo.s2()` without a problem, if we called `foo.s1()` we would get an error:

```
TypeError: s1() takes 0 positional arguments but 1 was given
```

because Python would try to pass the instance as a parameter as it is missing @staticdecorator.

We can get the docstring of the class with `help`:

### Constructors and visibility

A class does not require a constructor, but can have (at most) one. The constructor is an instance method named `__init__`. It can take additional parameters other than `self`.

Python does not support private or protected members. By convention, private members should be named starting with an underscore, but this is an 'honor system'; everything is public. Also by convention, you should avoid double underscores; that should be reerved for dunder-methods.

In [None]:
class Bug:
    """ A class for creepy crawly things. """
    
    heads = 1  # This is a class variable
    
    def __init__(self, legs=6, name='bug'):
        self.legs = legs  # Any variable assigned to with self.var = ... in constructor is an instance variable
        self.name = name
    
    @staticmethod
    def _article(name):  # 'private' class method
        """ Return the English article for the given name. """
        return 'an'if 'aeiouAEIOU'.find(name[0]) >= 0 else 'a'

    def article(self):  # 'public' instance method
        """ Return the English article for the given name. """
        return Bug._article(self.name)
    
    def __repr__(self):  # __repr__ is called to get a printable representation of an object
        return f"I'm {Bug._article(self.name)} {self.name} with {self.legs} legs"

# Notice how help() will show help for article() but not _article().
# It respects the '_' convention for 'privacy'.
help(Bug)

In [None]:
Bug()

In [None]:
Bug(legs=8)

It is recommended to always define a `__repr__` method on your classes.

### Inheritance

Python supports both single and multiple inheritance (which we won't discuss). To up-call to a base method we use `super()`:

In [None]:
class Insect(Bug):
    
    def __init__(self):
        super().__init__(name='insect')
        
Insect()

In [None]:
class Spider(Bug):
    
    def __init__(self):
        super().__init__(legs=8, name='spider')
        
Spider()

### Under the Hood

You can skip this section if you're not interested, but it can be useful to have some understanding of how classes work in Python.

Classes and class instances both have a `.__dict__` attribute that holds their methods and variables/attributes. For example:

In [None]:
class Example:
    """ this is a class docopt string. """
    
    class_var = 'this is a class variable'
    
    def __init__(self):
        """ This is an instance docopt string. """
        self.instance_var = 'this is an instance var'
        
    def class_method():
        """ This is a class method docopt string. """
        pass
    
    def instance_method(self):
        return self.instance_var
    
Example.__dict__

In the case of classes we really have a special object, a `mappingproxy`; this is a wrapper around a dictionary that makes it read-only and enforces that all keys are strings.

In [None]:
# Similarly for an instance, although this really is a dict, not a mappingproxy.
e = Example()
print(e.__dict__)
print(e.__dict__.__class__)

In [None]:
# Instances have a .__class__ attribute that points to their class.
e.__class__

In [None]:
# To change a class variable, qualify with the class name:

e2 = Example()
print(e.class_var)
print(e2.class_var)

Example.class_var = 'Changed class var'

# Note how it is changed for all instances
print(e.class_var)
print(e2.class_var)

In [None]:
# If you qualify with an instance instead, you'll end up creating an instance variable instead!
e2.class_var = 'e2 class var is actually an instance var'
print(e.class_var)
print(e2.class_var)
print(e.__dict__)
print(e2.__dict__)

In [None]:
# When we dereference an instance method, we get a *bound method*; the instance method bound to the instance:
e.instance_method

In [None]:
# We can save a reference to the bound method and call it later and it will use the right instance

f = e.instance_method
e.instance_var = 'e\'s instance var'
f()

There's a lot more to it than this, but this should give you some idea of how Python can support monkey-patching at run-time and other flexibility.

## Exceptions

You can raise an exception with the `raise` statememt. You can give an instance of any class that derives from the `BaseException` class. You can catch exceptions using `try: except:`. If you want to get a reference to the exception, use `catch..as..`:

In [None]:
try:
    raise Exception('The dude minds, man!')
except Exception as x:  # Exception is the type of exception to catch, x is the variable to catch it with.
    print(x)
    
# You can catch different types of exceptions, and you can use 'raise' on its own in the exception handling
# block to rethrow the exception.

def average(seq):
    "Compute the average of an iterable. "
    try:
        result = sum(seq) / len(seq)
    except ZeroDivisionError as e:
        return None
    except Exception:
        raise
    return result

print(average([]))
print(average(['cat']))

## Comprehensions

Comprehensions are a powerful feature in Python, allowing lists, dictionaries and tuples to be constructed from iterative computations with minimal code. These are best illustrated by examples:

In [None]:
# A list of all squares from 1 to 25
[x*x for x in range(1, 6)]

In [None]:
# A list of all squares from 1 to 1024 except those divisble by 5
[x*x for x in range(1, 33) if (x*x) % 5 != 0]

In [None]:
# Comprehensions can be nested
t = [
    ['1', '2'],
    ['3', '4']
]

# Make a list of lists from t where we convert the strings to floats
[[float(y) for y in x] for x in t]

In [None]:
# Dictionary comprehension
{ f'Square of {x}': x*x for x in range(1, 6)}

## Iterators and Generators

A Python iterator is an object with a `__next__` method for sequential access, that raises a StopIteration when done.

A Python iterable is an object that defines a `__getitem__` method that can take sequential integer indices starting from 0 (so not necessarily random access) and raises an IndexError when done, or that has an `__iter__` method which returns an iterator.

See https://docs.python.org/3/tutorial/classes.html#iterators for more; here's an example from that link:

In [None]:
class Reverse:
    """Iterator for looping over a sequence backwards."""
    def __init__(self, data):
        self.data = data
        self.index = len(data)

    def __iter__(self):
        return self

    def __next__(self):
        if self.index == 0:
            raise StopIteration
        self.index = self.index - 1
        return self.data[self.index]
    
for char in Reverse("spam"):
    print(char)

*TODO* add more here, inclduing generators. Worth noting that using these is very idiomatic to Python normally (see the *Fluent Python* book for example), but in the data science domain, this idiom is more commonly replaced by vectorising. This web-based book goes deep into this different way of thinking: http://www.labri.fr/perso/nrougier/from-python-to-numpy/

## async/await

Python runs as a single-threaded process. That means things like I/O can slow things down a lot. It is possible to use multiple threads - there are several libaries for that - but even with a single thread big improvements are possible with async code. The details are beyond the scope of the bootcamp, but more info is available here: https://docs.python.org/3/library/asyncio-task.html. Recent changes in Python have made this much more powerful, flexible and easy to use.

## Type Annotations

Python has some mechanisms for doing optional type annotations. These can improve execution speed and there are some packages that can enforce type checking at run-time. It's not a bad idea to start using these but they're out of scope of this bootcamp. 

See https://docs.python.org/3/library/typing.html and http://mypy-lang.org/ for more.


## Logging

See https://opensource.com/article/17/9/python-logging for detals on Python logging.

I recommend looking at Daiquiri, which biulds on top of the standard logging library and make things easy:

https://julien.danjou.info/blog/python-logging-easy-with-daiquiri

In [None]:
import sys
!{sys.executable} -m pip install daiquiri

In [None]:
import logging
import daiquiri

daiquiri.setup(level=logging.INFO)

logger = daiquiri.getLogger("bootcamp")
logger.info("It works and logs to stderr by default with color!")

## Converting between Lists/Dictionaries and JSON

Non-tabular data can be stored in dictionaries, which may be nested and contain lists. This is similar to JSON data on the web and in Javascript, and Python provides a `json` package for converting between these formats.

In [None]:
import json

my_albums = [
    {
        'title': 'Tales of the Inexpressible',
        'artist': 'Shpongle',
        'year': 2001,
        'tracks': [
            { 'title': 'Dorset Perception', 'time': '8:12' },
            { 'title': 'Star Shpongled Banner', 'time': '8:23' },
            { 'title': 'A New Way to Say Hooray!', 'time': '8:32' },
            { 'title': 'Room 2ॐ', 'time': '5:05' },
            { 'title': 'My Head Feels Like a Frisbee', 'time': '8:52' },
            { 'title': 'Shpongleyes', 'time': '8:56' },
            { 'title': 'Once Upon the Sea of Blissful Awareness', 'time': '7:30' },
            { 'title': 'Around the World in a Tea Daze', 'time': '11:21' },
            { 'title': 'Flute Fruit', 'time': '2:09' },
        ],
    }
]

j = json.dumps(my_albums)  # Convert to JSON string
print(type(j))
j

In [None]:
p = json.loads(j)  # Convert from JSON string to Python object
print(type(p))
p

## Dates and Times

It's worth briefly discussing Python's support for date and time operations as these are relevant to the exploratory data analysis we will be doing.

The standard library has two modules related to this area:

- `time`, which includes many low-level wrappers around platform C APIs. In particular, routines that convert between epoch time (from Jan 1, 1970) to the various time components found in a C `tm` struct. The most useful functions here are related to getting the system time zone and the `time.sleep()` function which pauses execution;
- `datetime` which provides a more high-level set of functions for dealing with dates, times, and time intervals; this is the module we will focus on here.

In addition to this, there are some good third-party libraries to be aware of, that, amongst other things, provide flexible date parsing operations from different formats. The most commonly used one, that extends the functionality of `datetime`, is `dateutil` (https://dateutil.readthedocs.io/en/stable/) but another that is growing in popularity is `arrow` (http://arrow.readthedocs.io/en/latest/) which provides a completely different approach with a very natural API. 

The `datetime` module (https://docs.python.org/3.6/library/datetime.html) defines five classes:

- `datetime`, combining a date and time
- `date`, a date only with no time component
- `time`, a time of day only, with no date component
- `timedelta`, an interval between two points in time
- `tzinfo`, a class that contains information about a time zone


## Cool Stuff

See https://github.com/tukkek/notablepython

Concise reference: https://github.com/mattharrison/Tiny-Python-3.6-Notebook

The Hitchhikers Guide to Python documents many best practices: http://docs.python-guide.org/en/latest/

Easily progress bars to outer loops (works in Jupyter and console): https://pypi.python.org/pypi/tqdm

For anyone who wants to get really serious about Python, Mark Lutz's and David Beazley's books are good but some are dated, but the best book on the language itself is IMO "Fluent Python" by Luciano Ramalho. There are also many excellent talks at http://pyvideo.org/. 

Blog aggregator for Python: http://planetpython.org/


## Going Deeper

### The sys module

`sys.modules` is a dictionary of the currently imported modules:

In [None]:
import sys

sys.modules

`sys.path` is the path to look for imports:

In [None]:
sys.path

### Using Threads and Processes

See https://medium.com/@bfortuner/python-multithreading-vs-multiprocessing-73072ce5600b

### Extending Python with C code

See https://dbader.org/blog/python-ctypes-tutorial#.

### Functional Programming in Python

See https://docs.python.org/dev/howto/functional.html#iterators and http://coconut-lang.org/

### Making HTTP Requests and Parsing Responses

There are numerous ways to do this in Python, but the most commonly used libraries for these are `requests` (http://docs.python-requests.org/en/master/) and Beautiful Soup (https://www.crummy.com/software/BeautifulSoup/); look at those first before considering anything else as they are powerful, stable, mature and easy to use. Kenneth Reitz, who wrote `requests`, has recently implemented a new library on top of both `requests` and Beautiful Soup: https://github.com/kennethreitz/requests-html



## Exercise - write a function to count the number of characters, words and lines in a file

## Exercise - write a function to count the number occurences of each word in a file