# Welcome to Python!
*David Norrish, December 2019*

If this if your first time using Jupyter and/or Python, congratulations that you've made it this far.

### Itinerary for today
I have no idea how far we'll get, but here's the road map

1. Packages and environments: Anaconda & pip
1. Python
1. Code quality: YAPF & Pylint
1. Data wrangling: Numpy & Pandas
1. Seer-py
1. Plotting

In [None]:
print("Let's go!")

## 1. Packages and environments: Anaconda & pip
### 1.1. System Python vs virtual environments
- Unix systems come pre-installed with a Python (I don't think Windows does?)
- This is the "system" Python - it should never be meddled with!
- Instead, create **virtual environments** to manage projects and packages
  - **node_modules** vs Python paths

### 1.2. Virtual environments
- There are at least half a dozen widely used virtual environment management tools in Python - it's a bit of a mess
- Possibly the most widely used in scientific computing (and the easiest to get started with) is [Anaconda](https://docs.anaconda.com/anaconda/install/)
  - environment manager (conda - like `NVM`)
  - Python & scientific computing packages
  - (package manager*)

Playing with environments:

    conda info --envs
    conda create --name testenv python=3.7
    conda activate testenv

### 1.3. Packages
- For installing packages, the go-to method is the much-beloved `pip` ("Pip installs packages")
  - Cf. `NPM`
- Can install anything hosted on the [PyPI (Python Package Index](https://pypi.org/).

See current packages:

    pip freeze

Install some more:

    pip install numpy pandas
    pip install -r requirements.txt

### 1.4. Jupyter tips and tricks

- State is shared across all cells - enjoy/beware!
- Files are saved as long gross JSON files that change every time you execute a cell
  - BAD for version control: large files, merge conflict hell, all data retained in notebook
  - Best used for RnD, refactoring valuable code into modules
  - A recommended practice is to have two types of notebook:
    - Owned by owned by one person (with their initials prepended to filename) - no one else touch
    - Report/demonstration notebooks - refined by multiple people, but careful with version control & file sizes

function                | shortcut
------------------------|---------------------
execute a cell          | ctrl + enter (stay on cell) / ctrl + shift (move to next cell)
create a new cell       | a (above) / b (below)
copy a cell             | c
paste a cell            | v
cut a cell              | x
undo copy/paste/cut     | z
delete a cell           | dd
hide left side-bar      | cmd + b
show function docstring | shift + tab (while keyboard cursor in function)

`Jupyter` is based on `iPython`, an improved Python REPL with tab complete, history & [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html). I recommend `iPython` this for command-line Python fun times.

Here's an example of a useful magic command - this forces reloading of any imported packaged every time before executing code. This is essention if you're editing code in another module that you're importing here/

In [None]:
%load_ext autoreload
%autoreload 2

## 2. Python

Some major language points:

- dynamic weakly typed BUT
- object-oriented, procedural, (functional if you like - I mostly do)
- standard implementation is Cpython
  - written in C
  - interpreted scripting language
  - so slow*, especially loops and especially recursion
  - Cf. pypy (pure Python, JIT compiled), Cython (statically typed and pre-compiled)

### 2.1 Getting started: Types

Note that in Python everything is an object. As such there are no true "primitive" data types.

- Numeric: `float`, `int` (yay, fewer floating-point annoyances!)
- String: `str`
- Boolean: `bool`
- Containers: `list`, `dict`, `set`, `tuple`, (`frozenset`)

NB `dict` and `set` are basically hash tables. `list` is a linked list.

#### 2.1.1. Numeric types

In [None]:
3 == 3.0

In [None]:
4 * 2 + 7

In [None]:
4 * 2.5

In [None]:
1e2 / 5

In [None]:
int(20.3)

In [None]:
# Modulo
7 % 3

In [None]:
import math

radius = 5
(math.pi * 5)**2

In [None]:
math.cos(2 * math.pi)

#### 2.1.2. Strings

In [None]:
food = 'quarter pounder with cheese'
food.title()

In [None]:
print(food + " weighs ~0.25 pounds")
food

In [None]:
"pound" in food

In [None]:
# Test whether all whitespace
food.isspace()

In [None]:
food.replace(' ', ' ~!~ ')

In [None]:
food[:15]

In [None]:
food[-6:]

#### 2.1.3 Lists and tuples

- `Lists` are mutable (so easy to extend and modify), and are generally used for series of a given object type (e.g. a list of names or numbers)
- `Tuples` are immutable and generally used for mixed data type of fixed length.

In [None]:
words = food.split(' ')
print(words)
type(words)

In [None]:
len(words)

In [None]:
# Python uses 0-indexing
words[0]

In [None]:
words[-2]

In [None]:
some_words = words[1:3]
some_words

In [None]:
# Mutability
words[0] = 'half'
words.append('and')
words.append('pickles')
words.insert(2, 'lettuce')
words

In [None]:
# You can extend lists using list1.extend(list2), or simply:
words + ['MORE', 'WORDS']

You can sort a list by calling its `sort` method, or using the `sorted` command.

Somewhat confusingly, the former is an in-place operation whereas the latter returns a new object.

In [None]:
words.sort()
words

In [None]:
other_words = 'A man, a plan, a canal: Panama'.split()
other_words

In [None]:
other_words_sorted = sorted(other_words)
print(other_words)
other_words_sorted

In [None]:
words.reverse()
words

In [None]:
popped = words.pop()
print(popped, '-', words)

In [None]:
print('pickles' in words)
print(popped in words)

Tuples! Very similar to lists but immutable

In [None]:
# A tuple of (taxa, number of legs, whether drinks blood)
cat = ('Mammalia', 4, False)
tick = ('Arachnida', 8, True)
starfish = ('Asteroidea', 5, True)

In [None]:
print("Guess the animal!\n")
for animal in [cat, tick, starfish]:
    print("I am class", animal[0], "have", animal[1], "legs, and", "do" if animal[2] else "don't", "drink blood")

In [None]:
# Our starfish lost an arm
try:
    starfish[1] = 4
except TypeError as err:
    print("Error:", err)

In [None]:
starlist = list(starfish)
starlist[1] = 4
starlist

#### 2.1.4. Dictionaries & Sets
- *O(1)* lookup time

In [None]:
pantry = {
    'can of chickpeas': 3,
    'cheese': 1,
    'milk carton': 2
}

pantry

In [None]:
pantry['apples'] = 5
pantry

In [None]:
pantry['milk carton'] -= 1
pantry

In [None]:
# A set
fruit = {'apple', 'banana', 'strawberry'}
fruit

In [None]:
# Alternative construction
fruit == set(['banana', 'apple', 'strawberry'])

In [None]:
popped = fruit.pop()
print(popped, "-", fruit)

In [None]:
{'strawberry', 'apple', 'pineapple'} - fruit

In [None]:
try:
    pantry[fruit] = 1
except TypeError as error:
    print("Error!:", error)

Dictionary keys must be immutable, so cannot be anything like a list or set.

In [None]:
pantry[frozenset(fruit)] = 2
pantry

### 2.2. Functions
- Functions are defined with the `def` keyword, a function name, braces containing any parameters, a colon, and an indented block
- Function exit when they encounter a `return` statement or when the indented block ends

In [None]:
def check_is_numeric(mystery_variable):
    """Check if a variable is numeric"""
    if isinstance(mystery_variable, int) or isinstance(mystery_variable, float):
        return True
    return False

In [None]:
result = check_is_numeric(42)
result

In [None]:
# If don't assign result to a variable, will just print
check_is_numeric('squirrel')

A simpler version of the same function

In [None]:
def check_is_numeric2(mystery_variable):
    """Check if a variable is numeric"""
    return isinstance(mystery_variable, (int, float))

If no return statement is encountered, the function returns `None` when it finishes.

In [None]:
def say_my_name(name, what_to_say = "you look good today"):
    """Say something to me (demonstrate default argument)"""
    print(f"Hi there {name}, {what_to_say}.")

In [None]:
message = say_my_name('David')
print("The message:", message)
type(message)

If you want to accept a variable number of argument, JS-style -

In [None]:
def heteroflexible_function(*args, **kwargs):
    """A function that takes named or unnamed arguments"""
    for arg in args:
        print("I'm an arg alright:", arg)
    for key, value in kwargs.items():
        print(key, "is", value)

In [None]:
heteroflexible_function(6, False, dingbat='Gustav', **{'science': 'cool', 'truth': 'unattainable'})

#### 2.2.1. A note on typing
- Python is *weakly dynamically typed*, so you can legally pass anything into functions, and it may or may not explode at runtime
- Consequence: quick and fun to write code, harder to maintain (and harder to read other people's code)
- Solution: When appropriate, please be generous and provide **type hints**!

In [None]:
from typing import Iterable

def get_nth_word(list_of_word: Iterable[str], index: int = 0) -> str:
    """Sort a list of words alphabetically then get the Nth word"""
    sorted_words = sorted(list_of_words)  # Creates a new list object
    try:
        return sorted_words[index]
    except IndexError:
        # Return the last word if list not long enough
        return sorted_words[-1]

#### 2.2.2 Lambda functions
These are probably quite familiar to Javascript(ers?)

In [None]:
# Revisit an earlier variable
words.sort()
words

In [None]:
# What if we want to sort by the last letter not the first?
words.sort(key=lambda x: x[-1])
words

### 2.3. Control flow
- Indentation (4 spaces) used instead of curly braces

In [None]:
temperature = 22
windy = False

if temperature < 18:
    clothes = 'jacket'
elif 18 <= temperature < 25:
    if windy:
        clothes = 'windbreaker'
    clothes = 'shirt'
else:
    clothes = 'singlet'

print("Aaaand we're wearing...", clothes)

In [None]:
for i, tool in enumerate(['spanner', 'wrench', 'spirit level']):
    print(tool * (i + 1))

In [None]:
blocks = 3
stacked = 0

while blocks >= 0:
    word = 'blocks'
    if stacked == 1:
        word = 'block'
    print(f"{stacked} stacked {word} and {blocks} to go")
    blocks -= 1
    stacked += 1

### 2.4. Mutability
(Does JavaScript doesn't have mutability? Something about Redux...)

To recap:
- The built-in immutable types are: `int`, `float`, `bool`, `str`, `tuple`, `frozenset`
- The built-in mutable types are: `list`, `set`, `dict`

User-defined classes are mutable (unless you do some fancy work to prevent it).

In [None]:
x = 6
y = x

In [None]:
print(f"x: {x}, y: {y}")
x == y

In [None]:
x = 7

In [None]:
print(f"x: {x}, y: {y}")
x == y

In [None]:
biology = {
    'brains': 'squishy',
    'kidneys': 'also squishy'
}

organs = biology

In [None]:
print(biology)
print(organs)
organs == biology

In [None]:
biology['bones'] = 'hard'

In [None]:
print(biology)
print(organs)
organs == biology

In [None]:
soft_organs = organs.copy()
del soft_organs['bones']
soft_organs

In [None]:
organs

Handy way to think about it:
- Mutable variables point to a living object
- Immutable variables hold a set value

This brings us to something to be wary of.

In [None]:
def danger_function(new_int: int, a_list = [2, 3, 4]):
    """
    It's an anti-pattern in Python to provide a default argument that's mutable
    """
    a_list.append(new_int)
    return a_list

In [None]:
test_list = danger_function(5)
test_list

In [None]:
test_list = danger_function(5)
test_list

### 2.5. Classes & objects

Classes are a way to couple attributes (variables) and methods (functions)

In [None]:
cat = ('Mammalia', 4, False)
tick = ('Arachnida', 8, True)
starfish = ('Asteroidea', 5, True)

In [None]:
class Animal:
    """An object to represent important aspects of animals"""

    def __init__(self, animal: str, num_legs: int, has_jaws: bool):
        """
        This is the constructor which gets called when you instantiate an
        Animal object
        """
        self.animal = animal
        self.num_legs = num_legs
        self.has_jaws = has_jaws
        self.lives = self._calculate_lives(animal)

    def _calculate_lives(self, animal):
        """A private method called by the constructor only"""
        if animal.lower() == 'cat':
            return 9
        return 1

    def feed(self):
        """Have a nom"""
        if self.has_jaws:
            return "Chewing away"
        # Implicit `else`
        return "Secreting digestive enzymes..."

    def advance(self, speed: str = 'slowly'):
        """Move forwards"""
        if speed == 'slowly':
            if self.num_legs % 2 == 0:
                return "Walking forward"
            return "Slithering along"
        elif speed == 'quickly':
            if self.num_legs % 2 == 0:
                return "Running!"
            return "Slithering disturbingly quickly"
        raise ValueError("`speed` must be 'quickly' or 'slowly', got:", speed)

    def __str__(self):
        """Define how the object prints"""
        word = 'life' if self.lives == 1 else 'lives'
        return f"{self.animal.title()} with {self.num_legs} legs, {self.lives} {word} remaining"

In [None]:
print(Animal)

In [None]:
mr_meowmeow = Animal('cat', 4, True)
leggalot = Animal('starfish', 5, False)

In [None]:
print(mr_meowmeow)
print(leggalot)

In [None]:
mr_meowmeow.advance('slowly')

In [None]:
# Oh no!
mr_meowmeow.lives -= 1
print(mr_meowmeow)

In [None]:
print(leggalot.feed())

In [None]:
class Mammal(Animal):
    """A subclass of Animal; inherits everything"""
    def __init__(self, animal: str, herbivore: bool):
        """
        Because this is a subclass, we must use `super` to access the
        __init__ (constructor) of the parent class.
        """
        super().__init__(animal=animal, num_legs=4, has_jaws=True)
        self.herbivore = herbivore

    def feed(self):
        """Overload this method from the parent class"""
        if self.herbivore:
            return "Chew some cud"
        return "Better go kill something"
        
    def feed_young(self):
        """A special mammal power"""
        return "Milk time!"

In [None]:
maru = Mammal('cat', False)
print(maru)

In [None]:
maru.feed()

## 3. Code quality: YAPF & Pylint
A brief aside to mention a couple of useful tools for making sure your Python is up to scratch and looking pretty.

The holy book of Python style is called [PEP8](https://www.python.org/dev/peps/pep-0008/). Intermediate Pythoners should definitely check it out.

- By default VS Code asks to install `Pylint`, which is a pretty good option for basic PEP8 compliance.
- Another great addition is `YAPF` to automatically format all code consistently. We'll probably move to having it as a pre-commit hook.
- NB there are a bunch of other linters and formatters (`Black`, `Mypy`, `Autopep8` etc).

Here we gooo!

    pip install pylint yapf
    
    # Lint file
    pylint {filename}

    # Auto-format [see diff/change in-place]
    yapf {filename} -d/-i

## 4. Data wrangling: Numpy & Pandas
For anything involving tabular data, matrices, non-trivial mathematical operations etc, life is infinitely better with `Numpy` and `Pandas`.
  - Almost always imported as `np` and `pd` by convention

In [None]:
import numpy as np
import pandas as pd

# A useful option to make Pandas show more columns than the default
pd.set_option('display.max_columns', 200)

### 4.1. Speed
The standard implementation of Python, while written in C, is undeniably pretty slow. Due to:
- High-level
- Dynamic typing (lots more lookups to do)
- The infamous GIL ("global interpreter lock" - a mutex that allows only one thread to hold the control of the Python interpreter)
  - (Python uses reference counting for memory management, so the GIL protects from race conditions where two threads change its value simultaneously. Also means no deadlocks, as only one lock)

HOWEVER, there are a bunch of Python libraries that run in optimised C code which can achieve excellent speeds.

The take-home is that you should always avoid using loops if you can, and instead use "vectorized methods". Let's see how long it takes to find the mean of a list using pure Python versus two important packages: `Numpy` and `Pandas`.

In [None]:
def mean(list_of_nums):
    """Find the mean of a list of numbers"""
    total = 0
    for num in list_of_nums:
        total += num
    return total / len(list_of_nums)

In [None]:
# Use Numpy to generate a bunch of random ints
num_array = np.random.randint(low=0, high=100, size=1000000)
num_list = list(num_array)  # Python unpacks Numpy arrays into lists, so do that here

In [None]:
type(num_array)

This is a new data type*. It has an enormous number of build-in methods.

\**I believe `np.ndarray` objects implemented as true arrays, whereas Python `list` objects are linked lists.*

In [None]:
%%time
mean(num_list)

In [None]:
%%time
num_array.mean()

In [None]:
# Can also call Numpy's mean() method on any iterable
%time np.mean(num_array)
%time np.mean(num_list)  # Slower because must load into an ndarray first

`Pandas` is sort of like a wrapper around `Numpy` for tabular data. It stores its data in `Numpy` format and offers many of the same functionalities, and is more optimised for Excel-type operations like creating pivot tables, and/or visualising data a bit easier.

In [None]:
# Load the array into a Pandas `Series` object
series = pd.Series(num_array)
series.iloc[:10]

In [None]:
%%time
series.mean()

### 4.2. Working with files
To deal with file paths, the standard library `pathlib` is the way to go.
- NB that this is a newish library introduced in Python 3, so a lot of old code samples use the `os` library
- (`os` does a lot more than `pathlib`, so it's still good for other things)

In [None]:
from pathlib import Path

# Look up the current working directory
cwd = Path.cwd()
cwd

Let's get a list of all the files in this directory.

In [None]:
local_files = cwd.glob("*")  # '*' matches all files

In [None]:
local_files

*Generators* are a little advanced to explain properly right now.
- They are like a list that hasn't been computed yet
- You can only ever look at the next item in a generator, and it is then discarded when you move onto the following item
- You can always cast a generator to a list for simplicity

In [None]:
local_files = list(local_files)
local_files

In [None]:
# Use Pandas to read in a sample dataset about adult income
# https://archive.ics.uci.edu/ml/datasets/Adult
ecg_path = cwd / 'income.csv'
df = pd.read_csv(ecg_path)
df.shape

In [None]:
df.head()

### 4.2. Wrangling
`Pandas` is really its a whole micro-language, so I'll just demonstrate a few common operations.

In [None]:
df.dtypes[:6]

Pandas doesn't really have a `str` data type, so any columns with strings will be generalised to `object` type (everything is an object in Python). Be careful, as there could be other data types like numbers in these object columns.

In [None]:
# Get statistical summary of all numeric columns
df.describe()

DataFrame are basically a column index, a row index and a *[n x m]* array of Numpy data.

In [None]:
# columns
df.columns

In [None]:
# In this case our index is just an incrementing integer, not labels
df.index[:10]

In [None]:
type(df.values)

In [None]:
df.values

In [None]:
# Index first 5 rows and columns 3-5 (remember 0-indexing)
df.iloc[:5, 3:6]

In [None]:
# You can also index by label, which indexes inclusively
# In this case
df.loc[:4, 'education':'marital-status']

In [None]:
# Grab a selection of columns - shortcut for df.iloc[:, [cols...]]
df[['relationship', 'workclass']].head()

In [None]:
# Grab a selection of rows
df[1000:1005]

In [None]:
# Can pull out a single column to a pd.Series object
salaries = df['salary']
salaries

In [None]:
salaries.value_counts()

In [None]:
salaries.unique()

In [None]:
rich_df = df[df['salary'] == ' >50K']
rich_df.shape

In [None]:
df['occupation'].value_counts()

In [None]:
df['workclass'].value_counts()

In [None]:
# Filter with multiple criteria
# Find all craft repairs and sales government workers (yes this is contrived)
df[(df['occupation'].isin({' Craft-repair', ' Sales'})) & 
   (df['workclass'].str.contains('gov'))]

Some common tasks are counting values, perhaps by another category.

In [None]:
df.groupby('sex')['hours-per-week'].mean()

In [None]:
df.groupby('marital-status').agg({
    'age': 'median',
    'capital-gain': 'max',
    'capital-loss': 'std',
    'native-country': 'count',
    'salary': lambda series: (series == '>50k').count()
})

Pivot tables are handy too

In [None]:
df.pivot_table(index='marital-status', columns='race',
               values='age', aggfunc='median')

In [None]:
df.pivot_table(index='education', columns='sex',
               values='hours-per-week', aggfunc='mean')

### 4.3 Plotting
Finally, a quick demo of using the most standard plotting package: `matplotlib`

In [None]:
# Read in some sample ECG data
ecg_path = cwd / 'ecg.csv'
ecg_df = pd.read_csv(ecg_path)
print(ecg_df.shape)
ecg_df.head()

In [None]:
ecg_df.mean()

We'll plot the three lines on a single axes for simplicity, so to separate the signals we'll add some constants.

In [None]:
ecg_df['ECG1'] += 0.002
ecg_df['ECG2'] += 0.001
ecg_df.head()

Matplotlib involves creating a figure, then one or more axes for that figure, then adding series to the axes.

In [None]:
import matplotlib.pyplot as plt
# plt.style.use('seaborn-whitegrid')

In [None]:
fig = plt.figure()
ax = plt.axes()

In [None]:
for col in ecg_df.columns:
    plt.plot(ecg_df.index, ecg_df[col])

In [None]:
from signal_filter import fir_filter

In [None]:
filter_df = ecg_df.copy()

In [None]:
test_df = filter_df.apply(fir_filter)

In [None]:
test_df.head()

In [None]:
plt.subplot(311)
plt.plot(test_df.index, test_df['ECG1'])
plt.subplot(312)
plt.plot(test_df.index, test_df['ECG2'])
plt.subplot(313)
plt.plot(test_df.index, test_df['ECG3']);