# ![Python logo](data/python.ico) Hello Python!

## Python: history and now

- Created by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum) in 1990.  At the time he was a postdoc at CWI, Amsterdam, currently he is at Dropbox.
- Named after [**Monty Python**](https://en.wikipedia.org/wiki/Monty_Python%27s_Flying_Circus), Guido is a big fan.
- Development guided by the Python Steering Council,
  - entirely community led effort,
  - supported by the Python Software Foundation.

## The Python ecosystem

- Ease of use among its primary considerations
- Highly extendable, including through extensions written in lower level languages like C/C++.
- Natively there are certain performance limitations (e.g. *Global Interpreter Lock* or GIL), but effectively not a limitation as it can be worked around with extensions; e.g. `numpy`, `xarray`, `pandas`, `numba`, `pyarrow`, etc.
- Writing *native* extensions is easier with `Cython` (Python + `type` information + ability to by-pass the GIL)

## Understanding variables

Variables hold a *reference* to a value,
- can be *objects* of simple *types* (e.g. numbers, strings, booleans),

- user defined types,

- *sequences* or *containers* (contains other variables), and others (esp. in Python).

### Literals

In [None]:
# values
1, 3.14, 0b10, 0x1e, "string", b"bytes"

In [None]:
a = 2.14  # variable assignment: hold a reference to a value
a + 1  # the variable refers to the value later

In [None]:
a # what is a?

To refer to an `object` later, you must store a reference to it in a variable

### Numbers

In [None]:
x, y = 3, 3.0
x, type(x), y, type(y)

In [None]:
# alternate notations for readability
p, q = 10_000, 1.1e4
p, type(p), q, type(q)

#### Numeric operations

In [None]:
# increment, similar for most other operators: -=, *=, /=
a = 40
a = a + 1
a += 1
a

#### Numeric operations

In [None]:
# division
1 / 2, 4 / 3, 4 / 2

In [None]:
# floor division
1 // 2, 4 // 3, 4 // 2

In [None]:
# modulo/remainder, exponent
4 % 3, 3 ** 3

### Booleans

In [None]:
eq = p == 10_000
eq, type(eq)

In [None]:
# numbers and strings: 0, empty string, None -> False, everything else -> True
bool(1), bool(0), bool(""), bool("foo"), bool(None)

In [None]:
# containers: empty -> False, has element -> True
bool(list()), bool([1]), bool(set()), bool({1}), bool(dict()), bool({1: 4})

In [None]:
bool(None), bool([]), bool([1])

In [None]:
bool([None])  # given the above, what does this evaluate to?

#### Boolean operations

In [None]:
-2 > 3 or -1 < 0, 2 > 3 and -1 > 3, not True

##### Truth table

<table>
<tr><th>Logical OR</th><th>Logical AND</th></tr>
<tr><td>

| `or` | T | F |
|:----:|---|---|
| T    | T | T |
| F    | T | F |

</td><td>

| `and` | T | F |
|:-----:|---|---|
| T     | T | F |
| F     | F | F |

</td></tr> </table>

### Strings

In [None]:
txt = "foo bar baz"  # 'also valid'
txt, type(txt)

In [None]:
# escaping, and nested quotes
'Don\'t be an ass', "Don't be an ass"

In [None]:
multi = "First\nSecond"
multi

In [None]:
print(multi)

#### Triple quoted strings

In [None]:
prose = """This is a pre-formatted string.

You may have paragraphs, and lists:
- an item
- no need for "escaping"

    Or whatever you like

"""
print(prose)
prose

#### String operations

In [None]:
# concatenation
"foo" "bar"

In [None]:
"foo"\
"bar"

In [None]:
# append
a = "foo"
b = "bar"
a + b

In [None]:
# multiply
"--8<-" * 5

In [None]:
# in
"foo" in "foo bar baz", "foo" not in "foo bar baz"

### User defined types

In [None]:
class MyClass:
    def __init__(self, name):
        self.name = name

    def __repr__(self):
        return f'MyClass("{self.name}")'


instance = MyClass("foo")

instance, type(instance)

### Containers

In [None]:
l = [1, 2.0, 3, 4.0, "foo", True]
t1 = (1, 3.14, True)

type(l), type(t1)

In [None]:
# append
l + [1, 2]

In [None]:
# multiply
t1 * 3

### Indexing

Syntax: `container[index]`

In [None]:
l[0] == t1[0] == 1

In [None]:
l[-1] == True

### Slicing

Syntax: `container[start:stop:step]` (only one is required)

In [None]:
l, l[:2], l[1:3]

In [None]:
l[0:4:2], l[::3]

In [None]:
txt = "foo bar baz"
txt[4:7]

### Mutable and immutable types

Certain types, once created cannot be "edited"; e.g. `str`, `tuple`.

In [None]:
l  # mutable

In [None]:
l[0] = 1000
l

In [None]:
t1, txt  # immutable

In [None]:
t1[0] = 1000

In [None]:
txt[5] = "e"

### Operators & type conversion

- Implicit type conversion on supported operations
- Result is the most representative type: support both operands **and** the result

In [None]:
1 +  3.14, type(1 + 3.14), 1 + True, type(1 + True), 2.16 * False

In [None]:
"1" + 1

In [None]:
1 + "1"  # same for: "foo" - "foo"

## Flow control

- conditionals
- iteration
- callable / functions: reusable routines

### Conditionals

In [None]:
# if .. else ..
txt = "foo bar baz"
if "foo" in txt:
    pass
elif "bar" in txt:  # optional
    pass
else:  # optional
    pass

In [None]:
# ternary conditional
par = None
5 if par is None else par

### Iteration

#### `for` loops

In [None]:
# iterate over items
t1 = (1, 3.14, True)
for i in t1:
    print(i)

In [None]:
# iterate by index
for i in range(len(t1)):  # len(..) returns the length of a sequence
    print(t1[i])

In [None]:
# iterate by index
for i in range(1, 10, 2):
    print(i)

#### `while` loops

*What does the following do?*

In [None]:
# flexible iteration
txt = "foo bar baz"
i = 0
res = []
while i < len(txt):
    res += [txt[-1 - i]]
    i += 1
"".join(res)  # calling: str.join(Iterable[str]) -> str

**Ans:** *Reverse the string `txt`*

#### `break` and `continue` statements

*What does the following do?*

In [None]:
txt = "foo bar baz"
i = 0
while i < len(txt):
    if txt[i] == "b":
        break  # prematurely ends any iteration
    i += 1
txt[:i]

**Ans:** *Finds the first b / Shows the string upto the first b*

*What does the following do?*

In [None]:
txt = "foo bar baz"
i = 0
res = ""
while i < len(txt):
    char = txt[i]
    i += 1
    if char == "b":
        continue  # skip this iteration
    res += char
res

**Ans:** *Remove all b-s from the string*

### Callables / Functions

- Wraps a code block that can be reused, arguments act as parameters
- Ends in a `return` statement; returns control back to the caller

In [None]:
def add(i, j):
    return i + j

def sub(i, j):
    return i - j

def op(i, j, operator=add):  # default operator: addition
    """Applies a binary operator to two numbers"""  # <- docstring
    return operator(i, j)

In [None]:
op(3, 4)  # addition

In [None]:
op(3, 4, sub)  # subtraction

In [None]:
# multiply w/ anonymous function
op(3, 4, lambda i, j: i * j)

In the absence of a `return` statement, `return None`

In [None]:
def myfunc():
    pass

In [None]:
res = myfunc()
print(res)

#### Anonymous / `lambda` functions

- no statements
- only expressions

#### Positional and keyword arguments

In [None]:
def myfunc(pos1, pos2, kw1="foo", kw2="bar"):
    print(pos1, pos2, kw1, kw2)

In [None]:
# what's the print order?
myfunc(1, 2, 3, 4)

In [None]:
myfunc(1, 2, kw2="random", kw1="order")

In [None]:
myfunc(5, kw1="bla", pos2=99, kw2="dibla")

In [None]:
# explain the errors
myfunc()

In [None]:
myfunc(42)

In [None]:
myfunc(kw1=42)

## Variable scope

Code blocks, at any indentation level, are in the *same scope*

In [None]:
num = 42

for i in range(3):
    if i == 0:
        assert num == 42
        num = 5
    assert num == 5

assert num == 5

Functions have access to variables in the current scope *during execution*

In [None]:
num = 42

def testfn1():
    assert num == 42, f"{num} is not 42"

In [None]:
testfn1()

In [None]:
num = 5
testfn1()

In [None]:
del num

In [None]:
testfn1()

When 2 variables have the same name, the variable in the inner most scope is said to *shadow* the outer variable

In [None]:
num = 42

def testfn2():
    num = 5
    assert num == 5, f"{num} is not 5"

In [None]:
testfn2()

In [None]:
assert num == 5, f"{num} is not 5"

*Review errors at this point*

<center><strong>&#9646;&#9646;</strong></center>

### How to help yourself

- Use `help(object|"syntax token")`
- docstrings
- function signatures
- Library reference, tutorials, HOWTOs, etc @ [Python.org](https://docs.python.org/3.7/)

### Problem 1

- Read a sequence, and count the number of bases of each type.
- Print out the counts as a table, or write it to a CSV file.

In [None]:
from tutorial.io import read_file, fasta_seqs

# `next` will work only twice, as the file has only two sequences
fasta = read_file("data/example.fa")
seq_itr = fasta_seqs(fasta)
_, seq1 = next(seq_itr)  # work with seq1, it's a string

In [None]:
seq1[:12]

### Problem 2

- Read a sequence, and identify the sequence of *Codons*.
- Print out the position and codon as a table, or write it to a CSV file.

In [None]:
from tutorial.seq import CODON_MAP

_, seq2 = next(seq_itr)
# CODON_MAP["START"], list of starting seq
# CODON_MAP["STOP"], list of all stopping seqs
# CODON_MAP["REST"], list of all other seqs

In [None]:
CODON_MAP["STOP"]

<center><strong>&#9654;</strong></center>

# ![Python logo](data/python.ico) More Python concepts

## Format strings

In [None]:
fmt_float = "{0} {0:.3f} {0:+.3f} {0: .3f} {0:.3e} {0:.3g}"

In [None]:
fmt_float.format(79 / 3)

In [None]:
fmt_float.format(-79 / 3)

In [None]:
fmt = "{} {} {}"

In [None]:
fmt.format("foo", "bar", "baz")

In [None]:
fmt = "{a} {b} {c}"

In [None]:
fmt.format(a="foo", c="bar", b="baz")

In [None]:
a, b, c = "foo bar baz".split()  # calling: str.split() -> List[str]
f"{a} {b} {c} {c.upper()}"

## Function calls

### Argument unpacking

In [None]:
def myfunc(pos1, pos2, kw1="foo", kw2="bar"):
    print(pos1, pos2, kw1, kw2)

In [None]:
x, y, z = (1, 2, 3)
x, y, z

In [None]:
t1 = (1, 2, "foo", "bar")
myfunc(*t1)  # also works for lists

In [None]:
d1 = {"kw1": "foo", "kw2": "bar"}
myfunc(*t1[:2], **d1)

### Arbitrary arguments

- positional arguments, followed by keyword arguments

In [None]:
def myflexiblefn(pos1, *args, kw1="foo", **kwargs):
    print(pos1, kw1, args, kwargs)

In [None]:
myflexiblefn(1, 2, 3, kw1="foo", kw2="bar", kw3="baz")

In [None]:
# argument order: positional, *args, keyword, **kwargs
myflexiblefn(**d1, *t1[:2])

**Note:** default arguments should never be mutable objects, e.g. empty containers

In [None]:
def fn1(a, b=42):  # a-okay!
    pass

def fn1(a, b=[]):  # not okay!
    pass

def fn1(a, b=None):
    # use this idiom instead
    if b is None:
        b = []
    pass

## Containers & Iteration

### List comprehension

In [None]:
[i for i in range(5)]

#### with conditionals

In [None]:
[i for i in range(10) if i % 2]

### Sets

- elements are unique

In [None]:
set("foo bar baz")

#### Set comprehension

In [None]:
{c for c in "foo bar baz"}

#### Operations with `set`s

In [None]:
a_, b_, c_ = set("foo"), set("bar"), set("baz")

In [None]:
b_.intersection(c_), a_.isdisjoint(b_)

In [None]:
b_.difference(c_), c_.difference(b_)

### Dictionaries

In [None]:
week = {
    "mon": 9,
    "tues": 1,
    "wed": 2,
    "thurs": 3,
    "fri": 4,
    "sat": 5,
    "sun": 6,
}
week

In [None]:
week["mon"] = 0
week

#### Dictionary comprehension

In [None]:
{k: (v, v < 5) for k, v in week.items()}  # add weekday boolean flag

### Merging tuples, lists, and dictionaries

In [None]:
t1 = (1, 2)
t2 = ("foo", "bar", "baz")
(42, *t1, *t2)  # same for lists

In [None]:
d0 = {"a": 1, "b": 2}
d1 = {"c": 3, "d": 4}
{**d0, **d1}

In [None]:
# later keys have precedence
{**d0, **d1, "a": 0}

## I/O

- text file formats (read using parsers):
  - delimited files: `CSV`, `TSV`
  - files with nested data: `JSON`, `XML`, `YAML`
  - genomics: `FASTA` (DNA sequence), `VCF` (variant calls)
- binary formats (dedicated libraries):
  - advanced compression support (smaller files)
  - faster (optimised for a common task)
  - formats: `HDF5` (hierarchichal), `Parquet` (columnar), `Avro` (row)

1. open file: read, write, text, binary
2. do operations: read, write, seek
3. close file (otherwise may get corrupted)
4. Standard library: `csv`, `json`, `xml`, `yaml`, `zip`, `bz2`

In [None]:
txt = open("/tmp/afile.txt", mode="w")
txt.write("foo")
txt.write("bar")
txt.write("baz\n")
txt.write(str(1) + "\n")
txt.close()

In [None]:
! cat /tmp/afile.txt

In [None]:
txt = open("/tmp/afile.txt", mode="r")  # open
for line in txt.readlines():  # work with it
    print(line)

In [None]:
txt.close()  # close

In [None]:
# why the extra lines?

In [None]:
# use context managers
with open("/tmp/afile.txt", mode="w") as txt:
    txt.write("foo")
    txt.write("bar")
    txt.write("baz\n")
    txt.write(str(42) + "\n")

In [None]:
! cat /tmp/afile.txt

### Advanced example from tutorial helpers

In [None]:
from tutorial.io import read_file

In [None]:
read_file??

<center><strong>&#9646;&#9646;</strong></center>

### Problem 1, attempt 2

- Read a sequence, and count the number of bases of each type.
- Print out the counts as a table, or write it to a CSV file.

### Problem 2, attempt 2

- Read a sequence, and identify the sequence of *Codons*.
- Print out the position and codon as a table, or write it to a CSV file.

### Problem 3

- read `data/summary.txt` as a "table", and summarise the data
- *Note:* you might need to do some cleaning, to interpret the table as numbers

<center><strong>&#9654;</strong></center>

# ![Python logo](data/python.ico) Advanced Python concepts

## Writing scripts



### *shebang* line:

    #!/usr/bin/python
    # script continues ...

or more portable:

    #!/usr/bin/env python
    # script continues ...

### `import` statements

**Syntax:** `import module` or `from module import name [, name]`

In [None]:
import sys  # module name
from pathlib import Path  # function or class name
from functools import chain, accumulate  # multiple names

### Argument parsing

In [None]:
sys.argv  # all command line arguments (including script name)

### Problem 1, attempt 3

- Read a sequence, and count the number of bases of each type.
- Print out the counts as a table, or write it to a CSV file.
- Write the solution as a script that takes a fasta sequence file as argument

*Tips:* use libraries like,
- `argparse` (in the standard library, no installation needed),
- `click` (external library, more concise *decorator* based API).

*Solutions:* see [simple](prob-1-soln-1.py), [more complete](prob-1-soln-2.py).

## Decorators

- Allows you to modify behaviour by "wrapping" an existing function
- They are functions themselves

In [None]:
def make_bold(fn):
    def wrapper(*args, **kwargs):
        return "<b>" + fn(*args, **kwargs) + "</b>"
    return wrapper

@make_bold
def hello1():
    return "Hello World!"

In [None]:
hello1()

*How does it work?*

In [None]:
@make_bold
def hello1(name="World"):
    return f"Hello {name}!"

def hello2(name="World"):
    return f"Hello {name}!"
hello2 = make_bold(hello2)  # equivalent

In [None]:
hello2("Foo"), hello3("Bar")

Reimplement retaining the function name

In [None]:
from functools import wraps

def makebold(fn):
    @wraps(fn)
    def wrapper(*args, **kwargs):
        return "<b>" + fn(*args, **kwargs) + "</b>"
    return wrapper

@makebold
def hello2(name="World"):
    return f"Hello {name}!"

In [None]:
hello1.__name__, hello2.__name__

#### Decorators with parameters

In [None]:
def format_tag(tag):
    def decorator(fn):
        @wraps(fn)
        def wrapper(*args, **kwargs):
            return f"<{tag}>{fn(*args, **kwargs)}</{tag}>"
        return wrapper
    return decorator

@format_tag("b")
def hello1(name="World"):
    return f"Hello {name}!"

@format_tag("i")
def hello2(name="World"):
    return f"Hello {name}!"

In [None]:
hello1("Felix"), hello2("Phoenix")

## Generators

- functions with lazy evaluation
- `yield` -> `return`

In [None]:
def myrange(start, end, step=1):
    """My range implementation"""
    res = start
    while res < end:
        yield res
        res += step

In [None]:
[i for i in myrange(0, 10, 3)]

## Iterators & generator expressions

- represents a stream of data, but returns one element at a time
- can be used in a `for`-loop like a container
- more efficient

In [None]:
lst_iter = iter([1, 2, 3])
a = next(lst_iter)  # get next element
b, c = lst_iter     # unpack all elements
a, b, c

In [None]:
import sys

lst = [i/2 for i in range(1_000_000)]
lst_itr = (i/2 for i in range(1_000_000))
sys.getsizeof(lst), sys.getsizeof(lst_itr)  # size in bytes

In [None]:
sys.getsizeof(lst), sys.getsizeof(list(lst_itr))  # size in bytes

### Other built-ins to create iterators

- `zip` "ties" two iterables
- `enumerate` with index

In [None]:
vowels = ("a", "e", "i", "o", "u")
for vowel, num in zip(range(len(vowels)), vowels):
    print(vowel, num)

In [None]:
for index, vowel in enumerate(vowels):
    print(index, vowel)

## Using `lambda` functions with `map`, `all`, `any`

*Note:* `map` returns an iterator

- split the strings to a pair of numbers: `["1,2", ..]` -> `[[1, 2], ..]`
  - flatten the result

In [None]:
pairs = ["1,2", "3,4", "5,6"]

In [None]:
res = list(map(lambda i: list(map(int, i.split(","))), pairs))
res

*Hints for flattening:*
- `map` and `list.extend` or nested for loop
- `itertools.chain` or `itertools.chain.from_iterable` (returns an iterator)
- `functools.reduce`

*Solutions*

In [None]:
flat = []
_ = list(map(flat.extend, res))
flat

In [None]:
import itertools

list(itertools.chain.from_iterable(res))

In [None]:
from functools import reduce

reduce(lambda i, j: i + j, res)

*Advanced variation of reduce*

In [None]:
from itertools import accumulate
from operator import add

list(accumulate(res, add))

In [None]:
# play around here

- in a random sequence of characters, find
  - if any of them are vowels,
  - if all of them are vowels,
  - count vowels

In [None]:
from random import choices
from string import ascii_lowercase

vowels = ("a", "e", "i", "o", "u")
population = choices(ascii_lowercase, k=10)  # `k` random characters

In [None]:
is_vowel = [c in vowels for c in population]
population, is_vowel

In [None]:
any(is_vowel), all(is_vowel), sum(is_vowel)

## Playground

- Play around, try different things
- apart from the standard library, you can also try `numpy` and `pandas`