# Python Basics

This module covers the foundations of Python, focusing on its use in an interactive environment such as this Jupyter Notebook.

## Expressions

A Python program, script, or Notebook cell as below, consists of one or more expressions. The expressions in a Notebook cell are evaluated line-by-line, left-to-right, and inside-out in case of nested expressions.

In [None]:
42

In case of potentially ambiguous expressions, there is a [hierarchy of operator precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence).

In [None]:
42 + 1 * 2

We can use parentheses to explicitly indicate which sub-expressions should be evaluated first:

In [None]:
(42 + 1) * 2 

In a Jupyter Notebook, the evaluation of the last expression in the cell is printed:

In [None]:
42 + 1
sum([1, 2] * 2) + 42

## Everything Is an Object

All expressions evaluate into an _object_. Every Python object has a _type_, an _identity_, and a _value_.

The type can be obtained by the `type()` built-in function:

In [None]:
type(42)

In [None]:
type(sum)

For your reference, here is the [list of all built-in functions](https://docs.python.org/3/library/functions.html).

The `id()` built-in function returns an object's identity. The identity can be thought of as a representation of the memory address where this object is stored, but this depends on the Python implementation.

In [None]:
id(42)

In [None]:
id(sum)

## Variables

Objects can be assigned to variables

In [None]:
a = 42

In [None]:
a

And variables in an expression evaluate to the value of the object assigned to them

In [None]:
b = a
b

In [None]:
a = 1

__Question__: After assigning a new value to `a`, what exactly happened to `b`? What is its value? (expand the cell below for an explanation)

Regard the memory as a space consisting of uniquely identified storage units:

<img alt="Memory Layout" src="images/var_assignment/var_assignment.001.png" width="600">

With the statement

```python
a = 42
```

the object `42` is created in one of the storage units, and the variable `a` refers to that object:

<img alt="Memory Layout" src="images/var_assignment/var_assignment.002.png" width="600">

With the statement

```python
b = a
```

a new label (or variable) `b` is created, which points to the exact same object as `a`:

<img alt="Memory Layout" src="images/var_assignment/var_assignment.003.png" width="600">

Finally, with the statement

```python
a = 1
```

the existing label (or variable) `a` is now pointing to a newly created object `1`, which is stored in a new storage unit:

<img alt="Memory Layout" src="images/var_assignment/var_assignment.004.png" width="600">

Please note that the object `42` in the storage unit with id `123` _was not mutated_! This is regardless of the existence of the variable `b`: if we only had the following statements
```python
a = 42
a = 99
```
the result would have been the creation of the two objects `42` and `99` in memory, with the variable `a` pointing to `99`.

The conclusion here is that the _assignment operator_ `=` never mutates the objects on its left hand side, but rather assigns labels to objects in memory!

In [None]:
a = 10
a += 1  # shorthand notation for a = a + 1
a

## Functions

To prevent repeating sequences of similar statements, we can organize our code by defining functions using the `def` [keyword](https://docs.python.org/3/reference/lexical_analysis.html#keywords):

In [None]:
def add_one(x):
    return x + 1

This function defines one _positional_ argument `x` and can be _called_ by using parentheses:

In [None]:
add_one(42)

The positional arguments are required: omitting them or providing unexpected arguments will result in a `TypeError`:

In [None]:
add_one()

In [None]:
add_one(42, 99)

A function returns the evaluation of the expression directly following any `return` keyword. In case of no `return` keywords in a function body, `None` is implicitly returned from the function.

_Gotcha for R users:_ In R, the evaluation of the last expression in a function is returned implicitly in case of no explicit return, which is a different default behavior than in Python!

In some cases (details will follow later), a function has access to variables defined outside of its body:

In [None]:
n = 10
def add_n(x):
    return x + n

In [None]:
add_n(42)

Let's try to re-assign `n` to a new value from within a function:

In [None]:
def add_and_change_n(x):
    n = 1
    return x + n

In [None]:
add_and_change_n(42)

In [None]:
n

This may come as a surprise: our 'global' `n` did not change! Using the `global` keyword within the function body, we have a way to reassign `n`: 

In [None]:
def add_and_change_n_global(x):
    global n
    n = 1
    return x + n

In [None]:
add_and_change_n_global(42), n

Let's try to reassign the positional argument passed to a function:

In [None]:
def change_or_not(x):
    x = 100
    return x

In [None]:
x = 1234
change_or_not(x)

__Question(s)__: 

- What is the value of `x` after calling the function?
- What was passed to `change_or_not()`? Was it the exact same object? Or a copy? How can we find out?
- What is happening at the statement `x = 100` within the function body? Reassignment of the existing `x`? Creation of a new object?

Expand the hidden cells below for explanations:

In [None]:
def change_or_not(x):
    print(id(x))
    x = 100
    print(id(x))
    return x

id(x), id(change_or_not(x)), id(x)

In [None]:
# The name of the argument doesn't matter:
def change_or_not(foo):
    print(id(foo))
    foo = 100
    print(id(foo))
    return foo

id(x), id(change_or_not(x)), id(x)

Just as in the previous example about assignments, the statement
```python
x = 1234
````
creates an integer object `1234` in memory. But there are some important details about what's exactly happening: every piece of Python code is run in an _execution frame_. The subdivision of Python code into such execution frames is explained in [the documentation](https://docs.python.org/3/reference/executionmodel.html) and too detailed for this tutorial.

What's important in this case is that the body of a function always runs in a different execution frame than the code block that declares or calls the function (which in this example is the Jupyter Notebook itself). Each of these execution frames is associated with a _scope_, which defines the visibility and resolution of names (i.e. variables).

So when we assign `1234` to `x` in our notebook cell, the name `x` is available on the global (i.e. notebook) scope:

<img alt="Global scope assignment" src="images/scopes/scopes.001.png" width="600">

When we call our function with
```python
change_or_not(x)
```
the same name `x` is also created in the function scope. Names are resolved using the nearest enclosing scope, so any usage of the variable name `x` within the function body will refer to whatever object is passed to the function as argument. Looking at the functions with debug prints above, realize that our function-scoped `x` points to the exact same object as the globally scoped `x`.

<img alt="Function scope assignment" src="images/scopes/scopes.002.png" width="600">

Finally, the assignment `x=100` within the function body assigns a new integer object `100` to the name `x` _within the function scope_. This is the reason that the assignment within the function did not reassign the variable in the enclosing scope.

<img alt="Function scope reassignment" src="images/scopes/scopes.003.png" width="600">


What happens when we declare names within a function scope? Are these also visible in the outer scope (after calling the function)?

In [None]:
def is_y_visible(x):
    y = 42
    return y + x

In [None]:
is_y_visible(10)
y

Besides positional arguments, we can also provide (optional) keyword arguments to functions. These must have default values and are defined as follows:

In [None]:
def add_optional(x, add=0, sub=0):
    return x + add - sub

In [None]:
add_optional(42, sub=10)

Default values of keyword arguments can also be variables, but there's an important gotcha!

In [None]:
a = 99
def add(x, y=a):
    return x + y

In [None]:
add(1)

In [None]:
a = 0
add(1)

As you can see, the binding of the value of `a` to the keyword argument `y` happens _only at the time when the function is defined_, and does not take into account runtime changes to `a` afterwards!

## Container Types

So far, we looked at simple numeric objects. Python offers different container types to hold collections of objects.

### List

Lists are created and indexed using square-bracket notation:

In [None]:
l = [1, 2, 3]

In [None]:
l[1]

The `:` (slice) operator allows us to retrieve arbitrary slices from a list:

In [None]:
l[0:2]

In [None]:
l[:2]

Slicing in general uses a syntax of `[from_index:to_index:step_size]`, where `to_index` is not included in the result. When using a negative step size, the list is traversed from end to beginning, and the slice also expects the `from_index` and `to_index` to go from high to low index values.

_Gotcha for R users_: List indexing (and in general indexing of all iterable object types) always starts with `0`.

In [None]:
l[::2]

In [None]:
l[-1]

In [None]:
l[-2:]

In [None]:
l[-2::-1]

In [None]:
l[::-1]

It is possible to assign list elements to new objects:

In [None]:
l[1] = 42
l

In [None]:
l = [1, 2]
m = l
m[0] = 42
m

__Question__: What is the value of `l`? Did it change?

Expand the cell below for an explanation.

By assigning `[1, 2]` to `l`, a new _list_ object was created in memory:

<img alt="List creation" src="images/containers/containers.001.png" width="600">

Because this is a container object, the list does not hold the actual values, but rather (nameless) references to the actual values. We'll learn later how these square brackets are used to look up values for any kind of container object.

When we assign the value of `l` to `m`, we only created another name that refers to the same object:

<img alt="List creation" src="images/containers/containers.002.png" width="600">

Through the statement
```python
m[0] = 42
```
we assign a new integer object `42` to the first element of our list object:

<img alt="List creation" src="images/containers/containers.003.png" width="600">

Please note that this assignment did _not_ create a new list object: we _mutated_ the existing list by reassigning one of its members to a new object. Every variable referring to this list object will see the changes that were made.

In [None]:
l = [1, 2, 3]

What's the result type of slicing a list?

In [None]:
type(l[::-1])

Is it a _copy_ or a _view_? What happens when we try to mutate the slice result?

In [None]:
l[::-1][0] = 100

In [None]:
l

Several built-in functions can handle list objects in an intuitive way:

In [None]:
len(l), sum(l)

Once created, we can append elements to a list using the `append()` _method_. Note the different syntax using the `.` operator, compared to using the built-in functions. The next module about Object-Oriented Programming will provide more details about the difference between functions and methods.

In [None]:
k = [1, 2, 3]
k.append(4)
k

The `+` operator is used to concatenate two lists:

In [None]:
k + [10, 11]

Besides `append()`, what other methods for our list object can we use? The `dir()` built-in function provides a list of all the _attributes_ of an object.

In [None]:
dir(l)

In [None]:
dir(42)

In [None]:
dir(sum)

### Tuple

A tuple is a container object very similar to a list. It is created using parentheses instead of square brackets.

In [None]:
t = (1, 2, 3)

In [None]:
t1 = (1)
t1, type(t1)

In [None]:
t1 = (1,)
t1, type(t1)

Tuples are indexed and sliced just like lists:

In [None]:
t[1]

In [None]:
t[:2]

But modifying tuples seems to be problematic:

In [None]:
t[1] = 42

In [None]:
t.append(4)

The above shows the difference between tuples and lists: tuples are _immutable_. Once created, it is not allowed to reassign elements to new objects or append elements to tuples.

__Question__: If tuples are immutable, why is it ok to concatenate them, as shown below?

In [None]:
t + (10, 11)

Also note that immutability only goes as far as not allowing to mutate the tuple itself. If the tuple is a container for objects that are mutable themselves (such as lists, see the example below), these mutable contained objects can still be changed.

In [None]:
t = ([1, 2], [3, 4], [5, 6])
t[1]

In [None]:
t[1][0] = 99
t

### Set

Sets are collections of unique values. When constructing them from any collection of non-unique values, de-duplication is automatic:

In [None]:
s = {'a', 'b', 'c', 'b'}

In [None]:
s

In [None]:
s = set(['a', 'b', 'c', 'b'])
s

Sets can be mutated by adding elements:

In [None]:
s.add('d')
s

A set's elements do not have order, so indexing does not make much sense:

In [None]:
s[1]

Some useful (and non-mutating) operations on sets are _union_, _intersection_ and _difference_:

In [None]:
s = {'a', 'b', 'c'}
s | {'a', 'd', 'b'}

In [None]:
s & {'a', 'd', 'b'}

In [None]:
s - {'a', 'd', 'b'}

### Dictionary

Dictionaries are used to efficiently look up values based on keys

In [None]:
d = {'foo': 42, 'bar': 21}

Using square brackets, the value for the corresponding key is returned:

In [None]:
d['foo']

In [None]:
d['baz']

To prevent `KeyError`s for non-existing keys, there is a `get()` method that allows us to define default fall-back return values.

In [None]:
d.get('baz', 99)

As with lists, we can assign a new value to a key using the `[]` notation:

In [None]:
d['bar'] = 99
d

The above also works for keys that don't (yet) exist in the dictionary:

In [None]:
d['baz'] = 21
d

The union of two dicts is similar to sets:

In [None]:
d | {'x': 1}

An alternative (pre Python 3.9) approach to merge two dicts (these `**` operators are explained later):

In [None]:
{**d, **{'x': 1}}

There's also a mutating `update()` method:

In [None]:
d.update({'x': 1})

And the following methods allow to iterate over a dict's keys, values and items:

In [None]:
d.keys(), d.values(), d.items()

### String

A string can be seen as a special kind of list that holds characters:

In [None]:
'foo'[2]

In [None]:
'foo' + 'bar'

In [None]:
b = 'bar'
'foo' + b + 'baz'

In [None]:
b = 42
'foo' + b + 'baz'

In modern Python (> 3.5), the most convenient way to "glue together" (a.k.a. "formatting") strings with data from variables is to use "f-Strings":

In [None]:
f'foo{b}baz'

For more details about these, see [this great article](https://realpython.com/python-f-strings/) at Real Python.

And, by the way, Strings are immutable:

In [None]:
b = 'bar'
b[1] = '*'

### Useful Keywords and Builtin Functions

In [None]:
l, 1 in l, 101 in l

In [None]:
w = 'foobar'

In [None]:
'f' in w, 'oo' in w, 'x' in w

In [None]:
s, len(s)

In [None]:
t = (1, 2, 3)
t, sum(t), max(t)

In [None]:
all([True, False, True]), any([False, True, False])

When putting objects of different types in containers, prepare to meet unexpected behavior:

In [None]:
sum([1, 2, 'a'])

## Equality

Python has two different checks of object equality:

In [None]:
l1 = [1, 2, 3]
l2 = l1

Equality by _value_ is checked using the `==` operator:

In [None]:
l1 == l2

Equality by _identity_ is checked using the `is` keyword. Objects that are equal by identity do not just have the same value, but are the _exact same object_.

In [None]:
l1 is l2, id(l1) == id(l2)

In [None]:
l1 = [1, 2, 3]
l2 = [1, 2, 3]

In [None]:
l1 == l2

In [None]:
l1 is l2

Even though `l1` and `l2` are not the same object, their elements can be!

In [None]:
l1[0] is l2[0]

In [None]:
a = 999
b = 999
a is b

For some values of immutable types, Python _may_ reuse objects for performance reasons:

In [None]:
a = 42
b = 42
a is b

In [None]:
a = 'foobar'
b = 'foobar'
a is b

In [None]:
a = '_'.join([str(i) for i in range(100)])
b = '_'.join([str(i) for i in range(100)])
a is b

## Mutation or No Mutation Quiz

In [None]:
def foo(l):
    l = [1, 2, 3]
    return l

In [None]:
l1 = [42, 99]
foo(l1)

__Question__: Has `l1` changed after calling `foo()`?

In [None]:
def bar(l):
    l[0] = 99
    return l

In [None]:
l2 = [1, 2, 3]
bar(l2)

__Question__: Has `l2` changed after calling `bar()`?

## Packing / Unpacking

Any object that holds a collection (such as instances of the container types discussed above) and allows iteration over its elements is called an Iterable. The elements of an Iterable can be _unpacked_ by assigning them to multiple variables in a single statement.

In [None]:
a, b = [1, 2]

In [None]:
a, b

In [None]:
(a, b) = [1, 2]
a, b

If you don't know how many values you can expect, you can use the prefix `*` operator to capture a variable number of elements into a single variable:

In [None]:
a, *b, = [1, 2, 3]
a, b

In [None]:
a, *b, c = [1, 2, 3]
a, b, c

We can use this operator to unpack an Iterable into function arguments:

In [None]:
def foo(x, y, z):
    print(f'x: {x}, y: {y}, z: {z}')

In [None]:
foo(*[1, 2, 3])

... or pass a variable number of arguments to a function:

In [None]:
def bar(x, *args):
    print(f'x: {x}')
    print(f'args: {args}')

In [None]:
bar(*[1, 2, 3])

There's a similar `**` prefix operator that unpacks dictionaries into keyword arguments:

In [None]:
def baz(x, *args, **kwargs):
    print(f'x: {x}, args: {args}, kwargs: {kwargs}')

In [None]:
baz(42, *[1, 2, 3], **{'f': 42, 'g': 99})

In [None]:
baz(*[1, 2, 3], *[4, 5], **{'f': 42}, **{'g': 99})

The unpacking operators can also be used to merge several Iterables:

In [None]:
[*[1, 2, 3], *[4, 5]]

In [None]:
{**{'a': 1, 'b': 2}, **{'c': 3}}

## Control Structures

Although Python has the necessary [control flow constructs](https://docs.python.org/3/reference/compound_stmts.html), they don't need to be used in most cases. Often (list) comprehensions (explained later) offer more readable alternatives.

In [None]:
x = 42

if x < 100:
    y = 1
elif x < 200:  # elif is optional
    y = 2
else:  # else is optional
    y = 3

y

Instead of writing an explicit control flow as above, the same behavior can be achieved using an expression:

In [None]:
y = 1 if x < 100 else 2
y

In [None]:
# This is an anti-pattern!
even_numbers = 0

for x in [1, 2, 3]:
    even_numbers += 1 if x % 2 == 0 else 0

even_numbers

Typical filtering, mapping, or reducing applications are often more readable using comprehensions. Manually iterating over elements to apply some operation to them is considered not Pythonic. One of the valid use cases for explicit `for` or `while` loops is when every iteration involves some side-effect such as reading from a file or database:

In [None]:
def read_user_data(user_id):
    # reads user data from file or database
    return [42, 99]

user_ids = [1, 2, 3]

for user_id in user_ids:
    user_items = read_user_data(user_id)
    if len(user_items) > 100:
        big_user = user_id
        break
else:  # runs whenever loop iterates through all elements and doesn't break
    big_user = None
    

## Comprehensions

Comprehensions are expressions for building new Iterables from existing ones and are best explained using an example:

In [None]:
[elt * 2 for elt in [1, 2, 3]]

The general syntax is `[build_expression FOR var IN iterable IF filter_expression]`, where `build_expression` and `filter_expression` can refer to `var`.

In [None]:
[elt * 2 for elt in [1, 2, 3] if elt <= 2]

The iterable in a comprehension can have nested iterables:

In [None]:
user_tuples = (
    # user_id, name, items_bought
    (123, 'Ren', 42),
    (456, 'Stimpy', 99)
)

{user[0]: {'name': user[1], 'items_bought': user[2]} for user in user_tuples}

... which can be unpacked and assigned to meaningful variable names:

In [None]:
{
    user_id: {'name': user_name, 'items_bought': items_bought}
    for user_id, user_name, items_bought in user_tuples
}

Note the use of `{}` for the last comprehensions above: these are _dict comprehensions_. A final example of a dict comprehension using a filter:

In [None]:
{
    user_id: {'name': user_name, 'items_bought': items_bought}
    for user_id, user_name, items_bought in user_tuples
    if items_bought < 50
}

## Exercises

The `assert` keyword makes Python evaluate the expression following it, raise an `AssertionError` when it evaluates to `False`, or pass silently when evaluating to `True`:

In [None]:
assert 1 < 2, 'That was unexpected!'

In [None]:
assert 1 > 2, 'That was unexpected!'

Create a function `largest_number_divisible_by_n()` that accepts as arguments a list of integers `numbers` and an integer `n`. It should return the largest number in `numbers` that is divisible by `n`. Use the assertions below to test your implementation.

In [None]:
# Your solution:

In [None]:
# %load solutions/largest_number_divisible_by_n.py

In [None]:
assert largest_number_divisible_by_n([1, 2, 3, 4], 2) == 4
assert largest_number_divisible_by_n([20, 99, 100, 101], 10) == 100

Create a function `prefix_names()` that accepts as arguments an Iterable of users `users` such as `user_tuples` above, and optional keyword arguments `FIRST_LETTER_OF_NAME=PREFIX`. It should return a list of the names in `users` with every name that matches one of the keyword arguments being prefixed with the prefix for that keyword argument.

In [None]:
# Your solution:

In [None]:
# %load solutions/prefix_names.py

In [None]:
assert prefix_names(user_tuples, R='--') == ['--Ren', 'Stimpy']

Create a function `variance()` that accepts as argument an Iterable of numbers `numbers` and returns the (biased) variance of that sample, i.e., $\frac{1}{n}\sum_{i=1}^{n}(x_i - \mu)^2$

In [None]:
# Your solution:

In [None]:
# %load solutions/variance.py

In [None]:
assert variance([1, 2, 3, 10]) == 12.5

_Bonus_: Make sure that your implementation of `variance()` can handle zero-length inputs or an optional keyword argument to indicate that the unbiased variance should be returned (i.e. the sum of squared differences from the mean is divided by $n - 1$). What would be reasonable return values for these edge cases?