<img src="img/dsci511_header.png" width="600">

# Lecture 2: Loops & functions

## Lecture learning objectives

- Write `for` and `while` loops in Python
- Identify iterable data types which can be used in `for` loops
- Create a `list`, `dictionary`, or `set` using comprehension
- Write a `try`/`except` statement
- Define a function and an anonymous function in Python
- Describe the difference between positional and keyword arguments
- Describe the difference between local and global arguments
- Apply the `DRY principle` to write modular code
- Assess whether a function has side effects
- Write a docstring for a function that describes parameters, return values, behaviour and usage

## `for` loops

- For loops allow us to execute code a specific number of times.

In [1]:
for n in [2, 7, -1, 5]:
    print(f"The number is {n} and its square is {n**2}")

print("I'm outside the loop!")

The number is 2 and its square is 4
The number is 7 and its square is 49
The number is -1 and its square is 1
The number is 5 and its square is 25
I'm outside the loop!


The main points to notice:

* Keyword `for` begins the loop. Colon `:` ends the first line of the loop.
* The indented block of code is executed for each value in the list (hence the name "for" loops)
* The loop ends after the variable `n` has taken all the values in the list
* We can iterate over any kind of **"iterable"**: `list`, `tuple`, `range`, `set`, `string`.
* An iterable is really just any object with a sequence of values that can be looped over. In this case, we are iterating over the values in a list.

In [2]:
word = "Python"
for letter in word:
    print("Gimme a " + letter + "!")

print(f"What's that spell?!! {word}!")

Gimme a P!
Gimme a y!
Gimme a t!
Gimme a h!
Gimme a o!
Gimme a n!
What's that spell?!! Python!


- A very common pattern is to use `for` with the `range` object. 
- `range` gives you a sequence of integers up to some value (non-inclusive of the end-value) and is typically used for looping.

In [3]:
range(10)

range(0, 10)

In [4]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [5]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


- We can also specify a start value and a step value with `range`:

In [6]:
for i in range(1, 101, 10):
    print(i)

1
11
21
31
41
51
61
71
81
91


- We can write a loop inside another loop to iterate over multiple dimensions of data. Consider the following loop as enumerating the coordinates in a 3 by 3 grid of points.

In [7]:
for x in [1, 2, 3]:
    for y in ["a", "b", "c"]:
        print((x, y))

(1, 'a')
(1, 'b')
(1, 'c')
(2, 'a')
(2, 'b')
(2, 'c')
(3, 'a')
(3, 'b')
(3, 'c')


In [8]:
list_1 = [0, 1, 2]
list_2 = ["a", "b", "c"]

In [9]:
for i in range(3):
    print(list_1[i], list_2[i])

0 a
1 b
2 c


- There are many clever ways of doing these kinds of things in Python

- `zip()` returns a zip object which is an iterable of tuples

In [10]:
for i in zip(list_1, list_2):
    print(i)

(0, 'a')
(1, 'b')
(2, 'c')


- We can "unpack" this tuples directly in the `for` loop:

In [11]:
for i, j in zip(list_1, list_2):
    print(i, j)

0 a
1 b
2 c


- `enumerate()` adds a counter to an iterable 

In [12]:
for i in enumerate(list_2):
    print(i)

(0, 'a')
(1, 'b')
(2, 'c')


- It is also possible to extract the counter and the value:

In [13]:
for n, i in enumerate(list_2):
    print(f"index {n}, value {i}")

index 0, value a
index 1, value b
index 2, value c


- We can loop through key-value pairs of a dictionary using `.items()`
- The general syntax is `for key, value in dictionary.items()`

In [14]:
courses = {521: "awesome 🤗",
           551: "riveting 🤯",
           511: "naptime 😄"}

In [15]:
for course_num, description in courses.items():
    print(f"DSCI {course_num}, is {description}")

DSCI 521, is awesome 🤗
DSCI 551, is riveting 🤯
DSCI 511, is naptime 😄


- We can even use `enumerate()` to do more complex un-packing:

In [16]:
for n, (course_num, description) in enumerate(courses.items()):
    print(f"Item {n}: DSCI {course_num}, is {description}")

Item 0: DSCI 521, is awesome 🤗
Item 1: DSCI 551, is riveting 🤯
Item 2: DSCI 511, is naptime 😄


## `while` loops

- We can also use a [`while` loop](https://docs.python.org/3/reference/compound_stmts.html#while) to execute a block of code until a condition becomes `False`.
- Beware! If the conditional expression is always `True`, then you've got an infinite loop!
>(Use the "Stop" button in the toolbar above, or `Ctrl+C` in the terminal, to kill the program if you get an infinite loop.)

In [17]:
n = 10
while n > 0:
    print(n)
    n -= 1

print("Lift off!")

10
9
8
7
6
5
4
3
2
1
Lift off!


- We can read the `while` statement above as if it were English.
- It means, 
> _While n is greater than 0, display the value of n and then decrement n by 1. When you get to 0, display "Lift off!"_
- But for some loops, it's hard to tell when, or if, they will stop!

- For example, here is the [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture):
> Pick a starting positive integer $ n $. Next terms are obtained as follows: if the previous term is even, the next term is one half of the previous term. If the previous term is odd, the next term is 3 times the previous term plus 1

  - The conjecture states that no matter what positive integer $ n $ we start with, the sequence will always eventually reach 1
  - Nobody has been able to prove or disprove this yet!

In [18]:
n = 5

while n != 1:
    print(n)
    if n % 2 == 0:  # n is even
        n = n // 2
    else:  # n is odd
        n = n * 3 + 1

print(n)

5
16
8
4
2
1


- In some cases, you may want to force a `while` loop to stop based on some criteria, using the `break` keyword

In [19]:
n = 123
i = 0

while n != 1:
    print(n)
    if n % 2 == 0:  # n is even
        n = n // 2
    else:  # n is odd
        n = n * 3 + 1
    i += 1
    if i == 10:
        print(f"Ugh, too many iterations!")
        break

123
370
185
556
278
139
418
209
628
314
Ugh, too many iterations!


- The `continue` keyword is similar to `break` but won't stop the loop
- Instead, `continue` ignores what comes after it, and goes to the next iteration.

In [20]:
n = 10
while n > 0:
    if n % 2 != 0:  # n is odd
        n -= 1
        continue
        print('Can anyone hear me?')  # this line is never executed
    print(n)
    n -= 1

print("Lift off!")

10
8
6
4
2
Lift off!


## Comprehensions

- Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code.
- I use these quite a bit!
- Here is a standard for loop you might use to iterate over an iterable and create a list

In [21]:
subliminal = ['Toby', 'ingests', 'many', 'eggs', 'to', 'outrun',
              'large', 'eagles', 'after', 'running', 'near', '!']

first_letters = []

for word in subliminal:
    first_letters.append(word[0])

print(first_letters)

['T', 'i', 'm', 'e', 't', 'o', 'l', 'e', 'a', 'r', 'n', '!']


- List comprehension allows us to do this in one line

In [22]:
letters = [word[0] for word in subliminal]  # list comprehension
letters

['T', 'i', 'm', 'e', 't', 'o', 'l', 'e', 'a', 'r', 'n', '!']

- We can make things more complicated by doing multiple iteration or conditional iteration

In [23]:
[(i, j) for i in range(3) for j in range(4)]

[(0, 0),
 (0, 1),
 (0, 2),
 (0, 3),
 (1, 0),
 (1, 1),
 (1, 2),
 (1, 3),
 (2, 0),
 (2, 1),
 (2, 2),
 (2, 3)]

- Condition the iterator to select even numbers only:

In [24]:
[i for i in range(11) if i % 2 == 0]

[0, 2, 4, 6, 8, 10]

In [25]:
[-i if i % 2 else i for i in range(11)]

[0, -1, 2, -3, 4, -5, 6, -7, 8, -9, 10]

Note that:
> - if you only want to keep certain elements, put the condition at the end.
> - If you need all elements in the list, put the condition at the beginning. **Having an `else` statement is mandatory in this case!**

- There is also set comprehension:

In [26]:
words = ['hello', 'goodbye', 'the', 'antidisestablishmentarianism']
y = {word[-1] for word in words}  # set comprehension
y

{'e', 'm', 'o'}

We see only 3 elements because a set contains only unique items and there would have been two "e"s.

- Dictionary comprehension:

In [27]:
words = ['hello', 'goodbye', 'the', 'antidisestablishmentarianism']
word_lengths = {word: len(word) for word in words}  # dictionary comprehension
word_lengths

{'hello': 5, 'goodbye': 7, 'the': 3, 'antidisestablishmentarianism': 28}

- tuple comprehension doesn't work as you might expect...
- We get a generator instead (more on that later)

In [28]:
y = (word[-1] for word in words)  # this is NOT a tuple comprehension
print(y)

<generator object <genexpr> at 0x10c1a9850>


- tuple generation is possible using the following methodology:

In [29]:
y = tuple(word[-1] for word in words)
y

('o', 'e', 'e', 'm')

## `try` / `except`

- If something goes wrong, we don't want our code to crash - we want it to **fail gracefully**.
- In Python, this can be accomplished using `try`/`except` statements

Here is a basic example:

In [30]:
this_variable_does_not_exist
print("Another line")  # code fails before getting to this line

NameError: name 'this_variable_does_not_exist' is not defined

In [31]:
try:
    this_variable_does_not_exist
except:
    pass  # do nothing
    print("You did something bad! But I won't raise an error.")

You did something bad! But I won't raise an error.


- Python tries to execute the code in the `try` block.
- If an error is encountered, we "catch" this in the `except` block (also called `try`/`catch` in other languages).
- There are many different error types, or **exceptions** - we saw `NameError` above. 

In [32]:
5 / 0  # ZeroDivisionError

ZeroDivisionError: division by zero

In [33]:
my_list = [1, 2, 3]
my_list[5]  # IndexError

IndexError: list index out of range

In [34]:
my_tuple = (1, 2, 3)
my_tuple[0] = 0  # TypeError

TypeError: 'tuple' object does not support item assignment

- Ok, so there are apparently a bunch of different errors one could run into. 
- With `try`/`except` you can also catch the exception itself:

In [35]:
try:
    this_variable_does_not_exist
except Exception as ex:
    print("You did something bad!")
    print(ex)
    print(type(ex))

You did something bad!
name 'this_variable_does_not_exist' is not defined
<class 'NameError'>


- In the above, we caught the exception and assigned it to the variable `ex` so that we could print it out.
- This is useful because you can see what the error message would have been, without crashing your program.

- You can also catch specific exceptions types
- This is typically the recommended way to catch errors, you want to be specific in catching your error so you know exactly where and why your code failed.

In [40]:
try:
    this_variable_does_not_exist  # name error
#     (1, 2, 3)[0] = 1  # type error
#     5/0  # ZeroDivisionError
except TypeError:
    print("You made a type error!")
except NameError:
    print("You made a name error!")
except:
    print("You made some other sort of error")

You made a name error!


- The last `except` would trigger if the error is none of the above types.
- There is also an optional `else` and `finally` keyword, read more [here](https://docs.python.org/3/tutorial/errors.html)

In [37]:
try:
    1 + 100
    this_variable_does_not_exist
except:
    print("The variable does not exist!")
else:
    print("I'm `else`, did someone call me?")
finally:
    print("I'm printing anyway!")

The variable does not exist!
I'm printing anyway!


- The `finally` clause will always get executed.
- We can also write code that raises an exception on purpose, using `raise`

In [38]:
def add_one(x):  # we'll get to functions in the next section
    return x + 1

In [39]:
add_one("blah")

TypeError: can only concatenate str (not "int") to str

In [41]:
def add_one(x):
    if not isinstance(x, float) and not isinstance(x, int):
        raise TypeError(f"Sorry, x must be numeric, you entered a {type(x)}.")

    return x + 1

In [42]:
add_one("blah")

TypeError: Sorry, x must be numeric, you entered a <class 'str'>.

- This is useful when your function is complicated and would fail in a complicated way, with a weird error message.
- You can make the cause of the error much clearer to the _caller_ of the function.
- Thus, your function is more usable this way.
- If you do this, you should ideally describe these exceptions in the function documentation, so a user knows what to expect if they call your function.  

- Finally, we can even define our own exception types, as you'll do in lab 2.
- We do this by inheriting from the `Exception` class (more on classes and inheritance next lecture!)

In [43]:
class CustomAdditionError(Exception):
    pass

In [44]:
def add_one(x):
    if not isinstance(x, float) and not isinstance(x, int):
        raise CustomAdditionError("Sorry, x must be numeric")

    return x + 1

In [45]:
add_one("blah")

CustomAdditionError: Sorry, x must be numeric

## Functions

- Define a [**function**](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) to re-use a block of code with different input parameters, also known as **arguments**.
- Function definition syntax:

```python
def function(arg1, arg2, ...):
    # do something
    output = ...
    return output
```
* Functions begin with the `def` keyword, then the function name, arguments in parentheses, and then a colon (`:`)
* The function block defined by indentation
* Output or "return" value of the function is given by the `return` keyword

- For example, suppose that we want to compute the probability density function of the [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution), which is given by:
$$
f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2}
$$
Let's assume that we want to compute $ f(2) $ for a mean of $ \mu = 2.5 $ and variance of $ \sigma = 0.3 $:

In [46]:
import math

(1 / (0.3 * (2 * math.pi)**0.5)) * math.exp(-0.5 * ((2 - 2.5) / 0.3)**2)

0.3315904626424956

- With a function, we can abstract things and avoid repetition:

In [47]:
def pdf_normal(x, μ, σ):
    prefactor = (1 / (σ * (2 * math.pi)**0.5))
    exp_value = math.exp(-0.5 * ((x - μ) / σ)**2)
    pdf = prefactor * exp_value
    return pdf

In [48]:
pdf_normal(2, 2.5, 0.3)

0.3315904626424956

In [49]:
pdf_normal(1, 0, 1)

0.24197072451914337

### Side effects & local variables

- When you create a variable inside a function, it is local, which means that it only exists inside the function. For example:

In [50]:
def cat_string(str1, str2):
    string = str1 + str2
    return string

In [51]:
cat_string('My name is ', 'Arman')

'My name is Arman'

In [52]:
string

NameError: name 'string' is not defined

- If a function changes the variables passed into it, then it is said to have **side effects**
- Example:

In [53]:
def silly_sum(my_list):
    my_list.append(8.5)
    return sum(my_list)

In [54]:
nums = [1, 2, 3, 4]
out = silly_sum(nums)
out

18.5

- Looks like what we wanted.
- But wait... it changed our `nums` object...

In [55]:
nums

[1, 2, 3, 4, 8.5]

- If your function has side effects like this, you must mention it in the documentation (later today).

### `None` return type

- If you do not specify a return value, the function returns `None` when it terminates:

In [56]:
def f(x):
    x + 1  # no return!
    if x == 999:
        print('x = 999!')


print(f(0))

None


### Optional & required arguments

- Sometimes it is convenient to have _default values_ for some arguments in a function. 
- Because they have default values, these arguments are optional, hence "optional arguments"
- Example:

In [57]:
def repeat_string(s, n=2):
    return s*n

In [58]:
repeat_string("mds", 2)

'mdsmds'

In [59]:
repeat_string("mds", 5)

'mdsmdsmdsmdsmds'

In [60]:
repeat_string("mds")  # do not specify `n`; it is optional

'mdsmds'

- Ideally, the default should be carefully chosen. 
- Here, the idea of "repeating" something makes me think of having 2 copies, so `n=2` feels like a sane default.

- You can have any number of required arguments and any number of optional arguments
- All the optional arguments must come after the required arguments
- The required arguments are mapped by the order they appear
- The optional arguments can be specified out of order

In [61]:
def example(a, b, c="DEFAULT", d="DEFAULT"):
    print(a, b, c, d)


example(1, 2, 3, 4)

1 2 3 4


- Using the defaults for `c` and `d`:

In [62]:
example(1, 2)

1 2 DEFAULT DEFAULT


- Specifying `c` and `d` as **keyword arguments** (i.e. by name):

In [63]:
example(1, 2, c=3, d=4)

1 2 3 4


- Specifying only one of the optional arguments, by keyword:

In [64]:
example(1, 2, c=3)

1 2 3 DEFAULT


- Specifying all the arguments as keyword arguments, even though only `c` and `d` are optional:

In [65]:
example(a=1, b=2, c=3, d=4)

1 2 3 4


- Specifying `c` by the fact that it comes 3rd (I do not recommend this because I find it is confusing):

In [66]:
example(1, 2, 3)

1 2 3 DEFAULT


- Specifying the optional arguments by keyword, but out of order:

In [67]:
example(1, 2, d=4, c=3)

1 2 3 4


- Specifying the non-optional arguments by keyword (I am fine with this):

In [68]:
example(a=1, b=2)

1 2 DEFAULT DEFAULT


- Specifying the non-optional arguments by keyword, but in the wrong order (not recommended, I find it confusing):

In [69]:
example(b=2, a=1)

1 2 DEFAULT DEFAULT


- Specifying keyword arguments before non-keyword arguments (this throws an error):

In [70]:
example(a=2, 1)

SyntaxError: positional argument follows keyword argument (1657783790.py, line 1)

- In general, I am used to calling non-optional arguments by order, and optional arguments by keyword.
- The language allows us to deviate from this, but it can be unnecessarily confusing sometimes.

### Multiple return values

- In many programming languages, functions can only return one object
- That is technically true in Python, but there is a "workaround", which is to return a tuple.

In [71]:
def sum_and_product(x, y):
    return (x + y, x * y)

In [72]:
sum_and_product(5, 6)

(11, 30)

- The parentheses can be omitted in this case, and a `tuple` is implicitly returned as defined by the use of the comma: 

In [73]:
def sum_and_product(x, y):
    return x + y, x * y

In [74]:
sum_and_product(5, 6)

(11, 30)

- It is common to immediately unpack a returned tuple into separate variables, so it really feels like the function is returning multiple values:

In [75]:
s, p = sum_and_product(5, 6)

In [76]:
s

11

In [77]:
p

30

- As an aside, it is conventional in Python to use `_` for values you don't want:

In [78]:
s, _ = sum_and_product(5, 6)

In [79]:
s

11

In [80]:
_

11

### Advanced stuff

- You can also call/define functions that accept an arbitrary number of positional or keyword arguments using `*args` and `**kwargs`. See, e.g. [here](https://realpython.com/python-kwargs-and-args/)

In [81]:
def add(*args):
    print(args)
    return sum(args)

In [82]:
add(1, 2, 3, 4, 5, 6)

(1, 2, 3, 4, 5, 6)


21

In [83]:
def add(**kwargs):
    print(kwargs)
    return sum(kwargs.values())

In [84]:
add(a=3, b=4, c=5)

{'a': 3, 'b': 4, 'c': 5}


12

- Do not instantiate objects (like empty lists) in the function definition - see [here](https://docs.python-guide.org/writing/gotchas/) under "Mutable Default Arguments"

In [85]:
def example(a, b=[]):  # don't do this!
    b.append(a)
    return b

In [86]:
example(1)

[1]

In [87]:
example(2)  # the list inside the function persists and got appended to!

[1, 2]

In [88]:
def example(a, b=None):  # instead, do this
    if b is None:
        b = []
    b.append(a)
    return b

In [89]:
example(1)

[1]

In [90]:
example(2)

[2]

## Functions as a data type

- In Python, functions are a data type just like anything else. 

In [91]:
def do_nothing(x):
    return x

In [92]:
type(do_nothing)

function

In [93]:
print(do_nothing)

<function do_nothing at 0x10d08aa70>


- This means you can pass functions as arguments into other functions.

In [94]:
def square(y):
    return y**2


def evaluate_function_on_x_plus_1(fun, x):
    return fun(x+1)

In [95]:
evaluate_function_on_x_plus_1(square, 5)

36

- Above: what happened here?
  - `fun(x+1)` becomes `square(5+1)`
  - `square(6)` becomes `36`

- You can also write functions that return functions, or define functions inside of other functions.
- We'll see examples of this when we get to classes & decorators

## Anonymous functions

- There are two ways to define functions in Python:

In [96]:
def add_one(x):
    return x+1

In [97]:
add_one(7.2)

8.2

In [98]:
lambda x: x+1

<function __main__.<lambda>(x)>

In [99]:
type(lambda x: x+1)

function

In [100]:
(lambda x: x+1)(7.2)

8.2

- The two approaches above are identical. The one with `lambda` is called an **anonymous function**.
- Anonymous functions can only take up one line of code, so they aren't appropriate in most cases, but can be useful for smaller things

In [101]:
evaluate_function_on_x_plus_1(lambda x: x ** 2, 5)

36

Above:

- First, `lambda x: x**2` evaluates to a value of type `function`
  - Notice that this function is never given a name - hence "anonymous functions" !
- Then, the function and the integer `5` are passed into `evaluate_function_on_x_plus_1`
- At which point the anonymous function is evaluated on `5+1`, and we get `36`.

- Anonymous functions can have multiple arguments, as well as multiple outputs:

In [102]:
(lambda x, y: (x+y, x-y, x**y))(5, 2)

(7, 3, 25)

## DRY principle: designing good functions

- DRY: **Don't Repeat Yourself**
- See [Wikipedia article](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)
- Consider the task of, for each element of a list, turning it into a palindrome
  - e.g. "mike" => "mikeekim"

In [103]:
names = ["milad", "arman", "tiffany"]

In [104]:
name = "arman"
name[::-1]  # creates a slice that starts at the end and moves backwards, syntax is [begin:end:step]

'namra'

In [105]:
names_backwards = list()

names_backwards.append(names[0] + names[0][::-1])
names_backwards.append(names[1] + names[1][::-1])
names_backwards.append(names[2] + names[2][::-1])
names_backwards

['miladdalim', 'armannamra', 'tiffanyynaffit']

- Above: this is gross and terrible coding:
  1. It only works for a list with 3 elements
  2. It only works for a list named `names`
  3. If we want to change its functionality, we need to change 3 similar lines of code (Don't Repeat Yourself!!)
  4. It is hard to understand what it does just by looking at it

In [106]:
names_backwards = list()

for name in names:
    names_backwards.append(name + name[::-1])

names_backwards

['miladdalim', 'armannamra', 'tiffanyynaffit']

- Above: this is slightly better. We have solved problems (1) and (3).
- But let's create a function to make our life easier

In [107]:
def make_palindromes(names):
    names_backwards = []

    for name in names:
        names_backwards.append(name + name[::-1])

    return names_backwards

In [108]:
make_palindromes(names)

['miladdalim', 'armannamra', 'tiffanyynaffit']

- Above: this is even better. We have now also solved problem (2), because you can call the function with any list, not just `names`. 
- For example, what if we had multiple _lists_:

In [109]:
names1 = ["milad", "arman", "tiffany"]
names2 = ["apple", "orange", "banana"]

In [110]:
make_palindromes(names1)

['miladdalim', 'armannamra', 'tiffanyynaffit']

In [111]:
make_palindromes(names2)

['appleelppa', 'orangeegnaro', 'bananaananab']

### Designing good functions

- How far you go and how you choose to apply the DRY principle is up to you and the programming context
- These decisions are often ambiguous. For example: 
  - Should `make_palindromes` be a function if I'm only ever doing it once? Twice?
  - Should the loop be inside the function, or outside?
  - Or should there be TWO functions, one that loops over the other??

- In my personal opinion, `make_palindromes` does a bit too much to be understandable.
- I prefer this:

In [112]:
def make_palindrome(name):
    return name + name[::-1]

In [113]:
make_palindrome("milad")

'miladdalim'

- From here, we want to "apply `make_palindrome` to every element of a list"
- We could do this with list comprehension

In [114]:
[make_palindrome(name) for name in names]

['miladdalim', 'armannamra', 'tiffanyynaffit']

- Or there is also the in-built `map()` function which does exactly this, applies a function to every element of a sequence

In [115]:
list(map(make_palindrome, names))

['miladdalim', 'armannamra', 'tiffanyynaffit']

Other function design considerations:

- Should we print output or produce plots inside or outside functions? 
  - I would usually say outside, because this is a "side effect" of sorts
  - Although there are certainly cases where I do plot or print within a function
  - In these cases I usually add a function argument such as `plot=False` or `verbose=0` that allows users to control this behaviour.
- Should the function do one thing or many things?
  - This is a tough one, hard to answer in general, depends on the situation and programming style

## Generators

- Recall list comprehension from earlier in the lecture

In [116]:
[n for n in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

- Comprehensions evaluate the entire expression at once, and then return the full data product
- Sometimes, we want to work with just one part of our data at a time, for example, when we can't fit all of our data in memory (I'll show an example of this a little later)
- For this, we can use generators (you'll see more of these when we get to DSCI 572!)

In [117]:
(n for n in range(10))

<generator object <genexpr> at 0x10d04b220>

- Notice that we just created a `generator object`
- Generator objects are like a "recipe" for generating values
- They don't actually do any computation until they are asked to
- We can get values from a generator in three main ways:
    - Using `next()`
    - Using `list()`
    - Looping

In [118]:
gen = (n for n in range(10))

In [119]:
next(gen)

0

In [120]:
next(gen)

1

- But once the generator is exhausted, it will no longer return values:

In [121]:
gen = (n for n in range(10))
for i in range(11):
    print(next(gen))

0
1
2
3
4
5
6
7
8
9


StopIteration: 

- We can see all the values of a generator using `list()` but this defeats the purpose of using a generator in the first place

In [122]:
gen = (n for n in range(10))
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

- Finally, we can loop over generator objects too

In [123]:
gen = (n for n in range(10))
for i in gen:
    print(i)

0
1
2
3
4
5
6
7
8
9


- Above, we saw how to create a generator object using comprehension syntax but with parentheses
- We can also create a generator using functions and the `yield` keyword (instead of the `return` keyword)

In [124]:
def gen():
    for n in range(10):
        yield (n, n ** 2)

In [125]:
g = gen()
print(next(g))
print(next(g))
print(next(g))
list(g)

(0, 0)
(1, 1)
(2, 4)


[(3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81)]

- We'll work with generators more when we get to 572 and other ML courses where we are often working with large datasets (images are especially memory-consuming!)
- But so you keep them in the back of your mind, below is some real-world motivation of a case where a generator might be useful
- Say we want to create a list of dictionaries containing information about houses in Canada

In [126]:
# !conda install -y memory_profiler

In [131]:
import random  # we'll learn about imports next lecture
import time
import memory_profiler
city = ['Vancouver', 'Toronto', 'Ottawa',
        'Montreal', 'Edmonton', 'Calgary']

In [132]:
def house_list(n):
    houses = []
    for i in range(n):
        house = {
            'id': i,
            'city': random.choice(city),
            'bedrooms': random.randint(1, 5),
            'bathrooms': random.randint(1, 3),
            'price ($1000s)': random.randint(300, 1000)
        }
        houses.append(house)
    return houses

In [133]:
house_list(2)

[{'id': 0,
  'city': 'Ottawa',
  'bedrooms': 5,
  'bathrooms': 2,
  'price ($1000s)': 868},
 {'id': 1,
  'city': 'Toronto',
  'bedrooms': 4,
  'bathrooms': 3,
  'price ($1000s)': 648}]

- What happens if we want to create a list of 1,000,000 houses?
- How much time/memory will it take?

In [134]:
start = time.time()
print(f"Memory usage before: {memory_profiler.memory_usage()[0]:.0f} MB")

result_list = house_list(1_000_000)

print(f"Memory usage after: {memory_profiler.memory_usage()[0]:.0f} MB")
print(f"Time taken: {time.time() - start:.2f}s")

Memory usage before: 82 MB
Memory usage after: 382 MB
Time taken: 1.80s


In [135]:
def house_generator(n):
    for i in range(n):
        house = {
            'id': i,
            'city': random.choice(city),
            'bedrooms': random.randint(1, 5),
            'bathrooms': random.randint(1, 3),
            'price ($1000s)': random.randint(300, 1000)
        }
        yield house

In [136]:
start = time.time()
print(f"Memory usage before: {memory_profiler.memory_usage()[0]:.0f} MB")

result_gen = house_generator(1_000_000)

print(f"Memory usage after: {memory_profiler.memory_usage()[0]:.0f} MB")
print(f"Time taken: {time.time() - start:.2f}s")

Memory usage before: 382 MB
Memory usage after: 382 MB
Time taken: 0.21s


In [137]:
next(result_gen)

{'id': 0,
 'city': 'Ottawa',
 'bedrooms': 2,
 'bathrooms': 3,
 'price ($1000s)': 990}

- Although, if we used `list()` to extract all of the genertator values, we'd lose our memory savings

In [138]:
print(f"Memory usage before: {memory_profiler.memory_usage()[0]:.0f} MB")

result_gen = list(house_generator(1_000_000))

print(f"Memory usage after: {memory_profiler.memory_usage()[0]:.0f} MB")

Memory usage before: 382 MB
Memory usage after: 682 MB


## Docstrings

- One problem we never really solved when talking about writing good functions was: **"4. It is hard to understand what it does just by looking at it"**
- Enter the idea of function documentation, called "docstrings"
- The [docstring](https://www.python.org/dev/peps/pep-0257/) goes right after the `def` line and is wrapped in **triple quotes** `"""`

In [139]:
def make_palindrome(string):
    """Turns the string into a palindrome by concatenating itself with a reversed version of itself."""
    return string + string[::-1]

- In IPython/Jupyter, we can use `?` to view the documentation string of any function in our environment.

In [140]:
make_palindrome?

[0;31mSignature:[0m [0mmake_palindrome[0m[0;34m([0m[0mstring[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Turns the string into a palindrome by concatenating itself with a reversed version of itself.
[0;31mFile:[0m      /var/folders/qm/c_scj_0n7vj7r36900wc3j140000gn/T/ipykernel_69946/2916775673.py
[0;31mType:[0m      function


- But, even easier than that, if your cursor is in the function parentheses, you can use the shortcut `shift + tab` to open the docstring at will

In [141]:
make_palindrome('uncomment and try pressing shift+tab here.')

'uncomment and try pressing shift+tab here..ereh bat+tfihs gnisserp yrt dna tnemmocnu'

### Docstring structure

- General docstring convention in Python is described in [PEP 257 - Docstring Conventions](https://www.python.org/dev/peps/pep-0257/). 
- There are many different docstring style conventions used in Python.
- The exact style you use can be important for helping you to render your documentation (more on that in a later course), or for helping your IDE parse your documentation.
- Common styles include:

1. **Single-line**: If it's short, then just a single line describing the function will do (as above).
2. **reST style**: see [here](https://www.python.org/dev/peps/pep-0287/).
3. **NumPy/SciPy style**: see [here](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html). (RECOMMENDED! and MDS-preferred)
4. **Google style**: see [here](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html#example-google).

The NumPy/Scipy style:

In [142]:
def function_name(param1, param2, param3):
    """First line is a short description of the function.

    A paragraph describing in a bit more detail what the
    function does and what algorithms it uses and common
    use cases.

    Parameters
    ----------
    param1 : datatype
        A description of param1.
    param2 : datatype
        A description of param2.
    param3 : datatype
        A longer description because maybe this requires
        more explanation and we can use several lines.

    Returns
    -------
    datatype
        A description of the output, datatypes and behaviours.
        Describe special cases and anything the user needs to
        know to use the function.

    Examples
    --------
    >>> function_name(3,8,-5)
    2.0
    """

In [143]:
def make_palindrome(string):
    """Turns the string into a palindrome by concatenating
    itself with a reversed version of itself.

    Parameters
    ----------
    string : str
        The string to turn into a palindrome.

    Returns
    -------
    str
        string concatenated with a reversed version of string

    Examples
    --------
    >>> make_palindrome('arman')
    'armannamra'
    """
    return string + string[::-1]

In [144]:
make_palindrome?

[0;31mSignature:[0m [0mmake_palindrome[0m[0;34m([0m[0mstring[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Turns the string into a palindrome by concatenating
itself with a reversed version of itself.

Parameters
----------
string : str
    The string to turn into a palindrome.

Returns
-------
str
    string concatenated with a reversed version of string

Examples
--------
>>> make_palindrome('arman')
'armannamra'
[0;31mFile:[0m      /var/folders/qm/c_scj_0n7vj7r36900wc3j140000gn/T/ipykernel_69946/3201762563.py
[0;31mType:[0m      function


### Docstrings in your labs

In MDS we will accept:

- One-line docstrings for very simple functions.
- Either the PEP-8 or NumPy/SciPy style for bigger functions.
  - But we think the NumPy/SciPy style is more common in the wild so you may want to get into the habit of using it.

### Docstrings with optional arguments

- When specifying the parameters, we specify the defaults for optional arguments:

In [145]:
# NumPy/SciPy style
def repeat_string(s, n=2):
    """
    Repeat the string s, n times.

    Parameters
    ----------
    s : str
        the string
    n : int, optional
        the number of times, by default = 2

    Returns
    -------
    str
        the repeated string

    Examples
    --------
    >>> repeat_string("Blah", 3)
    "BlahBlahBlah"
    """
    return s * n

### Type hinting

- [Type hinting](https://docs.python.org/3/library/typing.html) is exactly what it sounds like, it hints at the data type of function arguments
- You can indicate the type of an argument in a function using the syntax `argument : dtype`, and the type of the return value using `def func() -> dtype`
- Let's see an example:

In [146]:
# NumPy/SciPy style
def repeat_string(s: str, n: int = 2) -> str:  # <- note the type hinting here
    """
    Repeat the string s, n times.

    Parameters
    ----------
    s : str
        the string
    n : int, optional (default = 2)
        the number of times

    Returns
    -------
    str
        the repeated string

    Examples
    --------
    >>> repeat_string("Blah", 3)
    "BlahBlahBlah"
    """
    return s * n

In [147]:
repeat_string?

[0;31mSignature:[0m [0mrepeat_string[0m[0;34m([0m[0ms[0m[0;34m:[0m [0mstr[0m[0;34m,[0m [0mn[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m2[0m[0;34m)[0m [0;34m->[0m [0mstr[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Repeat the string s, n times.

Parameters
----------
s : str
    the string
n : int, optional (default = 2)
    the number of times

Returns
-------
str
    the repeated string

Examples
--------
>>> repeat_string("Blah", 3)
"BlahBlahBlah"
[0;31mFile:[0m      /var/folders/qm/c_scj_0n7vj7r36900wc3j140000gn/T/ipykernel_69946/2733892460.py
[0;31mType:[0m      function


- Type hinting just helps your users and IDE identify dtypes and identify bugs
- It's just another level of documentation
- They do not force users to use that dtype, for example, I can still pass an `dict` to `repeat_string` if I want to:

In [148]:
repeat_string({'key_1': 1, 'key_2': 2})

TypeError: unsupported operand type(s) for *: 'dict' and 'int'

- Further, IDE's (e.g VS Code) are clever enough to even read your type hinting and warn you if you're using a different dtype in the function.
- You don't **have** to use type hinting in MDS, but it is **highly recommended** to get into the practice of doing so

### Automatically generated documentation

- As mentioned before, docstring formatting is important if you want to use standard tools for rendering your documentation into readable, accessible documents using libraries like [sphinx](http://www.sphinx-doc.org/en/master/), [pydoc](https://docs.python.org/3.7/library/pydoc.html) or [Doxygen](http://www.doxygen.nl/).
- For example: compare this [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) with this [code](https://github.com/scikit-learn/scikit-learn/blob/1495f6924/sklearn/neighbors/classification.py#L23).
- Notice the similarities? The webpage was automatically generated because the authors used standard conventions for docstrings!
- You'll have to use some string methods to extract information from a docstring in lab 1.
- The [website for this course](https://pages.github.ubc.ca/mds-2021-22/DSCI_511_py-prog_instructors/) is built with **Jupyter Book** which leverages some of the above libraries.