<a href="https://colab.research.google.com/github/gt-cse-6040/skills_oh_week_01/blob/main/week01_session01_NB01_efficiency_readability.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Efficiency and Readability, Function Development

_Main topics covered during today's session:_


This NB:
1. **Asymptotic running time (or cost)** and **cost (or work) efficiency**, e.g., algorithms that scale like $\mathcal{O}(n)$ vs. $\mathcal{O}(n^2)$
2. **Slicing notation**
3. **List comprehensions**
4. **Helper functions**


Next NB:

5. **Assert yourself:** Does my code do what I think it should?


## Example: Cumulative Sums ##

Consider a sequence of $n$ values, $[x_0, x_1, x_2, \ldots, x_{n-1}]$. Its _cumulative sum_ (or _running sum_) is 

$$[x_0, \underbrace{x_0+x_1}, \underbrace{x_0+x_1+x_2}, \ldots, \underbrace{x_0+x_1+x_2+\cdots+x_{n-1}}]$$.

$$[x_0, \underbrace{x_0+x_1}, \underbrace{x_0+x_1+x_2}, \ldots, \underbrace{x_0+x_1+x_2+\cdots+x_{n-1}}]$$.

For example, the list

```python
    [5, 3, -4, 20, 2, 9, 0, -1]
```

has the following cumulative sum:

```python
    [5, 8, 4, 24, 26, 35, 35, 34]

In [None]:
# Set a variable for the
# Running example:

x = [5, 3, -4, 20, 2, 9, 0, -1]

## A (very) naïve implementation ##

1. Let $y = [y_0, y_1, \ldots, y_{n-1}]$ denote the output sequence.
2. For each $y_i$, calculate $y_i = x_0 + x_1 + \cdots + x_i$

In [None]:
def cumulative_sum__v0(x):   # `x = [x[0], x[1], ..., x[n-1]]`
    y = []                   # Holds the output, `y = [y[0], y[1], ..., y[n-1]]`; initially empty
    for i in range(len(x)):  # Computes `y[i]` for every `i`
        s = 0                # Lines 4-6: Compute `s = x[0] + x[1] + ... + x[i]`
        for k in range(i+1):
            s += x[k]
        y.append(s)          # Line 7: `y[i] = s`
    return y

y = cumulative_sum__v0(x)
y # In Jupyter, like `print(y)`

### This implementation suffers from two problems:

1. _Efficiency_. For an $n$-element input list, its **asymptotic cost (or work)** scales like $\mathcal{O}(n^2)$. That should seem suspicious: the calculation seems simple enough, involving just $n$ inputs and $n$ outputs, and "feels like" it should only require a similar number of operations. That is, we should guess that there is a method that scales more like $\mathcal{O}(n)$.

2. _Readability_. Let's set aside the fact that it is slow and uses single-character variable names. It also has several levels of nesting and multiple variables to track. The more nesting there is and the more variables there are, the more effort a reader needs to understand what is happening.

### Revisions for readability ###

We should fix the algorithm first, but as an exercise, let's start by tackling some of the readability issues. We'll write several versions and use a few Python idioms from the bootcamp to "clean it up."

**Slices and direct iteration over values.** In `cumulative_sum__v0`, consider the loop over `k` in lines 5-6:

```python
    for k in range(i+1):
        s += x[k]
```

This loop has a higher-level interpretation: it performs a sum on the first `i+1` elements of `x`. Python has two basic constructs that can help express this higher-level idea more concisely:

1. **Slices for sublists** — A sublist of `x`, beginning at position `a` and continuing up until but _excluding_ `b` in steps of size `s`, is written compactly as `x[a:b:s]`, with appropriate default values if any of `a`, `b`, or `s` are omitted.
2. **Direct iteration over values** — Rather than iterating over positions and then indexing using the positions, you can iterate over the values of many types of collections directly.

The next version applies these two ideas.

In [None]:
def cumulative_sum__v1(x):
    y = []
    for i in range(len(x)):
        s = 0
        for e in x[:i+1]:
            s += e
        y.append(s)
    return y
        
cumulative_sum__v1(x)

**Higher-level primitives, e.g., `sum`.** Of course, the innermost loop does a sum, for which Python has a built-in function called `sum`. Remembering or learning how to discover such functions, whether by memory, reading documentation, or "just-in-time" Google searching, is a skill you should be building.

In [None]:
def cumulative_sum__v2(x):
    y = []
    for i in range(len(x)):
        s = sum(x[:i+1])
        y.append(s)
    return y

cumulative_sum__v2(x)

The pair of lines, `s = ...` and `y.append(s)` could be condensed as `y.append(sum(x[:i+1]))`. However, this compact form is arguably less readable.

In [None]:
def cumulative_sum__v2b(x):
    y = []
    for i in range(len(x)):
        y.append(sum(x[:i+1]))
    return y

cumulative_sum__v2b(x)

**List comprehensions.** The `__v2b` version is an example of a common pattern or idiom, which is to generate a new collection where every element can (in principle) be generated independently of any other.

Consider the following general loop, which for each value in `x` applies a function `f` to it:

```python
y = []
for e in x:
    y.append(f(e))
    
**************
    
Observe that `f(e)` depends only on a given `e`, and not any other elements of `x`. Such constructions can be written in the compact form,

```python
y = [f(e) for e in x]
```

#### This form is a _list comprehension_. 

You can think of it as syntactic sugar, but actually, it's more than that. It expresses how to generate an entire collection of objects from a collection of input objects by applying a function to every element of the input.

> A Python implementation will execute the list comprehension "serially," that is, in a manner that is identical to the first version. However, the _intent_ of a list comprehension is different.
>
> Regarding "intent," consider an analogy to natural language. It is entirely correct to say, "In my first meal of the day, that is, the one that I eat shortly after I wake up in the morning, I had ..." But you can communicate your intent more quickly by saying, "For breakfast, I had ..."

Applying list comprehensions to our naïve cumulative sum:

In [None]:
def cumulative_sum__v3(x):
    return [sum(x[:i+1]) for i in range(len(x))]

cumulative_sum__v3(x)

Cute? Yes! Readable? Hmmm...


**Helper functions.** One way to make your code a little more readable is to break it up into smaller functions with meaningful names.

In our cumulative sum example, the sublist `x[:i+1]` is intended to select the elements up to `i` (inclusive). So, let's use a little (inner) helper function to communicate this intent.

> Pay attention to how this code is constructed, and in particular, what names are visible in what parts of the program.

In [None]:
def cumulative_sum__v4(x):
    def up_to(i):
        return x[:i+1]
    return [sum(up_to(i)) for i in range(len(x))]

cumulative_sum__v4(x)

### Revisions for efficiency ###

The bigger problem with our original code is its _cost_ or _running time_, which scales quadratically with the size of the input. There is a lot of redundant computation, since the partial sums overlap:

$$\begin{array}{rcl}
  y_0 & = & x_0 \\
  y_1 & = & x_0 + x_1 \\
  y_2 & = & x_0 + x_1 + x_2 \\
      & \cdots & \\
  y_{n-1} & = & x_0 + x_1 + x_2 + \cdots + x_{n-1}
\end{array}$$



As most of you would have naturally deduced, you can easily remove the redundancies by maintaining a partial sum and updating that as you generate each output element:

In [None]:
def cumulative_sum__v5(x):
    s = 0
    y = []
    for e in x:
        s += e
        y.append(s)
    return y

cumulative_sum__v5(x)

This version is reasonable. It touches every entry of `x` just once (line 4), and each time it does so, it performs one addition (line 5). Therefore, the total number of additions is proportional to the size of the input, $\mathcal{O}(n)$.

***********************

One minor issue is the "extra" intermediate value, `s`, which adds to what a reader has to track. Since the output list entries are themselves partial sums, once we know an output element, we know the partial sum needed to produce the next output element.

***********************

Here is a version that eliminates the use of `s`. It uses two more idiomatic forms from Python:

1. To refer to elements of a list relative to the _end_ of the list, use negative indices. For example, `y[-1]` would refer to the last element of `y`, `y[-2]` would refer to the second-to-last element of `y`, and so on.
2. This implementation "seeds" the output with a dummy partial sum that holds the value of `0`. But when we are done, we don't need that any more. To remove it, we can use the `.pop` method associated with `list` objects.

In [None]:
def cumulative_sum__v6(x):
    y = [0]
    for e in x:
        y.append(y[-1] + e)
    y.pop(0) 
    return y

print(cumulative_sum__v6(x))

> As a final tweak, you can also use _slicing_ to extract all computed elements other than the first. How? (_Answer:_ Rather than use `y.pop(0)` followed by `return y`, try `return y[1:]`.)