# How much, how long?

In [1]:
from sys import getsizeof

help(getsizeof)

Help on built-in function getsizeof in module sys:

getsizeof(...)
    getsizeof(object, default) -> int
    
    Return the size of object in bytes.



In [2]:
from timeit import timeit

help(timeit)

Help on function timeit in module timeit:

timeit(stmt='pass', setup='pass', timer=<built-in function perf_counter>, number=1000000, globals=None)
    Convenience function to create Timer object and call timeit method.



# We can ask how big a python object is

### Be careful of the reference versus object distinction

In [3]:
from sys import getsizeof
print('[]:', getsizeof([]))
print('1:', getsizeof(1))
print('[1]:', getsizeof([1]))

[]: 64
1: 28
[1]: 72


# The standard library timeit module makes some poor choices in methodology

In [4]:
%%bash

python -m timeit "5000 in range(100)"

2000000 loops, best of 5: 179 nsec per loop


In [5]:
%%bash

python -m perf timeit "5000 in range(100)"

.....................
Mean +- std dev: 185 ns +- 9 ns


# Run timeit on your computer to find out in rough terms how long it takes to print "hello noisebridge!".

# Put the result on the whiteboard

# Every operating system has a different way of asking how much space a process is using 

# : (

### psutil is a python library that tries to create a common interface to ask about your system

In [6]:
import os

import psutil  # Not in the standard library!

process = psutil.Process(os.getpid())
process.memory_info()

pmem(rss=48652288, vms=531881984, shared=14782464, text=8192, lib=0, data=101851136, dirty=0)

# "How efficient is this code?" is a hard and poorly defined question.  

### It depends on what we care about and what computer we are using 

# Big O notation

### Big O notation is a way of catagorizing code. 

### Big O only examines how cost grows as you add data.

# The memory occupied by a list with `N` elements

### 64 + 8 * N bytes
### &rarr; O(1) + O(N) bytes
### &rarr; O(N) bytes

# The time it takes to print `N` numbers

### 'time to set up the code' + 'time to print one number' * N
### &rarr; O(1) + O(N) time
### &rarr; O(N) time

In [None]:
N = 10
for i in range(N):
    print(i)

# Use sys.getsizeof to determine the big O category for memory of
# * range(N) 
# * list(range(N))

# Big O complexity signifies

### The extra resources it takes to process an extra bit of data, 
### in the worst case, 
### if we are already processing many elements.

# What is the big O complexity for execution time of the following code?

In [22]:
def find_max(collection):
    """Print the maximum element of `collection`, which has N elements."""
    collection_max = collection[0]
    max_index = 0
    for i, item in enumerate(collection):
        if collection_max < item:
            collection_max = item
            max_index = i
    
    print(collection_max)
    return i

In [9]:
def find_duplicates(collection):
    """Print any elements of `collection` that occur twice."""
    
    for i, item1 in enumerate(collection):
        for j, item2 in enumerate(collection):
            if i != j and item1 == item2:
                print(item1)

In [23]:
def sort(collection):
    """Print elements of `collection` in descending order."""
    new_collection = list(collection)
    for i in range(len(new_collection)):
        index = find_max(new_collection)
        del new_collection[index]


# Python builtin collections give you control over time complexity
* Accessing an element by index from a list or tuple is O(1)
* Finding an element in a list or tuple is O(N)
* Finding an element in a set or dictionary is O(1)

Given an array of integers, find the maximum sum of any contiguous slice. If the maximum is negative, return zero.

```
-------------------------------------------------------
| 31 | -41 | 59 | 26 | -53 | 58 | 97 | -93 | -23 | 84 |
-------------------------------------------------------
           ↑                         ↑
           2                         7
```

Here the slice `[2:7]` has a sum of 187.

What is the worst strategy we could use to find the maximum sum?

In [25]:
values = [31, -41, 59, 26, -53, 58, 97, -93, -23, 84]

def solution_1(values):
    current_max = 0
    for i in range(len(values)):
        for j in range(i, len(values)):
            current_max = max(current_max, sum(values[i:j]))  # O(N)
            
    return current_max
        
print(solution_1(values))

187


What is the performance of this algorithm?
1. $O(n)$
2. $O(n log(n))$
3. $O(n ^ 2)$
4. $O(n ^ 3)$

# Duplicate work

The inner loop adds the same numbers together over and over.  There are two options to avoid this.

1. Accumulate the sum as we iterate, instead of recomputing it.

2. Calculate sums ahead of time, and look them up later.

# Duplicate work

Adapt your implementation by accumulating the sum while iterating, or by calculating the sums ahead of time.

For the second option, you can use
```python
sum(array[i:j]) = sum(array[:j]) - sum(array[:i-1])
```
to find the sum of a slice from the cumulative sum.

In [26]:
def solution_2(values):
    current_max = 0
    for i in range(len(values)):
        current_sum = 0
        for val in values[i:]:
            current_sum += val
            current_max = max(current_max, current_sum)
    
    return current_max
            
print(solution_2(values))

187


In [29]:
def solution_3(values):
    cumulative = [0 for i in range(len(values) + 1)]
    current_sum = 0

    # This is equivalent to itertools.accumulate plus a leading zero
    for i, v in enumerate(values):
        current_sum += v
        cumulative[i + 1] = current_sum
    
    current_max = 0
    for i, cumulative_i in enumerate(cumulative):
        for j, cumulative_j in enumerate(cumulative[i:]):
            current_sum = cumulative_j - cumulative_i
            current_max = max(current_max, current_sum)
    
    return current_max

print(solution_3(values))


187


# Divide and Conquor

There is a tricky way of going faster, by noticing that we can divide this problem in two.

```
      left maximum       right maximum
------===========------   ===========------------
| -41 | 59 | 26 | -53 |   | 58 | 97 | -93 | -23 |
------===========------   ===========------------
```

We can keep doing this recursively

```
       ------===========------         
       | -41 | 59 | 26 | -53 |           
       ------===========------           
            /           \                
     ------======    ======------    
     | -41 | 59 |    | 26 | -53 |    
     ------======    ======------    
      /      \          /     \      
-------    ======    =====-    -------
| -41 |    | 59 |    | 26 |    | -53 |
-------    ======    =====-    -------
```

At each node we have three options:

1. left child's max
2. right child's max
3. maximum crosses the middle

Checking the third case takes O(n) at each node.
- What is the big O complexity of this algorithm?
- If we didn't have to worry about the third case, what would be the complexity?

# Dynamic programming

Let's say we have a solution to a list with N elements.  If we add an element on to the end, how do we need to modify the solution?

```
======-----------       ------
| 59 | -41 | 31 |   +   | 30 |    =    ?
======-----------       ------
```

In [30]:
def solution_4(values):
    current_max = 0
    max_ending_here = 0
    
    for val in values:
        max_ending_here = max(max_ending_here + val, 0)
        current_max = max(current_max, max_ending_here)
        
    return current_max

print(solution_4(values))

187


Rules to remember:
- Don't do work you don't have to do
- Look it up instead of finding it each time
- Look for ways to divide the problem into subproblems that can be combined into the full solution.
- Look for simpler subproblems that can be extended into a full solution.
- The way you would solve the problem with pen and paper might be close to the optimal method.