# Efficient Sequence Processing

Many Python sequence manipulation built-in functions (e.g. `map`, `filter`, `sum`, `reduce`) take and return iterable objects. 

## Sequence Operations

How convenient these sequence operations can be?

`map`, `filter` and `reduce` express sequence manipulation using compact expressions.

Example: Let's say we want to sum all the prime numbers in an interval from `a` (inclusive) to `b` (exclusive). And let's pretend that we already defined a function `is_prime` that determines whether a number is prime.

Early in the course, we might have implemented `sum_primes` as the following,

In [None]:
def sum_primes(a, b):
    total = 0
    x = a
    while x < b:
        if is_prime(x):
            total = total + x
        x = x + 1
    return total

The space that's required to execute this function given an interval from `a` to `b` of size `n` is $\Theta(1)$.
* We only need to keep track of `total`, `x`, `a`, and `b` regardless of the size of the interval.

Here's a more compact definition of the same function,

In [None]:
# Sum the result of filtering the range from a to b using the is_prime function
def sum_primes(a, b):
    return sum(filter(is_prime, range(a, b)))

Now if we run the following,

In [None]:
sum_primes(1, 6)

What would happen?

The `range` is implicit. The `range` iterator that's extracted by `filter` only keeps track of what's next. It never explicitly represents all the numbers in the range, so it actually uses constant space.

<img src = 'range.jpg' width = 200/>

The `filter` object that's created by calling `filter` remembers that `is_prime` is the filering function and remembers the source or iterator input that's going to yield values.

<img src = 'filter.jpg' width = 300>

`sum` takes the source and keeps track of the total. 

<img src = 'sum.jpg' width = 300/>

Now `1` is not prime, so `1` is not added to the total. The `range` iterator moves to the next element, which is 2.

<img src = '2.jpg' width = 300/>

`2` is a prime number, so `2` is added to the total. The `range` iterator then moves to the next element, 3.

<img src = '3.jpg' width = 300/>

`3` is a prime number, so `3` is added to the total. The `range` iterator moves to the next element, `4`.

<img src = '4.jpg' width = 300/>

And the process repeats until we finish processing the last element before `end`, which is `5`. `6` is not included in the process.

<img src = 'finish.jpg' width = 300/>

The space that's required to run this function is $\Theta_1$!

Even though we expressed the computation in terms of sequences, we've managed to keep our implementation down to constant space. This is only true because of the **lazy** implicit nature of `range` and `filter`. If either `range` has explicitly written out all the elements from `a` to `b`, or if `filter` has explicitly written out all of the remaining elements, we would have ended up using linear space. Thus, iterators are a convenient way to avoid that outcome. 

## Demo

The `is_prime` function can also be considered a sequence operation. 

In [3]:
def is_prime(x):
    if x <= 1: # if x is less than 1, then it's definitely not a prime
        return False
    # Check whether all of the elements that we receive when we map a function that divides x by y 
    # are True for everything in the range of 2 to x.
    return all(map(lambda y: x % y, range(2, x)))

Above,

`all` is a built-in function that takes in an iterable and returns `True` if all values in the iterable evaluates to `True`. If the iterable is empty, also returns `True`.

In the following line,

In [None]:
(lambda y: x % y, range(2, x))

...means to apply the lambda function `x % y` for numbers starting 2 all the way to (but not including) `x`. For example, if x was 5, then the execution procedure is as the following,

In [9]:
5 % 2

1

In [10]:
5 % 3

2

In [11]:
5 % 4

1

...then Python calculates all of the above and collects the result (1, 2, 1) as an iterator object.

The built-in `all` function checks if there is any value within the result that evaluates to `False` (such as 0). If `x` number is not prime, then one of the execution flows would result in 0, in which eventually the `all` function would return `False`. For example, if `x` was 4,

In [1]:
4 % 2

0

In [2]:
4 % 3

1

The `map` function would return an iterator that contains `0` and `1`, and `all` would return `False` since the iterator contains `0`, which evaluates to `False`.

The implementation of `sum_primes` is just returning the result of summing the result of filtering using `is_prime` in the range of `a` to `b`.

In [2]:
def sum_primes(a, b):
    return sum(filter(is_prime, range(a, b)))

Above, recall that the `filter` function is a built-in function that takes in a one-argument function and a list of numbers, and returns an iterator that **contains only the numbers where applying the function to the number returns** `True`. This means the following line,

In [None]:
filter(is_prime, range(a, b))

Would return an iterator that contains a list of prime numbers between `a` and `b`.

In [3]:
sum_primes(1, 6) # 2 + 3 + 5

10

In [4]:
sum_primes(1, 10) # 2 + 3 + 5 + 7

17

In [5]:
sum_primes(1, 100)

1060

All of the above were computed in constant space. It doesn't matter how large the interval is, we will not run out of memory.

Can we do the same in Scheme?

We can certainly take the same sequence processing approach using the tools we already have and the built-in `list` data structure.

Below are the sequence operations implemented in Scheme. We can define the procedure `map` that applies some function `f` to every element in `s`.

In [7]:
(define false #f)

In [8]:
(define nil ())

In [9]:
(define (map f s)
  (if (null? s) ; If we've gone through all the elements in s
      nil ; then return nil
      (cons (f (car s)) ; Otherwise, construct a list where the `first` is the result of applying f to car s
            (map f ; and the '.rest' is a map recursive call to cdr s
                 (cdr s)))))

And we can define the function `filter` that keeps the elements in `s` for which `f` is `True`.

In [10]:
(define (filter f s)
  (if (null? s) ; If we've gone through all the elements in s
      nil ; then return nil
      (if (f (car s)) ; Otherwise, if applying f on (car s) returns True
          (cons (car s) ; Construct a list where the `.first` is (car s) and `.rest` is recursive call filter on (cdr s)
                (filter f (cdr s)))
          (filter f (cdr s))))) ; Otherwise, skips (car s) and move on to recursive call filter on (cdr s)

And we can define the function `reduce` that combines the elements in `s` using a 2-argument function `f` starting with the `start` value.

In [1]:
(define (reduce f s start)
  (if (null? s)
      start
      (reduce f
              (cdr s)
              (f start (car s)))))

For example, calling `(reduce - '(1 2) 8)` means subtracting 8 with 1 and 2, resulting in `5`.

In [6]:
(reduce - '(1 2) 8)

5

...and calling `(reduce + '(1 3) 9)` means summing 9 with 1 and 3. `reduce` would be a useful an excellent procedure for defining `sum`.

For `range` and `sum`, we have to define them ourselves.

In [12]:
(define (range a b) ; a range is a list
  (if (>= a b) ; If a is greater or equal to b, that means Python have iterated through all the numbers from a to b. 
      nil ; in the case above, just return nil
      (cons a (range (+ a 1) b)))) ; Otherwise, construct a list where the '.first' is a and the '.rest' is 
      ; ...a recursive call of calling range, but this time 'a' is incremented by 1

In [13]:
(define (sum s) ; if we want to sum up all the elements in the sequence s
  (reduce + s 0)) ; then just reduce all of the elements in s using addition

Then finally, we define `prime?`, which is a procedure that checks whether an input number is a prime number.

In [None]:
(define (prime? x) ; 
  (if (<= x 1) false ; If x is less than or equal to 1, then it's not a prime
      (null? (filter (lambda (y) (= 0 (remainder x y)))
                     (range 2 x)))))

Above, the following line,

In [None]:
(filter (lambda (y) (= 0 (remainder x y)))
                     (range 2 x))

Recall that the `filter` procedure that we defined returns a list that contains the elements where applying the procedure on a certain list of numbers returns `True`. For example, if `x` is 5, then,

In [3]:
(= 0 (remainder 5 2))

#f

In [4]:
(= 0 (remainder 5 3))

#f

In [5]:
(= 0 (remainder 5 4))

#f

As we can see, if `x` was `5`, then the `filter` procedure would return nothing since none of the evaluation above returns `True`. Since the implentation of `prime?` checks whether the outcome of filtering the list is `null?`, `5` would return `True`. However, if `x` was `4`,

In [6]:
(= 0 (remainder 4 2))

#t

In [7]:
(= 0 (remainder 4 3))

#f

The first evaluation returns `True`! This means the outcome of the `filter` procedure would be a list containing `2`. Since the outcome is not `null`, `(prime? 4)` returns `False`/

Summing `primes` is straightforward. The sum of the primes from `a` to `b` is the sum of filtering using `prime?` the `range` from `a` to `b`.

In [15]:
(define (sum-primes a b)
  (sum (filter prime? (range a b))))

In [16]:
(sum-primes 1 6)

10

In [17]:
(sum-primes 1 10)

17

In [18]:
(sum-primes 1 100)

1060

We are using linear space to compute `sum-primes` since `(range a b)` actually writes out all the numbers from `1` to `100` explicitly. 

In [19]:
(range 1 100)

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99)

When we filter the `range` above using `filter` and `prime?`, we obtain a shorter list but they are still written out explicitly (as opposed to Python, where the values are written one at a time due to **lazy computation** nature).

In [20]:
(filter prime? (range 1 100))

(2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97)

We've lost efficiency when we switch from Python iterator-based approach to explicit list representation in Scheme that we just used. Fortunately, there's a way to obtain that efficiency back!