#### Writer Note:

Start the lecture with Python kernel. At a certain point it will be required to switch to Scheme kernel.

# Efficient Sequence Processing

Many Python sequence manipulation built-in functions (e.g. `map`, `filter`, `sum`, `reduce`) take and return iterable objects. 

## Sequence Operations

How convenient these sequence operations can be?

`map`, `filter` and `reduce` express sequence manipulation using compact expressions.

Example: Let's say we want to sum all the prime numbers in an interval from `a` (inclusive) to `b` (exclusive). And let's pretend that we already defined a function `is_prime` that determines whether a number is prime.

Early in the course, we might have implemented `sum_primes` as the following,

In [None]:
def sum_primes(a, b):
    total = 0
    x = a
    while x < b:
        if is_prime(x):
            total = total + x
        x = x + 1
    return total

The space that's required to execute this function given an interval from `a` to `b` of size `n` is $\Theta(1)$.
* We only need to keep track of `total`, `x`, `a`, and `b` regardless of the size of the interval.

Here's a more compact definition of the same function,

In [None]:
# Sum the result of filtering the range from a to b using the is_prime function
def sum_primes(a, b):
    return sum(filter(is_prime, range(a, b)))

Now if we run the following,

In [None]:
sum_primes(1, 6)

What would happen?

The `range` is implicit. The `range` iterator that's extracted by `filter` only keeps track of what's next. It never explicitly represents all the numbers in the range, so it actually uses constant space.

<img src = 'range.jpg' width = 200/>

The `filter` object that's created by calling `filter` remembers that `is_prime` is the filering function and remembers the source or iterator input that's going to yield values.

<img src = 'filter.jpg' width = 300>

`sum` takes the source and keeps track of the total. 

<img src = 'sum.jpg' width = 300/>

Now `1` is not prime, so `1` is not added to the total. The `range` iterator moves to the next element, which is 2.

<img src = '2.jpg' width = 300/>

`2` is a prime number, so `2` is added to the total. The `range` iterator then moves to the next element, 3.

<img src = '3.jpg' width = 300/>

`3` is a prime number, so `3` is added to the total. The `range` iterator moves to the next element, `4`.

<img src = '4.jpg' width = 300/>

And the process repeats until we finish processing the last element before `end`, which is `5`. `6` is not included in the process.

<img src = 'finish.jpg' width = 300/>

The space that's required to run this function is $\Theta_1$!

Even though we expressed the computation in terms of sequences, we've managed to keep our implementation down to constant space. This is only true because of the **lazy** implicit nature of `range` and `filter`. If either `range` has explicitly written out all the elements from `a` to `b`, or if `filter` has explicitly written out all of the remaining elements, we would have ended up using linear space. Thus, iterators are a convenient way to avoid that outcome. 

## Demo

The `is_prime` function can also be considered a sequence operation. 

In [1]:
def is_prime(x):
    if x <= 1: # if x is less than 1, then it's definitely not a prime
        return False
    # Check whether all of the elements that we receive when we map a function that divides x by y 
    # are True for everything in the range of 2 to x.
    return all(map(lambda y: x % y, range(2, x)))

The implementation of `sum_primes` is just returning the result of summing the result of filtering using `is_prime` in the range of `a` to `b`.

In [2]:
def sum_primes(a, b):
    return sum(filter(is_prime, range(a, b)))

In [3]:
sum_primes(1, 6) # 2 + 3 + 5

10

In [4]:
sum_primes(1, 10) # 2 + 3 + 5 + 7

17

In [5]:
sum_primes(1, 100)

1060

All of the above were computed in constant space. It doesn't matter how large the interval is, we will not run out of memory.

Can we do the same in Scheme?

We can certainly take the same sequence processing approach using the tools we already have and the built-in `list` data structure.

#### Writer Note: Switch to Scheme kernel!

Below are the sequence operations implemented in Scheme. `map` applies some function `f` to every element in `s`.

In [25]:
(define false #f)

In [26]:
(define nil ())

In [27]:
(define (map f s)
  (if (null? s)
      nil
      (cons (f (car s))
            (map f
                 (cdr s)))))

`filter` keeps the elements in `s` for which `f` is `True`.

In [28]:
(define (filter f s)
  (if (null? s)
      nil
      (if (f (car s))
          (cons (car s)
                (filter f (cdr s)))
          (filter f (cdr s)))))

`reduce` combines the elements in `s` using a 2-argument function `f` starting with the `start` value.

In [29]:
(define (reduce f s start)
  (if (null? s)
      start
      (reduce f
              (cdr s)
              (f start (car s)))))

For `range` and `sum`, we have to define them ourselves.

In [30]:
(define (range a b) ; a range is a list
  (if (>= a b)
      nil ; nil if a is greater or equal to b
      (cons a (range (+ a 1) b)))) ; starts with a, followed by a range from a+1 to b

In [31]:
(define (sum s) ; if we want to sum up all the elements in the sequence s
  (reduce + s 0)) ; then just reduce all of the elements in s using addition

In [32]:
(define (prime? x) ; whether x is a prime or not
  (if (<= x 1) ; if x is less than or equal to 1
      false ; then it's not a prime
      ; otherwise, filter using a predicate that checks whether x is divisible by some y for every
      ; y from 2 up to (but not including) x. If it's empty, it's prime.
      (null? (filter (lambda (y) (= 0 (remainder x y)))
                     (range 2 x)))))

Summing `primes` is straightforward. The sum of the primes from `a` to `b` is the sum of filtering using `prime?` the `range` from `a` to `b`.

In [33]:
(define (sum-primes a b)
  (sum (filter prime? (range a b))))

In [34]:
(sum-primes 1 6)

10

In [35]:
(sum-primes 1 10)

17