# The Structure & Interpretation of Computer Programs

## Chapter 2, Building Abstractions with Data

> Why do we want compound data in a programming language? For the same reasons that we want
compound procedures: to elevate the conceptual level at which we can design our programs, to increase
the modularity of our designs, and to enhance the expressive power of our language. Just as the ability to
define procedures enables us to deal with processes at a higher conceptual level than that of the primitive
operations of the language, the ability to construct compound data objects enables us to deal with data at a
higher conceptual level than that of the primitive data objects of the language.

> We will see that the key to forming compound data is that a programming language should provide some
kind of glue so that data objects can be combined to form more complex data objects. There are many
possible kinds of glue. Indeed, we will discover how to form compound data using no special *data*
operations at all, only procedures. This will further blur the distinction between *procedure* and *data*,
which was already becoming tenuous toward the end of chapter 1. We will also explore some conventional
techniques for representing sequences and trees. One key idea in dealing with compound data is the notion
of *closure* -- that the glue we use for combining data objects should allow us to combine not only primitive
data objects, but compound data objects as well. Another key idea is that compound data objects can serve
as *conventional interfaces* for combining program modules in mix-and-match ways. We illustrate some of
these ideas by presenting a simple graphics language that exploits closure.

> We will then augment the representational power of our language by introducing *symbolic expressions* --
data whose elementary parts can be arbitrary symbols rather than only numbers

### Introduction to Data Abstraction

Suppose we wish to do arithmetic with rational numbers. We can think of operations without assuming how we represent them. We need to consider the existence of some procedures:

+ `(make-rat n d)` returns a rational n/d

+ `(numer x)` return the numerator of x

+ `(denom x)` return the denominator of x

With these we can define rational addition:

In [2]:
(define (add-rat x y)
  (make-rat (+ (* (numer x) (denom y))
               (* (numer y) (denom x)))
            (* (denom x) (denom y))))

as well as other operations:

In [3]:
(define (sub-rat x y)
  (make-rat (- (* (numer x) (denom y))
               (* (numer y) (denom x)))
            (* (denom x) (denom y))))

(define (mul-rat x y)
  (make-rat (* (numer x) (numer y))
            (* (denom x) (denom y))))

(define (div-rat x y)
  (make-rat (* (numer x) (denom y))
            (* (denom x) (numer y))))

(define (equal-rat? x y)
  (= (* (numer x) (denom y))
     (* (numer y) (denom x))))

How to represent rationals? One way is to think of a rational numbers as two integers.

For that we can use Scheme's `cons`, `car` and `cdr`:

In [35]:
(define x (cons 1 2)) ; makes a pair (1,2)

(display (car x))
(newline)
(display (cdr x))
(newline)
(pair? x) ; check if it's a pair

1
2


#t

So, rationals would be defined like this:

In [5]:
(define (make-rat n d) (cons n d))

(define (numer x) (car x))

(define (denom x) (cdr x))

; just add a print procedure:
(define (print-rat x)
  (newline)
  (display (numer x))
  (display "/")
  (display (denom x)))

In [6]:
(define one-half (make-rat 1 2))
(define one-fourth (make-rat 1 4))

(print-rat (add-rat one-half one-fourth))


6/8

In this case, we could also force that the rational would be represented in its reduced form:

In [7]:
(define (gcd a b)
  (if (= b 0)
      a
      (gcd b (remainder a b))))

(define (make-rat n d)
  (let ((g (gcd n d)))
    (cons (/ n g) (/ d g))))

(define one-half (make-rat 1 2))
(define one-fourth (make-rat 1 4))

(print-rat (add-rat one-half one-fourth))


3/4

> Constraining the dependence on the representation to a few interface procedures helps us design programs
as well as modify them, because it allows us to maintain the flexibility to consider alternate
implementations.

### What Is Meant by Data?

> We began the rational-number implementation in section 2.1.1 by implementing the rational-number
operations add-rat, sub-rat, and so on in terms of three unspecified procedures: make-rat,
numer, and denom. At that point, we could think of the operations as being defined in terms of data
objects -- numerators, denominators, and rational numbers -- whose behavior was specified by the latter
three procedures.

> But exactly what is meant by data? It is not enough to say *whatever is implemented by the given
selectors and constructors.* Clearly, not every arbitrary set of three procedures can serve as an appropriate
basis for the rational-number implementation. We need to guarantee that, if we construct a rational number
x from a pair of integers n and d, then extracting the numer and the denom of x and dividing them
should yield the same result as dividing n by d. In other words, make-rat, numer, and denom must
satisfy the condition that, for any integer n and any non-zero integer d, if x is (make-rat n d), then

$$\frac{\text{(numer x)}}{\text{(denom x)}} = \frac{n}{d}$$

> In fact, this is the only condition make-rat, numer, and denom must fulfill in order to form a suitable
basis for a rational-number representation. In general, we can think of data as defined by some collection
of selectors and constructors, together with specified conditions that these procedures must fulfill in order
to be a valid representation

> This point of view can serve to define not only ``high-level'' data objects, such as rational numbers, but
lower-level objects as well. Consider the notion of a pair, which we used in order to define our rational
numbers. We never actually said what a pair was, only that the language supplied procedures cons, car,
and cdr for operating on pairs. But the only thing we need to know about these three operations is that if
we glue two objects together using cons we can retrieve the objects using car and cdr. That is, the
operations satisfy the condition that, for any objects x and y, if z is (cons x y) then (car z) is x
and (cdr z) is y. Indeed, we mentioned that these three procedures are included as primitives in our
language. However, any triple of procedures that satisfies the above condition can be used as the basis for
implementing pairs. This point is illustrated strikingly by the fact that we could implement cons, car,
and cdr without using any data structures at all but only using procedures. Here are the definitions:

> The point of exhibiting the procedural representation of pairs is not that our language works this way
(Scheme, and Lisp systems in general, implement pairs directly, for efficiency reasons) but that it could
work this way. The procedural representation, although obscure, is a perfectly adequate way to represent
pairs, since it fulfills the only conditions that pairs need to fulfill. This example also demonstrates that the
ability to manipulate procedures as objects automatically provides the ability to represent compound data.
This may seem a curiosity now, but procedural representations of data will play a central role in our
programming repertoire. This style of programming is often called **message passing**, and we will be using
it as a basic tool in chapter 3 when we address the issues of modeling and simulation.

### Hierarchical Data and the Closure Property

> The ability to create pairs whose elements are pairs is the essence of list structure's importance as a
representational tool. We refer to this ability as the **closure property** of cons. In general, an operation for
combining data objects satisfies the closure property if the results of combining things with that operation
can themselves be combined using the same operation.6 Closure is the key to power in any means of
combination because it permits us to create *hierarchical structures* -- structures made up of parts, which
themselves are made up of parts, and so on.

### Representing Sequences

> One of the useful structures we can build with pairs is a sequence -- an ordered collection of data objects.
There are, of course, many ways to represent sequences in terms of pairs.

In [8]:
(define nil '())  ; nil in scheme is the empty list, so quoting an empty list gives you nil

(cons 1
  (cons 2
    (cons 3
      (cons 4 nil)))) 

(1 2 3 4)

> Such a sequence of pairs, formed by nested conses, is called a list, and Scheme provides a primitive called
list to help in constructing lists. The above sequence could be produced by (list 1 2 3 4). [...] We can think of car as selecting the first item in the list, and of cdr as selecting the sublist consisting of all but the first item.

In [9]:
(display (list 1 2 3 4))
(newline)
;(display (cdr (list 1 2 3 4)))
(cdr (list 1 2 3 4))

(1 2 3 4)


(2 3 4)

### List Operations

In [10]:
; index of list
(define (list-ref xs n)
  (if (= n 0)
      (car xs)
      (list-ref (cdr xs) (- n 1))))

(list-ref (list 1 2 3 4) 2)

3

In [11]:
; length of list
(define (length xs)
  (if (null? xs)
      0
      (+ 1 (length (cdr xs)))))

In [12]:
; concatenate lists
(define (append xs ys)
  (if (null? xs)
      ys
      (cons (car xs) (append (cdr xs) ys))))

In [13]:
; mapping over a list
(define (map f xs)
  (if (null? xs)
      nil
      (cons (f (car xs))
            (map f (cdr xs)))))

(map (lambda (x) (* x x)) (list 1 2 3 4))

(1 4 9 16)

> Map is an important construct, not only because it captures a common pattern, but because it establishes a
higher level of abstraction in dealing with lists. [...] map helps establish an abstraction barrier that isolates the implementation of procedures that transform lists from the details of how the elements of the list are extracted and combined.

In [14]:
; implementing a for-each command
(define (for-each f xs) 
  (cond 
    ((null? xs) #t) 
    (else       (f (car xs)) 
                (for-each f (cdr xs))))) 
 
(for-each (lambda (x) (newline) (display x)) (list 57 321 88))


57
321
88

#t

### Hierarchical Structures

> The representation of sequences in terms of lists generalizes naturally to represent sequences whose
elements may themselves be sequences [...] Another way to think of sequences whose elements are sequences is as trees. The elements of the sequence are the branches of the tree, and elements that are themselves sequences are subtrees [...] Recursion is a natural tool for dealing with tree structures, since we can often reduce operations on trees to
operations on their branches, which reduce in turn to operations on the branches of the branches, and so on,
until we reach the leaves of the tree.

In [15]:
(cons (list 1 2) (list 3 4))

((1 2) 3 4)

In [16]:
; count leaves on a tree
(define (count-leaves x)
  (cond ((null? x) 0)
        ((not (pair? x)) 1)
        (else (+ (count-leaves (car x))
                 (count-leaves (cdr x))))))

; mapping over trees
(define (scale-tree tree factor)
  (map (lambda (sub-tree)
         (if (pair? sub-tree)
             (scale-tree sub-tree factor)
             (* sub-tree factor)))
       tree))

### Sequences as Conventional Interfaces

This section talks about the use of flows to pass information among procedures (à lá pipes)

> The key to organizing programs so as to more clearly reflect the signal-flow structure is to concentrate on
the ``signals'' that flow from one stage in the process to the next. If we represent these signals as lists, then
we can use list operations to implement the processing at each of the stages.

In [17]:
; filter procedure
(define (filter p xs)
  (cond ((null? xs) nil)
        ((p (car xs))
         (cons (car xs)
               (filter p (cdr xs))))
        (else  (filter p (cdr xs)))))

; fold
(define (accumulate op initial xs)
  (if (null? xs)
      initial
      (op (car xs)
          (accumulate op initial (cdr xs)))))

; range
(define (range low high)
  (if (> low high)
      nil
      (cons low (range (+ low 1) high))))

Let's try to make a procedure that creates a list of even fibonacci numbers:

In [18]:
(define (fib n)
  (fib-iter 1 0 n))

(define (fib-iter a b count)
  (if (= count 0)
      b
      (fib-iter (+ a b) a (- count 1))))

; even-fibs = range >> map fib >> filter even >> accumulate
(define (even-fibs n)
  (accumulate cons
              nil
              (filter even?
                      (map fib
                           (range 0 n)))))

(even-fibs 15)

(0 2 8 34 144 610)

> The value of expressing programs as sequence operations is that this helps us make program designs that
are modular, that is, designs that are constructed by combining relatively independent pieces. We can
encourage modular design by providing a library of standard components together with a conventional
interface for connecting the components in flexible ways.

> Modular construction is a powerful strategy for controlling complexity in engineering design. In real signal processing applications, for example, designers regularly build systems by cascading elements selected
from standardized families of filters and transducers. Similarly, sequence operations provide a library of
standard program elements that we can mix and match. 

> Sequences, implemented here as lists, serve as a conventional interface that permits us to combine
processing modules. Additionally, when we uniformly represent structures as sequences, we have localized
the data-structure dependencies in our programs to a small number of sequence operations. By changing
these, we can experiment with alternative representations of sequences, while leaving the overall design of
our programs intact.

### Nested Mappings

Consider this problem: Given a positive integer n, find all ordered pairs of distinct positive integers i and j, where 1< j< i< n, such that i + j is prime.

In [19]:
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; operations for checking if prime
(define (square x) 
  (* x x)) 

(define (divides? a b)
  (= (remainder b a) 0))

(define (find-divisor n test-divisor)
  (cond ((> (square test-divisor) n) n)
        ((divides? test-divisor n) test-divisor)
        (else (find-divisor n (+ test-divisor 1)))))

(define (smallest-divisor n)
  (find-divisor n 2))

(define (prime? n)
  (= n (smallest-divisor n)))

(define (prime-sum? pair)
  (prime? (+ (car pair) (cadr pair))))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(define (flatmap f xs)
  (accumulate append nil (map f xs)))

(define (make-pair-sum pair)
  (list (car pair) (cadr pair) (+ (car pair) (cadr pair))))

(define (prime-sum-pairs n)
  (map make-pair-sum
       (filter prime-sum?
               (flatmap
                 (lambda (i)
                   (map (lambda (j) (list i j))
                        (range 1 (- i 1))))
                 (range 1 n)))))

(prime-sum-pairs 7)

((2 1 3) (3 2 5) (4 1 5) (4 3 7) (5 2 7) (6 1 7) (6 5 11) (7 4 11) (7 6 13))

Another example: computing permutations

In [20]:
(define (remove item sequence)
  (filter (lambda (x) (not (= x item)))
          sequence))

(define (permutations s)
  (if (null? s)                         ; empty set?
      (list nil)                        ; sequence containing empty set
      (flatmap (lambda (x)
                 (map (lambda (p) (cons x p))
                      (permutations (remove x s))))
               s)))

(permutations (range 1 3))

((1 2 3) (1 3 2) (2 1 3) (2 3 1) (3 1 2) (3 2 1))

### Symbolic Data

An important extension is to make it possible to work with arbitrary synbols as data, not just numbers.

> In order to manipulate symbols we need a new element in our language: the ability to quote a data object.
Suppose we want to construct the list `(a b)`. We can't accomplish this with `(list a b)`, because this
expression constructs a list of the values of a and b rather than the symbols themselves. This issue is well
known in the context of natural languages, where words and sentences may be regarded either as semantic
entities or as character strings (syntactic entities). The common practice in natural languages is to use
quotation marks to indicate that a word or a sentence is to be treated literally as a string of characters. [...] our format for quoting differs from that of natural languages in that we place a quotation mark (traditionally, the single quote symbol ') only at the beginning of the object to be quoted.

In [24]:
(define a 1)
(define b 2)

(display (list a b)   )
(display (list 'a 'b) )
(display (list 'a b)  )

(1 2)(a b)(a 2)

In [27]:
(display (car '(a b c)) ) (newline)
(display (cdr '(a b c)) ) 

a
(b c)

> One additional primitive used in manipulating symbols is `eq?`, which takes two symbols as arguments and
tests whether they are the same [...] Using eq?, we can implement a useful procedure called memq. This
takes two arguments, a symbol and a list. If the symbol is not contained in the list (i.e., is not eq? to any
item in the list), then memq returns false. Otherwise, it returns the sublist of the list beginning with the first
occurrence of the symbol

In [28]:
(define (memq item x)
  (cond ((null? x) false)
        ((eq? item (car x)) x)
        (else (memq item (cdr x)))))

(memq 'apple '(x (apple sauce) y apple pear))

(apple pear)

### Representing Sets

> Informally, a set is simply a collection of distinct objects. To give a more precise definition we can employ
the method of data abstraction. That is, we define *set* by specifying the operations that are to be used on
sets. These are `union-set`, `intersection-set`, `element-of-set?`, and `adjoin-set`.

One way to represent a set is as a unordered list of its elements in which no element appears more than once

In [None]:
(define (element-of-set? x set)
  (cond ((null? set) false)
        ((equal? x (car set)) true)
        (else (element-of-set? x (cdr set)))))

(define (adjoin-set x set)
  (if (element-of-set? x set)
      set
      (cons x set)))

(define (intersection-set set1 set2)
  (cond ((or (null? set1) (null? set2)) '())
        ((element-of-set? (car set1) set2)
         (cons (car set1)
               (intersection-set (cdr set1) set2)))
        (else (intersection-set (cdr set1) set2))))

(define (union-set set1 set2) 
  (if (null? set1) 
      set2 
      (union-set (cdr set1) (adjoin-set (car set1) set2)))) 

The book includes alternative implementations for ordered lists and binary search trees.

In [None]:
;; BINARY TREES
(define (entry tree) (car tree))

(define (left-branch tree) (cadr tree))

(define (right-branch tree) (caddr tree))

(define (make-tree entry left right)
  (list entry left right))

(define (element-of-set? x set)
  (cond ((null? set) false)
        ((= x (entry set)) true)
        ((< x (entry set))
         (element-of-set? x (left-branch set)))
        ((> x (entry set))
         (element-of-set? x (right-branch set)))))

(define (adjoin-set x set)
  (cond ((null? set) (make-tree x '() '()))
        ((= x (entry set)) set)
        ((< x (entry set))
         (make-tree (entry set) 
                    (adjoin-set x (left-branch set))
                    (right-branch set)))
        ((> x (entry set))
         (make-tree (entry set)
                    (left-branch set)
                    (adjoin-set x (right-branch set))))))

### Multiple Representations for Abstract Data

> there might be more than one useful representation for a data object, and we might like to
design systems that can deal with multiple representations. To take a simple example, complex numbers
may be represented in two almost equivalent ways: in rectangular form (real and imaginary parts) and in
polar form (magnitude and angle). Sometimes rectangular form is more appropriate and sometimes polar
form is more appropriate. Indeed, it is perfectly plausible to imagine a system in which complex numbers
are represented in both ways, and in which the procedures for manipulating complex numbers work with
either representation.

> In this section, we will learn how to cope with data that may be represented in different ways by different
parts of a program. This requires constructing generic procedures -- procedures that can operate on data
that may be represented in more than one way. Our main technique for building generic procedures will be
to work in terms of data objects that have type tags, that is, data objects that include explicit information
about how they are to be processed. We will also discuss data-directed programming, a powerful and
convenient implementation strategy for additively assembling systems with generic operations.

We can define four selectors `real-part`, `imag-part`, `magnitude`, and `angle` that will be defined depending the chosen representation of a complex number. The `make-from-real-imag` and `make-from-mag-ang` will be constructors

In [None]:
(define (add-complex z1 z2)
  (make-from-real-imag (+ (real-part z1) (real-part z2))
                       (+ (imag-part z1) (imag-part z2))))

This operation is agnostic to the real representation of a complex number!

The next selectors/constructors are based on the Cartesian coordinates:

In [None]:
(define (real-part z) (car z))

(define (imag-part z) (cdr z))

(define (magnitude z)
  (sqrt (+ (square (real-part z)) (square (imag-part z)))))

(define (angle z)
  (atan (imag-part z) (real-part z)))

(define (make-from-real-imag x y) (cons x y))

(define (make-from-mag-ang r a) 
  (cons (* r (cos a)) (* r (sin a))))

The next one is a representation based on polar coordinates:

In [40]:
(define (real-part z)
  (* (magnitude z) (cos (angle z))))

(define (imag-part z)
  (* (magnitude z) (sin (angle z))))

(define (magnitude z) (car z))

(define (angle z) (cdr z))

(define (make-from-real-imag x y) 
  (cons (sqrt (+ (square x) (square y)))
        (atan y x)))

(define (make-from-mag-ang r a) (cons r a))

### Tagged data

And if we wished to used both representations? One way would be to tag each value to its representations (like `rectangular` and `polar`), in order for the program to distinguish between them. One way to to attach the tag within a list.

In [None]:
(define (attach-tag type-tag contents)
  (cons type-tag contents))

(define (type-tag datum)
  (if (pair? datum)
      (car datum)
      (error "Bad tagged datum -- TYPE-TAG" datum)))

(define (contents datum)
  (if (pair? datum)
      (cdr datum)
      (error "Bad tagged datum -- CONTENTS" datum)))

To check if a complex number has some representation:

In [None]:
(define (rectangular? z)
  (eq? (type-tag z) 'rectangular))

(define (polar? z)
  (eq? (type-tag z) 'polar))

Then the selectors/constructors would need to be adapted,

In [42]:
;; (rectangular)

(define (real-part-rectangular z) (car z))

(define (imag-part-rectangular z) (cdr z))

(define (magnitude-rectangular z)
  (sqrt (+ (square (real-part-rectangular z))
           (square (imag-part-rectangular z)))))

(define (angle-rectangular z)
  (atan (imag-part-rectangular z)
        (real-part-rectangular z)))

(define (make-from-real-imag-rectangular x y)
  (attach-tag 'rectangular (cons x y)))

(define (make-from-mag-ang-rectangular r a) 
  (attach-tag 'rectangular
              (cons (* r (cos a)) (* r (sin a)))))

;; (polar)

(define (real-part-polar z)
  (* (magnitude-polar z) (cos (angle-polar z))))

(define (imag-part-polar z)
  (* (magnitude-polar z) (sin (angle-polar z))))

(define (magnitude-polar z) (car z))

(define (angle-polar z) (cdr z))

(define (make-from-real-imag-polar x y) 
  (attach-tag 'polar
               (cons (sqrt (+ (square x) (square y)))
                     (atan y x))))

(define (make-from-mag-ang-polar r a)
  (attach-tag 'polar (cons r a)))

And also the generic selectors would need to choose between the two possible representations:

In [None]:
(define (real-part z)
  (cond ((rectangular? z) 
         (real-part-rectangular (contents z)))
        ((polar? z)
         (real-part-polar (contents z)))
        (else (error "Unknown type -- REAL-PART" z))))

(define (imag-part z)
  (cond ((rectangular? z)
         (imag-part-rectangular (contents z)))
        ((polar? z)
         (imag-part-polar (contents z)))
        (else (error "Unknown type -- IMAG-PART" z))))

(define (magnitude z)
  (cond ((rectangular? z)
         (magnitude-rectangular (contents z)))
        ((polar? z)
         (magnitude-polar (contents z)))
        (else (error "Unknown type -- MAGNITUDE" z))))

(define (angle z)
  (cond ((rectangular? z)
         (angle-rectangular (contents z)))
        ((polar? z)
         (angle-polar (contents z)))
        (else (error "Unknown type -- ANGLE" z))))

The functions would not change and could work with the two representations at the same time, if needed!

In [44]:
(define (add-complex z1 z2)
  (make-from-real-imag (+ (real-part z1) (real-part z2))
                       (+ (imag-part z1) (imag-part z2))))

---

SCIP solutions at http://community.schemewiki.org/?SICP-Solutions