# 2.3 Symbolic Data

All the compound data objects we have used so far were constructed ultimately from numbers. In this section we extend the representational capability of our language by introducing the ability to work with arbitrary symbols as data.

## 2.3.1 Quotation

If we can form compound data using symbols, we can have lists such as

Lists containing symbols can look just like the expressions of our language:

In order to manipulate symbols we need a new element in our language:the ability to $\textbf{quote}$ a data object. Suppose we want to construct the list $\textrm{(a b)}$. We can’t accomplish this with $\textrm{(list a b)}$, because this expression constructs a list of the values of $a$ and $b$ rather than the symbols themselves. This issue is well known in the context of natural languages,where words and sentences may be regarded either as semantic entities or as character strings (syntactic entities). The common practice in natural languages is to use quotation marks to indicate that a word or a sentence is to be treated literally as a string of characters. For instance,the first letter of “John” is clearly “J.” If we tell somebody “say your name aloud,” we expect to hear that person’s name. However, if we tell somebody “say ‘your name’ aloud,” we expect to hear the words “your name.” Note that we are forced to nest quotation marks to describe what somebody else might say.

We can follow this same practice to identify $\text{lists}$ and $\text{symbols}$ that are to be treated as data objects rather than as expressions to be evaluated. However, our format for quoting differs from that of natural languages in that we place a quotation mark (traditionally, the single quote symbol $\textbf{'}$) only at the beginning of the object to be quoted. We can get away with this in Scheme syntax because we rely on blanks and parentheses to delimit objects. Thus, the meaning of the single quote character is to quote the next object.

Now we can $\text{distinguish}$ between symbols and their values:

Quotation also allows us to type in compound objects, using the conventional printed representation for lists:

In keeping with this, we can obtain the empty list by evaluating $\textrm{'()}$, and thus dispense with the variable $\text{nil}$.

## Example 14: Just an example

One additional primitive used in manipulating symbols is $\textrm{eq?}$, which takes two symbols as arguments and tests whether they are the same. Using $\textrm{eq?}$, we can implement a useful procedure called $\texttt{memq}$. This takes two arguments, a symbol and a list. If the symbol is not contained in the list (i.e., is not $\textrm{eq?}$ to any item in the list), then $\texttt{memq}$ returns false. Otherwise, it returns the sublist of the list beginning with the first occurrence of the symbol:

In [1]:
cat 2.3/Example_14/memq.scm

(define (memq item x)
  (cond ((null? x) #f)
        ((eq? item (car x)) x)
        (else (memq item (cdr x)))))


For example, the value of

## Exercise 2.53: 
What would the interpreter print in response to evaluating each of the following expressions?

## Answer:

## Exercise 2.54: 
Two lists are said to be equal? if they contain equal elements arranged in the same order. For example,

is true, but

is false. To be more precise, we can define $\texttt{equal?}$ recursively in terms of the basic $\texttt{eq?}$ equality of symbols by saying that $a$ and $b$ are $\texttt{equal?}$ if they are both symbols and the symbols are $\texttt{eq?}$, or if they are both lists such that $\textrm{(car a)}$ is $\texttt{equal?}$ to $\textrm{(car b)}$ and $\textrm{(cdr a)}$ is $\texttt{equal?}$ to $\textrm{(cdr b)}$.Using this idea, implement $\texttt{equal?}$ as a procedure.

## Answer:

In [1]:
cat 2.3/Exercise_2.54/equal.scm

(define (symbol-equal? x y) (eq? x y))
(define (list-equal? x y)
  (cond ((and (null? x) (null? y)) #t)
        ((or (null? x) (null? y)) #f)
        ((equal? (car x) (car y))
         (equal? (cdr x) (cdr y)))
        (else #f)))

(define (equal? x y)
  (cond ((and (symbol? x) (symbol? y))
         (symbol-equal? x y))
        ((and (list? x) (list? y))
         (list-equal? x y))
        (else (error "Wrong type input x and y..." x y))))


### Running Instance:

## Exercise 2.55: 
Eva Lu Ator types to the interpreter the expression 

To her surprise, the interpreter prints back quote. Explain.

## Answer:

Because the expression

is equivalent to

So,the interpreter prints:

## Example 15: Symbolic Differentiation

As an illustration of symbol manipulation and a further illustration of data abstraction, consider the design of a procedure that performs symbolic differentiation of algebraic expressions. We would like the procedure to take as arguments an algebraic expression and a variable and to return the $\text{derivative}$ of the expression with respect to the variable. For example, if the arguments to the procedure are $ax^2 + bx + c$ and $x$, the procedure should return $2ax + b$. Symbolic differentiation is of special historical significance in Lisp. It was one of the motivating examples behind the development of a computer language for symbol manipulation. Furthermore, it marked the beginning of the line of research that led to the development of powerful systems for symbolic mathematical work, which are currently being used by a growing number of applied mathematicians and physicists.

In developing the symbolic-differentiation program, we will follow the same strategy of data abstraction that we followed in developing the rational-number system of Section 2.1.1. That is, we will first define a differentiation algorithm that operates on abstract objects such as “sums,” “products,” and “variables” without worrying about how these are to be represented. Only afterward will we address the representation problem.

### $\spadesuit\quad$ The differentiation program with abstract data

In order to keep things simple, we will consider a very simple symbolic-differentiation program that handles expressions that are built up using only the operations of addition and multiplication with two arguments.
Differentiation of any such expression can be carried out by applying the following reduction rules:

\begin{align}
\frac{dc}{dx} & = 0,\quad\textrm{for c a constant or a variable different from x},\\
\frac{dx}{dx} & = 1,\\
\frac{d(u+v)}{dx} & = \frac{du}{dx}+\frac{dv}{dx},\\
\frac{d(uv)}{dx} & = u\frac{dv}{dx}+v\frac{du}{dx}.
\end{align}

Observe that the latter two rules are recursive in nature. That is, to obtain the derivative of a sum we first find the derivatives of the terms and add them. Each of the terms may in turn be an expression that needs to be decomposed. Decomposing into smaller and smaller pieces will eventually produce pieces that are either constants or variables, whose derivatives will be either 0 or 1.

To embody these rules in a procedure we indulge in a little wishful thinking, as we did in designing the rational-number implementation.If we had a means for representing algebraic expressions, we should be able to tell whether an expression is a sum, a product, a constant, or a variable. We should be able to extract the parts of an expression.For a sum, for example we want to be able to extract the addend (first term) and the augend (second term). We should also be able to construct expressions from parts. Let us assume that we already have procedures to implement the following $\text{selectors}$, $\text{constructors}$, and $\text{predicates}$:

\begin{align}
\textrm{(variable? e)}&\qquad\text{Is e a variable?}\\
\textrm{(same-variable? v1 v2)}&\qquad\text{Are v1 and v2 the same variable?}\\
\textrm{(sum? e)}&\qquad\text{Is e a sum?}\\
\textrm{(addend e)}&\qquad\text{Addend of the sum e.}\\
\textrm{(augend e)}&\qquad\text{Augend of the sum e.}\\
\textrm{(make-sum a1 a2)}&\qquad\text{Construct the sum of a1 and a2.}\\
\textrm{(product? e)}&\qquad\text{Is e a product?}\\
\textrm{(multiplier e)}&\qquad\text{Multiplier of the product e.}\\
\textrm{(multiplicand e)}&\qquad\text{Multiplicand of the product e.}\\
\textrm{(make-product m1 m2)}&\qquad\text{Construct the product of m1 and m2.}
\end{align}

Using these, and the primitive predicate $\texttt{number?}$, which identifies numbers,we can express the differentiation rules as the following procedure:

In [24]:
cat 2.3/Example_15/deriv-rules.scm

(load "2.3/Example_15/deriv-repr.scm")

(define (deriv expression var)
  (cond ((number? expression) 0)
        ((variable? expression) (if (same-variable? expression var) 1 0))
        ((sum? expression) (make-sum (deriv (addend expression) var)
                                     (deriv (augend expression) var)))
        ((product? expression)
         (make-sum
           (make-product (multiplier expression)
                         (deriv (multiplicand expression) var))
           (make-product (deriv (multiplier expression) var)
                         (multiplicand expression))))
        (else
          (error "Unknown expression type: DERIV" expression))))


This $\texttt{deriv}$ procedure incorporates the complete differentiation algorithm.Since it is expressed in terms of abstract data, it will work no matter how we choose to represent algebraic expressions, as long as we
design a proper set of selectors and constructors. This is the issue we must address next.

### $\spadesuit\quad$ Representing algebraic expressions

We can imagine many ways to use list structure to represent algebraic expressions. For example, we could use lists of symbols that mirror the usual algebraic notation, representing $\textrm{ax + b}$ as the list $\textrm{(a $\ast$ x + b)}$. However, one especially straightforward choice is to use the same parenthesized prefix notation that Lisp uses for combinations; that is,to represent $\textrm{ax + b}$ as $\textrm{(+ ($\ast$ a x) b)}$. Then our data representation for the differentiation problem is as follows:

$\bullet\quad$The variables are symbols. They are identified by the primitive predicate $\texttt{symbol?}$:

In [16]:
cat 2.3/Example_15/is-variable.scm

(define (variable? x) (symbol? x))


$\bullet\quad$Two variables are the same if the symbols representing them are $\texttt{eq?}$:

In [23]:
cat 2.3/Example_15/is-same-variable.scm

(define (same-variable? v1 v2)
  (and (variable? v1) (variable? v2) (eq? v1 v2)))


$\bullet\quad$Sums are constructed as lists:

In [6]:
cat 2.3/Example_15/make-sum.scm

(define (make-sum a1 a2) (list '+ a1 a2))


$\bullet\quad$Products are constructed as lists:

In [7]:
cat 2.3/Example_15/make-product.scm

(define (make-product m1 m2) (list '* m1 m2))


$\bullet\quad$A sum is a list whose first element is the symbol +:

In [14]:
cat 2.3/Example_15/is-sum.scm

(define (sum? x) (and (pair? x) (eq? (car x) '+)))


$\bullet\quad$The addend is the second item of the sum list:

In [9]:
cat 2.3/Example_15/addend.scm

(define (addend s) (cadr s))


$\bullet\quad$The augend is the third item of the sum list:

In [10]:
cat 2.3/Example_15/augend.scm

(define (augend s) (caddr s))


$\bullet\quad$A product is a list whose first element is the symbol $\ast$:

In [15]:
cat 2.3/Example_15/is-product.scm

(define (product? x) (and (pair? x) (eq? (car x) '*)))


$\bullet\quad$The multiplier is the second item of the product list:

In [12]:
cat 2.3/Example_15/multiplier.scm

(define (multiplier p) (cadr p))


$\bullet\quad$The multiplicand is the third item of the product list:

In [13]:
cat 2.3/Example_15/multiplicand.scm

(define (multiplicand p) (caddr p))


Thus, we need only combine these with the algorithm as embodied by $\texttt{deriv}$ in order to have a working symbolic-differentiation program.

Let us look at some examples of its behavior:

The program produces answers that are correct; however, they are unsimplified.It is true that

$$\frac{d(xy)}{dx}=x \cdot 0 + 1 \cdot y,$$

but we would like the program to know that $x \cdot 0 = 0$, $1 \cdot y = y$, and $0 \cdot y = y$. The answer for the second example should have been simply $y$. As the third example shows, this becomes a serious issue when the
expressions are complex.

Our difficulty is much like the one we encountered with the rational-number implementation: we haven’t reduced answers to simplest form.To accomplish the rational-number reduction, we needed to change only the $\text{constructors}$ and the $\text{selectors}$ of the implementation. We can adopt a similar strategy here. We won’t change $\texttt{deriv}$ at all. Instead, we will change $\texttt{make-sum}$ so that if both summands are numbers, $\texttt{make-sum}$ will add them and return their sum. Also, if one of the summands is 0, then $\texttt{make-sum}$ will return the other summand.

In [18]:
cat 2.3/Example_15/make-sum-reduce.scm

(define (make-sum a1 a2)
  (cond ((is-number? a1 0) a2)
        ((is-number? a2 0) a1)
        ((and (number? a1) (number? a2)) (+ a1 a2))
        (else (list '+ a1 a2))))


This uses the procedure $\texttt{is-number?}$,which checks whether an expression is equal to a given number:

In [19]:
cat 2.3/Example_15/is-number.scm

(define (is-number? expression num)
  (and (number? expression) (= expression num)))


Similarly, we will change $\texttt{make-product}$ to build in the rules that 0 times anything is 0 and 1 times anything is the thing itself:

In [20]:
cat 2.3/Example_15/make-product-reduce.scm

(define (make-product m1 m2)
  (cond ((is-number? m1 1) m2)
        ((is-number? m2 1) m1)
        ((or (is-number? m1 0) (is-number? m2 0)) 0)
        ((and (number? m1) (number? m2)) (* m1 m2))
        (else (list '* m1 m2))))


For simplicity, we will put all the $\text{constructors}$, $\text{selections}$ and $\text{predicates}$ above given together:

In [25]:
cat 2.3/Example_15/deriv-repr.scm

;;
;; Constructors
;;

;; Construct the sum of a1 and a2
(define (make-sum a1 a2)
  (cond ((is-number? a1 0) a2)
        ((is-number? a2 0) a1)
        ((and (number? a1) (number? a2)) (+ a1 a2))
        (else (list '+ a1 a2))))

;; Construct the product of m1 and m2
(define (make-product m1 m2)
  (cond ((is-number? m1 1) m2)
        ((is-number? m2 1) m1)
        ((or (is-number? m1 0) (is-number? m2 0)) 0)
        ((and (number? m1) (number? m2)) (* m1 m2))
        (else (list '* m1 m2))))


;;
;; Selectors
;;

;; Addend of the sum s
(define (addend s) (cadr s))

;; Augend of the sum s
(define (augend s) (caddr s))

;; Multiplier of the product p
(define (multiplier p) (cadr p))

;; Multiplicand of the product p
(define (multiplicand p) (caddr p))

;;
;; Predicates
;;

;; Is num number?
(define (is-number? expression num)
  (and (number? expression) (= expression num)))

;; Is x variable?
(define (variable? x) (symbol? x))

;; Are v1 and

Here is how this version works on our three examples:

Although this is quite an improvement, the third example shows that there is still a long way to go before we get a program that puts expressions into a form that we might agree is “simplest.” The problem of algebraic simplification is complex because, among other reasons, a form that may be simplest for one purpose may not be for another.

## Exercise 2.56: 
Show how to extend the basic differentiator to handle more kinds of expressions. For instance, implement the differentiation rule

$$\frac{d(u^n)}{dx}=nu^{n-1}\frac{du}{dx}$$

by adding a new clause to the $\texttt{deriv}$ program and defining appropriate procedures $\texttt{xponentiation?}$, $\texttt{base}$, $\texttt{exponent}$,and $\texttt{make-exponentiation}$. (You may use the symbol ${**}$ to denote exponentiation.) Build in the rules that anything raised to the power 0 is 1 and anything raised to the power 1 is the thing itself.

## 解答：

$\spadesuit\quad$对Example_15中的$\text{deriv-rules.scm}$新增：

In [26]:
cat 2.3/Exercise_2.56/deriv-rules.scm

(load "2.3/Exercise_2.56/deriv-repr.scm")

(define (deriv expression var)
  (cond ((number? expression) 0)
        ((variable? expression) (if (same-variable? expression var) 1 0))
        ((sum? expression) (make-sum (deriv (addend expression) var)
                                     (deriv (augend expression) var)))
        ((product? expression)
         (make-sum
           (make-product (multiplier expression)
                         (deriv (multiplicand expression) var))
           (make-product (deriv (multiplier expression) var)
                         (multiplicand expression))))
        ;; START NEWLINES
        ((is-exponentiation? expression)
         (let ((n (exponent expression))
               (u (base expression)))
           (make-product n (make-product (make-exponentiation u (- n 1))
                                         (deriv u var)))))
        ;; END NEWLINES
        (else
          (error "Unknown expression type: DERIV" expression))))

$\spadesuit\quad$对Example_15中的$\text{deriv-repr.scm}$新增：

In [27]:
cat 2.3/Exercise_2.56/deriv-repr.scm

;;
;; Constructors
;;

;; Construct the sum of a1 and a2
(define (make-sum a1 a2)
  (cond ((is-number? a1 0) a2)
        ((is-number? a2 0) a1)
        ((and (number? a1) (number? a2)) (+ a1 a2))
        (else (list '+ a1 a2))))

;; Construct the product of m1 and m2
(define (make-product m1 m2)
  (cond ((is-number? m1 1) m2)
        ((is-number? m2 1) m1)
        ((or (is-number? m1 0) (is-number? m2 0)) 0)
        ((and (number? m1) (number? m2)) (* m1 m2))
        (else (list '* m1 m2))))


;;
;; Selectors
;;

;; Addend of the sum s
(define (addend s) (cadr s))

;; Augend of the sum s
(define (augend s) (caddr s))

;; Multiplier of the product p
(define (multiplier p) (cadr p))

;; Multiplicand of the product p
(define (multiplicand p) (caddr p))

;;
;; Predicates
;;

;; Is num number?
(define (is-number? expression num)
  (and (number? expression) (= expression num)))

;; Is x variable?
(define (variable? x) (symbol? x))

;; Are v1 and

### Running Instance:

## Exercise 2.57: 
Extend the differentiation program to handle sums and products of arbitrary numbers of (two or more) terms. Then the last example above could be expressed as

Try to do this by changing only the representation for $\text{sums}$ and $\text{products}$, without changing the $\texttt{deriv}$ procedure at all.For example, the $\texttt{addend}$ of a sum would be the first term,and the $\texttt{augend}$ would be the sum of the rest of the terms.

$\spadesuit\quad$重新定义$\text{sum}$的生成器和选择器过程：

In [28]:
cat 2.3/Exercise_2.57/sum-in-deriv.scm

(define (make-sum a1 . a2)
  (if (single-operand? a2)
    (let ((a2 (car a2)))
      (cond ((is-number? a1 0) a2)
            ((is-number? a2 0) a1)
            ((and (number? a1) (number? a2)) (+ a1 a2))
            (else (list '+ a1 a2))))
    (cons '+ (cons a1 a2))))

(define (addend s) (cadr s))

(define (augend s)
  (let ((tail-operand (cddr s)))
    (if (single-operand? tail-operand)
        (car tail-operand)
        (apply make-sum tail-operand))))

(define (sum? x)
  (and (pair? x) (eq? (car x) '+)))


$\spadesuit\quad$重新定义$\text{product}$的生成器和选择器过程：

In [29]:
cat 2.3/Exercise_2.57/product-in-deriv.scm

(define (make-product m1 . m2)
  (if (single-operand? m2)
    (let ((m2 (car m2)))
      (cond ((or (is-number? m1 0) (is-number? m2 0)) 0)
            ((is-number? m1 1) m2)
            ((is-number? m2 1) m1)
            ((and (number? m1) (number? m2)) (* m1 m2))
            (else (list '* m1 m2))))
    (cons '* (cons m1 m2))))

(define (multiplier p) (cadr p))

(define (multiplicand p)
  (let ((tail-operand (cddr p)))
    (if (single-operand? tail-operand)
        (car tail-operand)
        (apply make-product tail-operand))))

(define (product? x)
  (and (pair? x) (eq? (car x) '*)))



其中的$\texttt{single-operand?}$过程将检查第二个参数是否只有单个操作符：

In [30]:
cat 2.3/Exercise_2.57/is-single-operand.scm

(define (single-operand? x)
  (not (pair? (cdr x))))


### Running Instance:

$\bullet\quad$首先导入Exercise_2.56中的$\texttt{deriv-rules}$过程：

$\bullet\quad$然后导入本题中的重新定义的$\texttt{sum}$与$\texttt{product}$构造器与选择器过程覆盖Exercise_2.56中导入的对应过程：

$\bullet\quad$运行表达式$\textrm{(deriv '(* x y (+ x 3)) 'x)}$进行验证：

一个较复杂的表达式的求值：

## Exercise 2.58: 
Suppose we want to modify the differentiation program so that it works with ordinary mathematical notation, in which $\text{+}$ and $\ast$ are infix rather than prefix operators.Since the differentiation program is defined in terms of abstract data, we can modify it to work with different representations of expressions solely by changing the $\text{predicates}$,$\text{selectors}$, and $\text{constructors}$ that define the representation of the algebraic expressions on which the differentiator is to operate.

$\quad a.\;$Show how to do this in order to differentiate algebraic expressions presented in infix form, such as $\textrm{(x + (3 * (x + (y + 2))))}$. To simplify the task, assume that $\text{+}$ and $\ast$ always take two arguments and that expressions are fully parenthesized.

$\quad b.\;$The problem becomes substantially harder if we allow standard algebraic notation, such as $\textrm{(x + 3 * (x + y + 2))}$, which drops unnecessary parentheses and assumes that multiplication is done before addition. Can you design appropriate predicates, selectors, and constructors for this notation such that our derivative program still works?

## 2.3.3 Example: Representing Sets

In the previous examples we built representations for two kinds of compound data objects: $\text{rational numbers}$ and $\text{algebraic expressions}$. In one of these examples we had the choice of simplifying (reducing) the expressions at either construction time or selection time, but other than that the choice of a representation for these structures in terms of lists was straightforward. When we turn to the representation of sets, the choice of a representation is not so obvious. Indeed, there are a number of possible representations, and they differ significantly from one another in several ways.

Informally, a set is simply a collection of distinct objects. To give a more precise definition we can employ the method of data abstraction.That is, we define “set” by specifying the operations that are to be used on sets. These are $\texttt{union-set}$, $\texttt{intersection-set}$, $\texttt{element-ofset?}$,and $\texttt{adjoin-set}$. $\texttt{element-of-set?}$ is a predicate that determines whether a given element is a member of a set. $\texttt{adjoin-set}$ takes an object and a set as arguments and returns a set that contains the elements
of the original set and also the adjoined element. $\texttt{union-set}$ computes the union of two sets, which is the set containing each element that appears in either argument. $\texttt{intersection-set}$ computes the intersection of two sets, which is the set containing only elements that appear in both arguments. From the viewpoint of data abstraction, we are free to design any representation that implements these operations in a way
consistent with the interpretations given above.

## Example 16: Sets as unordered lists

One way to represent a set is as a list of its elements in which no element appears more than once. The empty set is represented by the empty list. In this representation, $\texttt{element-of-set?}$ is similar to the procedure $\texttt{memq}$ of Section 2.3.1. It uses $\texttt{equal?}$ instead of $\texttt{eq?}$ so that the set elements need not be symbols:

In [1]:
cat 2.3/Example_16/element-of-set.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((equal? x (car set)) #t)
        (else (element-of-set? x (cdr set)))))


Using this, we can write $\texttt{adjoin-set}$. If the object to be adjoined is already in the set, we just return the set. Otherwise, we use $\texttt{cons}$ to add the object to the list that represents the set:

In [2]:
cat 2.3/Example_16/adjoin-set.scm

(define (adjoin-set x set)
  (if (element-of-set? x set) set (cons x set)))


For $\texttt{intersection-set}$ we can use a recursive strategy. If we know how to form the intersection of $\text{set2}$ and the $\text{cdr}$ of $\text{set1}$, we only need to decide whether to include the $\text{car}$ of $\text{set1}$ in this. But this depends on whether $\textrm{(car set1)}$ is also in $\text{set2}$. Here is the resulting procedure:

In [3]:
cat 2.3/Example_16/intersection-set.scm

(define (intersection-set set1 set2)
  (cond ((or (null? set1) (null? set2)) '())
        ((element-of-set? (car set1) set2)
         (cons (car set1) (intersection-set (cdr set1) set2)))
        (else (intersection-set (cdr set1) set2))))


In designing a representation, one of the issues we should be concerned with is $\text{efficiency}$.

Consider the number of steps required by our set operations. Since they all use $\texttt{element-of-set?}$, the speed of this operation has a major impact on the efficiency of the set implementation as a whole. Now, in order to check whether an object is a member of a set,$\texttt{element-of-set?}$ may have to scan the entire set. (In the worst case, the object turns out not to be in the set.) Hence, if the set has $n$ elements, $\texttt{element-of-set?}$ might take up to $n$ steps. Thus, the number of steps required grows as $\Theta(n)$. The number of steps required by $\texttt{adjoin-set}$, which uses this operation, also grows as $\Theta(n)$. For $\texttt{intersection-set}$, which does an $\texttt{element-of-set?}$ check for each element of $\text{set1}$, the number of steps required grows as the product of the sizes of the sets involved, or $\Theta(n^2)$ for two sets of size $n$. The same will be true of $\texttt{union-set}$.

## Exercise 2.59: 
Implement the $\texttt{union-set}$ operation for the $\text{unordered-list}$ representation of sets.

## 解答：

$\texttt{union-set}$过程将给定的两个表用$\texttt{append}$过程组合成一个新的输入表，然后调用内部过程$\texttt{iter}$，$\texttt{iter}$维持输入表(input)和一个结果表(result)，它每次从输入表中取出一个元素，用 $\texttt{element-of-set?}$检查这个元素是否已经存在于结果表，如果不存在，就用 $\texttt{cons}$把这个元素加入结果表，否则就检查输入表的下一个元素，一直到输入表成为空表为止。

在返回结果表之前对它执行了一次$\texttt{reverse}$过程调用，因为$\texttt{iter}$产生的结果表是逆序的

In [4]:
cat 2.3/Exercise_2.59/union-set.scm

(load "2.3/Example_16/element-of-set.scm")
(define (iter input result)
  (if (null? input) (reverse result)
      (let ((current-element (car input))
            (remain-element (cdr input)))
        (if (element-of-set? current-element result)
            (iter remian-element result)
            (iter remain-element (cons current-element result))))))

(define (union-set set1 set2)
  (iter (append set1 set2) '()))


为了方便起见，我们将集合所拥有的四个基本操作过程定义在同一个文件中：

In [5]:
cat 2.3/Exercise_2.59/set-rules.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((equal? x (car set)) #t)
        (else (element-of-set? x (cdr set)))))

(define (adjoin-set x set)
  (if (element-of-set? x set) set (cons x set)))

(define (intersection-set set1 set2)
  (cond ((or (null? set1) (null? set2)) '())
        ((element-of-set? (car set1) set2)
         (cons (car set1) (intersection-set (cdr set1) set2)))
        (else (intersection-set (cdr set1) set2))))


(define (union-set set1 set2)
  (define (iter input result)
    (if (null? input) (reverse result)
        (let ((current-element (car input))
              (remain-element (cdr input)))
          (if (element-of-set? current-element result)
              (iter remian-element result)
              (iter remain-element (cons current-element result))))))
  (iter (append set1 set2) '()))


### Running Instance:

## Exercise 2.60: 
We specified that a set would be represented as a list with no duplicates. Now suppose we allow duplicates.For instance, the set $\textrm{{1, 2, 3}}$ could be represented as the list $\textrm{(2 3 2 1 3 2 2)}$. Design procedures $\texttt{element-of-set?}$, $\texttt{adjoin-set}$, $\texttt{union-set}$, and $\texttt{intersection-set}$ that operate on this representation. How does the efficiency of each compare with the corresponding procedure for the non-duplicate representation? Are there applications for which you would use this representation in preference to the non-duplicate one?

## 解答：

$\bullet\quad$过程$\texttt{element-of-set?}$可同时用于重复元素与无重复元素的集合版本，复杂度为$\Theta(n)$

In [6]:
cat 2.3/Exercise_2.60/element-of-set.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((equal? x (car set)) #t)
        (else (element-of-set? x (cdr set)))))


$\bullet\quad$过程$\texttt{adjoin-set}$的重复元素版本无需调用$\texttt{element-of-set?}$过程来检查要加入的元素是否已经存在于集合中，只需简单地使用$\texttt{cons}$组合起来即可，因此复杂度为$\Theta(1)$

In [7]:
cat 2.3/Exercise_2.60/adjoin-set.scm

(define (adjoin-set x set) (cons x set))


$\bullet\quad$有重复元素集合的$\texttt{intersection-set}$过程和无重复元素集合的$\texttt{intersection-set}$过程很相似，但是重复元素集合的$\texttt{intersection-set}$还需要增加一个条件：如果某个元素在两个输入列表中都存在的话，那么还要检查它在已有的结果表中是否存在，如果在结果表里已经有了这个元素，那么就忽略它，否则，就将它加入结果表。复杂度为$\Theta(n^2)$

In [8]:
cat 2.3/Exercise_2.60/intersection-set.scm

(define (intersection-set set another)
  (define (iter set result)
    (if (or (null? set) (null? another)) (reverse result)
        (let ((current-element (car set))
              (remain-element (cdr set)))
          (if (and (element-of-set? current-element another)
                   (not (element-of-set? current-element result)))
              (iter remain-element (cons current-element result))
              (iter remain-element result)))))
  (iter set '()))


$\bullet\quad$过程$\texttt{union-set}$可同时用于重复元素与无重复元素的集合版本，复杂度为$\Theta(n^2)$

In [10]:
cat 2.3/Exercise_2.60/union-set.scm

(define (union-set set1 set2)
  (define (iter input result)
    (if (null? input) (reverse result)
        (let ((current-element (car input))
              (remain-element (cdr input)))
          (if (element-of-set? current-element result)
              (iter remian-element result)
              (iter remain-element (cons current-element result))))))
  (iter (append set1 set2) '()))


可以看出，在复杂度方面，有重复元素集合的$\texttt{adjoin-set}$比无重复元素集合的$\texttt{adjoin-set}$要低一个量级，除此之外，两个版本的其他操作的复杂度都是一样的。

不过尽管如此，在有重复元素的集合进行$\texttt{element-of-set?}$ 、$\texttt{union-set}$和$\texttt{intersection-set}$ ，算法的系数会比无重复元素的集合要高，随着重复元素的不断增多，带重复元素的集合进行以上三个操作将会越来越慢

因此，对于插入操作频繁的应用来说，可以使用有重复元素的集合；  
而对于频繁进行查找、交集、并集这三个操作的应用来说，使用无重复元素的集合比较好。

为了方便起见，我们将集合所拥有的四个基本操作过程定义在同一个文件中：

In [16]:
cat 2.3/Exercise_2.60/set-rules.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((equal? x (car set)) #t)
        (else (element-of-set? x (cdr set)))))

(define (adjoin-set x set) (cons x set))

(define (intersection-set set another)
  (define (iter set result)
    (if (or (null? set) (null? another)) (reverse result)
        (let ((current-element (car set))
              (remain-element (cdr set)))
          (if (and (element-of-set? current-element another)
                   (not (element-of-set? current-element result)))
              (iter remain-element (cons current-element result))
              (iter remain-element result)))))
  (iter set '()))

(define (union-set set1 set2)
  (define (iter input result)
    (if (null? input) (reverse result)
        (let ((current-element (car input))
              (remain-element (cdr input)))
          (if (element-of-set? current-element result)
              (iter remian-element result)
              (iter remain-element (cons

### Running Instance:

## Example 17: Sets as ordered lists

One way to speed up our set operations is to change the representation so that the set elements are listed in increasing order. To do this, we need some way to compare two objects so that we can say which is bigger. For example, we could compare symbols lexicographically, or we could agree on some method for assigning a unique number to an object and then compare the elements by comparing the corresponding numbers. To keep our discussion simple, we will consider only the case where the set elements are numbers, so that we can compare elements using $\text{>}$ and $\text{<}$. We will represent a set of numbers by listing its elements in increasing order. Whereas our first representation above allowed us to represent the set $\textrm{{1, 3, 6, 10}}$ by listing the elements in any order, our new representation allows only the list $\textrm{(1 3 6 10)}$.

One advantage of ordering shows up in $\texttt{element-of-set?}$: In checking for the presence of an item, we no longer have to scan the entire set. If we reach a set element that is larger than the item we are looking for, then we know that the item is not in the set:

In [11]:
cat 2.3/Example_17/element-of-set.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((= x (car set)) #t)
        ((< x (car set)) #f)
        (else (element-of-set? x (cdr set)))))


How many steps does this save? In the worst case, the item we are looking for may be the largest one in the set, so the number of steps is the same as for the unordered representation. On the other hand, if we search for items of many different sizes we can expect that sometimes we will be able to stop searching at a point near the beginning of the list and that other times we will still need to examine most of the list. On the average we should expect to have to examine about half of the items in the set. Thus, the average number of steps required will be about $n/2$. This is still $\Theta(n)$ growth, but it does save us, on the average, a factor of 2 in number of steps over the previous implementation.

We obtain a more impressive speedup with $\texttt{intersection-set}$. In the unordered representation this operation required $\Theta(n^2)$ steps, because we performed a complete scan of $\text{set2}$ for each element of $\text{set1}$. But with the ordered representation, we can use a more clever method. Begin by comparing the initial elements, $x1$ and $x2$, of the two sets. If $x1$ equals $x2$, then that gives an element of the intersection, and the rest of the intersection is the intersection of the $\text{cdr}$-s of the two sets. Suppose,however, that $x1$ is less than $x2$. Since $x2$ is the smallest element in $\text{set2}$, we can immediately conclude that $x1$ cannot appear anywhere in $\text{set2}$ and hence is not in the intersection. Hence, the intersection is equal to the intersection of $\text{set2}$ with the $\text{cdr}$ of $\text{set1}$. Similarly, if $x2$ is less than $x1$, then the intersection is given by the intersection of $\text{set1}$ with the $\text{cdr}$ of $\text{set2}$. Here is the procedure:

In [12]:
cat 2.3/Example_17/intersection-set.scm

(define (intersection-set set1 set2)
  (if (or (null? set1) (null? set2)) '()
      (let ((x1 (car set1)) (x2 (car set2)))
        (cond ((= x1 x2)
               (cons x1 (intersection-set (cdr set1) (cdr set2))))
              ((< x1 x2)
               (intersection-set (cdr set1) set2))
              ((< x2 x1)
               (intersection-set set1 (cdr set2)))))))


To estimate the number of steps required by this process, observe that at each step we reduce the intersection problem to computing intersections of smaller sets—removing the first element from $\text{set1}$ or $\text{set2}$ or both. Thus, the number of steps required is at most the sum of the sizes of $\text{set1}$ and $\text{set2}$, rather than the product of the sizes as with the unordered representation. This is $\Theta(n)$ growth rather than $\Theta(n^2)$—a considerable speedup, even for sets of moderate size.

## Exercise 2.61: 
Give an implementation of $\texttt{adjoin-set}$ using the ordered representation. By analogy with $\texttt{element-of-set?}$ show how to take advantage of the ordering to produce a procedure that requires on the average about half as many steps as with the unordered representation.

## 解答：

$\texttt{adjoin-set}$过程遍历列表并使用$\texttt{cons}$重新组合整个表，并在重新组合的过程中将$x$加入到集合中去，复杂度为$\Theta(n)$

In [14]:
cat 2.3/Exercise_2.61/adjoin-set.scm

(define (adjoin-set x set)
  (if (null? set) (list x)
      (let ((current-element (car set))
            (remain-element (cdr set)))
        (cond ((= x current-element) set)
              ((> x current-element)
               (cons current-element (adjoin-set x remain-element)))
              ((< x current-element) (cons x set))))))


## Exercise 2.62: 
Give a $\Theta(n)$ implementation of $\texttt{union-set}$ for sets represented as ordered lists.

## Answer:

In [15]:
cat 2.3/Exercise_2.62/union-set.scm

(define (union-set set1 set2)
  (cond ((and (null? set1) (null? set2)) '())
        ((null? set1) set2)
        ((null? set2) set1)
        (else
          (let ((x1 (car set1)) (x2 (car set2)))
            (cond ((= x1 x2) (cons x1 (union-set (cdr set1) (cdr set2))))
                  ((< x1 x2) (cons x1 (union-set (cdr set1) set2)))
                  ((> x1 x2) (cons x2 (union-set set1 (cdr set2)))))))))


为了方便测试，我们Example_17，Exercise_2.61与Exercise_2.62中集合的四个基本操作过程定义在同一个文件中：

In [17]:
cat 2.3/Exercise_2.62/set-rules.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((= x (car set)) #t)
        ((< x (car set)) #f)
        (else (element-of-set? x (cdr set)))))

(define (adjoin-set x set)
  (if (null? set) (list x)
      (let ((current-element (car set))
            (remain-element (cdr set)))
        (cond ((= x current-element) set)
              ((> x current-element)
               (cons current-element (adjoin-set x remain-element)))
              ((< x current-element) (cons x set))))))

(define (intersection-set set1 set2)
  (if (or (null? set1) (null? set2)) '()
      (let ((x1 (car set1)) (x2 (car set2)))
        (cond ((= x1 x2)
               (cons x1 (intersection-set (cdr set1) (cdr set2))))
              ((< x1 x2)
               (intersection-set (cdr set1) set2))
              ((< x2 x1)
               (intersection-set set1 (cdr set2)))))))

(define (union-set set1 set2)
  (cond ((and (null? set1) (null? set2)) '())
        ((null? set1) se

### Running Instance:

## Example 18: Sets as binary trees

We can do better than the ordered-list representation by arranging the set elements in the form of a tree. Each node of the tree holds one element of the set, called the “entry” at that node, and a link to each of two other (possibly empty) nodes. The “left” link points to elements smaller than the one at the node, and the “right” link to elements greater than the one at the node.

Figure above shows some trees that represent the set $\textrm{{1, 3, 5, 7, 9, 11}}$.The same set may be represented by a tree in a number of different ways. The only thing we require for a valid representation is that all elements in the left subtree be smaller than the node entry and that all elements in the right subtree be larger.

The advantage of the tree representation is this: Suppose we want to check whether a number $x$ is contained in a $\text{set}$. We begin by comparing $x$ with the entry in the top node. If $x$ is less than this, we know that we need only search the left subtree; if $x$ is greater, we need only search the right subtree. Now, if the tree is “balanced,” each of these subtrees will be about half the size of the original.Thus,in one step we have reduced the problem of searching a tree of size $n$ to searching a tree of size $n/2$. Since the size of the tree is halved at each step, we should expect that the number of steps needed to search a tree of size $n$ grows as $\Theta(logn)$.For large sets, this will be a significant speedup over the previous representations.

We can represent trees by using lists. Each node will be a list of three items: the entry at the node, the left subtree, and the right subtree. A left or a right subtree of the empty list will indicate that there is no subtree connected there. We can describe this representation by the following procedures:

In [18]:
cat 2.3/Example_18/set-repr.scm

(define (make-tree entry left right)
  (list entry left right))

(define (entry tree) (car tree))
(define (left-branch tree) (cadr tree))
(define (right-branch tree) (caddr tree))


Now we can write the $\texttt{element-of-set?}$ procedure using the strategy described above:

In [20]:
cat 2.3/Example_18/element-of-set.scm

(define (element-of-set? x set)
  (cond ((null? set) #f)
        ((= x (entry set)) #t)
        ((< x (entry-set)) (element-of-set? x (left-branch set)))
        ((> x (entry-set)) (element-of-set? x (right-branch set)))))


Adjoining an item to a set is implemented similarly and also requires $\Theta(logn)$ steps. To adjoin an item $x$, we compare $x$ with the node entry to determine whether $x$ should be added to the right or to the left branch, and having adjoined $x$ to the appropriate branch we piece this newly constructed branch together with the original entry and the other branch. If $x$ is equal to the entry, we just return the node. If we are asked
to adjoin $x$ to an empty tree, we generate a tree that has $x$ as the entry and empty right and left branches. Here is the procedure:

In [21]:
cat 2.3/Example_18/adjoin-set.scm

(define (adjoin-set x set)
  (cond ((null? set) (make-tree x '() '()))
        ((= x (entry set)) set)
        ((< x (entry set))
         (make-tree (entry set)
                    (adjoin-set x (left-branch set))
                    (right-branch set)))
        ((> x (entry set))
         (make-tree (entry set)
                    (left-branch set)
                    (adjoin-set x (right-branch set))))))


The above claim that searching the tree can be performed in a logarithmic number of steps rests on the assumption that the tree is “balanced,” i.e., that the left and the right subtree of every tree have approximately the same number of elements, so that each subtree contains about half the elements of its parent. But how can we be certain that the trees we construct will be balanced? Even if we start with a balanced tree, adding elements with $\texttt{adjoin-set}$ may produce an unbalanced result. Since the position of a newly adjoined element depends on how the element compares with the items already in the set, we can expect that if we add elements “randomly” the tree will tend to be balanced on the average. But this is not a guarantee. For example, if we start with an empty set and adjoin the numbers 1 through 7 in sequence we end up with the highly unbalanced tree shown in Figure below:

In this tree all the left subtrees are empty, so it has no advantage over a simple ordered list. One way to solve this problem is to define an operation that transforms an arbitrary tree into a balanced tree with the same elements. Then we can perform this transformation after every few $\texttt{adjoin-set}$ operations to keep our
set in balance. There are also other ways to solve this problem, most of which involve designing new data structures for which searching and insertion both can be done in $\Theta(logn)$ steps.

## Exercise 2.63: 
Each of the following two procedures converts a binary tree to a list.

In [1]:
cat 2.3/Exercise_2.63/tree-to-list-1.scm

(define (tree->list-1 tree)
  (if (null? tree) '()
      (append (tree->list-1 (left-branch tree))
              (cons (entry tree)
                    (tree->list-1 (right-branch tree))))))


In [2]:
cat 2.3/Exercise_2.63/tree-to-list-2.scm

(define (tree->list-2 tree)
  (define (copy-to-list tree result-list)
    (if (null? tree) result-list
        (copy-to-list (left-branch tree)
                      (cons (entry tree)
                            (copy-to-list (right-branch tree) result-list)))))
  (copy-to-list tree '()))


a. Do the two procedures produce the same result for every tree? If not, how do the results differ? What lists do the two procedures produce for the trees as follows?

b. Do the two procedures have the same order of growth in the number of steps required to convert a balanced tree with n elements to a list? If not, which one grows more slowly?

$\bullet\quad$直接使用解释器进行测试：

## 解答：

$\spadesuit\quad$

1.将上述三棵树表示为（从左到右的顺序）：

In [3]:
cat 2.3/Exercise_2.63/tree-define.scm

(define tree-a 
  (make-tree 7
             (make-tree 3
                        (make-tree 1 '() '())
                        (make-tree 5 '() '()))
             (make-tree 9
                        '()
                        (make-tree 11 '() '()))))

(define tree-b
  (make-tree 3
             (make-tree 1 '() '())
             (make-tree 7
                        (make-tree 5 '() '())
                        (make-tree 9
                                   '()
                                   (make-tree 11 '() '())))))

(define tree-c
  (make-tree 5
             (make-tree 3
                        (make-tree 1 '() '())
                    	'())
             (make-tree 9
                        (make-tree 7 '() '())
                        (make-tree 11 '() '()))))


2.导入过程：

3.对上述定义的三棵树应用$\texttt{tree->list-1}$和$\texttt{tree->list-2}$过程：

因此从上述返回的结果可知，对于同一颗树，$\texttt{tree->list-1}$和$\texttt{tree->list-2}$过程都生成同一个列表

并且对于不同形状并包含相同元素的不同树，$\texttt{tree->list-1}$和$\texttt{tree->list-2}$过程也生成同一个列表

为了测试$\texttt{tree->list-1}$和$\texttt{tree->list-2}$过程的执行效率，最简单也是最好的办法是展开表达式的执行过程：

$\bullet\quad$展开表达式$\textrm{(tree->list-1 tree-a)}$

从上述转换包含6个元素的树为列表的计算过程中，过程$\texttt{tree->list-1}$一共调用6次，并且每一次调用都会生成1次$\texttt{append}$与$\texttt{cons}$原始过程，即6次$\texttt{append}$过程与6次$\texttt{cons}$过程。

因此对于包含$n$个元素的树，$\texttt{append}$与$\texttt{cons}$的调用次数正比于$n$

由于$\texttt{cons}$的复杂度为$\Theta(1)$，而$\texttt{append}$的复杂度为$\Theta(n)$，  
因此过程$\texttt{tree->list-1}$的复杂度可以近似于$\texttt{append}$被调用的次数来计算：

对于树中每个节点，需要调用一次$\texttt{append}$过程，而$\texttt{append}$的复杂度为$\Theta(n)$，所以对于节点数为$n$的树来说，过程$\texttt{tree->list-1}$的复杂度为$\Theta(n^2)$。

$\bullet\quad$展开表达式$\textrm{(tree->list-2 tree-a)}$

从上述表达式展开过程中，$\texttt{copy-to-list}$过程调用6次，每次调用时会调用1次$\texttt{cons}$过程，  
所以对于节点数为$n$的树，过程$\texttt{tree->list-2}$调用$\texttt{copy-to-list}$和$\texttt{cons}$过程的次数都为$n$

过程$\texttt{tree->list-2}$的复杂度可以通过统计$\texttt{cons}$的调用次数来统计： 

由于每次展开需要调用一次$\texttt{cons}$，而$\texttt{cons}$的复杂度为$\Theta(1)$，所以对于节点数为$n$的树来说，过程$\texttt{tree->list-2}$的复杂度为$\Theta(n)$。

## Exercise 2.64: 
The following procedure $\texttt{list->tree}$ converts an ordered list to a balanced binary tree. The helper procedure $\texttt{partial-tree}$ takes as arguments an integer $n$ and list of at least $n$ elements and constructs a balanced tree containing the first $n$ elements of the list. The result returned by $\texttt{partial-tree}$ is a pair (formed with cons) whose $\texttt{car}$ is the constructed tree and whose $\texttt{cdr}$ is the list of elements not included in the tree.

In [3]:
cat 2.3/Exercise_2.64/list-to-tree.scm

(load "2.3/Example_18/set-repr.scm")

(define (partial-tree elts n)
  (if (= n 0) (cons '() elts)
      (let ((left-size (quotient (- n 1) 2)))
        (let ((left-result (partial-tree elts left-size)))
          (let ((left-tree (car left-result))
                (not-left-elts (cdr left-result))
                (right-size (- n (+ left-size 1))))
            (let ((this-entry (car non-left-elts))
                  (right-result (partial-tree (cdr non-left-elts) right-size)))
              (let ((right-tree (car right-result))
                    (remaining-elts (cdr right-result)))
               (cons (make-tree this-entry left-tree right-tree)
                      remaining-elts))))))))

(define (list->tree elements)
  (car (partial-tree elements (length elements))))


a. Write a short paragraph explaining as clearly as you can how $\texttt{partial-tree}$ works. Draw the tree produced by $\texttt{list->tree}$ for the list $\textrm{(1 3 5 7 9 11)}$.

b. What is the order of growth in the number of steps required by $\texttt{list->tree}$ to convert a list of $n$ elements?

## 解答：

$\texttt{list->tree}$过程对于列表$\textrm{(list 1 3 5 7 9 11)}$的求值结果如下：

$\texttt{list->tree}$过程将列表$\textrm{(list 1 3 5 7 9 11)}$转换为平衡树的过程示意图：

列表中的每个节点，$\texttt{list->tree}$过程都要执行一次$\texttt{make-tree}$（复杂度为$\Theta(1)$），将这个节点和它的左右子树组合起来，因此对于长度为$n$的列表来说， $\texttt{list->tree}$的复杂度为$\Theta(n)$。

## Exercise 2.65: 
Use the results of Exercise 2.63 and Exercise 2.64 to give $\Theta(n)$ implementations of $\texttt{union-set}$ and $\texttt{intersection-set}$ for sets implemented as (balanced) binary trees.

## 解答：

使用树结构实现复杂度为$\Theta(n)$的过程$\texttt{intersection-tree}$与过程$\texttt{union-tree}$的步骤如下：

$\bullet\quad$使用Exercise_2.63中的$\texttt{tree->list-2}$过程将输入的两棵树转换为列表，计算的复杂度为$\Theta(n)$

$\bullet\quad$使用Example_17中的$\texttt{intersection-set}$过程计算两个列表的交集，计算复杂度为$\Theta(n)$

$\bullet\quad$使用Exercise_2.62中的$\texttt{union-set}$过程计算两个列表的病机，计算复杂度为$\Theta(n)$

$\bullet\quad$使用Exercise_2.64中的$\texttt{list->tree}$过程将上述生成的交集或者并集列表转换为一颗平衡树，计算复杂度为$\Theta(n)$

过程$\texttt{intersection-tree}$与过程$\texttt{union-tree}$需要使用三个复杂度为$\Theta(n)$的中间过程，所以总的复杂度还是$\Theta(n)$。

In [4]:
cat 2.3/Exercise_2.65/intersection-tree.scm

(load "2.3/Exercise_2.63/tree-to-list-2.scm")
(load "2.3/Exercise_2.64/list-to-tree.scm")
(load "2.3/Example_17/intersection-set.scm")

(define (intersection-tree tree another)
  (list->tree
    (intersection-set (tree->list-2 tree)
                      (tree->list-2 another))))


In [5]:
cat 2.3/Exercise_2.65/union-tree.scm

(load "2.3/Exercise_2.63/tree-to-list-2.scm")
(load "2.3/Exercise_2.64/list-to-tree.scm")
(load "2.3/Exercise_2.62/union-set.scm")

(define (union-tree tree another)
  (list->tree
    (union-set (tree->list-2 tree)
               (tree->list-2 another))))


## Example 19: Sets and information retrieval

We have examined options for using lists to represent sets and have seen how the choice of representation for a data object can have a large impact on the performance of the programs that use the data. Another reason for concentrating on sets is that the techniques discussed here appear again and again in applications involving $\text{information retrieval}$.

Consider a data base containing a large number of individual records,such as the personnel files for a company or the transactions in an accounting system. A typical data-management system spends a large amount of time accessing or modifying the data in the records and therefore requires an efficient method for accessing records. This is done by identifying a part of each record to serve as an identifying $key$. A key can be anything that uniquely identifies the record. For a personnel file, it might be an employee’s $\text{ID}$ number. For an accounting system, it might be a transaction number. Whatever the key is, when we define the record as a data structure we should include a $\text{key}$ selector procedure that retrieves the key associated with a given record.

Now we represent the data base as a set of records. To locate the record with a given key we use a procedure $\texttt{lookup}$, which takes as arguments a key and a data base and which returns the record that has that key, or false if there is no such record. $\texttt{lookup}$ is implemented in almost the same way as $\texttt{element-of-set?}$. For example, if the set of records is implemented as an unordered list, we could use

In [6]:
cat 2.3/Example_19/lookup.scm

(define (lookup given-key set-of-records)
  (cond ((null? set-of-records) #f)
        ((equal? given-key (key (car set-of-records)))
         (car set-of-records))
        (else (lookup given-key (cdr set-of-records)))))


Of course, there are better ways to represent large sets than as unordered lists. Information-retrieval systems in which records have to be “randomly accessed” are typically implemented by a tree-based method, such as the binary-tree representation discussed previously. In designing such a system the methodology of data abstraction can be a great help. The designer can create an initial implementation using a simple, straightforward representation such as unordered lists. This will be unsuitable for the eventual system, but it can be useful in providing a “quick and dirty” data base with which to test the rest of the system. Later on, the data representation can be modified to be more sophisticated. If the data base is accessed in terms of abstract selectors and constructors,this change in representation will not require any changes to the rest of the system.

## Exercise 2.66: 
Implement the $\texttt{lookup}$ procedure for the case where the set of records is structured as a binary tree, ordered by the numerical values of the keys.

## 解答：

根据数据抽象的原则，我们无需知道记录集合所使用的二叉树的实现细节，只需知道对于一棵树，有如下操作过程即可：

\begin{aligned}
\text{entry} &\qquad \textrm{取出当前节点}\\
\text{key} &\qquad \textrm{取出节点中的键}\\
\text{left-branch} &\qquad \textrm{取出树的左分支}\\
\text{right-branch} &\qquad \textrm{取出树的左分支}
\end{aligned}

根据以上这些过程，我们可以给出相应的二叉树实现的数据库$\texttt{lookup}$过程：

In [7]:
cat 2.3/Exercise_2.66/lookup.scm

(define (lookup given-key tree-of-records)
  (if (null? tree-of-records) #f
      (let ((entry-key (key (entry tree-of-records))))
        (cond ((= given-key entry-key)
               (entry tree-of-records))
              ((> given-key entry-key)
               (lookup given-key (right-branch tree-of-records)))
              ((< given-key entry-key)
               (lookup given-key (left-branch tree-of-records)))))))


## 2.3.4 Example: Huffman Encoding Trees

This section provides practice in the use of list structure and data abstraction to manipulate sets and trees. The application is to methods for representing data as sequences of ones and zeros (bits). For example, the $\texttt{ASCII}$ standard code used to represent text in computers encodes each character as a sequence of seven bits. Using seven bits allows us to distinguish $2^7$, or 128, possible different characters. In general, if we
want to distinguish $n$ different symbols, we will need to use $log_2n$ bits per symbol. If all our messages are made up of the eight symbols A, B,C, D, E, F, G, and H, we can choose a code with three bits per character, for example

With this code, the message

is encoded as the string of 54 bits

Codes such as $\texttt{ASCII}$  and the A-through-H code above are known as $fixed$-$length$ codes, because they represent each symbol in the message with the same number of bits. It is sometimes advantageous to use $variable$-$length$ codes, in which different symbols may be represented by different numbers of bits. For example, Morse code does not use the same number of dots and dashes for each leer of the alphabet. In particular, E, the most frequent leer, is represented by a single dot. In general, if our messages are such that some symbols appear very frequently and some very rarely, we can encode data more efficiently (i.e., using fewer bits per message) if we assign shorter codes to the frequent symbols. Consider the following alternative code for the letters A through H:

With this code, the same message as above is encoded as the string

This string contains 42 bits, so it saves more than 20% in space in comparison with the fixed-length code shown above.

One of the difficulties of using a variable-length code is knowing when you have reached the end of a symbol in reading a sequence of zeros and ones. Morse code solves this problem by using a special separator code (in this case, a pause) after the sequence of dots and dashes for each letter. Another solution is to design the code in such a way that no complete code for any symbol is the beginning (or $prefix$) of the code for another symbol. Such a code is called a $prefix$ $code$. In the example above, A is encoded by 0 and B is encoded by 100, so no other symbol can have a code that begins with 0 or with 100.

In general, we can attain significant savings if we use variable-length prefix codes that take advantage of the relative frequencies of the symbols in the messages to be encoded. One particular scheme for doing this is called the Huffman encoding method, aer its discoverer, David Huffman. A Huffman code can be represented as a binary tree whose leaves are the symbols that are encoded. At each non-leaf node of the tree there is a set containing all the symbols in the leaves that lie below the node. In addition, each symbol at a leaf is assigned a weight (which is its relative frequency), and each non-leaf node contains a weight that is the sum of all the weights of the leaves lying below it. The weights are not used in the encoding or the decoding process. We will see below how they are used to help construct the tree.

Above Figure shows the Huffman tree for the A-through-H code given above. The weights at the leaves indicate that the tree was designed for messages in which A appears with relative frequency 8, B with relative frequency 3, and the other letters each with relative frequency 1.