§1 GB\_GATES INTRODUCTION 1

Important: Before reading GB\_GATES, please read or at least skim the program for GB\_GRAPH.

## 1. Introduction. This GraphBase module provides six external subroutines:

```
risc, a routine that creates a directed acyclic graph based on the logic of a simple RISC computer;
```

prod, a routine that creates a directed acyclic graph based on the logic of parallel multiplication circuits;

print\_gates, a routine that outputs a symbolic representation of such directed acyclic
graphs;

gate\_eval, a routine that evaluates such directed acyclic graphs by assigning boolean values to each gate;

partial\_gates, a routine that extracts a subgraph by assigning random values to some of the input gates;

run\_risc, a routine that can be used to play with the output of risc.

Examples of the use of these routines can be found in the demo programs TAKE\_RISC and MULTIPLY.

```
\langle gb\_gates.h 1 \rangle \equiv
                               /* abbreviation for Procrustean linkers */
#define print_gates p_gates
  extern Graph *risc();
                             /* make a network for a microprocessor */
                              /* make a network for high-speed multiplication */
  extern Graph *prod();
                                /* write a network to standard output file */
  extern void print_gates();
                               /* evaluate a network */
  extern long gate_eval();
  extern Graph *partial_gates(); /* reduce network size */
                              /* simulate the microprocessor */
  extern long run_risc();
  extern unsigned long risc_state[]; /* the output of run_risc */
See also sections 2 and 50.
```

2 INTRODUCTION GB\_GATES §2

2. The directed acyclic graphs produced by GB\_GATES are GraphBase graphs with special conventions related to logical networks. Each vertex represents a gate of a network, and utility field *val* is a boolean value associated with that gate. Utility field *typ* is an ASCII code that tells what kind of gate is present:

- 'I' denotes an input gate, whose value is specified externally.
- '&' denotes an AND gate, whose value is the logical AND of two or more previous gates (namely, 1 if all those gates are 1, otherwise 0).
- '|' denotes an OR gate, whose value is the logical OR of two or more previous gates (namely, 0 if all those gates are 0, otherwise 1).
- '^' denotes an XOR gate, whose value is the logical EXCLUSIVE-OR of two or more previous gates (namely, their sum modulo 2).
- '~' denotes an inverter, whose value is the logical complement of the value of a single previous gate.
- 'L' denotes a latch, whose value depends on past history; it is the value that was assigned to a subsequent gate when the network was most recently evaluated. Utility field *alt* points to that subsequent gate.

Latches can be used to include "state" information in a circuit; for example, they correspond to registers of the RISC machine constructed by risc. The prod procedure does not use latches.

The vertices of the directed acyclic graph appear in a special "topological" order convenient for evaluation: All the input gates come first, followed by all the latches; then come the other types of gates, whose values are computed from their predecessors. The arcs of the graph run from each gate to its arguments, and all arguments to a gate precede that gate.

If g points to such a graph of gates, the utility field g-outs points to a list of  $\mathbf{Arc}$  records, denoting "outputs" that might be used in certain applications. For example, the outputs of the graphs created by prod correspond to the bits of the product of the numbers represented in the input gates.

A special convention is used so that the routines will support partial evaluation: The *tip* fields in the output list either point to a vertex or hold one of the constant values 0 or 1 when regarded as an unsigned long integer.

```
#define val x.I
                      /* the field containing a boolean value */
                      /* the field containing the gate type */
#define typ y.I
#define alt z.V
                      /* the field pointing to another related gate */
                         /* the field pointing to the list of output gates */
#define outs zz.A
#define is\_boolean(v) ((unsigned long)(v) \le 1)
                                                      /* is a tip field constant? */
#define the\_boolean(v) ((long)(v))
                                        /* if so, this is its value */
\#define tip\_value(v) (is\_boolean(v) ? the\_boolean(v):(v) \neg val)
#define AND '&'
#define OR ','
#define NOT ', ~,
#define XOR ', ',
\langle gb\_gates.h 1 \rangle + \equiv
#define val x.I
                      /* the definitions are repeated in the header file */
#define typ y.I
#define alt z.V
\#define outs zz.A
\#define is\_boolean(v)
                          ((\mathbf{unsigned\ long})(v) \le 1)
\#define the\_boolean(v)
                           ((\mathbf{long})(v))
                         (is\_boolean(v) ? the\_boolean(v) : (v) \rightarrow val)
#define tip_value(v)
#define AND
                '&'
#define OR
                , | ,
                 ,~,
#define NOT
#define XOR
```

§3 GB\_GATES INTRODUCTION 3

3. Let's begin with the *gate\_eval* procedure, because it is quite simple and because it illustrates the conventions just explained. Given a gate graph g and optional pointers *in\_vec* and *out\_vec*, the procedure *gate\_eval* will assign values to each gate of g. If *in\_vec* is non-null, it should point to a string of characters, each '0' or '1', that will be assigned to the first gates of the network, in order; otherwise *gate\_eval* assumes that all input gates have already received appropriate values and it will not change them. New values are computed for each gate after the bits of *in\_vec* have been consumed.

If  $out\_vec$  is non-null, it should point to a memory area capable of receiving m+1 characters, where m is the number of outputs of g; a string containing the respective output values will be deposited there.

If  $gate\_eval$  encounters an unknown gate type, it terminates execution prematurely and returns the value -1. Otherwise it returns 0.

```
\langle \text{The } gate\_eval \text{ routine } 3 \rangle \equiv
  long gate\_eval(g, in\_vec, out\_vec)
                         /* graph with gates as vertices */
        Graph *q;
                          /* string for input values, or \Lambda */
        char *in\_vec;
                             /* string for output values, or \Lambda */
        \mathbf{char} * out\_vec;
   { register Vertex *v; /* the current vertex of interest */
                              /* the current arc of interest */
     register Arc *a;
                             /* boolean value being computed */
     register char t;
     if (\neg g) return -2; /* no graph supplied! */
     v = g \neg vertices;
     if (in\_vec) (Read a sequence of input values from in\_vec 4);
     for ( ; v < g \neg vertices + g \neg n; v \leftrightarrow)  {
        switch (v \rightarrow typ) { /* branch on type of gate */
        case 'I': continue;
                                   /* this input gate's value should be externally set */
        case 'L': t = v \rightarrow alt \rightarrow val; break;
     \langle Compute the value t of a classical logic gate 6\rangle;
        default: return -1;
                                   /* unknown gate type! */
        }
                      /* assign the computed value */
        v \rightarrow val = t;
     if (out_vec) \langle Store the sequence of output values in out_vec 5 \rangle;
     return 0:
This code is used in section 7.
4. \langle \text{Read a sequence of input values from } in\_vec 4 \rangle \equiv
  while (*in\_vec \land v < g \neg vertices + g \neg n) (v++) \neg val = *in\_vec ++ - '0';
This code is used in section 3.
     \langle Store the sequence of output values in out\_vec 5 \rangle \equiv
     for (a = g \neg outs; a; a = a \neg next) * out\_vec ++ = `o` + tip\_value(a \neg tip);
     *out\_vec = 0;
                       /* terminate the string */
```

This code is used in section 3.

4 INTRODUCTION GB\_GATES §6

```
6. \langle Compute the value t of a classical logic gate 6 \rangle \equiv
case AND: t = 1;
   for (a = v \rightarrow arcs; a; a = a \rightarrow next) t \&= a \rightarrow tip \rightarrow val;
   break;
case OR: t = 0;
   for (a = v \rightarrow arcs; a; a = a \rightarrow next) t = a \rightarrow tip \rightarrow val;
   break;
case XOR: t = 0;
   for (a = v \rightarrow arcs; a; a = a \rightarrow next) t \oplus = a \rightarrow tip \rightarrow val;
   break;
case NOT: t = 1 - v \rightarrow arcs \rightarrow tip \rightarrow val;
   break;
This code is used in section 3.
7. Here now is an outline of the entire GB_GATES module, as seen by the C compiler:
#include "gb_flip.h"
                                       /* we will use the GB_FLIP routines for random numbers */
#include "gb_graph.h"
                                        /* and we will use the GB_GRAPH data structures */
   \langle Preprocessor definitions \rangle
   (Private variables 12)
   (Global variables 48)
   \langle Internal subroutines 11 \rangle
    \langle \text{ The } gate\_eval \text{ routine } 3 \rangle
    The print\_gates routine 49
    The risc routine 8
    \langle \text{ The } run\_risc \text{ routine } 43 \rangle
   \langle \text{ The } prod \text{ routine } 66 \rangle
   ⟨ The partial_gates routine 84⟩
```

 $\S 8$  GB\_GATES THE RISC NETLIST 5

8. The RISC netlist. The subroutine call risc(regs) creates a gate graph having regs registers; the value of regs must be between 2 and 16, inclusive, otherwise regs is set to 16. This gate graph describes the circuitry for a small RISC computer, defined below. The total number of gates turns out to be 1400 + 115 \* regs; thus it lies between 1630 (when regs = 2) and 3240 (when regs = 16). EXCLUSIVE-OR gates are not used; the effect of xoring is obtained where needed by means of ANDs, ORs, and inverters.

If risc cannot do its thing, it returns  $\Lambda$  (NULL) and sets  $panic\_code$  to indicate the problem. Otherwise risc returns a pointer to the graph.

```
#define panic(c) { panic\_code = c; gb\_trouble\_code = 0; return \Lambda; }
\langle The risc routine 8 \rangle \equiv
  Graph *risc(regs)
        unsigned long regs;
                                      /* number of registers supported */
  { \langle Local \ variables \ for \ risc \ 9 \rangle
     \langle \text{Initialize } new\_graph \text{ to an empty graph of the appropriate size } 16 \rangle;
     \langle Add \text{ the RISC data to } new\_graph 17 \rangle;
     if (gb\_trouble\_code) {
        gb\_recycle(new\_graph);
                                 /* oops, we ran out of memory somewhere back there */
        panic(alloc\_fault);
     return new_graph;
This code is used in section 7.
9. \langle \text{Local variables for } risc \ 9 \rangle \equiv
  Graph *new_graph; /* the graph constructed by risc */
                              /* all-purpose indices */
  register long k, r;
See also sections 18, 20, 25, 28, 33, 37, and 40.
This code is used in section 8.
```

6 THE RISC NETLIST GB\_GATES §10

10. This RISC machine works with 16-bit registers and 16-bit data words. It cannot write into memory, but it assumes the existence of an external read-only memory. The circuit has 16 outputs, representing the 16 bits of a memory address register. It also has 17 inputs, the last 16 of which are supposed to be set to the contents of the memory address computed on the previous cycle. Thus we can run the machine by accessing memory between calls of gate\_eval. The first input bit, called RUN, is normally set to 1; if it is 0, the other inputs are effectively ignored and all registers and outputs will be cleared to 0. Input bits for the memory appear in "little-endian order," that is, least significant bit first; but the output bits for the memory address register appear in "big-endian order," most significant bit first.

Words read from memory are interpreted as instructions having the following format:

| DST |    |    | MOD |    |    |   | 0P |   | A |   | SRC |   |   |   |   |
|-----|----|----|-----|----|----|---|----|---|---|---|-----|---|---|---|---|
| 15  | 14 | 13 | 12  | 11 | 10 | 9 | 8  | 7 | 6 | 5 | 4   | 3 | 2 | 1 | 0 |

The SRC and A fields specify a "source" value. If A = 0, the source is SRC, treated as a 4-bit signed number between -8 and +7 inclusive. If A = 1, the source is the contents of register DST plus the (signed) value of SRC. If A = 2, the source is the contents of register SRC. And if A = 3, the source is the contents of the memory location whose address is the contents of register SRC. Thus, for example, if DST = 3 and SRC = 10, and if r3 contains 17 while r10 contains 1009, the source value will be -6 if A = 0, or A = 10 if A = 10, or A = 10 if A = 10, or the contents of memory location 1009 if A = 3.

The DST field specifies the number of the destination register. This register receives a new value based on its previous value and the source value, as prescribed by the operation defined in the OP and MOD fields. For example, when OP = 0, a general logical operation is performed, as follows: Suppose the bits of MOD are called  $\mu_{11}\mu_{10}\mu_{01}\mu_{00}$  from left to right. Then if the kth bit of the destination register currently is equal to i and the kth bit of the source value is equal to j, the general logical operator changes the kth bit of the destination register to  $\mu_{ij}$ . If the MOD bits are, for example, 1010, the source value is simply copied to the destination register; if MOD = 0110, an exclusive-or is done; if MOD = 0011, the destination register is complemented and the source value is effectively ignored.

The machine contains four status bits called S (sign), N (nonzero), K (carry), and V (overflow). Every general logical operation sets S equal to the sign of the new result transferred to the destination register; this is bit 15, the most significant bit. A general logical operation also sets N to 1 if any of the other 15 bits are 1, to 0 if all of the other bits are 0. Thus S and N both become zero if and only if the new result is entirely zero. Logical operations do not change the values of K and V; the latter are affected only by the arithmetic operations described below.

The status of the S and N bits can be tested by using the conditional load operator, OP = 2: This operation loads the source value into the destination register if and only if MOD bit  $\mu_{ij} = 1$ , where i and j are the current values of S and N, respectively. For example, if MOD = 0011, the source value is loaded if and only if S = 0, which means that the last value affecting S and N was greater than or equal to zero. If MOD = 1111, loading is always done; this option provides a way to move source to destination without affecting S or N.

A second conditional load operator, OP = 3, is similar, but it is used for testing the status of K and V instead of S and N. For example, a command having MOD = 1010, OP = 3, A = 1, and SRC = 1 adds the current overflow bit to the destination register. (Please take a moment to understand why this is true.)

We have now described all the operations except those that are performed when  $\mathtt{OP}=1$ . As you might expect, our machine is able to do rudimentary arithmetic. The general addition and subtraction operators belong to this final case, together with various shift operators, depending on the value of  $\mathtt{MOD}$ .

Eight of the  $\mathtt{OP}=1$  operations set the destination register to a shifted version of the source value:  $\mathtt{MOD}=0$  means "shift left 1," which is equivalent to multiplying the source by 2;  $\mathtt{MOD}=1$  means "cyclic shift left 1," which is the same except that it also adds the previous sign bit to the result;  $\mathtt{MOD}=2$  means "shift left 4," which is equivalent to multiplying by 16;  $\mathtt{MOD}=3$  means "cyclic shift left 4";  $\mathtt{MOD}=4$  means "shift right 1," which is equivalent to dividing the source by 2 and rounding down to the next lower integer if there was a remainder;  $\mathtt{MOD}=5$  means "unsigned shift right 1," which is the same except that the most significant bit is always set to zero instead of retaining the previous sign;  $\mathtt{MOD}=6$  means "shift right 4," which is equivalent to dividing the source by 16 and rounding down;  $\mathtt{MOD}=7$  means "unsigned shift right 4." Each of these shift

 $\S10$  GB\_GATES THE RISC NETLIST 7

operations affects S and N, as in the case of logical operations. They also affect K and V, as follows: Shifting left sets K to 1 if and only if at least one of the bits shifted off the left was nonzero, and sets V to 1 if and only if the corresponding multiplication would cause overflow. Shifting right 1 sets K to the value of the bit shifted out, and sets V to 0; shifting right 4 sets K to the value of the last bit shifted out, and sets V to the logical OR of the other three lost bits. The same values of K and V arise from cyclic or unsigned shifts as from ordinary shifts.

When OP = 1 and MOD = 8, the source value is added to the destination register. This sets S, N, and V as you would expect; and it sets K to the carry you would get if you were treating the operands as 16-bit unsigned integers. Another addition operation, having MOD = 9, is similar, but the current value of K is also added to the result; in this case, the new value of N will be zero if and only if the 15 non-sign bits of the result are zero and the previous values of S and N were also zero. This means that you can use the first addition operation on the lower halves of a 32-bit number and the second operation on the upper halves, thereby obtaining a correct 32-bit result, with appropriate sign, nonzero, carry, and overflow bits set. Higher precision (48 bits, 64 bits, etc.) can be obtained in a similar way.

When OP = 1 and MOD = 10, the source value is subtracted from the destination register. Again, S, N, K, and V are set; the K value in this case represents the "borrow" bit. An auxiliary subtraction operation, having MOD = 11, subtracts also the current value of K, thereby allowing for correct 32-bit subtraction.

The operations for  $\mathtt{OP}=1$  and  $\mathtt{MOD}=12$ , 13, and 14 are "reserved for future expansion." Actually they will never change, however, since this RISC chip is purely academic. If you check out the logic below, you will find that they simply set the destination register and the four status bits all to zero.

A final operation, called JUMP, will be explained momentarily. It has OP = 1 and MOD = 15. It does not affect S, N, K, or V.

If the RISC is made with fewer than 16 registers, the higher-numbered ones will effectively contain zero whenever their values are fetched. But if you use them as destination registers, you will set S, N, K, and V as if actual numbers were being stored.

Register 0 is different from the other 15 registers: It is the location of the current instruction. Therefore if you change the contents of register 0, you are changing the control flow of the program. If you do not change register 0, it automatically increases by 1.

Special treatment occurs when  $\mathtt{A}=3$  and  $\mathtt{SRC}=0$ . In such a case, the normal rules given above say that the source value should be the contents of the memory location specified by register 0. But that memory location holds the current instruction; so the machine uses the *following* location instead, as a 16-bit source operand. If the contents of register 0 are not changed by such a two-word instruction, register 0 will increase by 2 instead of 1.

We have now discussed everything about the machine except the operation of the JUMP command. This command moves the source value to register 0, thereby changing the flow of control. Furthermore, if DST  $\neq$  0, it also sets register DST to the location of the instruction following the JUMP. Assembly language programmers will recognize this as a convenient way to jump to a subroutine.

Example programs can be found in the TAKE\_RISC module, which includes a simple subroutine for multiplication and division.

8 THE RISC NETLIST GB\_GATES §11

11. A few auxiliary functions will ameliorate the task of constructing the RISC logic. First comes a routine that "christens" a new gate, assigning it a name and a type. The name is constructed from a prefix and a serial number, where the prefix indicates the current portion of logic being created.

```
\langle \text{Internal subroutines } 11 \rangle \equiv
  static Vertex *new\_vert(t)
                  /* the type of the new gate */
  \{ \text{ register Vertex } *v; 
     v = next\_vert ++;
     if (count < 0) \ v \rightarrow name = gb\_save\_string(prefix);
       sprintf(name_buf, "%s%ld", prefix, count);
       v \rightarrow name = gb\_save\_string(name\_buf);
       count ++;
     v \rightarrow typ = t;
     return v;
See also sections 13, 14, 15, 38, and 51.
This code is used in section 7.
12. #define start\_prefix(s) strcpy(prefix, s); count = 0
#define numeric\_prefix(a, b) sprintf(prefix, "%c%ld: ", a, b); count = 0;
\langle \text{Private variables } 12 \rangle \equiv
  static Vertex *next_vert;
                                     /* the first vertex not yet assigned a name */
                             /* prefix string for vertex names */
  static char prefix [8];
                             /* serial number for vertex names */
  static long count;
  static char name_buf[100]; /* place to form vertex names */
This code is used in section 7.
```

 $\S13$  GB\_GATES THE RISC NETLIST

13. Here are some trivial routines to create gates with 2, 3, or more arguments. The arcs from such a gate to its inputs are assigned length 100. Other routines, defined below, assign length 1 to the arc between an inverter and its unique input. This convention makes the lengths of shortest paths in the resulting network a bit more interesting than they would otherwise be.

```
#define DELAY 100_{L}
\langle \text{Internal subroutines } 11 \rangle + \equiv
  static Vertex *make2(t, v1, v2)
       char t;
                /* the type of the new gate */
       Vertex *v1, *v2;
  { register Vertex *v = new\_vert(t);
    gb\_new\_arc(v, v1, DELAY);
    qb\_new\_arc(v, v2, DELAY);
    return v;
  static Vertex *make3(t, v1, v2, v3)
       char t:
                /* the type of the new gate */
       Vertex *v1, *v2, *v3;
  { register Vertex *v = new\_vert(t);
    gb\_new\_arc(v, v1, DELAY);
    gb\_new\_arc(v, v2, DELAY);
    gb\_new\_arc(v, v3, DELAY);
    return v;
  static Vertex *make4(t, v1, v2, v3, v4)
                 /* the type of the new gate */
       char t;
       Vertex *v1, *v2, *v3, *v4;
  { register Vertex *v = new\_vert(t);
    qb\_new\_arc(v, v1, DELAY);
    gb\_new\_arc(v, v2, DELAY);
    qb\_new\_arc(v, v3, DELAY);
    gb\_new\_arc(v, v \not 4, DELAY);
    return v;
  static Vertex *make5(t, v1, v2, v3, v4, v5)
                  /* the type of the new gate */
       Vertex *v1, *v2, *v3, *v4, *v5;
  { register Vertex *v = new\_vert(t);
    gb\_new\_arc(v, v1, DELAY);
    gb\_new\_arc(v, v2, DELAY);
    gb\_new\_arc(v, v3, DELAY);
    gb\_new\_arc(v, v \not 4, DELAY);
    gb\_new\_arc(v, v5, DELAY);
    return v;
  }
```

10 THE RISC NETLIST GB\_GATES  $\S14$ 

14. We use utility field w.V to store a pointer to the complement of a gate, if that complement has been formed. This trick prevents the creation of excessive gates that are equivalent to each other. The following subroutine returns a pointer to the complement of a given gate.

```
subroutine returns a pointer to the complement of a given gate.
#define bar w.V
                         /* field pointing to complement, if known to exist */
#define even\_comp(s, v) ((s) & 1 ? v : comp(v))
\langle Internal subroutines 11 \rangle + \equiv
  static Vertex *comp(v)
       Vertex *v;
  { register Vertex *u;
     if (v \rightarrow bar) return v \rightarrow bar;
     u = next\_vert ++;
     u \rightarrow bar = v; v \rightarrow bar = u;
     sprintf(name\_buf, "%s~", v \rightarrow name);
     u \rightarrow name = gb\_save\_string(name\_buf);
     u \rightarrow typ = NOT;
     gb\_new\_arc(u, v, 1_L);
     return u;
  }
      To create a gate for the EXCLUSIVE-OR of two arguments, we can either construct the OR of two ANDs
or the AND of two ORs. We choose the former alternative:
\langle \text{Internal subroutines } 11 \rangle + \equiv
  static Vertex *make\_xor(u, v)
       Vertex *u, *v;
  { register Vertex *t1, *t2;
     t1 = make2(AND, u, comp(v));
     t2 = make2 (AND, comp(u), v);
     return make2(OR, t1, t2);
  }
16. OK, let's get going.
(Initialize new_graph to an empty graph of the appropriate size 16) \equiv
  if (regs < 2 \lor regs > 16) regs = 16;
  new\_graph = gb\_new\_graph(1400 + 115 * regs);
```

17. ⟨Add the RISC data to new\_graph 17⟩ ≡
⟨Create the inputs and latches 19⟩;
⟨Create gates for instruction decoding 21⟩;
⟨Create gates for fetching the source value 22⟩;
⟨Create gates for the general logic operation 26⟩;
⟨Create gates for the conditional load operations 27⟩;
⟨Create gates for the arithmetic operations 41⟩;
⟨Create gates that bring everything together properly 29⟩;
if (next\_vert ≠ new\_graph¬vertices + new\_graph¬n) panic(impossible);
/\* oops, we miscounted; this should be impossible \*/
This code is used in section 8.

 $\S18$  GB\_GATES THE RISC NETLIST 11

18. Internal names will make it convenient to refer to the most important gates. Here are the names of inputs and latches.

```
\langle \text{Local variables for } risc \ 9 \rangle + \equiv
  Vertex *run\_bit;
                        /* the RUN input */
  Vertex *mem[16];
                        /* 16 bits of input from read-only memory */
                      /* first of 10 bits in the program register */
  Vertex *prog;
  \mathbf{Vertex} * sign;
                    /* the latched value of S */
  Vertex *nonzero; /* the latched value of N */
  Vertex * carry;
                      /* the latched value of K */
                         /* the latched value of V */
  Vertex *overflow;
  \mathbf{Vertex} * extra;
                      /* latched status bit: are we doing an extra memory cycle? */
                       /* the least-significant bit of a given register */
  Vertex *reg[16];
     #define first\_of(n,t) new\_vert(t); for (k = 1; k < n; k++) new\_vert(t);
\langle Create the inputs and latches 19\rangle \equiv
  strcpy(prefix, "RUN"); count = -1; run\_bit = new\_vert('I');
  start\_prefix("M"); for (k = 0; k < 16; k++) mem[k] = new\_vert('I');
  start\_prefix("P"); prog = first\_of(10, 'L');
  strcpy(prefix, "S"); count = -1; sign = new\_vert('L');
  strcpy(prefix, "N"); nonzero = new\_vert('L');
  strcpy(prefix, "K"); carry = new_vert('L');
  strcpy(prefix, "V"); overflow = new_vert('L');
  strcpy(prefix, "X"); extra = new\_vert('L');
  for (r = 0; r < regs; r++) {
    numeric\_prefix('R',r);
    reg[r] = first\_of(16, 'L');
  }
This code is used in section 17.
```

12 THE RISC NETLIST GB\_GATES §20

**20.** The order of evaluation of function arguments is not defined in C, so we introduce a few macros that force left-to-right order.

```
#define do2(result, t, v1, v2)
         \{ t1 = v1; t2 = v2; 
           result = make2(t, t1, t2); 
#define do3(result, t, v1, v2, v3)
         \{ t1 = v1; t2 = v2; t3 = v3; 
           result = make3(t, t1, t2, t3); 
#define do4 (result, t, v1, v2, v3, v4)
         \{ t1 = v1; t2 = v2; t3 = v3; t4 = v4; 
           result = make \{(t, t1, t2, t3, t4); \}
#define do5(result, t, v1, v2, v3, v4, v5)
         \{t1 = v1; t2 = v2; t3 = v3; t4 = v4; t5 = v5;
           result = make5(t, t1, t2, t3, t4, t5); 
\langle \text{Local variables for } risc \ 9 \rangle + \equiv
  Vertex *t1, *t2, *t3, *t4, *t5;
                                     /* temporary holds to force evaluation order */
  Vertex *tmp[16];
                        /* additional holding places for partial results */
                     /* is the source value immediate (a given constant)? */
  Vertex *imm;
  Vertex *rel;
                   /* is the source value relative to the current destination register? */
  Vertex *dir;
                   /* should the source value be fetched directly from a source register? */
                    /* should the source value be fetched indirectly from memory? */
  Vertex *ind;
  Vertex *op;
                   /* least significant bit of OP */
                     /* most significant bit of OP */
  Vertex *cond;
                      /* the MOD bits */
  Vertex *mod[4];
  Vertex * dest[4];
                       /* the DEST bits */
```

21. The sixth line of the program here can be translated into the logic equation

```
op = (extra \land prog) \lor (\overline{extra} \land mem[6]).
```

Once you see why, you'll be able to read the rest of this curious code.

```
 \begin{array}{l} \langle \operatorname{Create\ gates\ for\ instruction\ decoding\ 21} \rangle \equiv \\ start\_prefix("D"); \\ do3(imm, \operatorname{AND}, comp(extra), comp(mem[4]), comp(mem[5])); \\ /* \ A = 0 \ */\\ do3(rel, \operatorname{AND}, comp(extra), mem[4], comp(mem[5])); \\ /* \ A = 1 \ */\\ do3(dir, \operatorname{AND}, comp(extra), comp(mem[4]), mem[5]); \\ /* \ A = 2 \ */\\ do3(ind, \operatorname{AND}, comp(extra), mem[4], mem[5]); \\ /* \ A = 3 \ */\\ do2(op, \operatorname{OR}, make2(\operatorname{AND}, extra, prog), make2(\operatorname{AND}, comp(extra), mem[6])); \\ do2(cond, \operatorname{OR}, make2(\operatorname{AND}, extra, prog + 1), make2(\operatorname{AND}, comp(extra), mem[7])); \\ \mathbf{for}\ (k = 0; \ k < 4; \ k++) \ \{ \\ do2(mod[k], \operatorname{OR}, make2(\operatorname{AND}, extra, prog + 2 + k), make2(\operatorname{AND}, comp(extra), mem[8 + k])); \\ do2(dest[k], \operatorname{OR}, make2(\operatorname{AND}, extra, prog + 6 + k), make2(\operatorname{AND}, comp(extra), mem[12 + k])); \\ \} \end{array}
```

This code is used in section 17.

 $\S22$  GB\_GATES THE RISC NETLIST 13

```
\langle Create gates for fetching the source value 22\rangle \equiv
    start\_prefix("F");
     (Set old_dest to the present value of the destination register 23);
     (Set old_src to the present value of the source register 24);
     \langle \text{ Set } inc\_dest \text{ to } old\_dest \text{ plus SRC } 39 \rangle;
    for (k = 0; k < 16; k++)
         do4 (source[k], OR, make2 (AND, imm, mem[k < 4? k:3]), make2 (AND, rel, inc_dest[k]),
                   make2 (AND, dir, old\_src[k]), make2 (AND, extra, mem[k]));
This code is used in section 17.
23. Here and in the immediately following section we create OR gates old\_dest[k] and old\_src[k] that might
have as many as 16 inputs. (The actual number of inputs is regs.) All other gates in the network will have
at most five inputs.
\langle \text{Set } old\_dest \text{ to the present value of the destination register 23} \rangle \equiv
    for (r = 0; r < regs; r++)
         do4 (dest\_match[r], AND, even\_comp(r, dest[0]), even\_comp(r \gg 1, dest[1]),
                   even\_comp(r \gg 2, dest[2]), even\_comp(r \gg 3, dest[3]));
    for (k = 0; k < 16; k++) {
         for (r = 0; r < regs; r ++)
              tmp[r] = make2 \text{ (AND, } dest\_match[r], reg[r] + k);
         old\_dest[k] = new\_vert(OR);
         for (r = 0; r < regs; r \leftrightarrow) gb\_new\_arc(old\_dest[k], tmp[r], DELAY);
    }
This code is used in section 22.
24. \langle Set old_src to the present value of the source register 24\rangle \equiv
    for (k = 0; k < 16; k++) {
         for (r = 0; r < regs; r++)
              do5(tmp[r], AND, reg[r] + k, even\_comp(r, mem[0]), even\_comp(r \gg 1, mem[1]), even\_comp(r \gg 2, 
                        mem[2]), even\_comp(r \gg 3, mem[3]));
         old\_src[k] = new\_vert(OR);
         \mathbf{for}\ (r = 0;\ r < regs;\ r +\!\!\!\!+)\ gb\_new\_arc(old\_src[k], tmp[r], \mathtt{DELAY});
This code is used in section 22.
25. \langle \text{Local variables for } risc \ 9 \rangle + \equiv
                                                                /* dest_match[r] \equiv 1 \text{ iff DST} = r */
    Vertex * dest_match[16];
    Vertex *old\_dest[16];
                                                          /* contents of destination register before operation */
    Vertex *old\_src[16];
                                                        /* contents of source register before operation */
                                                          /* old_dest plus the SRC field */
    Vertex *inc\_dest[16];
    Vertex *source[16];
                                                       /* source value for the operation */
    Vertex *log[16];
                                                /* result of general logic operation */
    Vertex *shift[18];
                                                   /* result of shift operation, with carry and overflow */
    Vertex *sum[18];
                                                   /* old_dest plus source plus optional carry */
    Vertex *diff [18];
                                                  /* old_dest minus source minus optional borrow */
    Vertex *next\_loc[16];
                                                         /* contents of register 0, plus 1 */
    Vertex *next\_next\_loc[16];
                                                                   /* contents of register 0, plus 2 */
    Vertex *result[18];
                                                     /* result of operating on old_dest and source */
```

14 THE RISC NETLIST GB\_GATES  $\S 26$ 

```
\langle Create gates for the general logic operation 26\rangle \equiv
  start\_prefix("L");
  for (k = 0; k < 16; k ++)
     do4 (log[k], OR,
          make3 (AND, mod[0], comp(old\_dest[k]), comp(source[k])),
          make3 (AND, mod[1], comp(old\_dest[k]), source[k]),
          make3 (AND, mod[2], old\_dest[k], comp(source[k])),
          make3 (AND, mod[3], old\_dest[k], source[k]));
This code is used in section 17.
     \langle Create gates for the conditional load operations 27\rangle \equiv
  start\_prefix("C");
  do4 (tmp[0], OR,
       make3 (AND, mod[0], comp(sign), comp(nonzero)),
       make3 (AND, mod[1], comp(sign), nonzero),
       make3 (AND, mod[2], sign, comp(nonzero)),
       make 3 (AND, mod [3], sign, nonzero));
  do4 (tmp[1], OR,
       make3 (AND, mod[0], comp(carry), comp(overflow)),
       make3 (AND, mod[1], comp(carry), overflow),
       make3 (AND, mod[2], carry, comp(overflow)),
       make3 (AND, mod[3], carry, overflow));
  do3(change, OR, comp(cond), make2(AND, tmp[0], comp(op)), make2(AND, tmp[1], op));
This code is used in section 17.
     \langle \text{Local variables for } risc \ 9 \rangle + \equiv
                         /* is the destination register supposed to change? */
  Vertex *change;
     Hardware is like software except that it performs all the operations all the time and then selects only
the results it needs. (If you think about it, this is a profound observation about economics, society, and
nature. Gosh.)
\langle Create gates that bring everything together properly 29\rangle \equiv
  start_prefix("Z");
   \langle \text{ Create gates for the } next\_loc \text{ and } next\_next\_loc \text{ bits } 30 \rangle;
   Create gates for the result bits 31);
   (Create gates for the new values of registers 1 to regs 34);
    Create gates for the new values of S, N, K, and V 35);
    Create gates for the new values of the program register and extra 32);
   Create gates for the new values of register 0 and the memory address register 36;
This code is used in section 17.
     \langle \text{Create gates for the } next\_loc \text{ and } next\_next\_loc \text{ bits } 30 \rangle \equiv
  next\_loc[0] = comp(req[0]); next\_next\_loc[0] = req[0];
  next\_loc[1] = make\_xor(req[0] + 1, req[0]); next\_next\_loc[1] = comp(req[0] + 1);
  for (t5 = reg[0] + 1, k = 2; k < 16; t5 = make2(AND, t5, reg[0] + k++)) {
     next\_loc[k] = make\_xor(reg[0] + k, make2(AND, reg[0], t5));
     next\_next\_loc[k] = make\_xor(reg[0] + k, t5);
This code is used in section 29.
```

§31 GB\_GATES THE RISC NETLIST 15

```
\langle \text{ Create gates for the } result \text{ bits } 31 \rangle \equiv
  jump = make 5 \text{ (AND, } op, mod [0], mod [1], mod [2], mod [3]); /* assume cond = 0 */
  for (k = 0; k < 16; k++) {
     do5 (result[k], OR,
          make2 (AND, comp(op), log[k]),
          make2 (AND, jump, next\_loc[k]),
          make3 (AND, op, comp(mod[3]), shift[k]),
          make 5 (AND, op, mod [3], comp (mod [2]), comp (mod [1]), sum [k]),
          make 5 	ext{ (AND, } op, mod [3], comp (mod [2]), mod [1], diff [k]));
     do2(result[k], OR,
          make3 (AND, cond, change, source[k]),
          make2 (AND, comp(cond), result[k]);
  for (k = 16; k < 18; k++)
                                    /* carry and overflow bits of the result */
     do3(result[k], OR,
          make3 (AND, op, comp(mod[3]), shift[k]),
          make 5 (AND, op, mod [3], comp (mod [2]), comp (mod [1]), sum [k]),
          make 5 (AND, op, mod [3], comp (mod [2]), mod [1], diff [k]);
This code is used in section 29.
```

**32.** The program register *prog* and the *extra* bit are needed for the case when we must spend an extra cycle to fetch a word from memory. On the first cycle, *ind* is true, so a "result" is calculated but not actually used. On the second cycle, *extra* is true.

A slight optimization has been introduced in order to make the circuit a bit more interesting: If a conditional load instruction occurs with indirect addressing and a false condition, the extra cycle is not taken. (The <code>next\_next\_loc</code> values were computed for this reason.)

```
#define latchit(u, latch) (latch) \rightarrow alt = make2 (AND, u, run_bit)
                                                                       /* u & run_bit is new value for latch */
\langle Create gates for the new values of the program register and extra 32\rangle \equiv
  for (k = 0; k < 10; k++) \ latchit(mem[k+6], prog + k);
  do2(nextra, OR, make2(AND, ind, comp(cond)), make2(AND, ind, change));
  latchit(nextra, extra);
  nzs = make_4 (OR, mem[0], mem[1], mem[2], mem[3]);
  nzd = make4 (OR, dest[0], dest[1], dest[2], dest[3]);
This code is used in section 29.
33. \langle \text{Local variables for } risc \ 9 \rangle + \equiv
                     /* is this command a JUMP, assuming cond is false? */
  Vertex *jump;
                      /* must we take an extra cycle? */
  Vertex *nextra;
                     /* is the SRC field nonzero? */
  Vertex *nzs;
                     /* is the DST field nonzero? */
  Vertex *nzd;
34. \langle Create gates for the new values of registers 1 to regs 34\rangle \equiv
  t5 = make2 (AND, change, comp(ind));
                                             /* should destination register change? */
  for (r = 1; r < regs; r++) {
    t4 = make2 (AND, t5, dest_match[r]);
                                              /* should register r change? */
    for (k = 0; k < 16; k++) {
       do2(t3, OR, make2(AND, t4, result[k]), make2(AND, comp(t4), reg[r] + k));
       latchit(t3, reg[r] + k);
```

This code is used in section 29.

16 The risc netlist gb-gates  $\S 35$ 

```
\langle Create gates for the new values of S, N, K, and V 35 \rangle \equiv
  do4(t5, OR,
       make2 (AND, sign, cond),
       make 2 (\mathtt{AND}, sign, jump),
       make2 (AND, sign, ind),
       make4 (AND, result[15], comp(cond), comp(jump), comp(ind));
  latchit(t5, sign);
  do4(t5, OR,
       make 4 (OR, result[0], result[1], result[2], result[3]),
       make 4 (OR, result[4], result[5], result[6], result[7]),
       make4 (OR, result [8], result [9], result [10], result [11]),
       make4 (OR, result [12], result [13], result [14],
                   make 5 \text{ (AND, } make 2 \text{ (OR, } nonzero, sign), op, mod [0], comp(mod [2]), mod [3])));
  do4(t5, OR,
       make2 (AND, nonzero, cond),
       make2 (AND, nonzero, jump),
       make2 (AND, nonzero, ind),
       make4 (AND, t5, comp(cond), comp(jump), comp(ind)));
  latchit(t5, nonzero);
  do5(t5, OR,
       make2 (AND, overflow, cond),
       make2 (AND, overflow, jump),
       make2 (AND, overflow, comp(op)),
       make2 (AND, overflow, ind),
       make 5 (AND, result[17], comp(cond), comp(jump), comp(ind), op));
  latchit(t5, overflow);
  do5(t5, OR,
       make2 (AND, carry, cond),
       make2 (AND, carry, jump),
       make2 (AND, carry, comp(op)),
       make2 (AND, carry, ind),
       make 5 (AND, result[16], comp(cond), comp(jump), comp(ind), op));
  latchit(t5, carry);
This code is used in section 29.
```

§36 GB\_GATES THE RISC NETLIST 17

**36.** As usual, we have left the hardest case for last, hoping that we will have learned enough tricks to handle it when the time of reckoning finally arrives.

The most subtle part of the logic here is perhaps the case of a JUMP command with A = 3. We want to increase register 0 by 1 during the first cycle of such a command, if SRC = 0, so that the *result* will be correct on the next cycle.

```
\langle Create gates for the new values of register 0 and the memory address register 36 \rangle
                                               /* false conditional? */
  skip = make2 (AND, cond, comp(change));
                                               /* JUMP command? */
  hop = make2 (AND, comp(cond), jump);
  do4 (normal, OR,
       make2 (AND, skip, comp(ind)),
       make2 (AND, skip, nzs),
       make3 (AND, comp(skip), ind, comp(nzs)),
       make3 (AND, comp(skip), comp(hop), nzd));
  special = make3 (AND, comp(skip), ind, nzs);
  for (k = 0; k < 16; k++) {
    do4(t5, OR,
         make2 (AND, normal, next\_loc[k]),
         make4 (AND, skip, ind, comp(nzs), next\_next\_loc[k]),
         make3 (AND, hop, comp(ind), source[k]),
         make 5 (AND, comp(skip), comp(hop), comp(ind), comp(nzd), result[k]);
    do2(t4, OR,
         make2 (AND, special, reg[0] + k),
         make2 (AND, comp(special), t5));
    latchit(t4, req[0] + k);
    do2(t4, OR,
         make2 (AND, special, old\_src[k]),
         make2 (AND, comp(special), t5));
    { register Arc *a = gb\_virgin\_arc();
       a \rightarrow tip = make2 (AND, t4, run_bit);
       a \neg next = new\_graph \neg outs;
       new\_graph \neg outs = a; /* pointer to memory address bit */
       /* arcs for output bits will appear in big-endian order */
This code is used in section 29.
37. \langle \text{Local variables for } risc \ 9 \rangle + \equiv
                     /* are we skipping a conditional load operation? */
  Vertex *skip;
                     /\ast\, are we doing a JUMP? \,\ast/\,
  Vertex *hop;
                        /* is this a case where register 0 is simply incremented? */
  Vertex *normal;
  Vertex *special;
    /* is this a case where register 0 and the memory address register will not coincide? */
```

18 SERIAL ADDITION GB\_GATES §38

**38.** Serial addition. We haven't yet specified the parts of *risc* that deal with addition and subtraction; somehow, those parts wanted to be separate from the rest. To complete our mission, we will use subroutine calls of the form ' $make\_adder(n, x, y, z, carry, add)$ ', where x and y are n-bit arrays of input gates and z is an (n+1)-bit array of output gates. If  $add \neq 0$ , the subroutine computes x+y, otherwise it computes x-y. If  $carry \neq 0$ , the carry gate is effectively added to y before the operation.

A simple n-stage serial scheme, which reduces the problem of n-bit addition to (n-1)-bit addition, is adequate for our purposes here. (A parallel adder, which gains efficiency by reducing the problem size from n to  $n/\phi$ , can be found in the *prod* routine below.)

The handy identity  $x - y = \overline{x} + y$  is used to reduce subtraction to addition.

```
\langle \text{Internal subroutines } 11 \rangle + \equiv
  static void make\_adder(n, x, y, z, carry, add)
       unsigned long n; /* number of bits */
       Vertex *x[], *y[];
                                /* input gates */
       Vertex *z[];
                        /* output gates */
       Vertex *carry; /* add this to y, unless it's null */
       \mathbf{char}\ add; \qquad /*\ \mathrm{should}\ \mathrm{we}\ \mathrm{add}\ \mathrm{or}\ \mathrm{subtract?}\ */
  \{ \text{ register long } k; 
     Vertex *t1, *t2, *t3, *t4; /* temporary storage used by do4 */
     if (\neg carry) {
       z[0] = make\_xor(x[0], y[0]);
       carry = make2(AND, even\_comp(add, x[0]), y[0]);
       k = 1;
     } else k = 0;
     for ( ; k < n; k \leftrightarrow)  {
       comp(x[k]); comp(y[k]); comp(carry);
                                                       /* generate inverse gates */
       do4(z[k], OR,
            make3 (AND, x[k], comp(y[k]), comp(carry)),
            make3 (AND, comp(x[k]), y[k], comp(carry)),
            make3 (AND, comp(x[k]), comp(y[k]), carry),
            make \Im (\mathtt{AND}, x[k], y[k], carry));
       do3(carry, OR,
             make2 (AND, even\_comp(add, x[k]), y[k]),
            make2 (AND, even\_comp(add, x[k]), carry),
            make2(\mathtt{AND},y[k],carry));
     z[n] = carry;
```

§39 GB\_GATES SERIAL ADDITION 19

**39.** OK, now we can add. What good does that do us? In the first place, we need a 4-bit adder to compute the least significant bits of  $old\_dest + SRC$ . The other 12 bits of that sum are simpler.

```
\langle \text{Set } inc\_dest \text{ to } old\_dest \text{ plus SRC } 39 \rangle \equiv
  make\_adder(4_{L}, old\_dest, mem, inc\_dest, \Lambda, 1);
  up = make2 \text{ (AND, } inc\_dest[4], comp(mem[3]));
                                                           /* remaining bits must increase */
  down = make2 (AND, comp(inc\_dest[4]), mem[3]);
                                                             /* remaining bits must decrease */
  for (k = 4; ; k++) {
     comp(up); comp(down);
     do3(inc\_dest[k], OR,
          make2 (AND, comp(old\_dest[k]), up),
          make2 (AND, comp(old\_dest[k]), down),
          make3 (AND, old\_dest[k], comp(up), comp(down)));
     if (k < 15) {
       up = make2 (AND, up, old_dest[k]);
       down = make2 (AND, down, comp(old\_dest[k]));
     } else break;
This code is used in section 22.
40. \langle \text{Local variables for } risc \ 9 \rangle + \equiv
                             /* gates used when computing inc\_dest */
  Vertex *up,*down;
```

41. In the second place, we need a 16-bit adder and a 16-bit subtracter for the four addition/subtraction commands.

This code is used in section 17.

20 SERIAL ADDITION GB\_GATES  $\S42$ 

```
\langle Create gates for the shift operations 42 \rangle \equiv
  for (k = 0; k < 16; k++)
     do4 (shift[k], OR,
         (k \equiv 0 ? make4 (AND, source [15], mod [0], comp (mod [1]), comp (mod [2])) :
                      make3 (AND, source[k-1], comp(mod[1]), comp(mod[2]))),
         (k < 4 ? make4 (AND, source[k + 12], mod[0], mod[1], comp(mod[2])):
                      make3 (AND, source[k-4], mod[1], comp(mod[2]))),
         (k \equiv 15 ? make4 (AND, source [15], comp (mod [0]), comp (mod [1]), mod [2]) :
                      make3 (AND, source[k+1], comp(mod[1]), mod[2])),
         (k > 11 ? make4 (AND, source[15], comp(mod[0]), mod[1], mod[2]) :
                      make3 (AND, source[k+4], mod[1], mod[2])));
  do4 (shift [16], OR,
       make2 (AND, comp(mod[2]), source[15]),
       make3 (AND, comp (mod [2]), mod [1], make3 (OR, source [14], source [13], source [12])),
       make3 (AND, mod[2], comp(mod[1]), source[0]),
       make3 (AND, mod[2], mod[1], source[3]);
                                                     /* "carry" */
  do3 (shift [17], OR,
       make3 (AND, comp (mod [2]), comp (mod [1]), make\_xor (source [15], source [14])),
       make 4 \text{ (AND, } comp \text{ } (mod \text{ } [2]), mod \text{ } [1],
                   make 5 (OR, source [15], source [14], source [13], source [12], source [11]),
                   make 5 (OR, comp(source[15]), comp(source[14]), comp(source[13]),
                                comp(source[12]), comp(source[11]))),
       make3 (AND, mod[2], mod[1], make3 (OR, source[0], source[1], source[2])));
                                                                                      /* "overflow" */
This code is used in section 41.
```

§43 GB\_GATES RISC MANAGEMENT 21

**43. RISC management.** The *run\_risc* procedure takes a gate graph output by *risc* and simulates its behavior, given the contents of its read-only memory. (See the demonstration program TAKE\_RISC, which appears in a module by itself, for a typical illustration of how *run\_risc* might be used.)

This procedure clears the simulated machine and begins executing the program that starts at address 0. It stops when it gets to an address greater than the size of read-only memory supplied. One way to stop it is therefore to execute a command such as  $^{\#}$ 0f00, which will transfer control to location  $^{\#}$ ffff; even better is  $^{\#}$ 0f8f, which transfers to location  $^{\#}$ ffff without changing the status of S and N. However, if the given read-only memory contains a full set of  $2^{16}$  words,  $run\_risc$  will never stop.

When  $run\_risc$  does stop, it returns 0 and puts the final contents of the simulated registers into the global array  $risc\_state$ . Or, if g was not a decent graph,  $run\_risc$  returns a negative value and leaves  $risc\_state$  untouched.

```
\langle \text{The } run\_risc \text{ routine } 43 \rangle \equiv
  long run\_risc(g, rom, size, trace\_regs)
       Graph *q:
                       /* graph output by risc */
       unsigned long rom[];
                                     /* contents of read-only memory */
                                    /* length of rom vector */
       unsigned long size;
                                           /* if nonzero, this many registers will be traced */
       unsigned long trace_regs;
  { register unsigned long l;
                                        /* memory address */
                                        /* memory or register contents */
     register unsigned long m;
     register Vertex *v;
                                 /* the current gate of interest */
                              /* the current output list element of interest */
     register Arc *a;
                               /* general-purpose indices */
     register long k, r;
     long x, s, n, c, o;
                            /* status bits */
     if (trace\_regs) \langle Print a headline 44 \rangle;
     r = gate\_eval(g, "0", \Lambda);
                                  /* reset the RISC by turning off the RUN bit */
     if (r < 0) return r;
                                 /* not a valid gate graph! */
     g \rightarrow vertices \rightarrow val = 1;
                              /* turn the RUN bit on */
     while (1) {
       for (a = g \rightarrow outs, l = 0; a; a = a \rightarrow next) l = 2 * l + a \rightarrow tip \rightarrow val; /* set l = memory address */
       if (trace_regs) \langle Print register contents 46 \rangle;
                                  /* stop if memory check occurs */
       if (l \geq size) break;
       for (v = q \rightarrow vertices + 1, m = rom[l]; v \leq q \rightarrow vertices + 16; v + +, m \gg 1) v \rightarrow val = m \& 1;
             /* store bits of memory word in the input gates */
       gate\_eval(g, \Lambda, \Lambda);
                                /* do another RISC cycle */
     if (trace_regs) \( \text{Print a footline 45} \);
     \langle \text{Dump the register contents into } risc\_state \ 47 \rangle;
     return 0;
This code is used in section 7.
44. If tracing is requested, we write on the standard output file.
\langle \text{ Print a headline 44} \rangle \equiv
     for (r = 0; r < trace\_regs; r++) printf("_\rangle r\%-2ld_\rangle", r);
                                                                         /* register names */
     printf("⊔P⊔XSNKV⊔MEM\n"); /* prog, extra, status bits, memory */
This code is used in section 43.
```

22 RISC MANAGEMENT GB\_GATES §45

```
45. \langle \text{ Print a footline 45} \rangle \equiv
  printf("Execution_terminated_with_memory_address_%04lx.\n", l);
This code is used in section 43.
46. Here we peek inside the circuit to see what values are about to be latched.
\langle \text{ Print register contents } 46 \rangle \equiv
     for (r = 0; r < trace\_regs; r++) {
        v = g \text{-}vertices + (16 * r + 47); /* most significant bit of register r * / r + 47
        m = 0:
        if (v \rightarrow typ \equiv 'L')
           for (k = 0, m = 0; k < 16; k++, v--) m = 2 * m + v \rightarrow alt \rightarrow val;
        printf("\%041x_{\sqcup}", m);
     for (k = 0, m = 0, v = q \rightarrow vertices + 26; k < 10; k++, v--) m = 2 * m + v \rightarrow alt \rightarrow val;
     x = (g \neg vertices + 31) \neg alt \neg val;
                                             /* extra */
                                              /* sign */
     s = (g \neg vertices + 27) \neg alt \neg val;
                                               /* nonzero */
     n = (g \rightarrow vertices + 28) \rightarrow alt \rightarrow val;
                                               /* carry */
     c = (g \rightarrow vertices + 29) \rightarrow alt \rightarrow val;
                                              /* overflow */
     o = (g \rightarrow vertices + 30) \rightarrow alt \rightarrow val;
     printf("\%031x\%c\%c\%c\%c_{\square}", m \ll 2, x?'X':'.',s?'S':'.',n?'N':'.',c?'K':'.',
           o ? '∀' : '.');
     if (l \geq size) printf("????\n");
     else printf("\%04lx\n", rom[l]);
This code is used in section 43.
47. \(\rangle\) Dump the register contents into risc\_state 47\(\rangle\) \equiv
  for (r = 0; r < 16; r ++) {
     v = g \text{-}vertices + (16 * r + 47); /* most significant bit of register r * / r + 47
     m = 0;
     if (v \rightarrow typ \equiv 'L')
        for (k = 0, m = 0; k < 16; k++, v--) m = 2 * m + v \rightarrow alt \rightarrow val;
     risc\_state[r] = m;
  for (k = 0, m = 0, v = g \text{-}vertices + 26; k < 10; k++, v--) m = 2 * m + v \text{-}alt \text{-}val; /* prog */
  m = 4 * m + (g \rightarrow vertices + 31) \rightarrow alt \rightarrow val; /* extra */
  m = 2 * m + (g \neg vertices + 27) \neg alt \neg val;
                                                      /* sign */
  m = 2 * m + (g \neg vertices + 28) \neg alt \neg val;
                                                       /* nonzero */
  m = 2 * m + (g \neg vertices + 29) \neg alt \neg val;
                                                       /* carry */
  m = 2 * m + (g \neg vertices + 30) \neg alt \neg val;
                                                        /* overflow */
  risc\_state[16] = m; /* program register and status bits go here */
  risc\_state[17] = l;
                           /* this is the out-of-range address that caused termination */
This code is used in section 43.
48. \langle Global variables 48\rangle \equiv
  unsigned long risc_state [18];
This code is used in section 7.
```

- **49. Generalized gate graphs.** For intermediate computations, it is convenient to allow two additional types of gates:
  - 'C' denotes a constant gate of value z.I.
  - '=' denotes a copy of a previous gate; utility field alt points to that previous gate.

Such gates might appear anywhere in the graph, possibly interspersed with the inputs and latches.

Here is a simple subroutine that prints a symbolic representation of a generalized gate graph on the standard output file:

```
#define bit z.I
                           /* field containing the constant value of a 'C' gate */
\#define \ print\_gates \ p\_gates
                                            /* abbreviation makes chopped-off name unique */
\langle \text{ The } print\_gates \text{ routine } 49 \rangle \equiv
  static void pr\_gate(v)
         Vertex *v;
   { register Arc *a;
      printf("\%s_{\sqcup}=_{\sqcup}", v \rightarrow name);
      switch (v \rightarrow typ) {
      case 'I': printf("input"); break;
      case 'L': printf("latch");
        if (v \rightarrow alt) printf("ed_\%s", v \rightarrow alt \rightarrow name);
      case ', '; printf(", "); break;
      case 'C': printf("constant_{\square}%ld", v \rightarrow bit);
         break;
      case '=': printf("copy_lof_l%s", v \rightarrow alt \rightarrow name);
      for (a = v \rightarrow arcs; a; a = a \rightarrow next) {
        if (a \neq v \rightarrow arcs) printf ("\"\chick\cu'\", (char) v \rightarrow typ);
         printf(a \rightarrow tip \rightarrow name);
      printf("\n");
  void print_gates(g)
         Graph *g;
   { register Vertex *v;
      register Arc *a;
      for (v = g \neg vertices; \ v < g \neg vertices + g \neg n; \ v ++) \ pr\_gate(v);
      for (a = g \rightarrow outs; a; a = a \rightarrow next)
         if (is_boolean(a¬tip)) printf("Output⊔%ld\n", the_boolean(a¬tip));
         else printf("Output_{\square}%s\n", a \rightarrow tip \rightarrow name);
This code is used in section 7.
50. \langle gb\_gates.h 1 \rangle +\equiv
#define bit z.I
```

This code is used in section 51.

24

**51.** The reduce routine takes a generalized graph q and uses the identities  $\overline{\overline{x}} = x$  and

```
\begin{split} &x\wedge 0=0,\quad x\wedge 1=x,\quad x\wedge x=x,\quad x\wedge \overline{x}=0,\\ &x\vee 0=x,\quad x\vee 1=1,\quad x\vee x=x,\quad x\vee \overline{x}=1,\\ &x\oplus 0=x,\quad x\oplus 1=\overline{x},\quad x\oplus x=0,\quad x\oplus \overline{x}=1, \end{split}
```

to create an equivalent graph having no 'C' or '=' or obviously redundant gates. The reduced graph also excludes any gates that are not used directly or indirectly in the computation of the output values.

```
\langle \text{Internal subroutines } 11 \rangle + \equiv
  static Graph *reduce(g)
       Graph *g;
  { register Vertex *u, *v; /* the current vertices of interest */
     register Arc *a, *b; /* the current arcs of interest */
     \mathbf{Arc} *aa, *bb; /* their predecessors */
     Vertex *latch_ptr; /* top of the latch list */
     long n = 0; /* the number of marked gates */
     Graph *new_graph; /* the reduced gate graph */
     Vertex *next_vert = \Lambda, *max_next_vert = \Lambda; /* allocation of new vertices */
     \mathbf{Arc} * avail\_arc = \Lambda; /* list of recycled arcs */
                               /* end of the vertices */
     Vertex *sentinel;
     if (g \equiv \Lambda) panic (missing_operand); /* where is g? */
     sentinel = g \neg vertices + g \neg n;
     while (1) {
       latch\_ptr = \Lambda;
       for (v = q \text{-}vertices; v < sentinel; v ++) \ \langle \text{Reduce gate } v, \text{ if possible, or put it on the latch list 53} \rangle;
       (Check to see if any latch has become constant; if not, break 52);
     (Mark all gates that are used in some output 60);
     \langle \text{Copy all marked gates to a new graph } 62 \rangle;
     gb\_recycle(g);
     return new_graph;
  }
      We will link latches together via their v.V fields.
\langle Check to see if any latch has become constant; if not, break 52 \rangle \equiv
  { char no\_constants\_yet = 1;
     \mathbf{for}\ (v = latch\_ptr;\ v;\ v = v \neg v.V)\ \{
       u = v \rightarrow alt; /* the gate whose value will be latched */
       if (u \rightarrow typ \equiv '=') v \rightarrow alt = u \rightarrow alt;
       else if (u \rightarrow typ \equiv 'C') {
          v \rightarrow typ = 'C'; v \rightarrow bit = u \rightarrow bit; no\_constants\_yet = 0;
       }
     if (no_constants_yet) break;
```

```
#define foo x.V /* link field used to find all the gates later */
\langle \text{Reduce gate } v, \text{ if possible, or put it on the latch list } 53 \rangle \equiv
      switch (v \rightarrow typ) {
      case 'L': v \rightarrow v.V = latch\_ptr; latch\_ptr = v; break;
      case 'I': case 'C': break;
      case '=': u = v \rightarrow alt;
         if (u \rightarrow typ \equiv '=') v \rightarrow alt = u \rightarrow alt;
         else if (u \rightarrow typ \equiv 'C') {
            v \rightarrow bit = u \rightarrow bit; goto make\_v\_constant;
         break;
      case NOT: (Try to reduce an inverter, then goto done 54);
      case AND: (Try to reduce an AND gate 55); goto test_single_arg;
      case OR: (Try to reduce an OR gate 56); goto test_single_arg;
      case XOR: (Try to reduce an EXCLUSIVE-OR gate 57);
      test\_single\_arg:
         if (v \rightarrow arcs \rightarrow next) break;
         v \rightarrow alt = v \rightarrow arcs \rightarrow tip;
      make\_v\_eq: v \rightarrow typ = '='; goto make\_v\_arcless;
      make\_v\_1: v \rightarrow bit = 1; goto make\_v\_constant;
      make_{-}v_{-}\theta: v_{-}bit = 0;
      make_v\_constant: v \rightarrow typ = 'C';
      make\_v\_arcless: v \neg arcs = \Lambda;
      v - bar = \Lambda; /* this field will point to the complement, if computed later */
   done: v \rightarrow foo = v + 1; /* this field will link all the vertices together */
This code is used in section 51.
54. \langle Try to reduce an inverter, then goto done 54\rangle
   u = v \neg arcs \neg tip;
   if (u \rightarrow typ \equiv '=') u = v \rightarrow arcs \rightarrow tip = u \rightarrow alt;
   if (u \rightarrow typ \equiv C)
      v \rightarrow bit = 1 - u \rightarrow bit; goto make\_v\_constant;
   } else if (u \rightarrow bar) { /* this inverse already computed */
      v \rightarrow alt = u \rightarrow bar; goto make\_v\_eq;
   } else {
      u \rightarrow bar = v; \ v \rightarrow bar = u; \ \mathbf{goto} \ done;
This code is used in section 53.
```

26

```
55. \langle Try to reduce an AND gate 55\rangle
   for (a = v \neg arcs, aa = \Lambda; a; a = a \neg next) {
       u = a \rightarrow tip;
       if (u \rightarrow typ \equiv '=') u = a \rightarrow tip = u \rightarrow alt;
       if (u \rightarrow typ \equiv 'C') {
          if (u \rightarrow bit \equiv 0) goto make_{-}v_{-}\theta;
           goto bypass_and;
       } else for (b = v \rightarrow arcs; b \neq a; b = b \rightarrow next) {
              if (b \rightarrow tip \equiv u) goto bypass\_and;
              if (b \rightarrow tip \equiv u \rightarrow bar) goto make_{-}v_{-}\theta;
           }
       aa = a; continue;
   bypass\_and:
       if (aa) aa \rightarrow next = a \rightarrow next;
       else v \rightarrow arcs = a \rightarrow next;
   if (v \rightarrow arcs \equiv \Lambda) goto make_{-}v_{-}1;
This code is used in section 53.
56. \langle Try to reduce an OR gate 56\rangle \equiv
   for (a = v \rightarrow arcs, aa = \Lambda; a; a = a \rightarrow next) {
       u = a \rightarrow tip;
       if (u \rightarrow typ \equiv '=') u = a \rightarrow tip = u \rightarrow alt;
       if (u \rightarrow typ \equiv C) {
          if (u \rightarrow bit) goto make_{-}v_{-}1;
           goto bypass_or;
       } else for (b = v \rightarrow arcs; b \neq a; b = b \rightarrow next) {
              if (b \neg tip \equiv u) goto bypass\_or;
              if (b \rightarrow tip \equiv u \rightarrow bar) goto make_{v-1};
       aa = a; continue;
   bypass\_or:
       if (aa) aa \rightarrow next = a \rightarrow next;
       else v \rightarrow arcs = a \rightarrow next;
   if (v \rightarrow arcs \equiv \Lambda) goto make_{-}v_{-}\theta;
This code is used in section 53.
```

```
\langle Try to reduce an EXCLUSIVE-OR gate 57\rangle
   \{ long cmp = 0;
      \textbf{for} \ (a=v \neg arcs, aa=\Lambda; \ a; \ a=a \neg next) \ \{
         u = a \rightarrow tip;
         if (u \rightarrow typ \equiv '=') u = a \rightarrow tip = u \rightarrow alt;
         if (u \rightarrow typ \equiv C) {
            if (u \rightarrow bit) cmp = 1 - cmp;
             goto bypass_xor;
         } else for (bb = \Lambda, b = v \rightarrow arcs; b \neq a; b = b \rightarrow next) {
                if (b \rightarrow tip \equiv u) goto double\_bypass;
                if (b \rightarrow tip \equiv u \rightarrow bar) {
                   cmp = 1 - cmp;
                   goto double_bypass;
                bb = b; continue;
             double\_bypass:
                if (bb) bb \rightarrow next = b \rightarrow next;
                else v \rightarrow arcs = b \rightarrow next;
                goto bypass_xor;
         aa = a; continue;
      bypass\_xor:
         if (aa) aa \rightarrow next = a \rightarrow next;
         else v \rightarrow arcs = a \rightarrow next;
         a \rightarrow a.A = avail\_arc;
         avail\_arc = a;
      if (v \rightarrow arcs \equiv \Lambda) {
         v \rightarrow bit = cmp;
         goto make\_v\_constant;
      if (cmp) \langle Complement one argument of v 58\rangle;
   }
This code is used in section 53.
      \langle Complement one argument of v 58\rangle \equiv
      for (a = v \rightarrow arcs; ; a = a \rightarrow next) {
         u = a \rightarrow tip;
         if (u \rightarrow bar) break;
                                      /* good, the complement is already known */
                                        /* oops, this is our last chance */
         if (a \rightarrow next \equiv \Lambda) {
             \langle Create a new vertex for complement of u 59\rangle;
            break;
      a \rightarrow tip = u \rightarrow bar;
This code is used in section 57.
```

28

**59.** Here we've come to a subtle point: If a lot of XOR gates involve an input that is set to the constant value 1, the "reduced" graph might actually be larger than the original, in the sense of having more vertices (although fewer arcs). Therefore we must have the ability to allocate new vertices during the reduction phase of *reduce*. At least one arc has been added to the *avail\_arc* list whenever we reach this portion of the program.

```
\langle \text{Create a new vertex for complement of } u \text{ 59} \rangle \equiv
  if (next\_vert \equiv max\_next\_vert) {
      next\_vert = gb\_typed\_alloc(7, Vertex, g \neg aux\_data);
      if (next\_vert \equiv \Lambda) {
         gb\_recycle(g);
         panic(no\_room + 1);
                                        /* can't get auxiliary storage! */
      max\_next\_vert = next\_vert + 7;
  }
  next\_vert \rightarrow typ = NOT;
  sprintf(name\_buf, "\%s~", u \rightarrow name);
  next\_vert \neg name = gb\_save\_string(name\_buf);
  next\_vert \neg arcs = avail\_arc;
                                           /* this is known to be non-\Lambda */
  avail\_arc \neg tip = u;
   avail\_arc = avail\_arc \rightarrow a.A;
  next\_vert \neg arcs \neg next = \Lambda;
  next\_vert \neg bar = u;
  next\_vert \neg foo = u \neg foo;
  u \rightarrow foo = u \rightarrow bar = next\_vert ++;
This code is used in section 58.
```

**60.** During the marking phase, we will use the w.V field to link the list of nodes-to-be-marked. That field will turn out to be non- $\Lambda$  only in the marked nodes. (We no longer use its former meaning related to complementation, so we call it lnk instead of bar.)

```
#define lnk\ w.V /* stack link for marking */

\( \text{Mark all gates that are used in some output } 60 \) \( \equiv \) \( \text{for } (v = g \tau vertices; \ v \neq sentinel; \ v = v \tau foo) \ v \tau lnk = \Lambda; \\
\( \text{for } (a = g \tau outs; \ a; \ a = a \tau next) \ \{ \quad v = a \tau tip; \\
\( \text{if } (is \text{boolean}(v)) \) \( \text{continue}; \\
\( \text{if } (v \tau typ \equiv '\text{c'}) \) \( \text{ this output is constant, so make it boolean } */ \\
\( a \tau tip = (\text{Vertex} *) v \tau bit; \\
\( \text{continue}; \\
\) \( \text{Mark all gates that are used to compute } v \ 61 \); \\
\} \)
```

This code is used in section 51.

```
61. \langle Mark all gates that are used to compute v 61\rangle \equiv
  if (v \rightarrow lnk \equiv \Lambda) {
      v \!\! \! \rightarrow \!\! lnk = sentinel;
                                /* v now represents the top of the stack of nodes to be marked */
      do {
         n++;
         b = v \neg arcs;
         if (v \rightarrow typ \equiv `L") {
            \dot{u} = \dot{v}-alt; /* latch vertices have a "hidden" dependency */
            if (u < v) n++; /* latched input value will get a special gate */
            if (u \rightarrow lnk \equiv \Lambda) {
               u \neg lnk = v \neg lnk;
               v = u;
            } else v = v \rightarrow lnk;
         } else v = v \rightarrow lnk;
         for (; b; b = b \rightarrow next) {
            u = b \rightarrow tip;
            if (u \rightarrow lnk \equiv \Lambda) {
               u \dashv lnk = v;
               v = u;
      } while (v \neq sentinel);
```

This code is used in section 60.

GENERALIZED GATE GRAPHS

30

It is easier to copy a directed acyclic graph than to copy a general graph, but we do have to contend with the feedback in latches.

```
\#define reverse\_arc\_list(alist)
            { for (aa = alist, b = \Lambda; aa; b = aa, aa = a) {
                   a = aa \neg next;
                   aa \neg next = b;
               }
                alist = b; }
\langle Copy all marked gates to a new graph 62\rangle \equiv
   new\_graph = gb\_new\_graph(n);
   if (new\_graph \equiv \Lambda) {
      qb\_recycle(q);
      panic(no\_room + 2);
                                      /* out of memory */
   strcpy(new\_graph \rightarrow id, g \rightarrow id);
   strcpy (new\_graph \neg util\_types, \verb"ZZZIIVZZZZZZA");
   next\_vert = new\_graph \neg vertices;
   for (v = g \neg vertices, latch\_ptr = \Lambda; v \neq sentinel; v = v \neg foo) {
      if (v→lnk) {
                          /* yes, v is marked */
         u = v \rightarrow lnk = next\_vert ++;
                                                /* make note of where we've copied it */
         \langle Make u a copy of v; put it on the latch list if it's a latch 63\rangle;
      }
   \langle Fix up the alt fields of the newly copied latches 64\rangle;
   reverse\_arc\_list(g \rightarrow outs);
   for (a = g \rightarrow outs; a; a = a \rightarrow next) {
      b = gb\_virgin\_arc();
      b \rightarrow tip = is\_boolean(a \rightarrow tip) ? a \rightarrow tip : a \rightarrow tip \rightarrow lnk;
      b \rightarrow next = new\_graph \rightarrow outs;
      new\_graph \neg outs = b;
This code is used in section 51.
63. \langle Make u a copy of v; put it on the latch list if it's a latch 63\rangle \equiv
   u \rightarrow name = gb\_save\_string(v \rightarrow name);
   u \rightarrow typ = v \rightarrow typ;
   if (v \rightarrow typ \equiv `L") {
      u \rightarrow alt = latch\_ptr; latch\_ptr = v;
   }
   reverse\_arc\_list(v \rightarrow arcs);
   for (a = v \rightarrow arcs; a; a = a \rightarrow next) gb\_new\_arc(u, a \rightarrow tip \rightarrow lnk, a \rightarrow len);
This code is used in section 62.
```

```
64. ⟨Fix up the alt fields of the newly copied latches 64⟩ ≡
while (latch_ptr) {
u = latch_ptr¬lnk; /* the copy of a latch */
v = u¬alt;
u¬alt = latch_ptr¬alt¬lnk;
latch_ptr = v;
if (u¬alt < u) ⟨Replace u¬alt by a new gate that copies an input 65⟩;</li>
}
This code is used in section 62.
```

65. Suppose we had a latch whose value was originally the AND of two inputs, where one of those inputs has now been set to 1. Then the latch should still refer to a subsequent gate, equal to the value of the other input on the previous cycle. We create such a gate here, making it an OR of two identical inputs. We do this because we're not supposed to leave any '=' in the result of *reduce*, and because every OR is supposed to have at least two inputs.

```
 \left\langle \text{Replace } u \text{-}alt \text{ by a new gate that copies an input } 65 \right\rangle \equiv \left\{ \begin{array}{l} v = u \text{-}alt; & /* \text{ the input gate that should be copied for latching } */ \\ u \text{-}alt = next\_vert ++; \\ sprintf\left(name\_buf, \text{"%s>%s"}, v \text{-}name, u \text{-}name\right); \\ u = u \text{-}alt; \\ u \text{-}name = gb\_save\_string\left(name\_buf\right); \\ u \text{-}typ = \text{OR}; \\ gb\_new\_arc\left(u, v, \text{DELAY}\right); \ gb\_new\_arc\left(u, v, \text{DELAY}\right); \\ \end{array} \right\}
```

This code is used in section 64.

**66. Parallel multiplication.** Now comes the *prod* routine, which constructs a rather different network of gates, based this time on a divide-and-conquer paradigm. Let's take a breather before we tackle it. (Deep breath.)

The subroutine call prod(m, n) creates a network for the binary multiplication of unsigned m-bit numbers by n-bit numbers, assuming that  $m \ge 2$  and  $n \ge 2$ . There is no upper limit on the sizes of m and n, except of course the limits imposed by the size of memory in which this routine is run.

The overall strategy used by *prod* is to start with a generalized gate graph for multiplication in which many of the gates are identically zero or copies of other gates. Then the *reduce* routine will perform local optimizations leading to the desired result. Since there are no latches, some of the complexities of the general *reduce* routine are avoided.

All of the AND, OR, and XOR gates of the network returned by prod have exactly two inputs. The depth of the circuit (i.e., the length of its longest path) is  $3\log m/\log 1.5 + \log(m+n)/\log \phi + O(1)$ , where  $\phi = (1+\sqrt{5})/2$  is the golden ratio. The grand total number of gates is  $6mn + 5m^2 + O((m+n)\log(m+n))$ .

There is a demonstration program called MULTIPLY that uses prod to compute products of large integers.

```
 \begin{array}{ll} \left\langle \text{The } prod \; \text{routine } 66 \right\rangle \equiv \\ & \text{Graph } *prod (m,n) \\ & \text{unsigned long } m,n; \quad /* \; \text{lengths of the binary numbers to be multiplied } */ \\ \left\{ \; \left\langle \text{Local variables for } prod \; 68 \right\rangle \right. \\ & \text{if } \left( m < 2 \right) \; m = 2; \\ & \text{if } \left( n < 2 \right) \; n = 2; \\ & \left\langle \text{Allocate space for a temporary graph } g \; \text{and for auxiliary tables } 67 \right\rangle; \\ & \left\langle \text{Fill } g \; \text{with generalized gates that do parallel multiplication } 70 \right\rangle; \\ & \text{if } \left( gb\_trouble\_code \right) \; \left\{ \\ & gb\_recycle(g); \; panic(alloc\_fault); \quad /* \; \text{too big } */ \\ & \left. \right\} \\ & g = reduce(g); \\ & \text{return } g; \quad /* \; \text{if } g \equiv \Lambda, \; \text{the } panic\_code \; \text{was set by } reduce \; */ \\ & \right\} \\ \end{array}
```

This code is used in section 7.

33

§67

67. The divide-and-conquer recurrences used in this network lead to interesting patterns. First we use a method for parallel column addition that reduces the sum of three numbers to the sum of two numbers. Repeated use of this reduction makes it possible to reduce the sum of m numbers to a sum of just two numbers, with a total circuit depth that satisfies the recurrence T(3N) = T(2N) + O(1). Then when the result has been reduced to a sum of two numbers, we use a parallel addition scheme based on recursively "golden sectioning the data"; in other words, the recursion partitions the data into two parts such that the ratio of the larger part to the smaller part is approximately  $\phi$ . This technique proves to be slightly better than a binary partition would be, both asymptotically and for small values of m + n.

We define flog N, the Fibonacci logarithm of N, to be the smallest nonnegative integer k such that  $N \leq F_{k+1}$ . Let N = m+n. Our parallel adder for two numbers of N bits will turn out to have depth at most  $2+\operatorname{flog} N$ . The unreduced graph g in our circuit for multiplication will have fewer than  $(6m+3\operatorname{flog} N)N$  gates.

```
\langle Allocate space for a temporary graph g and for auxiliary tables 67\rangle \equiv
  m_{plus} = m + n; \langle \text{Compute } f = \text{flog}(m + n) | 69 \rangle;
  g = gb\_new\_graph((6*m - 7 + 3*f)*m\_plus\_n);
  if (g \equiv \Lambda) \ panic(no\_room);
                                     /* out of memory before we're even started */
  sprintf(g \neg id, "\texttt{prod}(\% \texttt{lu}, \% \texttt{lu})", m, n);
  strcpy(g→util_types, "ZZZIIVZZZZZZZA");
  long\_tables = gb\_typed\_alloc(2 * m\_plus\_n + f, long, g \neg aux\_data);
  vert\_tables = gb\_typed\_alloc(f * m\_plus\_n, Vertex *, g \rightarrow aux\_data);
  if (gb_trouble_code) {
     gb\_recycle(g);
     panic(no\_room + 1);
                                  /* out of memory trying to create auxiliary tables */
  }
This code is used in section 66.
68. \langle \text{Local variables for } prod 68 \rangle \equiv
                                     /* guess what this variable holds */
  unsigned long m_{-}plus_{-}n;
                /* initially flog(m+n), later flog of other things */
                   /* graph of generalized gates, to be reduced eventually */
                           /* beginning of auxiliary array of long numbers */
  long *long\_tables;
                                 /* beginning of auxiliary array of gate pointers */
  Vertex **vert_tables;
See also sections 71 and 77.
This code is used in section 66.
69. \langle \text{Compute } f = \text{flog}(m+n) \text{ 69} \rangle \equiv
  f=4;\; j=3;\; k=5;\;\; /*\;\; j=F_f,\; k=F_{f+1}\;\; */\;\; while (k < m\_plus\_n) {
     k = k + j;
     j = k - j;
     f++;
This code is used in section 67.
```

§70

**70.** The well-known formulas for a "full adder,"

$$x + y + z = s + 2c$$
, where  $s = x \oplus y \oplus z$  and  $c = xy \lor yz \lor zx$ ,

can be applied to each bit of an N-bit number, thereby providing us with a way to reduce the sum of three numbers to the sum of two.

The input gates of our network will be called  $x_0, x_1, \ldots, x_{m-1}, y_0, y_1, \ldots, y_{n-1}$ , and the outputs will be called  $z_0, z_1, \ldots, z_{m+n-1}$ . The logic of the *prod* network will compute

$$(z_{m+n-1} \dots z_1 z_0)_2 = (x_{m-1} \dots x_1 x_0)_2 \cdot (y_{n-1} \dots y_1 y_0)_2$$

by first considering the product to be the m-fold sum  $A_0 + A_1 + \cdots + A_{m-1}$ , where

$$A_j = 2^j x_j \cdot (y_{n-1} \dots y_1 y_0)_2, \qquad 0 \le j < m.$$

Then the three-to-two rule for addition is used to define further numbers  $A_m$ ,  $A_{m+1}$ , ...,  $A_{3m-5}$  by the scheme

$$A_{m+2j} + A_{m+2j+1} = A_{3j} + A_{3j+1} + A_{3j+2}, \qquad 0 \le j \le m-3.$$

[A similar but slightly less efficient scheme was used by Pratt and Stockmeyer in Journal of Computer and System Sciences 12 (1976), Proposition 5.3. The recurrence used here is related to the Josephus problem with step-size 3; see Concrete Mathematics, §3.3.] For this purpose, we compute intermediate results  $P_j$ ,  $Q_j$ , and  $R_j$  by the rules

$$\begin{split} P_j &= A_{3j} \oplus A_{3j+1} \,; \\ Q_j &= A_{3j} \wedge A_{3j+1} \,; \\ A_{m+2j} &= P_j \oplus A_{3j+2} \,; \\ R_j &= P_j \wedge A_{3j+2} \,; \\ A_{m+2j+1} &= 2(Q_j \vee R_j) \,. \end{split}$$

Finally we let

$$U = A_{3m-6} \oplus A_{3m-5}$$
,  
 $V = A_{3m-6} \wedge A_{3m-5}$ ;

these are the values that would be  $P_{m-2}$  and  $Q_{m-2}$  if the previous formulas were allowed to run past j = m - 3. The final result  $Z = (z_{m+n-1} \dots z_1 z_0)_2$  can now be expressed as

$$Z = U + 2V.$$

The gates of the first part of the network are conveniently obtained in groups of N=m+n, representing the bits of the quantities  $A_j$ ,  $P_j$ ,  $Q_j$ ,  $R_j$ , U, and V. We will put the least significant bit of  $A_j$  in gate position g-vertices +a(j)\*N, where a(j)=j+1 for  $0 \le j < m$  and a(m+2j+t)=m+5j+3+2t for  $0 \le j \le m-3$ ,  $0 \le t \le 1$ .

 $\langle \text{Fill } g \text{ with generalized gates that do parallel multiplication } 70 \rangle \equiv next\_vert = g\neg vertices; \\ start\_prefix("X"); x = first\_of(m, 'I'); \\ start\_prefix("Y"); y = first\_of(n, 'I'); \\ \langle \text{Define } A_j \text{ for } 0 \leq j < m \text{ } 72 \rangle; \\ \langle \text{Define } P_j, Q_j, A_{m+2j}, R_j, \text{ and } A_{m+2j+1} \text{ for } 0 \leq j \leq m-3 \text{ } 73 \rangle; \\ \langle \text{Define } U \text{ and } V \text{ } 74 \rangle; \\ \langle \text{Compute the final result } Z \text{ by parallel addition } 75 \rangle;$ 

This code is used in section 66.

 $\langle \text{Local variables for } prod 68 \rangle + \equiv$ 

```
register long i, j, k, l; /* all-purpose indices */
                             /* current vertex of interest */
  register Vertex *v;
                     /* least-significant bits of the input gates */
  Vertex *x, *y;
  Vertex *alpha, *beta; /* least-significant bits of arguments */
72. \langle \text{ Define } A_j \text{ for } 0 \leq j < m \text{ 72} \rangle \equiv
  for (j = 0; j < m; j ++) {
     numeric\_prefix(`A',j);
     for (k = 0; k < j; k++) {
       v = new\_vert('C'); v \rightarrow bit = 0; /* this gate is the constant 0 */
     for (k = 0; k < n; k++) make2(AND, x + j, y + k);
     for (k = j + n; k < m_plus_n; k++) {
       v = new\_vert(,C,); v \rightarrow bit = 0; /* this gate is the constant 0 */
  }
This code is used in section 70.
73. Since m is unsigned, it is necessary to say j < m-2 here instead of j \le m-3.
#define a\_pos(j) (j < m? j + 1: m + 5*((j - m) \gg 1) + 3 + (((j - m) \& 1) \ll 1))
\langle \text{ Define } P_j, Q_j, A_{m+2j}, R_j, \text{ and } A_{m+2j+1} \text{ for } 0 \leq j \leq m-3 \ 73 \rangle \equiv
  for (j = 0; j < m - 2; j ++) {
     alpha = g \neg vertices + (a\_pos(3 * j) * m\_plus\_n);
     beta = g \neg vertices + (a\_pos(3 * j + 1) * m\_plus\_n);
     numeric\_prefix('P', j);
     for (k = 0; k < m_plus_n; k++) make 2(XOR, alpha + k, beta + k);
     numeric\_prefix('Q',j);
     for (k = 0; k < m_plus_n; k++) make 2(AND, alpha + k, beta + k);
     alpha = next\_vert - 2 * m\_plus\_n;
     beta = g \neg vertices + (a\_pos(3 * j + 2) * m\_plus\_n);
     numeric\_prefix(`A`, (long) m + 2 * j);
     for (k = 0; k < m_plus_n; k++) make 2(XOR, alpha + k, beta + k);
     numeric\_prefix('R', j);
     \mathbf{for}\ (k=0;\ k < m\_plus\_n;\ k+\!\!+\!\!)\ make2\,(\mathtt{AND}, alpha + k, beta + k);
     alpha = next\_vert - 3 * m\_plus\_n;
     beta = next\_vert - m\_plus\_n;
     numeric\_prefix(`A`, (long) m + 2 * j + 1);
     v = new\_vert("C"); v \rightarrow bit = 0; /* another 0, it multiplies Q \lor R by 2 */
     for (k = 0; k < m\_plus\_n - 1; k++) make2(OR, alpha + k, beta + k);
  }
This code is used in section 70.
```

36

74. Actually  $v_{m+n-1}$  will never be used (it has to be zero); but we compute it anyway. We don't have to worry about such nitty gritty details because *reduce* will get rid of all the obvious redundancy.

```
\begin{split} &\langle \, \text{Define } U \text{ and } V \text{ 74} \, \rangle \equiv \\ & \, alpha = g \text{-}vertices + (a\_pos(3*m-6)*m\_plus\_n); \\ & \, beta = g \text{-}vertices + (a\_pos(3*m-5)*m\_plus\_n); \\ & \, start\_prefix("U"); \\ & \, \textbf{for } (k=0; \ k < m\_plus\_n; \ k++) \ make2 (\texttt{XOR}, alpha + k, beta + k); \\ & \, start\_prefix("V"); \\ & \, \textbf{for } (k=0; \ k < m\_plus\_n; \ k++) \ make2 (\texttt{AND}, alpha + k, beta + k); \\ & \, \text{This code is used in section 70.} \end{split}
```

Parallel addition. It's time now to take another deep breath. We have finished the parallel multiplier except for one last step, the design of a parallel adder.

The adder is based on the following theory: We want to perform the binary addition

where we know that  $u_k + v_k \leq 1$  for all k. It follows that  $z_k = u_k \oplus w_k$ , where  $w_0 = 0$  and

$$w_k = v_{k-1} \lor u_{k-1}v_{k-2} \lor u_{k-1}u_{k-2}v_{k-3} \lor \cdots \lor u_{k-1}\dots u_1v_0$$

for k>0. The problem has therefore been reduced to the evaluation of  $w_1, w_2, \ldots, w_{N-1}$ . Let  $c_k^j$  denote the OR of the first j terms in the formula that defines  $w_k$ , and let  $d_k^j$  denote the j-fold product  $u_{k-1}u_{k-2}\ldots u_{k-j}$ . Then  $w_k=c_k^k$ , and we can use a recursive scheme of the form

$$c_k^j = c_k^i \vee d_k^i c_{k-i}^{j-i}, \qquad d_k^j = d_k^i d_{k-i}^{j-i}, \qquad j \ge 2,$$

to do the evaluation.

It turns out that this recursion behaves very nicely if we choose i = down[i], where down[i] is defined for j > 1 by the formula

$$down[j] = j - F_{(flog j)-1}.$$

For example, flog 18 = 7 because  $F_7 = 13 < 18 \le 21 = F_8$ , hence down $[18] = 18 - F_6 = 10$ .

Let us write  $j \to \text{down}[j]$ , and consider the oriented tree on the set of all positive integers that is defined by this relation. One of the paths in this tree, for example, is  $18 \to 10 \to 5 \to 3 \to 2 \to 1$ . Our recurrence for  $w_{18} = c_{18}^{18}$  involves  $c_{18}^{10}$ , which involves  $c_{18}^{5}$ , which involves  $c_{18}^{3}$ , and so on. In general, we will compute  $c_{k}^{3}$ for all j with  $k \to^* j$ , and we will compute  $d_k^j$  for all j with  $k \to^+ j$ . It is not difficult to prove that

$$k \to^* j \to i$$
 implies  $k-i \to^* j-i$ ;

therefore the auxiliary factors  $c_{k-i}^{j-i}$  and  $d_{k-i}^{j-i}$  needed in the recurrence scheme will already have been evaluated. (Indeed, one can prove more: Let  $l = \log k$ . If the complete path from k to 1 in the tree is  $k = k_0 \rightarrow k_1 \rightarrow \cdots \rightarrow k_t = 1$ , then the differences  $k_0 - k_1, k_1 - k_2, \ldots, k_{t-2} - k_{t-1}$  will consist of precisely the Fibonacci numbers  $F_{l-1}, F_{l-2}, \ldots, F_2$ , except for the numbers that appear when  $F_{l+1} - k$  is written as a sum of non-consecutive Fibonacci numbers.)

It can also be shown that, when k > 1, we have

$$flog k = \min_{0 \le j \le n} \max (1 + flog j, 2 + flog(k - j)),$$

and that down [k] is the smallest j such that the minimum is achieved in this equation. Therefore the depth of the circuit for computing  $w_k$  from the u's and v's is exactly flog k.

In particular, we can be sure that at most  $3 \log N$  gates will be created when computing  $z_k$ , and that there will be at most  $3N \operatorname{flog} N$  gates in the parallel addition portion of the circuit.

 $\langle$  Compute the final result Z by parallel addition 75 $\rangle \equiv$ 

(Set up auxiliary tables to handle Fibonacci-based recurrences 76);

Create the gates for W, remembering intermediate results that might be reused later 78;

 $\langle$  Compute the last gates  $Z = U \oplus W$ , and record their locations as outputs of the network 83 $\rangle$ ;

 $g - n = next\_vert - g - vertices;$  /\* reduce to the actual number of gates used \*/

This code is used in section 70.

38 PARALLEL ADDITION GB\_GATES §76

**76.** After we have created a gate for  $w_k$ , we will store its address as the value of w[k] in an auxiliary table. After we've created a gate for  $c_k^i$  where i < k is a Fibonacci number  $F_{l+1}$  and  $l = \text{flog } i \ge 2$ , we will store its address as the value of c[k + (l-2)N]; the gate  $d_k^i$  will immediately follow this one. Tables of flog j and down[j] will facilitate all these manipulations.

```
\langle Set up auxiliary tables to handle Fibonacci-based recurrences 76\rangle \equiv
  w = vert\_tables;
  c = w + m_{plus}n;
  flog = long\_tables;
  down = flog + m_{-}plus_{-}n + 1;
  anc = down + m_plus_n;
  flog[1] = 0; flog[2] = 2;
  down[1] = 0; \ down[2] = 1;
  for (i = 3, j = 2, k = 3, l = 3; l \le m\_plus\_n; l ++) {
     if (l > k) {
        k = k + j;
        \begin{array}{ll} j = k - j; \\ i + +; & / * \ F_i = j < l \leq k = F_{i+1} \ * / \end{array} 
     flog[l] = i;
     down[l] = l - k + j;
This code is used in section 75.
77. \langle \text{Local variables for } prod 68 \rangle + \equiv
  Vertex *uu, *vv;
                          /* pointer to u_0 and v_0 */
                      /* table of pointers to w_k */
  Vertex **w;
  Vertex **c;
                      /* table of pointers to potentially important intermediate values c_k^i *
  \mathbf{Vertex} \ *cc, *dd; \qquad /* \ \text{pointers to} \ c_k^i \ \text{and} \ d_k^i \ */
                   /* table of flog values */
  long *flog;
                   /* table of down values */
  long *down;
                    /* table of ancestors of the current k */
  long *anc;
```

 $\S78$  GB\_GATES

```
\langle Create the gates for W, remembering intermediate results that might be reused later 78\rangle \equiv
  vv = next\_vert - m\_plus\_n; uu = vv - m\_plus\_n;
  start_prefix("W");
  for (k = 2; k < m_{plus_n}; k++)
     \langle Set the anc table to a list of the ancestors of k in decreasing order, stopping with anc[l] = 2 79\rangle;
     i = 1; cc = vv + k - 1; dd = uu + k - 1;
     while (1) {
                         /* \text{ now } i = \text{down}[j] */
        j = anc[l];
        \langle \text{ Compute the gate } b_k^j = d_k^i \wedge c_{k-i}^{j-i} \text{ 80} \rangle;
        \langle \text{ Compute the gate } c_k^j = c_k^i \vee b_k^j \text{ 81} \, \rangle;
        if (flog[j] < flog[j+1]) /* j is a Fibonacci number */
           c[k + (flog[j] - 2) * m_plus_n] = v;
        if (l \equiv 0) break;
        cc = v;
        \langle \text{ Compute the gate } d_k^j = d_k^i \wedge d_{k-i}^{j-i} \text{ 82} \rangle;
        dd = v;
        i = j;
        l--;
     w[k] = v;
  }
This code is used in section 75.
79. If k \to j, we call j an "ancestor" of k because we are thinking of the tree defined by '\to'; this tree is
\langle Set the anc table to a list of the ancestors of k in decreasing order, stopping with anc[l] = 2.79 \rangle \equiv
  for (l = 0, j = k; ; l++, j = down[j]) {
     anc[l] = j;
     if (j \equiv 2) break;
  }
This code is used in section 78.
       #define spec\_gate(v, a, k, j, t) v = next\_vert +++;
           sprintf(name\_buf, "%c%ld:%ld", a, k, j);
           v \rightarrow name = gb\_save\_string(name\_buf);
           v \rightarrow typ = t;
\langle Compute the gate b_k^{\,j} = d_k^{\,i} \wedge c_{k-i}^{\,j-i}80 \rangle \equiv
  spec\_gate(v, 'B', k, j, AND);
  gb\_new\_arc(v, dd, DELAY);
                                     /* first argument is d_k^i */
  f = flog[j-i]; /* get ready to compute the second argument, c_{k-i}^{j-i} */
  gb\_new\_arc(v, f > 0 ? c[k - i + (f - 2) * m\_plus\_n] : vv + k - i - 1, DELAY);
This code is used in section 78.
```

40 PARALLEL ADDITION GB\_GATES §81

```
81. (Compute the gate c_k^j = c_k^i \vee b_k^j 81)
   if (l) {
      spec\_gate(v, 'C', k, j, OR);
                                       /* if l is zero, this gate is c_k^k = w_k */
   } else v = new\_vert(\mathtt{OR});
   gb\_new\_arc(v, cc, DELAY);
                                      /* first argument is c_k^i */
   gb\_new\_arc(v, next\_vert - 2, DELAY); /* second argument is b_k^j */
This code is used in section 78.
82. Here we reuse the value f = flog(j-i) computed a minute ago.
\langle \, \text{Compute the gate} \,\, d_k^{\,j} = d_k^{\,i} \wedge d_{k-i}^{\,j-i} \,\, 82 \, \rangle \equiv spec\_gate(v,\, {}^{\backprime}\text{D}\, {}^{\backprime}, k, j, \, \text{AND});
   gb\_new\_arc(v, dd, DELAY);
                                      /* first argument is d_k^i */
   gb\_new\_arc(v, f > 0 ? c[k-i+(f-2)*m\_plus\_n] + 1 : uu + k - i - 1, \texttt{DELAY}); /* d_{k-i}^{j-i}*/
This code is used in section 78.
83. The output list will contain the gates in "big-endian order" z_{m+n-1}, \ldots, z_1, z_0, because we insert
them into the outs list in little-endian order.
\langle Compute the last gates Z=U\oplus W, and record their locations as outputs of the network 83\rangle
   start_prefix("Z");
   for (k = 0; k < m\_plus\_n; k++) { register Arc *a = gb\_virgin\_arc();
      a \rightarrow tip = make2 (XOR, uu + k, w[k]);
     a \rightarrow next = g \rightarrow outs;
     g \rightarrow outs = a;
This code is used in section 75.
```

§84 GB\_GATES PARTIAL EVALUATION 41

**84.** Partial evaluation. The subroutine call  $partial\_gates(g, r, prob, seed, buf)$  creates a new gate graph from a given gate graph g by "partial evaluation," i.e., by setting some of the inputs to constant values and simplifying the result. The new graph is usually smaller than g; it might, in fact, be a great deal smaller. Graph g is destroyed in the process.

The first r inputs of g are retained unconditionally. Each remaining input is retained with probability prob/65536, and if not retained it is assigned a random constant value. For example, about half of the inputs will become constant if prob = 32768. The seed parameter defines a machine-independent source of random numbers, and it may be given any value between 0 and  $2^{31} - 1$ .

If the buf parameter is non-null, it should be the address of a string. In such a case,  $partial\_gates$  will put a record of its partial evaluation into that string; buf will contain one character for each input gate after the first r, namely '\*' if the input was retained, '0' if it was set to 0, or '1' if it was set to 1.

The new graph will contain only gates that contribute to the computation of at least one output value. Therefore some input gates might disappear even though they were supposedly "retained," i.e., even though their value has not been set constant. The *name* field of a vertex can be used to determine exactly which input gates have survived.

If graph g was created by risc, users will probably want to make  $r \geq 1$ , since the whole RISC circuit collapses to zero whenever its first input 'RUN' is set to 0.

An interesting class of graphs is produced by the function call  $partial\_gates(prod(m, n), m, 0, seed, \Lambda)$ , which creates a graph corresponding to a circuit that multiplies a given m-bit number by a fixed (but randomly selected) n-bit constant. If the constant is not zero, all m of the "retained" input gates necessarily survive. The demo program called MULTIPLY illustrates such circuits.

The graph g might be a generalized network; that is, it might involve the 'C' or '=' gates described earlier. Notice that if r is sufficiently large,  $partial\_gates$  becomes equivalent to the reduce routine. Therefore we need not make that private routine public.

As usual, the result will be  $\Lambda$ , and  $panic\_code$  will be set, if  $partial\_gates$  is unable to complete its task.

```
\langle \text{ The } partial\_gates \text{ routine } 84 \rangle \equiv
  Graph *partial\_gates(g, r, prob, seed, buf)
       Graph *q;
                        /* generalized gate graph */
                               /* the number of initial gates to leave untouched */
       unsigned long r;
                                 /* scaled probability of not touching subsequent input gates */
       unsigned long prob;
                       /* seed value for random number generation */
       char *buf;
                        /* optional parameter for information about partial assignment */
  { register Vertex *v;
                                /* the current gate of interest */
    if (g \equiv \Lambda) panic (missing_operand);
                                                /* where is q? */
    gb\_init\_rand(seed);
                            /* get them random numbers rolling */
    for (v = g \neg vertices + r; \ v < g \neg vertices + g \neg n; \ v ++)
       switch (v \rightarrow typ) {
       case 'C': case '=': continue;
                                            /* input gates might still follow */
       case 'I':
         if ((gb\_next\_rand() \gg 15) \geq prob) {
            v \rightarrow typ = 'C'; v \rightarrow bit = gb\_next\_rand() \gg 30;
            if (buf) *buf ++ = v \rightarrow bit + '0';
          } else if (buf) *buf ++ = '*';
         break:
       default: goto done;
                                  /* no more input gates can follow */
  done:
                             /* terminate the string */
    if (buf) *buf = 0;
    g = reduce(g);
    \langle Give the reduced graph a suitable id 85\rangle;
                   /* if (g \equiv \Lambda), a panic_code has been set by reduce */
  }
```

42 PARTIAL EVALUATION GB\_GATES §84

This code is used in section 7.

```
85. The buf parameter is not recorded in the graph's id field, since it has no effect on the graph itself. \langle Give the reduced graph a suitable id 85\rangle \equiv if (g) { strcpy(name\_buf, g \rightarrow id); if (strlen(name\_buf) > 54) strcpy(name\_buf + 51, "..."); sprintf(g \rightarrow id, "partial\_gates(%s,%lu,%lu,%ld)", name\_buf, r, prob, seed); } This code is used in section 84.
```

86. Index. Here is a list that shows where the identifiers of this program are defined and used.

a: 3, 36, 43, 49, 51, 83.  $first\_of$ : 19, 70. flog: 76, <u>77</u>, 78, 80.  $a\_pos: 73, 74.$ aa: <u>51</u>, 55, 56, 57, 62. foo: <u>53,</u> 59, 60, 62.  $g: \ 3, \ 43, \ 49, \ 51, \ 68, \ 84.$ add: 38.alist:  $\underline{62}$ .  $gate\_eval$ :  $\underline{1}$ ,  $\underline{3}$ ,  $\underline{10}$ ,  $\underline{43}$ . alloc\_fault: 8, 66. $gb\_init\_rand$ : 84. alpha: 71, 73, 74. gb\_new\_arc: 13, 14, 23, 24, 63, 65, 80, 81, 82. alt: 2, 3, 32, 46, 47, 49, 52, 53, 54, 55, 56, 57,  $gb\_new\_graph$ : 16, 62, 67. 60, 61, 63, 64, 65, 78.  $qb\_next\_rand$ : 84. anc: 76, <u>77</u>, 78, 79. gb\_recycle: 8, 51, 59, 62, 66, 67. AND: 2, 6, 15, 21, 22, 23, 24, 26, 27, 30, 31, 32, 34, *qb\_save\_string*: 11, 14, 59, 63, 65, 80.  $gb\_trouble\_code$ : 8, 66, 67. 35, 36, 38, 39, 41, 42, 53, 66, 72, 73, 74, 80, 82. **Arc**: 2, 3, 36, 43, 49, 51, 83.  $qb\_typed\_alloc$ : 59, 67.  $gb\_virgin\_arc$ : 36, 62, 83. arcs: 6, 49, 53, 54, 55, 56, 57, 58, 59, 61, 63.  $aux_{-}data$ : 59, 67. **Graph**: 1, 3, 8, 9, 43, 49, 51, 66, 68, 84. avail\_arc: 51, 57, 59. hop:  $36, \ \underline{37}$ .  $b: \ \underline{51}.$ i:  $\underline{71}$ . bar: 14, 53, 54, 55, 56, 57, 58, 59, 60. id: 16, 62, 67, 85. bb: 51, 57. imm: 20, 21, 22. beta: 71, 73, 74. impossible: 17.bit: 49, 50, 52, 53, 54, 55, 56, 57, 60, 72, 73, 78, 84.  $in\_vec:$  3, 4.  $inc\_dest$ : 22, 25, 39, 40. buf: 84, 85.  $bypass\_and: 55.$ ind: 20, 21, 32, 34, 35, 36.  $is\_boolean: 2, 49, 60, 62.$  $bypass\_or: \underline{56}.$  $bypass\_xor: \underline{57}.$  $j: \frac{71}{}$ .  $c: \underline{43}, \underline{77}.$ Josephus, Flavius, problem: 70. carry: 18, 19, 27, 35, 38, 41, 46, 47. jump: 31, 33, 35, 36. cc: 77, 78, 81. $k: \ \underline{9}, \ \underline{38}, \ \underline{43}, \ \underline{71}.$ change: 27, 28, 31, 32, 34, 36.  $l: \ \underline{43}, \ \underline{71}.$  $latch: \underline{32}$ comp: 14, 15, 21, 26, 27, 30, 31, 32, 34, 35, latch\_ptr: 51, 52, 53, 62, 63, 64. 36, 38, 39, 41, 42. latchit: 32, 34, 35, 36.cond: 20, 21, 27, 31, 32, 33, 35, 36. len: 63.count: 11, 12, 19. lnk: 60, 61, 62, 63, 64. dd: 77, 78, 80, 82.log: 25, 26, 31.DELAY: 13, 23, 24, 65, 80, 81, 82. long\_tables: 67, <u>68</u>, 76. dest: 20, 21, 23, 32. $m: \ \underline{43}, \ \underline{66}.$  $dest\_match$ : 23,  $\underline{25}$ , 34. *m\_plus\_n*: 67, <u>68</u>, 69, 72, 73, 74, 76, 78, 80, 82, 83.  $diff: \underline{25}, 31, 41.$  $make\_adder$ : 38, 39, 41.  $dir: \underline{20}, 21, 22.$  $make\_v\_arcless: \underline{53}.$ done: 53, 54, 84.  $make\_v\_constant$ : 53, 54, 57.  $double\_bypass:$  57.  $make_{v}=eq:$  53, 54. down: 39, 40, 76, 77, 79.  $make_{-}v_{-}\theta$ : 53, 55, 56.  $do2\colon \ \underline{20},\ 21,\ 31,\ 32,\ 34,\ 36,\ 41.$  $make_{-}v_{-}1:$  53, 55, 56.  $do3\colon \ \ \underline{20},\ 21,\ 27,\ 31,\ 38,\ 39,\ 42.$ make\_xor: 15, 30, 38, 42. make2: 13, 15, 20, 21, 22, 23, 27, 30, 31, 32, 34,  $do4: \underline{20}, 22, 23, 26, 27, 35, 36, 38, 42.$ do5: 20, 24, 31, 35.35, 36, 38, 39, 41, 42, 72, 73, 74, 83.  $even\_comp: 14, 23, 24, 38.$ make3: 13, 20, 26, 27, 31, 36, 38, 39, 41, 42. extra: 18, 19, 21, 22, 32, 44, 46, 47. make4: 13, 20, 32, 35, 36, 42. *f*: 68. make5: 13, 20, 31, 35, 36, 42. Fibonacci, Leonardo, numbers: 67.  $max_next_vert$ : 51, 59.

mem: 18, 19, 21, 22, 24, 32, 39.  $missing\_operand$ : 51, 84. mod: <u>20,</u> 21, 26, 27, 31, 35, 41, 42. n: <u>38</u>, <u>43</u>, <u>51</u>, <u>66</u>. name: 11, 14, 49, 59, 63, 65, 80, 84. name\_buf: 11, 12, 14, 59, 65, 80, 85. new\_graph: 8, 9, 16, 17, 36, 51, 62. new\_vert: 11, 13, 19, 23, 24, 72, 73, 78, 81. next: 5, 6, 36, 43, 49, 53, 55, 56, 57, 58, 59, 60, 61, 62, 63, 83.  $next\_loc$ : 25, 30, 31, 36.  $next\_next\_loc$ : 25, 30, 32, 36.  $next\_vert$ : 11, <u>12</u>, 14, 16, 17, <u>51</u>, 59, 62, 65, 70, 73, 75, 78, 80, 81. nextra: 32, 33. $no\_constants\_yet$ : 52. no\_room: 16, 59, 62, 67. nonzero: 18, 19, 27, 35, 46, 47. normal:  $36, \underline{37}$ . NOT: 2, 6, 14, 53, 59.  $numeric\_prefix: 12, 19, 72, 73.$ nzd: 32, 33, 36.nzs: 32, <u>33</u>, 36. o: 43. old\_dest: 23, 25, 26, 39, 41. old\_src: 22, 23, 24, <u>25</u>, 36. op: 20, 21, 27, 31, 35. OR: 2, 6, 15, 21, 22, 23, 24, 26, 27, 31, 32, 34, 35, 36, 38, 39, 41, 42, 53, 65, 66, 73, 81.  $out\_vec: \underline{3}, 5.$ outs: 2, 5, 36, 43, 49, 60, 62, 83. overflow: 18, 19, 27, 35, 46, 47.  $p\_gates$ : 1, 49. panic: 8, 16, 17, 51, 59, 62, 66, 67, 84. panic\_code: 8, 66, 84.  $partial\_gates: 1, 84.$  $pr\_gate$ : 49. Pratt, Vaughan Ronald: 70. prefix: 11, 12, 19.  $print\_gates: 1, 49.$ printf: 44, 45, 46, 49.  $prob: \underline{84}, 85.$ prod: 1, 2, 38, 66, 70, 84. prog: 18, 19, 21, 32, 44, 46, 47.  $r: \ \underline{9}, \ \underline{43}, \ \underline{84}.$ reduce: 51, 59, 65, 66, 74, 84.

reg: 18, 19, 23, 24, 30, 34, 36.

result: 20, 25, 31, 34, 35, 36.

risc: 1, 2, 8, 9, 38, 43, 84.

regs: 8, 16, 19, 23, 24, 34.

 $reverse\_arc\_list$ : 62, 63.

rel: 20, 21, 22.

 $risc\_state$ :  $\underline{1}$ , 43, 47,  $\underline{48}$ .  $rom: \underline{43}, 46.$  $run\_bit$ : 18, 19, 32, 36.  $run\_risc: \underline{1}, \underline{43}.$ s: 43. seed: <u>84</u>, 85. sentinel: <u>51</u>, 60, 61, 62. shift: 25, 31, 42. sign: 18, 19, 27, 35, 46, 47. $size: \underline{43}, 46.$ skip: 36, 37. source: 22, <u>25,</u> 26, 31, 36, 41, 42.  $spec\_gate: 80, 81, 82.$ special:  $36, \underline{37}$ . sprintf: 11, 12, 14, 16, 59, 65, 67, 80, 85. start\_prefix: 12, 19, 21, 22, 26, 27, 29, 41, 70, 74, 78, 83. Stockmeyer, Larry Joseph: 70. strcpy: 12, 16, 19, 62, 67, 85. strlen: 85. $sum\colon \ \underline{25},\ 31,\ 41.$  $t: \ \ \underline{3}, \ \underline{11}, \ \underline{13}.$  $test\_single\_arg: 53.$  $the\_boolean$ : 2, 49. tip: 2, 5, 6, 36, 43, 49, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 83.  $tip\_value: 2, 5.$ tmp: 20, 23, 24, 27. $trace\_regs: \underline{43}, 44, 46.$ typ: 2, 3, 11, 14, 46, 47, 49, 52, 53, 54, 55, 56,57, 59, 60, 61, 63, 65, 80, 84. t1: 15, 20, 38.t2: 15, 20, 38.t3: 20, 34, 38.t4: 20, 34, 36, 38*t5*: 20, 30, 34, 35, 36. u: 14, 15, 51. up: 39, 40. $util\_types$ : 16, 62, 67. uu: 77, 78, 82, 83. $v\colon \ \ \underline{3},\ \underline{11},\ \underline{13},\ \underline{14},\ \underline{15},\ \underline{43},\ \underline{49},\ \underline{51},\ \underline{71},\ \underline{84}.$ val: 2, 3, 4, 6, 43, 46, 47.vert\_tables: 67, 68, 76. Vertex: 3, 11, 12, 13, 14, 15, 18, 20, 25, 28, 33, 37, 38, 40, 43, 49, 51, 59, 60, 67, 68, 71, 77, 84. vertices: 3, 4, 16, 17, 43, 46, 47, 49, 51, 60, 62, 70, 73, 74, 75, 84.  $vv: \ 77, 78, 80.$ v1: 13, 20.v2: 13, 20.

v3: 13, 20.

v4: 13, 20.

 $\S 86$  GB\_GATES INDEX 45

y: <u>38</u>, <u>71</u>.

z: 38.

zz: 2.

```
\langle Add \text{ the RISC data to } new\_graph 17 \rangle Used in section 8.
\langle Allocate space for a temporary graph g and for auxiliary tables 67\rangle Used in section 66.
(Check to see if any latch has become constant; if not, break 52) Used in section 51.
 Complement one argument of v 58 \ Used in section 57.
 Compute f = flog(m+n) 69 \ Used in section 67.
 Compute the final result Z by parallel addition 75 \) Used in section 70.
\langle Compute the gate b_k^j = d_k^i \wedge c_{k-i}^{j-i} 80 \rangle Used in section 78.
 \langle \text{ Compute the gate } c_k^j = c_k^i \vee b_k^j \text{ 81 } \rangle \quad \text{Used in section 78.}   \langle \text{ Compute the gate } d_k^j = d_k^i \wedge d_{k-i}^{j-i} \text{ 82 } \rangle \quad \text{Used in section 78.} 
(Compute the last gates Z = U \oplus W, and record their locations as outputs of the network 83) Used in
     section 75.
\langle Compute the value t of a classical logic gate 6 \rangle Used in section 3.
 Copy all marked gates to a new graph 62 \ Used in section 51.
 Create a new vertex for complement of u 59 \ Used in section 58.
 Create gates for fetching the source value 22 \ Used in section 17.
 Create gates for instruction decoding 21 \rangle Used in section 17.
 Create gates for the arithmetic operations 41 \rangle Used in section 17.
 Create gates for the conditional load operations 27 \ Used in section 17.
 Create gates for the general logic operation 26 \ Used in section 17.
 Create gates for the new values of S, N, K, and V 35 \ Used in section 29.
 Create gates for the new values of register 0 and the memory address register 36 \ Used in section 29.
 Create gates for the new values of registers 1 to regs 34 \ Used in section 29.
 Create gates for the new values of the program register and extra 32 Used in section 29.
 Create gates for the shift operations 42 \rangle Used in section 41.
 Create gates for the next_loc and next_next_loc bits 30 \ Used in section 29.
 Create gates for the result bits 31 \) Used in section 29.
 Create gates that bring everything together properly 29 \> Used in section 17.
 Create the gates for W, remembering intermediate results that might be reused later 78 \ Used in section 75.
 Create the inputs and latches 19 \rangle Used in section 17.
 Define A_j for 0 \le j < m 72 \ Used in section 70.
 Define P_j, Q_j, A_{m+2j}, R_j, and A_{m+2j+1} for 0 \le j \le m-3 73 \rangle Used in section 70.
 Define U and V 74 \rangle Used in section 70.
 Dump the register contents into risc_state 47 \ Used in section 43.
 Fill q with generalized gates that do parallel multiplication 70 \rangle Used in section 66.
 Fix up the alt fields of the newly copied latches 64 \ Used in section 62.
 Give the reduced graph a suitable id 85 \ Used in section 84.
 Global variables 48 \rangle Used in section 7.
 Initialize new\_graph to an empty graph of the appropriate size 16 \( \rightarrow \) Used in section 8.
 Internal subroutines 11, 13, 14, 15, 38, 51 \ Used in section 7.
 Local variables for prod 68, 71, 77 \ Used in section 66.
 Local variables for risc 9, 18, 20, 25, 28, 33, 37, 40 Used in section 8.
 Make u a copy of v; put it on the latch list if it's a latch 63 \) Used in section 62.
 Mark all gates that are used in some output 60 \ Used in section 51.
 Mark all gates that are used to compute v 61 \quad Used in section 60.
 Print a footline 45 \> Used in section 43.
 Print a headline 44 \> Used in section 43.
 Print register contents 46 \ Used in section 43.
 Private variables 12 \rangle Used in section 7.
 Read a sequence of input values from in\_vec \ 4 Used in section 3.
 Reduce gate v, if possible, or put it on the latch list 53 \rangle Used in section 51.
 Replace u \rightarrow alt by a new gate that copies an input 65 \ Used in section 64.
```

(Set the anc table to a list of the ancestors of k in decreasing order, stopping with anc[l] = 2.79) Used in section 78. (Set up auxiliary tables to handle Fibonacci-based recurrences 76) Used in section 75.  $\langle$  Set  $inc\_dest$  to  $old\_dest$  plus SRC 39 $\rangle$  Used in section 22. (Set old\_dest to the present value of the destination register 23) Used in section 22. Set  $old\_src$  to the present value of the source register 24 \rangle Used in section 22. Store the sequence of output values in  $out\_vec 5$  Used in section 3. The  $gate\_eval$  routine 3 \rangle Used in section 7. The  $partial\_gates$  routine 84  $\rangle$  Used in section 7. The  $print\_gates$  routine 49  $\rangle$  Used in section 7. The prod routine 66  $\rangle$  Used in section 7. The risc routine 8 \rightarrow Used in section 7. The  $run\_risc$  routine 43 \rangle Used in section 7. Try to reduce an inverter, then **goto** done 54 \) Used in section 53. Try to reduce an AND gate 55 \ Used in section 53. Try to reduce an EXCLUSIVE-OR gate 57 \ Used in section 53. Try to reduce an OR gate 56 \ Used in section 53.  $\langle gb\_gates.h 1, 2, 50 \rangle$ 

## $GB_-GATES$

|                         | Secti | ion I | 2age |
|-------------------------|-------|-------|------|
| Introduction            |       | . 1   | 1    |
| The RISC netlist        |       | . 8   | 5    |
| Serial addition         |       | 38    | 18   |
| RISC management         |       | 43    | 21   |
| Generalized gate graphs |       | 49    | 23   |
| Parallel multiplication |       |       |      |
| Parallel addition       |       | 75    | 37   |
| Partial evaluation      |       | 84    | 41   |
| Index                   |       | 86    | 43   |

## © 1993 Stanford University

This file may be freely copied and distributed, provided that no changes whatsoever are made. All users are asked to help keep the Stanford GraphBase files consistent and "uncorrupted," identical everywhere in the world. Changes are permissible only if the modified file is given a new name, different from the names of existing files in the Stanford GraphBase, and only if the modified file is clearly identified as not being part of that GraphBase. (The CWEB system has a "change file" facility by which users can easily make minor alterations without modifying the master source files in any way. Everybody is supposed to use change files instead of changing the files.) The author has tried his best to produce correct and useful programs, in order to help promote computer science research, but no warranty of any kind should be assumed.

Preliminary work on the Stanford Graph Base project was supported in part by National Science Foundation grant CCR-86-10181.