## Complementary Synthesis for Pipelined Encoder

**Abstract**— Complementary synthesis automatically generates an encoder's decoder that recovers the encoder's inputs from its output. However, the generated decoders is non-pipelined and therefore very slow. On the other hand, many encoders from industrial projects have pipelined structure that can be exploited to resolve this problems.

Thus, we propose a novel algorithm to first find out the encoder's pipeline registers in each pipeline stage, and then characterize all Boolean functions that recover each of these pipeline registers from the registers in the next pipeline stage, and finally characterize the Boolean functions that recover the encoder's input variables from the first pipeline stage.

Experimental results on several complex encoders indicate that this algorithm can always correctly generate a pipelined decoders with significantly improved speed.

#### I. Introduction

One of the most difficult jobs in designing communication and multimedia chips is to design and verify complex encoder and decoder pairs. The encoder maps its input variables  $\vec{i}$  to its output variables  $\vec{o}$ , while the decoder recovers  $\vec{i}$  from  $\vec{o}$ . Complementary synthesis [11, 10, 9, 5, 6, 12] eases this job by automatically generating a decoder from an encoder, with the assumption that  $\vec{i}$  can always be uniquely determined by a bounded sequence of  $\vec{o}$ . Thus, the decoder's Boolean function can be characterized with the algorithm proposed by Jiang et al. [4] based on Craig interpolant [1].

By studying the structure of many encoders from industrial projects, we find that most of them have a pipeline structure that can be exploited to significantly improve the quality of the generated decoders.

For example, one simple encoder is shown in Figure Ia). It has a pipeline stage  $s\vec{t}g^0$  with some registers. The inputs variables  $\vec{i}$  are used to compute  $s\vec{t}g^0$ , while  $s\vec{t}g^0$  are used to compute the output variables  $\vec{o}$ . According to this structure,  $s\vec{t}g^0$  can be uniquely determined by  $\vec{o}$ , while  $\vec{i}$  can be uniquely determined by  $s\vec{t}g^0$ .

So, a properly designed decoder, often written by human engineers, should be like the one shown in Figure Ib), which recovers  $s\vec{t}g^0$  from  $\vec{o}$  with combinational logic  $C^1$ , and further recovers  $\vec{i}$  from  $s\vec{t}g^0$  with combinational logic  $C^0$ . In such a decoder, the critical path is cut by registers  $s\vec{t}g^0$ , which improves the circuit speed.

However, all existing complementary synthesis algorithms [10, 9, 5, 6, 12] generate the decoders with Jiang's algorithm [4] based on Craig interpolant [1]. As shown in



Fig. 1. The pipelined encoder and its decoders

Figure Ic), these decoders recover  $\vec{i}$  directly from  $\vec{o}$  with a large combinational logic  $C^0 * C^1$ . Such decoders are unnecessarily slow because there are no registers to cut their critical path.

To generate a proper decoder like Figure Ib) in complementary synthesis, we propose a novel algorithm to first find out the encoder's pipeline registers in each pipeline stage  $s\vec{t}g^j$ , and then characterize all Boolean functions that recover each pipeline stage  $s\vec{t}g^j$  from the next pipeline stage  $s\vec{t}g^{j+1}$  or output variables  $\vec{o}$ , and finally characterize the Boolean functions that recover the encoder's input  $\vec{i}$  from the first pipeline stage  $s\vec{t}g^0$ .

Experimental results on several complex encoders, such as PCI Express [8] and Ethernet [3], indicate that this algorithm can always correctly generate a pipelined decoder with significantly improved speed.

The remainder of this paper is organized as follows. Section II introduces the background material; Section III infers the pipeline structure, while Section IV characterizes the Boolean functions that recover the input variables and pipeline registers; Sections V and VI present the experimental results and related works; Finally, Section VII sums up the conclusion.

## II. Preliminaries

#### A. Propositional satisfiability

The Boolean set is  $\mathbb{B} = \{0, 1\}$ . A variables vector is  $\vec{v} = (v, ...)$ . The number of variables in  $\vec{v}$  is  $|\vec{v}|$ . If a variable v is a member of  $\vec{v}$ , then we say  $v \in \vec{v}$ ; otherwise



Fig. 2. The under and over-approximative approaches

 $v \notin \vec{v}$ .  $v \cup \vec{v}$  is the vector containing both v and all members of  $\vec{v}$ .  $\vec{v}/\vec{w}$  is the vector containing all members of  $\vec{v}$  but no member of  $\vec{w}$ .  $\vec{a} \cup \vec{b}$  is the vector with all members of  $\vec{a}$  and  $\vec{b}$ . The set of truth valuations of  $\vec{v}$  is  $[\![\vec{v}]\!]$ , for instance,  $[\![(v_1, v_2)]\!] = \{(0, 0), (0, 1), (1, 0), (1, 1)\}$ .

The propositional satisfiability problem (SAT) for a formula F over a variable set V is to find a satisfying assignment  $A: V \to \mathbb{B}$ , so that F can be evaluated to 1. If A exists, then F is satisfiable; otherwise, it is unsatisfiable.

For formulas  $\phi_A$  and  $\phi_B$ , with  $\phi_A \wedge \phi_B$  unsatisfiable, there exists a formula  $\phi_I$  referring only to the common variables of  $\phi_A$  and  $\phi_B$  such that  $\phi_A \Rightarrow \phi_I$  and  $\phi_I \wedge \phi_B$  is unsatisfiable.  $\phi_I$  is the **Craig interpolant** [1] of  $\phi_A$  with respect to  $\phi_B$ .  $\phi_I$  can be computed by McMillan's algorithm [7] and used to characterize a Boolean function from a Boolean relation [4].

#### B. Finite state machine

The encoder is modeled by a finite state machine(FSM)  $M = (\vec{s}, \vec{i}, \vec{o}, T)$ , consisting of a state variable vector  $\vec{s}$ , an input variable vector  $\vec{i}$ , an output variable vector  $\vec{o}$ , and a transition function  $T : [\![\vec{s}]\!] \times [\![\vec{i}]\!] \to [\![\vec{s}]\!] \times [\![\vec{o}]\!]$  that computes the next state and output variable vector from the current state and input variable vector.

The behavior of FSM M can be reasoned by unrolling transition function for multiple steps. The state variable  $s \in \vec{s}$ , input variable  $i \in \vec{i}$  and output variable  $o \in \vec{o}$  at the n-th step are respectively denoted as  $s_n$ ,  $i_n$  and  $o_n$ . Furthermore, the state, the input and the output variable vectors at the n-th step are respectively denoted as  $\vec{s}_n$ ,  $\vec{i}_n$ 

and  $\vec{o}_n$ . A **path** is a state sequence  $\langle \vec{s}_n, \dots, \vec{s}_m \rangle$  with  $\exists \vec{i}_j \vec{o}_j (\vec{s}_{j+1}, \vec{o}_j) \equiv T(\vec{s}_j, \vec{i}_j)$  for all  $n \leq j < m$ . A **loop** is a path  $\langle \vec{s}_n, \dots, \vec{s}_m \rangle$  with  $\vec{s}_n \equiv \vec{s}_m$ .

C. The halting algorithm to determine if an input variable can be uniquely determined by a bounded sequence of output variable vector

The first halting algorithm [10] iteratively unrolls the transition function. For each iteration, it uses an underapproximative and an over-approximative approaches presented respectively in C.1 and C.2 to determine the answer. We will show in C.3 that these two approaches will eventually converge to a conclusive answer.

## C.1 The under-approximative approach

As shown in Figure 2a), on the unrolled transition functions, an input variable  $i \in \vec{i}$  can be uniquely determined, if there exist three integers p, l and r, such that for any particular valuation of the output sequence  $\langle \vec{o}_p, \ldots, \vec{o}_{p+l+r} \rangle$ ,  $i_{p+l}$  cannot be 0 and 1 at the same time. This is equal to the unsatisfiability of  $F_{PC}(p, l, r)$  in Equation (1).

Here, p is the length of the prefix state transition sequence. l and r are the lengths of the two output sequences  $\langle \vec{o}_{p+1}, \ldots, \vec{o}_{p+l} \rangle$  and  $\langle \vec{o}_{p+l+1}, \ldots, \vec{o}_{p+l+r} \rangle$  used to determine  $i_{p+l}$ . Line 2 of Equation (1) corresponds to the path in Figure 2a), while Line 3 is a copy of it. These two paths are of the same length. Line 4 forces these two paths' output sequences to be the same, while Line 5 forces their  $i_{p+l}$  to be different.

According to Equation (1), for  $p' \geq p$ ,  $l' \geq l$  and  $r' \geq r$ , the clause set of  $F_{PC}(p', l', r')$  is a super set of  $F_{PC}(p, l, r)$ . So, the bounded proof of  $F_{PC}(p, l, r)$ 's unsatisfiability can be generalized to unbounded cases.

**Proposition 1** If  $F_{PC}(p,l,r)$  is unsatisfiable, then  $i_{p+l}$  can be uniquely determined by  $\langle \vec{o}_p, \dots, \vec{o}_{p+l+r} \rangle$  for all larger p, l and r.

#### C.2 The over-approximative approach

If  $F_{PC}(p, l, r)$  presented in last subsection is satisfiable, there are two possibilities:

1.  $i_{p+l}$  can be uniquely determined by  $\vec{o}_p, \dots, \vec{o}_{p+l+r} > \text{for some larger } p, l \text{ and } r;$ 

**Algorithm 1:** CheckUniqueness(i): The halting algorithm to determine whether  $i \in \vec{i}$  can be uniquely determined by a bounded sequence of output variable vector  $\vec{o}$ 

**Input**: The input variable  $i \in \vec{i}$ .

**Output**: whether  $i \in \vec{i}$  can be uniquely determined by  $\vec{o}$ , and the value of p, l and r.

```
1 p:=0; l:=0; r:=0;

2 while 1 do

3 | p++; l++; r++;

4 | if F_{PC}(p,l,r) is unsatisfiable then

5 | return (yes, p, l, r);

6 | else if F_{LN}(p,l,r) is satisfiable then

7 | return (no, p, l, r);
```

2.  $i_{p+l}$  can't be uniquely determined by  $<\vec{o}_p,\ldots,\vec{o}_{p+l+r}>$  for any  $p,\ l$  and r at all.

If it is the 1st case, then by iteratively increasing p, l and r,  $F_{PC}(p, l, r)$  will eventually become unsatisfiable. But if it is the 2nd case, this method will never terminate.

So, to obtain a halting algorithm, we need to distinguish these two cases. One such solution is shown in Figure 2b), which is similar to Figure 2a) but with three additional constraints used to detect loops on the three state sequences  $\langle \vec{s}_0, \ldots, \vec{s}_p \rangle, \langle \vec{s}_{p+1}, \ldots, \vec{s}_{p+l} \rangle$  and  $\langle \vec{s}_{p+l+1}, \ldots, \vec{s}_{p+l+r} \rangle$ . It is formally defined in Equation (2) with the last three lines corresponding to the three new constraints used to detect loops.

$$F_{LN}(p, l, r) := \begin{cases} F_{PC}(p, l, r) \\ \wedge \bigvee_{x=0}^{p-1} \bigvee_{y=x+1}^{p} \{\vec{s}_x \equiv \vec{s}_y \wedge \vec{s'}_x \equiv \vec{s'}_y\} \\ \wedge \bigvee_{x=p+1}^{p+l-1} \bigvee_{y=x+1}^{p+l+1} \{\vec{s}_x \equiv \vec{s}_y \wedge \vec{s'}_x \equiv \vec{s'}_y\} \\ \wedge \bigvee_{x=p+l+1}^{p+l+r-1} \bigvee_{y=x+1}^{p+l+r} \{\vec{s}_x \equiv \vec{s}_y \wedge \vec{s'}_x \equiv \vec{s'}_y\} \end{cases}$$
 (2)

When  $F_{LN}(p,l,r)$  is satisfiable, then  $i_{p+l}$  can't be uniquely determined by  $\langle \vec{o}_p, \dots, \vec{o}_{p+l+r} \rangle$ . More importantly, by unrolling these three loops, we can generalize the satisfiability of  $F_{LN}(p,l,r)$  to all larger p,l and r. This means:

**Proposition 2** If  $F_{LN}(p,l,r)$  is satisfiable, then  $i_{p+l}$  cannot be uniquely determined by  $\langle \vec{o}_p, \ldots, \vec{o}_{p+l+r} \rangle$  for all larger p, l and r.

Please refer to [10] for more detail of this.

## C.3 The full algorithm

With Propositions 1 and 2, we can generalize their bounded proof to unbounded cases. This leads to the halting Algorithm 1 that search for p, l and r that enable

an input variable  $i_{p+l}$  to be uniquely determined by the output sequence  $\langle \vec{o}_p, \dots, \vec{o}_{p+l+r} \rangle$ :

- 1. On the one hand, if there exists such p, l and r, then let p' := max(p, l, r), l' := max(p, l, r) and r' := max(p, l, r). From Propositions 1, we know that  $F_{PC}(p', l', r')$  is unsatisfiable. So eventually  $F_{PC}(p, l, r)$  will become unsatisfiable in Line 4;
- 2. On the other hand, if there doesn't exist such p, l and r, then eventually p, l and r will be larger than the encoder's longest path without loop, which means that there will be three loops in  $\langle \vec{s}_0, \ldots, \vec{s}_p \rangle, \langle \vec{s}_{p+1}, \ldots, \vec{s}_{p+l} \rangle$  and  $\langle \vec{s}_{p+l+1}, \ldots, \vec{s}_{p+l+r} \rangle$ . This will make  $F_{LN}(p, l, r)$  satisfiable in Line 6.

Both cases will lead to this Algorithm's termination. Please refer to [10] for more detail of this algorithm's correctness and termination proof.

### III. INFERRING THE ENCODER'S PIPELINE STRUCTURE

## A. A general model for the encoder

As shown in Figure 3, we assume that the encoder have n pipeline stages. If we take the combinational logic block  $C^j$  as a function, then this encoder can be represented by the following equations.

$$\begin{array}{lll}
s\vec{t}g^{0} & := & C^{0}(\vec{i}) \\
s\vec{t}g^{j} & := & C^{j}(s\vec{t}g^{j-1}) & 1 \le j \le n-1 \\
\vec{o} & := & C^{n}(s\vec{t}g^{n-1})
\end{array} \tag{3}$$

Thus, each  $C^j$  can be seen as a small encoder that computes  $s\vec{t}g^j$  or  $\vec{o}$  from  $s\vec{t}g^{j-1}$  or  $\vec{i}$ .

In the remainder of this paper, superscript always means the pipeline stage, while the subscript, as mentioned in Subsection B, always means the step index in the unrolled transition function. For example,  $s\vec{t}g^j$  is the j-th pipeline stage. While  $s\vec{t}g^j_i$  is the value of this j-th pipeline stage at the i-th step in the unrolled state transition sequence.

## B. Inferring p, l and r

Before inferring the pipeline stages, we first apply Algorithm 1 to infer the value of p, l and r that can make the output sequence  $\langle o_p, \ldots, o_{p+l+r} \rangle$  uniquely determine all  $i_{p+l} \in \vec{i}_{p+l}$ .



Fig. 3. A general structure of the encoder

## **Algorithm 2:** RemoveRedundancy(p, l, r)

for 
$$r' := r \to 0$$
 do
$$\begin{array}{c|c}
\mathbf{if} \ r' \equiv 0 \ or \ F_{PC}(p, l, r' - 1) \ is \ satisfiable \ for \\
some \ i \in \vec{i} \ \mathbf{then} \\
\mathbf{3} \quad \boxed{\quad \text{break}} \\
\mathbf{4} \ \text{return} \ r'
\end{array}$$

As there are more than one  $i \in \vec{i}$ , we need to apply Algorithm 1 for each  $i \in \vec{i}$  to get the p, l and r for each of them.

And then we set the final p,l and r to be the maximal p,l and r of all  $i \in \vec{i}$  respectively. According to Equation 1, these values of p,l and r can indeed make the output sequence  $< o_p, \ldots, o_{p+l+r} >$  uniquely determine all  $i_{p+l} \in \vec{i}_{p+l}$ .

## C. Minimizing r and l

As Algorithm 1 increases p, l and r simultaneously, there may be some redundancy in the value of l and r. So we need to first minimize r in Algorithm 2

In Line 2, when  $F_{PC}(p,l,r'-1)$  is satisfiable, then r' is the last one that makes  $F_{PC}(p,l,r')$  unsatisfiable, we return it directly. On the other hand, when  $r' \equiv 0$ ,  $F_{PC}(p,l,0)$  must have been tested in last iteration, and the result must be unsatisfiable. In this case we return 0.

Now, we have a minimized r from Algorithm 2, which can make  $\vec{i}_{p+l}$  to be uniquely determined by  $\langle \vec{o}_p, \dots, \vec{o}_{p+l+r} \rangle$ .

We further require that:

- 1. As shown in Figure 4, l can be reduced to 0, which means  $\vec{i}_p$  can be uniquely determined by  $\langle \vec{o}_p, \dots, \vec{o}_{p+r} \rangle$ , that is, the set of future outputs.
- 2. The above mentioned output sequence  $\langle \vec{o}_p, \dots, \vec{o}_{p+r} \rangle$  can be further reduced to  $\vec{o}_{p+r}$ . This means  $\vec{o}_{p+r}$  is the only output vector needed to recover the input vector  $\vec{i}_p$ .

Checking these two requirements equals to checking the unsatisfiability of the following equation.

$$F'_{PC}(p,r) := \begin{cases} & \bigwedge_{m=0}^{p+r} \{ (\vec{s}_{m+1}, \vec{o}_m) \equiv T(\vec{s}_m, \vec{i}_m) \} \\ & \wedge & \bigwedge_{m=0}^{p+r} \{ (\vec{s'}_{m+1}, \vec{o'}_m) \equiv T(\vec{s'}_m, \vec{i'}_m) \} \\ & \wedge & \vec{o}_{p+r} \equiv \vec{o'}_{p+r} \\ & \wedge & i_n \equiv 1 \wedge i'_n \equiv 0 \end{cases}$$

$$(4)$$

This equation seems much stronger than the general requirement in Equation (1). But we will show in experimental results that they are always fulfilled.



Fig. 4. Recovering input with reduced output sequence

### D. Inferring pipeline stages

Now, with the inferred p and r, we need to generalize  $F'_{PC}$  in Equation (4) to the following new formula that can determine whether a particular variable v at step j can be uniquely determined by a vector  $\vec{w}$  at step k. Now v and  $\vec{w}$  can be either input, registers or output variables.

Obviously, when  $F_{PC}''(p,r,v,j,\vec{w},k)$  is unsatisfiable,  $\vec{w}_k$  can uniquely determine  $v_j$ .

The last pipeline stage  $s\vec{t}g^{n-1}$  is exactly the set of registers  $s \in \vec{s}$  that can be uniquely determined at the p+r-th step by  $\vec{o}$ . It can be formally defined as:

$$s\vec{t}g^{n-1} := \left\{ s \in \vec{s} \mid \begin{array}{l} F_{PC}''(p,r,s,p+r,\vec{o},p+r) \\ is \ unsatisfiable \end{array} \right\} \quad (6)$$

Similarly, for  $0 \le j \le n-2$ ,  $s\vec{t}g^j$  at j-((n-2)-(p+r-1))-th step can be uniquely determined by  $s\vec{t}g^{j+1}$  at j-((n-2)-(p+r-1))+1-th step. So we can recursively defined  $s\vec{t}g^j$  as :

$$S := \vec{s} / \bigcup_{j < k \le n-2} s \vec{t} g^k$$

$$D := (n-2) - (p+r-1)$$
(7)

$$\vec{stg}^{j} := \begin{cases} s \in S \mid F_{PC}^{"}(p, r, s, j - D, s\vec{tg}^{j+1}, j - D + 1) \\ is \ unsatisfiable \end{cases}$$
 (8)

With Equation (6) and (8), all the pipeline stages can now be inferred.

E. Inferring the pipeline stage that uniquely determines input vector

According to Figure 3,  $s\vec{t}g^0$  defined in Equation (8) is exactly the pipeline stage that uniquely determined the input vector  $\vec{i}$ .

But in real encoders, this may not be the case. So we need to search for the smallest j from 0 to n-1 that can make  $\vec{i}$  to be uniquely determined by  $s\vec{t}g^j$ , that is, the smallest j that can make  $F''_{PC}(p,r,i,p,s\vec{t}g^j,j-D)$  unsatisfiable for all  $i \in \vec{i}$ , with D defined in Equation (7).

## IV. CHARACTERIZING THE BOOLEAN FUNCTION OF INPUT VARIABLES AND PIPELINE REGISTERS

A. Characterizing the Boolean function of the last pipeline stage

According to Equation (6), every registers  $s \in s\vec{t}g^{n-1}$  can be uniquely determined by  $\vec{o}$  at the p+r-th step, that is,  $F_{PC}''(p,r,s,p+r,\vec{o},p+r)$  is unsatisfiable and can be partitioned into:

$$\phi_A := \begin{cases} & \bigwedge_{m=0}^{p+r} \{ (\vec{s}_{m+1}, \vec{o}_m) \equiv T(\vec{s}_m, \vec{i}_m) \} \\ & \bigwedge & s_{p+r} \equiv 1 \end{cases}$$
(9)

As  $F_{PC}''(p,r,s,p+r,\vec{o},p+r)$  equals to  $\phi_A \wedge \phi_B$ , so  $\phi_A \wedge \phi_B$  is unsatisfiable. And the common variables of  $\phi_A$  and  $\phi_B$  is  $\vec{o}_{p+r}$ .

According to [4], a Craig interpolant  $\phi_I$  of  $\phi_A$  with respect to  $\phi_B$  can be constructed, which refer only to  $\vec{o}_{p+r}$ , and covers all the valuation of  $\vec{o}_{p+r}$  that can make  $s_{p+r} \equiv 1$ . At the same time,  $\phi_I \wedge \phi_B$  is unsatisfiable, which means  $\phi_I$  covers nothing that can make  $s_{p+r} \equiv 0$ .

Thus,  $\phi_I$  can be used as the decoder's Boolean function that recovers  $s \in s\vec{tg}^{n-1}$  from  $\vec{o}$ .

## B. Characterizing the Boolean function of other pipeline stages

Similar to last subsection, we can partition the unsatisfiable formula  $F_{PC}''(p,r,s,j-D,s\vec{tg}^{j+1},j-D+1)$  in Equation (8) into the following two equations:

$$\phi_A := \begin{cases} & \bigwedge_{m=0}^{p+r} \{ (\vec{s}_{m+1}, \vec{o}_m) \equiv T(\vec{s}_m, \vec{i}_m) \} \\ & \land & s_{j-D} \equiv 1 \end{cases}$$

$$(11)$$

Again, a Craig interpolant  $\phi_I$  of  $\phi_A$  with respect to  $\phi_B$  can be constructed, and used as the decoder's Boolean function that recovers  $s \in s\vec{t}g^j$  from  $s\vec{t}g^{j+1}$ .

# C. Characterizing the Boolean function of the encoder's input variables

According to Subsection III.E, we have found the smallest j that can make  $F_{PC}''(p,r,i,p,s\vec{t}g^j,j-D)$  unsatisfiable for all  $i\in\vec{i}$ , with D defined in Equation (7).  $F_{PC}''(p,r,i,p,s\vec{t}g^j,j-D)$  is unsatisfiable and can be partitioned into:

$$\phi_A := \begin{cases} & \bigwedge_{m=0}^{p+r} \{ (\vec{s}_{m+1}, \vec{o}_m) \equiv T(\vec{s}_m, \vec{i}_m) \} \\ & \wedge & i_p \equiv 1 \end{cases}$$
 (13)

$$\phi_{B} := \begin{cases} & \bigwedge_{m=0}^{p+r} \{ (\vec{s'}_{m+1}, \vec{o'}_{m}) \equiv T(\vec{s'}_{m}, \vec{i'}_{m}) \} \\ & \wedge & s\vec{t}g_{j-D}^{j} \equiv s\vec{t}g'_{j-D}^{j} \\ & \wedge & i'_{p} \equiv 0 \end{cases}$$

$$(14)$$

Similar to last subsection, the Craig interpolant  $\phi_I$  of  $\phi_A$  with respect to  $\phi_B$  can be used as the decoder's Boolean function that recovers  $i \in \vec{i}$  from  $s\vec{t}g^j$ .

#### V. Experimental Results

We have implemented these algorithms in OCaml language, and solved the generated CNF formulas with MiniSat 1.14 [2]. All experiments have been run on a server with 16 Intel Xeon E5648 processors at 2.67GHz, 192GB memory, and CentOS 5.4 Linux.

Table I shows the benchmarks used in this paper. The 2nd and 3rd column show respectively the number of inputs, outputs and registers of each benchmark. The 4th column shows the area of the encoder when mapped to LSI10K library with Design Compiler. In this paper, all area and delay are obtained in the same setting.

The 6th to 8th columns show respectively the run time of [10]'s algorithm to generate the decoder without pipeline, and the delay and area of the generated decoder. While the 9th to 11th columns show respectively the run time of this paper's algorithm to generate the pipelined

TABLE I
BENCHMARKS AND EXPERIMENTAL RESULTS

| Names     | The encoders |     |      |                        | decoder gener- |              |      | decoder gener- |                    |      |           |  |
|-----------|--------------|-----|------|------------------------|----------------|--------------|------|----------------|--------------------|------|-----------|--|
|           |              |     |      |                        |                | ated by [10] |      |                | ated by this paper |      |           |  |
|           | #            | #   | area | Description            | run            | delay        | area | run            | delay              | area | #         |  |
|           | in/out       | reg |      | of Encoders            | time           | (ns)         |      | time           | (ns)               |      | reg       |  |
| pcie      | 10/11        | 23  | 326  | PCIE 2.0 [8]           | 0.37           | 7.20         | 624  | 3.57           | 5.89               | 652  | 9/12      |  |
| xgxs      | 10/10        | 16  | 453  | Ethernet clause 48 [3] | 0.21           | 7.02         | 540  | 1.57           | 5.93               | 829  | 13        |  |
| t2eth     | 14/14        | 49  | 2252 | Ethernet clause 36 [3] | 12.7           | 6.54         | 434  | 47.2           | 6.12               | 877  | 8/8/10/20 |  |
| scrambler | 64/64        | 58  | 1034 | inserting 01 flipping  | no pipeline    |              |      |                |                    |      |           |  |
| xfi       | 72/66        | 72  | 7772 | Ethernet clause 49 [3] | stages found   |              |      |                |                    |      |           |  |

decoder, and the delay and area of the generated decoder. The last column shows the number of registers in each pipeline stage.

Comparing the 7th and the 10th column indicates that the decoders' delay have been significantly improved. And the the last column shows that there actually exist very deep pipeline, especially the t2eth with 4 pipeline stages.

One thing that is a little bit surprise is, the two largest benchmarks scrambler and xfi do not have pipeline stages inside. We study their code and confirm that this is true. Their area are so large because they use much wider datapaths with 64 to 72 bits.

### VI. RELATED PUBLICATIONS

Shen et al.[11] proposed the first complementary synthesis algorithm. It checks the decoder's existence by iteratively unrolling transition function to longer path, and generates the decoder's Boolean function by enumerating all satisfying assignments. But it may not halt and is too slow in building the decoder.

The halting problem was independently tackled in Shen et al.[10] and Liu at al.[5] by searching for loops in the state sequence, while the runtime overhead problem was addressed in [9, 5] by interpolant [4].

Shen et al. [9] inferred an assertion for configuration pins that can lead to the decoder's existence. Tu et al.[12] proposed a breakthrough algorithm that uses the encoder's infinite history to generate the decoder's output.

### VII. CONCLUSIONS

This paper proposes the first complementary synthesis algorithm that can handle pipelined encoders. Experimental results on several complex encoders indicate that this algorithm can always correctly infer the encoder's pipeline structure, and generate a pipelined decoder that is significantly faster.

#### References

- [1] W. Craig. Linear reasoning: A new form of the herbrand-gentzen theorem. *The Journal of Symbolic Logic*, 22(3):250–268, Sept. 1957.
- [2] N. Eén and N. Sörensson. An extensible sat-solver. In SAT 2003, pages 502-518, 2003.
- [3] IEEE. Ieee standard for ethernet section 4, 2012.
- [4] W.-L. H. Jie-Hong Roland Jiang, Hsuan-Po Lin. Interpolating functions from large boolean relations. In ICCAD '09, pages 779–784, 2009.
- [5] H.-Y. Liu, Y.-C. Chou, C.-H. Lin, and J.-H. R. Jiang. Towards completely automatic decoder synthesis. In ICCAD '11, pages 389–395, 2011.
- [6] H.-Y. Liu, Y.-C. Chou, C.-H. Lin, and J.-H. R. Jiang. Automatic decoder synthesis: Methods and case studies. *IEEE Tran. on CAD of IC and Sys.*, 31(9):31:1319–31:1331, September 2012.
- [7] K. L. McMillan. Interpolation and sat-based model checking. In CAV 2003, pages 1–13,2003.
- [8] PCI-SIG. Pci express base 2.1 specification, 2009.
- [9] S. Shen, Y. Qin, K. Wang, Z. Pang, J. Zhang, and S. Li. Inferring assertion for complementary synthesis. *IEEE Tran. on CAD of IC and Sys.*, 31(8):31:1288–31:1292, August 2012.
- [10] S. Shen, Y. Qin, L. Xiao, K. Wang, J. Zhang, and S. Li. A halting algorithm to determine the existence of the decoder. *IEEE Tran. on CAD of IC and Sys.*, 30(10):30:1556–30:1563, October 2011.
- [11] S. Shen, J. Zhang, Y. Qin, and S. Li. Synthesizing complementary circuits automatically. In ICCAD '09, pages 381–388, 2009.
- [12] K.-H. Tu and J.-H. R. Jiang. Synthesis of feedback decoders for initialized encoders. In DAC '13, pages 1–6, 2013.