### An efficient implementation of D-Flip-Flop using the GDI technique



### AN EFFICIENT IMPLEMENTATION OF D-FLIP-FLOP USING THE GDI TECHNIQUE

Arkadiy Morgenshtein<sup>1</sup>, Alexander Fish<sup>2</sup> and Israel A. Wagner<sup>3</sup>

Electrical Engineering Department, Technion, Haifa, Israel, e-mail: arkadiy@tx.technion.ac.il
 The VLSI Systems Center, Ben-Gurion University, Beer-Sheva, Israel, e-mail: afish@ee.bgu.ac.il
 IBM Research Laboratory, Haifa, Israel, e-mail: wagner@il.ibm.com

#### **ABSTRACT**

A new implementation of efficient D-Flip-Flop (DFF) using Gate-Diffusion-Input (GDI) technique is presented. This DFF design allows reducing power-delay product and area of the circuit, while maintaining low complexity of logic design. Performance comparison with other DFF design techniques is presented, with respect to gate area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI DFF, as compared to other methods. A variety of circuits have been implemented in  $0.35\mu m$  and  $0.18\mu m$  technologies to compare the proposed GDI structure with existing alternatives, showing an up-to 45% reduction in power-delay product in GDI. Properties of implemented circuit are discussed and simulation results are reported.

### I. INTRODUCTION

Wide utilization of memory storage systems and sequential logic in modern electronics triggers a demand for high-performance and low-area implementations of basic memory components. One of the most important state-holding elements is the D-Flip-Flop (DFF) [1]. Various DFF circuits were researched and presented in the literature [1][3][4], aiming to achieve an optimal design in terms of delay, power and area. Some efficient techniques were developed and adopted by designers for a variety of technologies [1]. Gate-Diffusion-Input (GDI) design technique that was recently developed and presented in [6], proposes an efficient alternative for logic design in standard CMOS and SOI technologies.

The GDI method is based on the simple cell shown in Fig. 1. A basic GDI cell contains four terminals - G (the common gate input of the nMOS and pMOS transistors), P (the outer diffusion node of the pMOS transistor), N (the outer diffusion node of the nMOS transistor) and the D node (the common diffusion of both transistors). P, N and D may be used as either input or output ports, depending on the circuit structure.

Table 1 shows how various configuration changes of the inputs P, N and G in the basic GDI cell correspond to different Boolean functions at the output D. GDI enables simpler gates, lower transistor count, and lower power dissipation in many implementations, as compared with standard CMOS and PTL design techniques [6].

Multiple-input gates can be implemented by combining several GDI cells. The buffering constraints, due to possible  $V_{TH}$  drop, are described in detail in [6], as well as technological compatibility with CMOS and SOI



Fig. 1. GDI basic cell

| N   | P   | G | D                    | Function |  |
|-----|-----|---|----------------------|----------|--|
| '0' | В   | A | $\overline{A}B$      | F1       |  |
| В   | '1' | A | $\overline{A} + B$   | F2       |  |
| '1' | В   | A | A + B                | OR       |  |
| В   | '0' | A | AB                   | AND      |  |
| C   | В   | A | $\overline{A}B + AC$ | MUX      |  |
| '0' | '1' | A | $\overline{A}$       | NOT      |  |

Table 1. Some logic functions that can be implemented with a single GDI cell

In this paper a new implementation of DFF using GDI technique is presented. This new DFF design allows reducing power-delay product and area of the circuit, while maintaining low complexity of logic design. GDI DFF is compared with other design techniques with respect to gate area, number of devices, delay and power dissipation, showing advantages and drawbacks of GDI implementation. The circuits have been implemented in  $0.35\mu m$  and  $0.18\mu m$  technologies and simulated to compare the proposed GDI structure with existing alternatives.

Section II presents the structure of the GDI D-Flip-Flop and its operational principle. In Section III the

alternative design methods are presented and compared with GDI. Conclusions and future work are discussed in Section IV.

# II. GDI D-FLIP-FLOP IMPLEMENTATION AND OPTIMIZATION

A novel implementation of a GDI DFF is shown in Fig. 2. It is based on the Master-Slave connection of two GDI D-Latches. Each latch consists of four basic GDI cells, resulting in a simple eight-transistor structure.

The components of the circuit can be divided into two main categories:

- (a) Body gates responsible for the state of the circuit. These gates are controlled by the Clk signal and create two alternative paths: one for transparent state of the latch (when the Clk is low and the signals are propagating through PMOS transistors), and another for the holding state of the latch (when the Clk is high and internal values are maintained due to conduction of the NMOS transistors).
- (b) *Inverters* (marked by ×) responsible for maintaining the complementary values of the internal signals and the circuit outputs. An additional important role of inverters is buffering of the internal signals for swing restoration and improved driving abilities of the outputs.

This partition to categories can be helpful for understanding of circuit operation and optimization. As can be seen, in body gates the transmission of the signal is performed through the diffusion nodes of the GDI cells. It might cause a swing drop of  $V_{TH}$  in the output signals. This problem is solved by the internal inverters in their buffer role.

Performance optimization of the proposed circuit can be performed by adjusting the transistor sizes (as sweep parameter in simulation) to obtain a minimal powerdelay product. This procedure is iterative and contains a sequence of separate size adjustments:

- (a) First, the same scaling factor is obtained for all transistors of the circuit (body gates and inverters).
- (b) Secondly, iterative size optimizations are applied separately to inverters and body gates (mostly by opposite shifting of the scaling factors around the "operation point" found in (a)), while targeting the minimal power-delay product.
- (c) For high load requirements, an additional optimization can be separately performed on the inverter of the Slave latch.

The relatively compact structure of the proposed DFF, containing 18 transistors (with the inverter for complementary value of D), makes it an efficient

alternative for obtaining the combination of low area and high performance.



Fig. 2. GDI D-Flip-Flop implementation

### III. SIMULATION RESULTS

In this work the proposed GDI DFF circuit has been implemented in  $0.35\mu m$  and  $0.18\mu m$  technology to compare the GDI design with a set of representative flip-flops, commonly used for high performance design.

Nine sets of comparisons were carried out on the test circuits. The circuits were simulated using Cadence SpectreS at 3.3V and 1.8V (for 0.35µm and 0.18µm technologies, respectively), 250 MHz and 27°C, with load capacitance of 100fF. In our simulations the parasitic capacitances were taken into account. The simulation setup is shown in Fig. 3. The device under test was placed between input buffers to account for the current consumption from the previous stage, and output buffers to emulate real environmental conditions.



Fig. 3. Simulation setup

The reference circuits are presented in Fig. 4. The set includes (a) modified  $C^2MOS$  after [1], (b) PowerPC after [3], (c) HLFF after [4] and (d) DSTC after [1]. These circuits have been sized according to optimization procedure, as presented in [1].



Fig. 4. Set of representative flip-flops for comparison: (a) modified  $C^2MOS$ , (b) PowerPC, (c) HLFF and (d) DSTC

| Circuit | # of<br>transistors | Total<br>Width<br>[um] | CLK-Q<br>(LH)<br>[ps] | CLK-Q'<br>(HL)<br>[ps] | CLK-Q<br>(HL)<br>[ps] | CLK-Q'<br>(LH)<br>[ps] | Power<br>[uw] | Max<br>Delay<br>[ps] | PDPtot<br>[fJ] |
|---------|---------------------|------------------------|-----------------------|------------------------|-----------------------|------------------------|---------------|----------------------|----------------|
| GDI     | 18                  | 33.1                   | 196.9                 | 272.1                  | 275.0                 | 203.2                  | 151.7         | 275.0                | 41.7           |
| DSTC    | 12                  | 56.8                   | 380.5                 | 208.5                  | 380.5                 | 177.3                  | 171.9         | 380.6                | 65.4           |
| PowerPC | 20                  | 59.5                   | 282.5                 | 390.4                  | 248.4                 | 332.5                  | 141.0         | 390.4                | 55.0           |
| C2mos   | 24                  | 64.9                   | 185.5                 | 463.1                  | 88.8                  | 343.9                  | 138.2         | 463.1                | 64.0           |
| HLFF    | 20                  | 105.4                  | 147.2                 | 193.4                  | 102.9                 | 210.7                  | 242.9         | 210.7                | 51.2           |

Table 2. Simulation results for 0.18μm technology

| Circuit | # of<br>transistors | Total<br>Width<br>[um] | CLK-Q<br>(LH)<br>[ps] | CLK-Q'<br>(HL)<br>[ps] | CLK-Q<br>(HL)<br>[ps] | CLK-Q'<br>(LH)<br>[ps] | Power<br>[uw] | Max<br>Delay<br>[ps] | PDPtot<br>[fJ] |
|---------|---------------------|------------------------|-----------------------|------------------------|-----------------------|------------------------|---------------|----------------------|----------------|
| GDI     | 18                  | 51.3                   | 313.3                 | 299.3                  | 300.4                 | 313.3                  | 812.7         | 313.3                | 254.7          |
| DSTC    | 12                  | 148.5                  | 125.1                 | 333.3                  | 131.5                 | 361.1                  | 1100.3        | 361.0                | 397.3          |
| PowerPC | 20                  | 85.5                   | 428.3                 | 285.9                  | 417.9                 | 259.5                  | 741.9         | 428.2                | 317.7          |
| C2mos   | 24                  | 174.4                  | 222.2                 | 480.3                  | 75.3                  | 341.3                  | 960.3         | 480.3                | 461.2          |
| HLFF    | 20                  | 148.2                  | 292.7                 | 233.2                  | 292.8                 | 128.3                  | 1050.4        | 292.8                | 307.5          |

Table 3. Simulation results for 0.35µm technology

The comparative results are presented in Table 2 and Table 3 for  $0.18\mu m$  and  $0.35\mu m$  technologies, respectively. The best results in each compared category are emphasized. It can be seen, that GDI DFF outperforms the other circuits in terms of power-delay product and total gates area in both technologies (including the inverted inputs). The improvement in power-delay product in GDI is up to 37% in  $0.18\mu m$  technology and up to 45% in  $0.35\mu m$  technology.

Although, the number of transistors in DSTC circuit is less than in the GDI, the total gate area of GDI after optimization is smaller than in the alternative implementations. The improvement in performance and gate area in GDI is consistent in both fabrication technologies.

It should be noted that the optimization in all compared circuits is performance-driven (minimal power-delay product is obtained by sizing), while

separate parameters, like average power and maximal delay are secondary.

## IV. CONCLUSIONS AND FUTURE RESEARCH

A new implementation of high-performance D-Flip-Flop using Gate-Diffusion-Input technique was presented. The proposed circuit has a simple structure, based on Master-Slave princple, and contains 18 transistors. An optimization procedure was developed for GDI DFF, based on iterative transistor sizing, while targeting a minimal power-delay product.

Performance comparison with other DFF design techniques was shown, with respect to gate area, number of devices, delay and power dissipation. A variety of circuits have been implemented in  $0.35\mu m$  and  $0.18\mu m$  technologies to compare the proposed GDI structure with a set of representative flip-flops, commonly used for high performance design, showing an up-to 45% reduce in power-delay product and up to 71% reduction of gates area in GDI.

The future research activities may include integration of the proposed DFF in complex digital systems, combining sequential and combinatorial logic. Hybrid design using both GDI and CMOS techniques should be researched for further optimization of various VLSI structures. The applicability of the GDI method for advanced fabrication technologies, like SOI and SOS, is currently under research.

### **ACKNOWLEDGEMENTS**

The authors would like to thank G. Samuel and the staff of Technion Research Center of Microelectronic Systems, for their support during the research. We also thank O. Shirak, N. Ben-Shahar and other students for participating in projects in different stages of the research.

### REFERENCES

- J.M. Rabaey, A. Chandrakasan, B. Nikolic, "Digital Integrated Circuits", 2nd edition, Prentice Hall, 2002.
- [2] V. Stojanovic and V.G. Oklobdzija, "Comparative Analysis of Master–Slave Latches and Flip-Flops for High-Performance and Low-Power Systems", *IEEE J. Solid-State Circuits*, vol. 34, no. 4, April 1999.
- [3] G. Gerosa, S. Gary, C. Dietz, P. Dac, K. Hoover, J. Alvarez, H. Sanchez, P. Ippolito, N. Tai, S. Litch, J. Eno, J. Golab, N. Vanderschaaf, and J. Kahle, "A 2.2 W, 80 MHz superscalar RISC microprocessor," *IEEE J. Solid-State Circuits*, vol. 29, pp. 1440–1452, December 1994.

- [4] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flip-flop hybrid elements," *ISSCC Dig. Tech. Papers*, pp. 138–139, February 1996.
- [5] J. Yuan and C. Svensson, "New single-clock CMOS latches and flipflops with improved speed and power savings," *IEEE J. Solid-State Circuits*, vol. 32, January 1997.
- [6] A. Morgenshtein, A. Fish, I.A. Wagner, "Gate-Diffusion Input (GDI) A Power Efficient Method for Digital Combinatorial Circuits," *IEEE Trans. VLSI*, vol. 10, no. 5, pp. 566-581, October 2002.