# DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

TECHNISCHE UNIVERSITÄT MÜNCHEN

Master's Thesis in Electrical Engineering

# Signal Distribution Networks in Automatic QCA Standard Cell Placement and Routing

Benjamin Hien

# DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

#### TECHNISCHE UNIVERSITÄT MÜNCHEN

Master's Thesis in Electrical Engineering

# Signal Distribution Networks in Automatic QCA Standard Cell Placement and Routing

# Signalverteilungs-Netzwerke in automatisierter QCA Standard Zellen Plazierung und Verdrahtung

Author: Benjamin Hien

Supervisor: Prof. Dr. Robert Wille Advisor: Dr. Marcel Walter

Submission Date: 08.02.2023

| I confirm that this master's thesis in electrical endocumented all sources and material used. | ngineering is my own work and I have |
|-----------------------------------------------------------------------------------------------|--------------------------------------|
| Munich, 08.02.2023                                                                            | Benjamin Hien                        |
|                                                                                               |                                      |
|                                                                                               |                                      |



# **Abstract**

In this thesis, a new approach for placement and routing in Quantum-dot Cellular Automata (QCA) is presented by introducing Distribution Networks. These networks enhance the functionalities of an already existing scaling placement and routing algorithm. The first Distribution Network, called Ordering Distribution Network, reduces wire crossings and layout area by reorganizing primary inputs. The second Distribution Network, called Majority Gates Distribution Network, allows for the placement of majority gates while adhering to timing constraints, resulting in an increase in layout area, even though less gates have to be placed for logic networks containing majority gates. Lastly, the third Distribution Network, called Sequential Distribution Network, is the first to enable the placement and routing of sequential logic in QCA. The proposed method is evaluated through extensive simulation and experimentation and shows different trade-offs in terms of design metrics like performance, layout area or number of wire crossings.

# **Contents**

| A  | Acknowledgments |          |                                             |    |
|----|-----------------|----------|---------------------------------------------|----|
| Αl | ostrac          | et       |                                             | iv |
| 1  | Intr            | oduction | ı                                           | 1  |
|    | 1.1             | Motivat  | tion                                        | 1  |
|    | 1.2             | Objectiv | ve                                          | 2  |
| 2  | Prel            | iminarie | es                                          | 3  |
|    | 2.1             | Represe  | entation of Logic Circuits                  | 3  |
|    |                 | _        | Boolean Functions                           | 3  |
|    |                 | 2.1.2    | Logic Networks                              | 5  |
|    | 2.2             | QCA Te   | echnology                                   | 8  |
|    |                 | 2.2.1    | Cells                                       | 8  |
|    |                 | 2.2.2    | Gates                                       | 10 |
|    |                 | 2.2.3    | Clocking                                    | 13 |
|    | 2.3             | Placeme  | ent and routing problem                     | 18 |
|    | 2.4             | Sequent  | tiality                                     | 20 |
|    |                 | 2.4.1    | CMOS storage elements                       | 20 |
|    |                 | 2.4.2    | QCA storage elements                        | 22 |
| 3  | Stat            | e of the | Art                                         | 24 |
|    | 3.1             | Combin   | national P&R Algorithms                     | 24 |
|    | 3.2             | Design   | of Sequential QCA circuits                  | 35 |
|    |                 | -        | Sequential logic in QCA                     | 35 |
|    |                 |          | QCA storage cells (QCA RAM)                 | 36 |
| 4  | Met             | hodolog  | v                                           | 37 |
|    | 4.1             | O.       | ng Distribution Network                     | 37 |
|    | 4.2             |          | y Gate Distribution Network                 | 42 |
|    |                 | , .      | The proposed signal distribution Network    | 42 |
|    |                 |          | Placement and routing                       | 46 |
|    |                 |          | Signal synchronization and buffer insertion | 47 |

#### Contents

|    | 4.3   | -       | ntial Distribution Network          | 51 |
|----|-------|---------|-------------------------------------|----|
| 5  | Exp   | erimen  | tal Evaluation                      | 56 |
|    | 5.1   | Bench   | marks                               | 56 |
|    | 5.2   | Result  | ·s                                  | 56 |
|    |       | 5.2.1   | Ordering Distribution Network       | 57 |
|    |       | 5.2.2   | Majority Gates Distribution Network | 60 |
|    |       | 5.2.3   | Sequential Distribution Network     | 62 |
| 6  | Con   | clusior | 1                                   | 64 |
| Li | st of | Figures | 3                                   | 65 |
| Li | st of | Tables  |                                     | 67 |
| Bi | bliog | raphy   |                                     | 68 |

# 1 Introduction

Quantum-dot Cellular Automata (QCA) is a promising technology for the design of ultra-low power, high-density, and high-performance digital circuits. QCA technology is based on the manipulation of the electronic states of quantum dots (QDs) to perform logic operations. The potential benefits of QCA include its ability to operate at extremely low power levels and high device density.

#### 1.1 Motivation

Although the theory sounds very promising, the design of QCA circuits is a challenging task due to the complexity of the underlying physics and the lack of appropriate design tools. Placement and routing are two critical steps in the design of QCA circuits that determine the overall performance, power consumption, and area efficiency of the circuit.

Placement refers to the process of arranging the QCA cells on the chip. It involves determining the optimal position of the cells to minimize the number of interconnects and the routing area. Routing, on the other hand, refers to the process of connecting the QCA cells to form a functional circuit. It involves determining the optimal path for the interconnects to minimize the routing area. In QCA placement an routing are strongly related to each other and have to be viewed as a connected process. The placement and routing is a non-trivial task due to the constraints imposed by the QCA technology, such as local and global timing constraints.

The importance of placement and routing in the design of QCA circuits cannot be overstated. The performance, power consumption, and area efficiency of the circuit are all directly affected by the placement and routing of the QCA cells. An optimal placement and routing can significantly improve the performance and reduce the power consumption of the circuit, making it more suitable for practical applications.

In this thesis, we will explore the various placement and routing techniques used in the design of QCA circuits. We will emphazise the need for scalable placement and routing algorithms and investigate the trade-offs between performance, power consumption, and area efficiency, and we will propose new techniques to improve the design of QCA circuits.

## 1.2 Objective

Hence, the limitations of the current state of the art scaling placement and routing algorithms should be overcome by introducing so called distribution networks. The proposed distribution networks add new functionalities to a scaling placement and routing algorithm, such as the ability to reduce wire crossings and layout area, place and route majority gates and sequential logic.

The first distribution network, called the Ordering Distribution Network, is designed to reduce wire crossings and layout area by introducing a new ordering of primary inputs (PIs). This improves the overall area required for the circuit and reduces the number of wire crossings and therefore the costs of the designed QCA circuits.

The second distribution network, called the Majority Gates Distribution Network, is designed to enable the placement of majority gates in the QCA circuit. However, it is important to note that while the number of gates placed is reduced, the layout area may actually increase due to the timing constraints.

The third distribution network, called the Sequential Distribution Network, enables the algorithm for the first time to place and route sequential logic in QCA. This is a significant advancement as current placement and routing algorithms are not able to handle sequential logic.

The main objective of this thesis is to improve the scalability and performance of the placement and routing algorithm for QCA circuits by introducing these new distribution networks. The proposed distribution networks have the potential to improve the scalability and performance of the algorithm while enabling the placement and routing of majority gates and sequential logic, which were previously not possible. This thesis aims to provide a deeper understanding of the challenges and opportunities of QCA technology and to contribute to the development of practical design tools for QCA circuits.

# 2 Preliminaries

This chapter establishes a theoretical basis consisting of declarations and definitions required for the understanding of the ideas and their implementations proposed in this work. The four fields forming this basis are the representation of Logic Circuits, QCA technology, the placement and routing problem and sequentiality.

## 2.1 Representation of Logic Circuits

Logic Circuits provide a powerful construct that allows an abstraction of digital circuits to a logic level and thereby makes it possible to discuss and argue about them scientifically. This abstraction was made possible by the Boolean algebra, formed by the mathematician George Boole in 1847. It shows that every digital circuit can be represented by logic functions, independent of their underlying technology. In the following sections first, a definition of a Boolean algebra is given, and second, it is shown how logic networks can be formed using them.

#### 2.1.1 Boolean Functions

A definition of Boolean calculus was first provided by Edward V. Huntington in 1993. From *set of independent postulates for the algebra of logic* and his own correction [22, 21], the following equations form the basis of every Boolean algebra:

**Definition 2.1.1** (Basis for Boolean algebra). Let a,b,c be arbitrary elements of an abstract algebra (L,+,') with their set denoted as  $B_{abc}$ . The algebra includes the binary function disjunction  $+:B_{abc} \times B_{abc} \to B_{abc}$  and the unary function  $':B_{abc} \to B_{abc}$ .

$$a + b = b + a$$
  
 $(a + b) + c = a + (b + c)$   
 $(a' + b')' + (a' + b)' = a$ 

The last of his postulates is named after the inventor and is commonly known as *Huntington equation*. There also exists an "universe element"  $u \in B_{abc}$  for which holds:

$$\exists u' : a + u' = a$$
  
$$\exists u : a + a' = u.$$

Even though the two operands disjunction and negation function are powerful enough to form a Boolean algebra, the most common definition also uses the conjunction function  $\cdot: B_{abc} \times B_{abc} \to B_{abc}$  in order to form shorter and more readable logic terms. The most common Boolean algebra  $\mathbb B$  is defined by the tuple  $(B_{abc}, \vee, \wedge, \neg)$ , where  $\vee$  and  $\wedge$  are other denotations for binary operands for disjunction + and conjunction  $\cdot$  in  $\mathbb B$ . The third function  $\neg$  describes the unary negation function, which was previously denoted as '. The set  $B_{abc}$  contains exactly two distinct elements  $\{0,1\}$ , with u=1 and  $\neg u=0$  respectively [19].

Since this definition is restricting the use of only the three Boolean functions  $(\vee, \wedge, \neg)$ , we want to extend it by the following definition [44]:

**Definition 2.1.2** (Boolean function). A Boolean function can be described as  $f: \{0,1\}^k \to \{0,1\}$ , with  $k \in \mathbb{N}^*$  being the number of arguments or arity of the function. A function with k arguments is referred to as k-ary. Multi-output Boolean functions can be described as  $\{0,1\}^k \to \{0,1\}^m$ , with  $k \in \mathbb{N}^*$  and the integer m > 0.

Nonetheless, every k-ary function can still be decomposed into a set of common Boolean functions  $(\lor, \land, \lnot)$ . One example for this is the 3-ary majority function, which is a very important Boolean function for QCA technology.

**Definition 2.1.3** (Majority Function). The ternary Boolean majority function is defined as:  $\langle a,b,c\rangle = ab + ac + bc$ , so that the function value equals the majority of it's incoming values.

It follows:  $\langle a, b, 0 \rangle = a \cdot b$  and  $\langle a, b, 1 \rangle = a + b$ .

Common notations for Boolean functions are *conjunctive normal form* (CNF) or *disjunctive normal form* (DNF), using literals.

**Definition 2.1.4.** A literal is either an atom a (positive literal) or the negation of an atom  $\neg a$  (negative literal).

**Definition 2.1.5.** A propositional Boolean formula is said to be in CNF if it is a conjunction of *clauses*, each of which is a disjunction of literals [62]:

$$\bigwedge_{i}\bigvee_{j}(\neg)v_{ij},$$

where  $v_{ij} \in \mathbb{B}$ .

A propositional Boolean formula is said to be in DNF if it is a disjunction of clauses,

each of which is a conjunction of literals:

$$\bigvee_{i} \bigwedge_{j} (\neg) v_{ij},$$

where  $v_{ij} \in \mathbb{B}$ .

Using the CNF or rather the DNF, definition 2.1.1 and De Morgan's laws [23]

**Definition 2.1.6.** Given a Boolean Algebra  $\mathbb{B} = (B_{ab}, \vee, \wedge, \neg)$  with two arbitrary elements  $a, b \in B_{ab} = 0, 1$ , the following logic principles can be applied:

$$\neg a \wedge \neg b = \neg (b \vee a)$$
$$\neg a \vee \neg b = \neg (b \wedge a),$$

It follows that any Boolean algebra can be reduced to only two operands, e.g., conjunction ( $\vee$ ) and negation ( $\neg$ ) or disjunction ( $\wedge$ ) and negation ( $\neg$ ). Any set of such two Boolean functions is called *universal*.

#### 2.1.2 Logic Networks

There are many ways of representing Boolean Functions. But all *canonical* forms like truth tables (TTs), reduced sum of products (RSOPs) or binary decision diagrams (BDDs), suffer from exponential representations, making them impractical for big logic circuits. Even if a reasonable representation exists for a given function, simple operations like forming the complementary could yield an exponential function representation [38]. Logic networks overcome these restrictions being *non-canonical*, meaning that a given function can be represented by different logic networks. The following definition for Logic Networks is derived from [9]:

**Definition 2.1.7** (Logic Network). A logic network N(V, E) is a rooted, directed graph with vertex set V and edge set E. For any vertex  $v \in V$ , vertices connected by incoming edges  $e_{inc} \in E$  are called children. A vertex connected by an outgoing edge  $e_{out} \in E$  is called parent. V contains two types of vertices. A *non-terminal* vertex v has as attributes an argument index  $index(v) \in \{1,...,n\}$ , and l children  $child_1(v),...,child_l(v) \in V$ . A *terminal* vertex v has as attribute a value  $value(v) \in \{0,1\}$ .

Furthermore, for any non-terminal vertex v, if  $child_i(v)$  with  $1 \le i \le l$ , then we must have  $index(child_i(v)) < index(child_l(v))$  respectively.

This definition allows vertices to have an unrestricted number of children, implying that the Boolean function represented by the vertex can be k-ary:



The corresponding recursive Boolean function reads:

$$f_v = f_v(v_7) f_v(v_6) = f_v(v_3) \land f_v(v_4)$$
  

$$f_v(v_7) = f_v(v_5) \lor f_v(v_6) f_v(v_5) = f_v(v_1) \lor f_v(v_2)$$

with primary inputs:  $f_v(v_1), f_v(v_2), f_v(v_3), f_v(v_4) \in \{0, 1\}$ 

Figure 2.1: Binary Logic Network

**Definition 2.1.8** (Logic Network Boolean Functions). A set of k-ary Boolean Functions  $x_1, ..., x_n \in \mathbb{B}$  is assigned to every vertex via the argument index index(v) = i. The graph function  $f_v$  is defined recursively as:

- 1. If v is a terminal vertex:
  - a) If value(v) = 1, then  $f_v = 1$
  - b) If value(v) = 0, then  $f_v = 0$
- 2. If v is a non-terminal vertex with index(v) = i, then  $f_v$  is the function  $f_v(v_i) = x_i(f_{child_1(v)}(v_{i-1}), ..., f_{child_1(v)}(v_{i-n}))$ .

The recursive nature of the Boolean Function definition in logic networks can be seen in figure 2.1.

The non-canonical property can be explained by the fact that nodes with the identity function are allowed, which can be inserted everywhere in the logic network, while the function representation of the logic network stays the same. Even the exclusion of such identity nodes has no impact, since simple node combinations, like two negotiation nodes, collapse to the identity function. Following this argumentation, there exists an infinite number of logic networks representing each one Boolean Function, resulting in the widely accepted assumption, that the determination of an optimal logic network is an  $\mathcal{NP}$ -complete problem [56]. Attempts to create canonical logic networks, seem to evade this problem, but include  $co\mathcal{NP}$ -complete problems in itself [9]. Nevertheless, logic networks have proven to be very useful in transforming logic circuits into gate representations called *logic synthesis*. Due to the complexity of the representations algorithms commonly used for logic synthesis are based on approximate solutions.

Adapting the names used in the literature, a terminal vertex is referred to as *primary input* (PI) with their set denoted as I. The set of non-terminal vertices referred to as *nodes* is denoted as  $\Lambda$ . The definition requires  $I \cap \Lambda = \emptyset$ . An edge connecting a child  $v_i$  and a parent vertex  $v_j$  is called a *signal*. With i < j the notation of a signal is given as  $(v_i, v_j)$ . The set of all signals is denoted as  $\Sigma$ . If an edge is dangling, so it doesn't point to another vertex, it is called *primary output* (PO) and their set is denoted as O. Therefore also  $\Sigma \cap O = \emptyset$  holds. From the definition of a logic network, we can now describe it as acyclic directed graph  $N = (\Lambda, I, \Sigma, O)$ .

As already mentioned in subsection 2.1.1, a universal set of two Boolean functions can form any Boolean algebra. As long as this universality is contained the set of node functions in a logic network can be extended arbitrarily. Common logic networks containing only two network functions are e.g. *AND-Inverter Graphs* (AIGs) allowing only conjunction and negation. Another binary logic network, which is used in the QCA domain, is the *Majority-Inverter Graphs* (MIGs) utilizing the ternary majority function and negation. But there also exists a wide range of logic networks that permit more than just two-node functions. One example is the *XOR-AND-Inverter Graph* (XAG) with the parity function, conjunction, and negation functions, respectively [56].

As part of the logic synthesis, a suitable logic network representation of the combinational circuit has to be determined. Because, even though these logic networks can implement any Boolean function given by a specification, not every logic network can be synthesized into any given technology. Looking at the current standard technology complementary metal-oxidesemiconductor (CMOS), the logic network is then synthesized by using building blocks consisting of metal-oxide-semiconductor field-effect transistors (MOSFETs), the elemental unit in this technology.

## 2.2 QCA Technology

In order to fulfill the well-known Moore's law [42], demanding a doubling of transistors on a chip every two years, CMOS technology is facing a multitude of challenges. Most notable are the short-channel effect, impurity variations, and most importantly, the heat dissipation resulting from static and dynamic power losses [30, 53, 61]. To tackle these challenges among others the International Roadmap for Devices and Systems (IRDS, former ITRS), provides a platform proposing solutions within the semiconductor domain, e.g. new materials and multi-core architectures. Also new technologies are also being researched, including quantum computing and the domain of Field-Coupled Nanocomputing (FCN). This work focuses on one of the most promising FCN technologies, namely Quantum-dot Cellular Automata (QCA). The main difference of this technology compared to CMOS is the representation of logical modes, using the location of electron pairs in QCA-cells, rather than voltage levels. Data between cells are transferred based on Coulomb repulsion, using electromagnetic fields, enabling the technology to achieve high performance in terms of device density, clock frequency, and power consumption [35]. Hence, QCA tackles exactly the main issues faced with CMOS technology and provides a promising digital system for the future [2]. However, QCA also presents its own challenges in terms of the manufacturing process, manufacturing standards, and different design methodologies [34]. Because this work focuses on the design of QCA circuits, it mainly views circuit parameters such as area, complexity, and clock delays [2]. In order to allow an analysis under these parameters, first the QCA technology and the resulting design constraints have to be understood. Therefore, subsection 2.2.1 introduces QCA cells as building blocks for QCA gates, which are discussed in subsection 2.2.2. After understanding the clocking in subsection 2.2.3, the constraints for the placement and routing problem can be formulated subsequently. In addition sequential behavior is discussed in CMOS and transferred to the QCA domain in subsection 2.4.

#### 2.2.1 Cells

As already mentioned, in QCA technology logical states are no longer represented by voltage levels but by the location of electrons [5]. In order to achieve this property a nanosized structure is needed, which is capable of trapping electrons in a certain position. For this purpose so-called *quantum-dots* (QDs) are utilized. A QD consists of several to one hundred atoms of a semiconductor, and therefore quantum mechanics applies for their electrical properties. For this work it is sufficient to understand that every QD has a bound state, where a particle tends to be localized and the bound state is subject to a potential, which can be external or due to the presence of other particles.



Figure 2.2: QCA-Cell sates

Because this enables QDs to have discrete electronic states, they are also referred to as *artificial atoms*. A combination of them is used to build a QCA cell, also known as *artificial molecule* [43]. Figure 2.2 shows three such QCA cells, where the four circles at the corners of each QCA cell show the QDs or rather their quantum barriers, which are capable of trapping each one electron. In addition, a cell contains two excess electrons, which can be localized by quantum dots. When a QD currently traps an electron, it is depicted black and an unoccupied QD is depicted white. Coulomb repulsion causes electrons to occupy diagonally opposed QDs, resulting in three possible cell states [41, 28, 29].

A stable state indicates that it is easily distinguishable from the usual energy band. Therefore, the energy difference between two consecutive energy states must be well above the thermal noise energy ( $k_BT$ ). Only such states are suited for information transfer. The stable states can be derived from cell polarization, which can be +1 and -1 or null in the unexcited state. The two stable states contain the same electrostatic energy and are used to encode the binary values 0 and 1 [41]. Figure 2.2 shows three cells with possible states and their resulting polarizations and logical states.

As already stated, QDs can be influenced externally, allowing the designer to fix the polarization of a QCA cell. This effect is used to input information into one QCA cell, called the driver cell. When a driver cell is placed side-by-side with other QCA cells, its polarization causes the polarization of the adjacent cell to change. When the adjacent cell is polarized, it passes its state again to the next cell and so on [28]. Figure 2.3 depicts such a structure, where the left-most cell represents the driver cell with fixed polarization representing a logic "1". With every cell passing its polarization to the cell to its right, the polarization of the right-most cell can be measured and the logic value, which propagated through the structure can be extracted. Due to the property of just passing the information from its input to its output, the shown structure represents a QCA wire.



Figure 2.3: Adjacent QCA-cells forming a wire segment



Figure 2.4: Different QCA Inverter representations

#### **2.2.2 Gates**

In this subsection, QCA cells are combined to form different logic gates, which form the gate library used later for the design of QCA logical circuits. Some of these gates are inherited from the QCA ONE library [39], which is already used fully [37] or partially [59, 15] as a basis for some works. The QCA ONE library proposes gates formed by one tile as well as gates formed by multiple tiles. A tile describes a uniform size of gates. For the QCA ONE library a tile is of dimension  $5 \times 5$  cells. A major drawback of this library is the prerequisite of a clocking scheme (USE) in order to form multiple tile gates. This restricts the underlying placement and routing algorithm in the clocking domain. Also manual changes of the standard cells clock zones, size or positioning is not allowed [39], imposing the designer with even more restrictions. For this work, the standard cell library should contain only gates that occupy one tile. Every other logic function is composed out of these standard cells. The tiles used are also of the dimension  $5 \times 5$  cells, which means that all standard gates are reduced to this area.

Inverters and majority gates are the main building blocks of QCA circuits. Starting with the inverter or NOT gate, the simplest implementation is shown in figure 2.4(a). It consists of two wire segments which are shifted by exactly one cell height, so that the polarization is transferred diagonally resulting in an inversion of the input [41]. In order to get a more robust gate regarding disturbance, the C-shaped inverter shown in figure 2.4(b) is introduced [39]. This gate is used as standard in many libraries and works [37, 15, 31, 11], but it should be mentioned that this implementation is really prone to complex single-electron faults [32] and even common displacement faults [45], suggesting the addition of an inverter leg that results in an E-shaped gate structure.



Figure 2.5: The QCA Majority gate

Nevertheless the C-shaped inverter gate is part of the QCA ONE library and is also selected as standard gate for this work.

Due to its importance in QCA technology, the next gate, which needs to be investigated, is the majority gate. Definition 2.1.3 suggests that the implementation of this function in CMOS technology requires multiple AND and OR gates to be placed and routed. In QCA technology on the other hand a majority function can be represented by exactly one gate, making it one of the major advantages over CMOS technology. There are two main implementations of the majority gate. The rotated majority gate in Figure 2.5(a), which is used in QCA ONE, and the +-majority gate shown in Figure 2.5(b). Both of these implementations have their advantages and drawbacks. On the one hand, the rotated majority gate exhibits a sufficiently high degree of fault tolerance against cell displacement or misalignment but has a very poor degree of fault tolerance against single cell omission or extra cell deposition [25]. On the other hand, the +-majority gate is very prone to cell displacement, but it is also used as a building block for the AND and OR gates in most works. This means that the fabrication process for all these gates is very similar and since this work is aimed to enhance the design process of QCA circuits, the +-majority gate is chosen as standard gate for this work.

Following Definition 2.1.3 the AND gate can be derived by fixing one input of the



Figure 2.6: QCA wires

majority gate to logic 0, while the OR gate is obtained by fixing one input to logic 1. The resulting gates, which are also part of the standard gate library of this work are shown in figure 2.5(c) and figure 2.5(d).

Different than in CMOS technology, in QCA, wires must also be introduced as gates. As already seen in figure 2.3 a QCA wire also consists of QCA cells and therefore forms a gate. Since wires do not add functionality to the logic, they are viewed as logic gates representing the unity function. This property has the huge drawback that the cost of wires is comparable with other logic gates, which is being used as one major cost metric for the circuit design. Until now only straight planar wires have been introduced. From the implementation of the majority gate it can already seen that data is not only transferred in the x-Dimension but also in y-Dimension, requiring the wiring to also be able to flow in both dimensions. When gates are placed side by side in two dimensions, the resulting circuit can be viewed as a 2D-grid. In order to allow information to change its propagation direction from the x-direction to the y-direction and vice versa, also bent wires have to be introduced. They are depicted in Figure 2.6(b) and show a 90-degree bend. Given that all tiles can be rotated by 90°, 180° and 270° respectively, a tile connected to a bent wire can be routed to each adjacent tile of a bent wire. Also, since all gates introduced so far have a fan-out of one, we need a fan-out node to duplicate signals. This is done by adding a bent wire to a straight wire resulting in the fan-out shown in Figure 2.6(c).

The last special case of wires is the crossing case. By rotating the cells of one wire string by 45° the rotated cells do not have crosstalk with nonrotated cells [45] as shown in figure 2.7(a). This solution is very handy because it supports the planar structure of the circuit and is therefore called *coplanar wire crossing*. Further the possibility of multi-layer QCA has been investigated [17] and found especially useful in the case of wire crossings. To use this, one wire string is raised to an additional higher layer, which is connected with a vertical interconnect as in figure 2.7(b). The signal transmission in the vertical-stacked cells works just as in the horizontal direction. To impede any



Figure 2.7: Different QCA wire crossing implementations

crosstalk between the wire strings, two intermediate layers of cells are used in the vertical direction. Theoretically, the top layer cannot only be used as wire, but since the signal distribution works just as in the ground layer gates can also be placed in these multi-layers. Simulations have shown that coplanar crossovers significantly reduce the coupling between the horizontal wire segments. This makes the horizontal interconnect very sensitive to crosstalk and therefore highly prone to cell displacements. Multilayer circuits, on the other hand, show high robustness and therefore are used as standard in this library. In the tile representation of the multilayer interconnect, the top wire string is described with a  $\times$ , while the vertical layers are described with a circle. Although it has to be mentioned that the complex structure of the multilayer wire crossing yields high costs and is therefore tried to be avoided in the design of QCA circuits. In Figure 2.8 all gates used in this work are summarized.

#### 2.2.3 Clocking

As mentioned above, data transfer in the QCA paradigm is accomplished by cell-to-cell interaction. Given a fixed polarization of a cell, the next cell reacts to the Coulomb repulsion and changes its polarization accordingly. Looking at the wire segment in figure 2.3, the leftmost cell has a fixed polarization and is called the input. After some time the information propagates through to the rightmost cell, representing the output of the simplified QCA-circuit. Generally, a QCA circuit can be seen as an assembly of cells on a two-dimensional grid or array, where each cell has a position with x and y coordinates assigned. Every cell with fixed polarization is called an input cell and drives the other cells gradually into matching polarization. When all cells have matching polarization, meaning that the electrons in two adjacent cells have the maximum distance and therefore minimum energy following Coulomb repulsion,



Figure 2.8: QCA Standard Library



Figure 2.9: Schematic of a combined QCA and CMOS system [29].

the QCA array is said to be in *ground state*. When a cell has no adjacent cell in the distribution direction, it is called an output cell. While the polarization is propagating through the array, the direction of the propagation is always pointing away from the input cells and to the output cells. In reality the propagation doesn't go gradually through the array but rather sloshes around, showing a quite unpredictable behavior. This is the first reason, why a *clocking*, involving well-defined states for the polarization of the cells and enabling a well-ordered signal propagation, is introduced. Another reason for a clocking is, that for the described straight forward process called abrupt switching with dissipative coupling to the environment, the QCA-array has to be embedded into a CMOS environment, as shown in figure 2.9. In the following, a short evolution of this primitive clocking to the currently used clocking is described.

Therefore, the QCA-Array is divided into smaller decoupled sub-regions called *clock zones* and each clocking zone receives an external signal, a clock, assigned. The clock



Figure 2.10: QCA wire divided into the four clock zones according to Bennet clocking

can then activate and deactivate the cells of a zone in a way that the information propagates gradually from one zone to the next through the whole QCA circuit. In the approach first used, the clock decreases the QD barriers of all cells in a clocking zone, when applying a new input. This means that the electrons are not trapped and can move freely following Coulomb repulsion, therefore taking over the polarization of the input cell. When all cells in the region are stable, the barriers are raised again, localizing the electrons in the cells, which now have the desired polarization. Meanwhile the barriers of the subsequent clocking zone are lowered simultaneously, the previous sub-region acts as fixed input and again the polarization is taken over. In this way, the information gradually propagates through the whole circuit [28]. Today's used approaches create electrical fields with an external clock generator and distribute it to the cells through the device substrate using embedded electrodes. Thereby the energy level of the *null* state can be controlled, resulting in a equivalent effect as in the former approach [56].

A wire, divided in such clock zones is shown in 2.10. The colors of the zones and cells, as well as the zone number, show redundant information about the type of clocking zone. They differ in the external applied electrical field and therefore the energy of the cells. In QCA the clocking is divided into the four consecutive states, *switch*, *hold*, *release* and *relax*. They are aligned in a pipeline-like structure, where each of these states is phase-shifted by  $\pi/2$ , forming a  $2\pi$  clock cycle. In the switch phase, cells start to get polarized, dependent on the polarization of the driving cell. When the cells are polarized they get fixed in the hold-phase. Afterwards in the release-phase the excitation gradually decreases, resulting in the unexcited relax-state [41]. After one clocking cycle, the next clocking cycle starts with the same order of states also notated with numbers  $i = \{1, 2, 3, 4\}$ . Hence, for consecutive clocking numbers holds  $(i_{next} = i_{previous} + 1 \mod clk)$ . The scheme of such a pipeline as clocking is depicted exemplaryly in Figure 2.11(a).

The described clocking is named *Landauer clocking*. The inventor Rolf Landauer himself pointed out the vast power dissipation of this clocking mechanism. The main cause for the high power dissipation is the *erase* function, which happens because in the Landauer clocking the release state directly follows on the hold phase, irreversibly



Figure 2.11: QCA clocking pipeline

erasing the information and therefore transforming it into heat [26]. To tackle this problem in the QCA domain, Landauer pointed out, that the erase function has to be eliminated from the clocking. He argues that every erased bit dissipates at least  $k_B T \ln(2)$  in heat dissipation [24]. Exemplary if a QCA-cell has size  $1nm \times 1nm$  and operating frequency of 100GHz, the corresponding density of devices results to  $10^{14}$  $cm^{-2}$ . Further a dissipation of 0.1eV every clock cycle is assumed, resulting in a total power dissipation of 160 kW  $cm^{-2}$ . This directly yields the statement that a device operating with this clocking would be inoperable (it would evaporate due to heat) [27]. The Bennet clocking tackles exactly this problem by altering the timing of the clocking signals. Just as in the Landauer clocking, the clocking wave propagates in one direction, but leaving no trailing edge, when information is passed. Instead, the cells will be held in the excited state until the information propagates through the whole QCA-array. When the output was read, the excitation is released in reverse order resulting in no erase functions. This means, that this *quasi-adiabatic* clocking leads to a minimal power dissipation but with two constraints. The effective clock rate is at least halved due to the additional backwards propagation and since only one signal vector can be transmitted through the system, the pipeline capabilities are reduced [27]. The resulting clocking scheme for Bennet clocking is shown in figure 2.11(b). In order to preserve the pipeline like behavior of the Landauer clocking and the energetic advantages of the Bennet



Figure 2.12: Cell based layout of a 2:1 mux [33]

clocking, QCA circuits are divided into sub-regions, which are each Bennet clocked.

As already stated, the QCA-array has to be divided into clock zones, in order to apply a proper clocking. Allowing an arbitrary number of cells in one clock zone gives lots of freedom in designing clock zones with variable geometries. This clocking is referred to as *cell-based* and increases the fabrication process due to its variety in clock zone geometries. Assuming the necessity of an uniform fabrication in order to fabricate circuits with millions of cells, this clocking gets infeasible for large circuits. Since in this scheme single cells can be clocked, also electrodes of the same size must be fabricated, in order to provide a clocking signal to the single-cell region. Since this is also not feasible, this design is obsolete. An example of a cell-based clocking design of a 2:1 mux can be seen in 2.12. To achieve uniform clock zones with a possible distribution of clocking signals, the *tile-based* clocking is introduced. The approach of this design is to provide uniform tiles of size  $3 \times 3$  or  $5 \times 5$ . For clocking tiles larger than this, information propagation was suggested to be erroneous, also being an argument against cell-based clocking [48].

The tile-based clocking leads to several proposals of clocking-schemes, which give a certain distribution of clock zones. Since they follow an uniform pattern they can be extended easily for every size of the circuit. In figure 2.13 three clocking schemes are shown, each of them based on a different idea. Since information flow is only allowed in ascending clock order modulo *clk*, the 2DDWave clocking scheme in figure 2.13(a) only allows information to propagate in two directions, south and east [55]. This simplicity allows no back propagation, prohibiting the placement and routing of sequential circuits. Also, it restricts gates in the scheme to have a maximum input size of two and thereby not allowing the placement and routing of majority gates. The USE scheme, shown in figure 2.13(b), tackles the first problem by introducing clocking



Figure 2.13: Different clocking Schemes in QCA

loops into the scheme, giving the possibility to place sequential circuits [11]. To tackle the second problem, the RES scheme, shown in figure 2.13(c), gives the opportunity to place gates of input size three. Since one tile is restricted to four adjacent tiles, of which one has to output the information of the gate on the tile, this gives the maximum input size allowed. As already stated, this is especially important for the placement of majority gates [18]. In QCA technology, they can be represented by only one tile, making them to one of the key advantages of QCA technology.

# 2.3 Placement and routing problem

In simple words the placement and routing (P&R) problem can be formulated as the placement of logic gates, which are represented by vertices in a logic network and the routing between gates via wires, which are represented as edges in a logic network, in a way that the logic of the logic network is retained in the designed circuit. Because each gate with its position in the layout is strongly dependent on the clocking inside the QCA domain, additionally the circuit has to follow two timing constraints in order to function correctly. In this section, the denotation and constraints of placement and routing in the QCA domain are introduced. The P&R problem originally derived from [56] evolves from a grid enabling tile-based design in conjunction with a logic network.

**Definition 2.3.1.** A *layout* is defined by a  $w \times h$  grid  $\Gamma_{w,h}$  and a graph G(V,E), which is placed on the grid. Each *tile* of the layout can be accessed via its x and y coordinates. The set of tiles is denoted as T with  $t = (x,y) \in T$ . For any vertex of the graph, v(x,y) is restricted to the boundaries x < w and y < h. For edges  $\{(x,y),(x^*,y^*)\}$  it holds  $|x-x^*| + |y-y^*| = 1, 0 \le x, x^* \le w, 0 \le y, y^* \le h$ .

**Definition 2.3.2.** A gate-level layout describes a layout grid in combination with a logic network  $N=(\Lambda,I,\Sigma,O)$ . In addition to the already known mapping placement p, which assigns nodes to tiles, there are two additional mappings. The routing r, which assigns logic network signals to layout paths (connected tiles) and a clocking c assigning clock numbers to tiles. The gate-level layout is therefore described as  $L=(\Gamma,N,p,r,c)$ . Further, nodes placed on the gate-layout are referred to as gates. Two tiles  $t_i=(x_i,y_i)$  and  $t_j=(x_j,y_j)$  where  $|x_i-x_j|+|y_i-y_j|=1$  are called adjacent. A path which is wired through adjacent tiles is called wire. In this context, one tile corresponds to a wire segment. If neither a gate nor a wire segment is placed on a tile, it is empty. It follows that a layout with only empty tiles is also empty. A layout is said to be S-clocked if it follows a clocking scheme S. Otherwise, it is irregularly clocked. Moreover an adjacent tile of a tile  $t \in T$ , where T is the set of all tiles, is incoming  $t^-$  if  $c(t)-c(t^-)$  mod clk=1. This means that the incoming tile can forward information to the viewed tile according to pipelined clocking. For outgoing tiles  $t^+$  it holds  $c(t^+)-c(t)$  mod clk=1 accordingly. For QCA it was already stated that the clock number clk=4.

From this definition we can outline the difficulty of placing and routing a logic network onto a two-dimensional grid, with exception of wire crossings, which however are really costly and therefore should be minimized. One major challenge for P&R algorithms is the signal synchronization, which results in a strong dependency of clocking and signal distribution. As already pointed out, for every signal path it has to hold true that information can only propagate from a tile with clocking number i to an outgoing tile with clocking number  $(i+1 \mod clk)$ . This property is called the *local synchronization constraint*. The existence of possible signal paths can be assured by using predefined clocking schemes, but, however, can comprise some constraints. In addition, *global synchronization constraint* states that every two signal paths leading to the same tile need to pass the same amount of tiles starting at their primary input. Since this constraint has to hold true for every gate, the complexity increases rapidly with growing network sizes. Therefore, the combination of all these challenges forms a P&R problem, which is commonly accepted as  $\mathcal{NP}$ -complete [57].

When a gate-level layout is designed, still a technology has to be mapped onto it. To do so, the in subsection 2.2.2 proposed standard library is used for the mapping. Because definition of the P&R problem is based on the book [56], defining the it for the domain of field-coupled nanotechnologies, several FCN technologies can be mapped onto such a gate-level layout. This means that, for example, a change of clock to clk = 3 also allows a placement and routing for *Nanomagnet Logic* (NML). Hence, although the distribution networks in this work are only designed for QCA, the ideas may also be derived for other FCN technologies.



Figure 2.14: Eccles-Jordan-Flip-Flop

## 2.4 Sequentiality

In the section about the placement and routing problem, the main constraints for the design of QCA circuits are summarized. With this knowledge both combinational and sequential circuits can be designed and verified. While combinational circuits can be understood as combination of logic operations, sequential circuits additionally include the reuse of information computed by their logic, demanding a back-loop functionality. This additional back-looping is achieved through the use of storage elements, which are built according to the clocking properties of the circuit and technology. Since clocking in QCA is completely different than in CMOS, storage elements and sequentiality have to be rethought. Hence, first the properties of storage elements are discussed in the CMOS domain and transferred to the QCA domain. It also has to be mentioned that the clocking in QCA only supports synchronous information flow, which means that in the first part also only synchronous CMOS logic is considered.

#### 2.4.1 CMOS storage elements

In order to achieve sequential behavior in CMOS technology, *registers* are implemented into the circuits. A register is capable of storing and stabilizing several bits of information, which is then looped back to the logic via wires. In order to understand registers, first *flip-flops*, storing each one bit and their building blocks *latches* have to be discussed.

The simplest storage element in CMOS, which can store one bit, is the so-called Eccles-Jordan (EJ) flip-flop (FF). It is formed by connecting the output of one inverter to the input of another inverter and vice versa. Therefore the logic level is propagated from one inverter stage to the other, while the same value is held in it. Due to its simplicity, a high voltage shift is needed in order to change its stored value, making the FF really sluggish.

Also the direct connection of the input and output makes their voltage levels highly dependent on each other, which can be interpreted as noisy behavior, also referred to as



| S | R | Qnext | State           |
|---|---|-------|-----------------|
| 0 | 0 | Q     | hold state      |
| 0 | 1 | 0     | reset           |
| 1 | 0 | 1     | set             |
| 1 | 1 | X     | undefined state |

Figure 2.15: SR-Latch

transparency [20]. To avoid errors caused by the transparent behavior in the transition region, the input voltage needs to be really stable. In order to receive more robust storage elements, more sophisticated ideas were implemented while preserving the general idea introduced by the EJ-FF.

By replacing the inverters in a EJ-FF with NOR gates as shown in figure 2.15, the transparency gets reduced and two inputs a set S and a reset R are introduced leading to four possible input combinations and clear states. When both S=0 and R=0, the current value is latched and the storage element is in the hold state. By holding R=0 and changing S=1, the latch output is forced to Q=1, called the set state. Reversing both inputs to R=1 and S=0 leads to the reset state where the latch output is Q=0. Furthermore, with S, R=1, the latch value is unstable, and therefore we get an undefined state, prohibiting this input combination. Based on its behavior, this element is called Set-Reset-Latch (SR-Latch).

In order to achieve synchronous behavior the Set and Reset inputs are clocked resulting in a gated SR-Latch. To eliminate the undefined state, the set and inverted reset inputs are connected together, forming the D-Latch shown in figure 2.16. Now a value is held when the clock clk = 0 and the D lock propagates the input value D when clk = 1. To give a memory element the ability to clock Boolean operations from one stage to another,  $edge\ triggered\ flip\ flops$  are introduced. Rising clock edges determine when the FF overwrites and passes its data. An edge-triggered D-FF can be constructed by connecting two D-latch stages behind each other. The master slave is activated at rising edges, while the slave stage is activated at falling clock edges. In this way, the D-FF takes new data at a rising edge and passes them in the next clock cycle from its output. Also, the edge-triggered FF eliminates the transparency. Regarding the term edge-sensitive for D-FFs, latches are considered to be level-sensitive [20].



| clk | D | Qnext | State      |
|-----|---|-------|------------|
| 0   | 0 | *     | hold state |
| 0   | 1 | 0     | propagate  |
| 1   | 0 | 1     | propagate  |

Figure 2.16: D-Latch

#### 2.4.2 QCA storage elements

The goal of a storage element in QCA is to have all the properties of a D-FF, so it can store data from one clock cycle and pass it to the combinational logic in the next clock cycle. Also the element should not be transparent and the effect of an edge triggered element has to be discussed. In the QCA ONE library an effort was made to translate the D-FF into QCA by just replacing the CMOS gates and wires with the corresponding QCA gates. Thus a second external *clk* signal is introduced, mimicking the clock of a CMOS circuit. But since the QCA circuit already has its own clock with a dependency on all gates of the circuit and the clocking has some major differences from QCA to CMOS, as already observed in 2.2.3, this implementation is questionable and the implementation of latches and FFs in QCA has to be rethought from scratch.

This leads us to ideas to look at QCA storage elements dependent on their corresponding clocking. The effort of rethinking storage elements in QCA from scratch was already made in [52]. The proposed solution to create a QCA element which is able to store one bit is rather simple. Since every tile is clocked on its own, a simple latch can be formed by a wire segment held in the hold phase of the clocking. In Figure 2.17 the clocking of this wire segment is depicted. The data is propagating into the latch and by holding the hold phase by exactly one clock cycle the data is passed to the next logic block exactly one clock cycle later. Also the question arises if this element is a level-sensitive latch or if it is a edge-sensitive FF. For the latch we could argue that the information is clearly not sampled at one time point like in an ideal FF, but rather the information is taken into the latch during the whole switch phase. On the other hand one could argue that the switch phase could be seen like a real-time rising clock edge, where the data is sampled and then held while the clock is in hold phase or "1". For this work, the comparison to a FF seems to be more suited because also no transparency is allowed due to the strictly independent clocked tiles in QCA. The only functionality the wire-FF misses is a set-reset option. The idea for this was also proposed in [56] by exchanging the wire with a majority gate but the same clocking. Now the D-FF basically has the same functionality but has two external inputs which



Figure 2.17: Clocking of a basic latch in QCA

can force the gate to zero by setting both inputs to zero or force the gate to one by setting both inputs to one. In the other configurations, the majority-FF would act as normal storage element.

# 3 State of the Art

In this chapter various approaches trying to solve the placement and routing problem for QCA are reviewed. In the first part algorithms, which are able to work only with combinational circuits are investigated under the theoretical groundwork done in chapter 2. These algorithms are further divided into determining *optimal* and *scalable* solutions. In the second part ideas and challenges of sequential placement and routing algorithms are investigated.

# 3.1 Combinational P&R Algorithms

In order to understand optimal solutions for placement and routing, it has to be reviewed from section 2.3 that this problem is  $\mathcal{NP}$ -hard. Here the complexity class  $\mathcal{NP}$  (nondeterministic polynomial time) describes a set of decision problems, where problem instances with a formula, that can be evaluated to true, have a proof, that can be verified in polynomial time by a deterministic Turing machine. The existence of these problems lead to several ideas on solving them, one of these being Satisfiability Modulo Theories (SMT). The satisfiability problem can be formulated as the question if there exists a model evaluating the first-order formula over some theories to true. The consequent solving instance for propositional logic is a Boolean Satisfiability Solver (SAT), with its proposition being Boolean equations, that have to be proven true. With this basic instance two different solving strategies were proposed. The first strategy is called Eager SMT-solving and is used for uninterpreted functions or bit-vectors, which can be derived to propositional logic. Therefore, the first step implies the transformation of theory constraints into equisatisfiable propositional logic. These problem insances are then passed to SAT solvers, checking for satisfiability. Due to the equisatisfiability of the problems, a solution for the original problem can be derived from the solution of the propositional logic. The second approach called *lazy SMT-solving* refers to the assisting use of theory solvers and its process is depicted in figure 3.1. In the first step the first-order problem with formula  $\varphi$  is transformed to a Boolean abstraction  $\varphi'$  mapping the concrete problem to an abstract problem under a set of finite Boolean predicates [4]. The abstraction is then passed to the SAT-solver, which again computes solutions and gives them to a set of theory solvers. They in turn check if the Boolean predicates hold true or rather if they are consistent in the provided solution. If so, the abstraction



Figure 3.1: Lazy SMT-solving process [1]

is satisfiable and the theory solver instance returns SAT. Otherwise UNSAT with an explanation is passed back to the SAT-solver aiding the improvement of the abstraction. If the abstraction is finally found to be unsatisfied the problem is said to be unsatisfiable [1].

With this knowledge optimal placement and routing algorithms can be discussed by reviewing two approaches proposed in [56]. The first algorithm "Exact Placement and Routing" finds a valid placement, routing and clocking, also described as tuple (p,r,c), given an empty layout L and a logic network N. In order to find an optimal solution, the minimum layout size  $w \times h$  has to be determined for which the constraints of (p,c,r) hold true. Therefore, all possible sizes of layouts are encoded and passed to a SAT-solver iteratively and the first layout for which the solver returns true is the minimum or rather the optimal solution. The experimental results show that the determined layouts of the algorithm are many times smaller than the compared state of the art [15, 54]. But due to the complexity of the algorithm utilizing satisfiability solvers, the algorithm times out for quite small circuits already, making it insufficient for the manufacturing of commercial QCA circuits.

The other exact P&R algorithm proposed in [56] creates a *one-pass synthesis*, which combines logic synthesis and physical design in a single run with the idea to adapt the whole design-process to the needs of the QCA design rules. Therefore, this algorithm has to tackle two  $\mathcal{NP}$ -hard problems relying again on the power of satisfiability solvers. This particular algorithm uses eager SMT-solving. The idea is to eliminate some shortcomings of the two-step synthesis derived from CMOS. This includes treating wires as gates since the costs are equal in QCA and including data synchronization, which is dependent on the tiles passed. In this manner a SAT problem can be formed

and passed to a SAT-solver. The instances are now created only passing a empty layout L of size  $w \times h$ . Even though this algorithm is able to find *truly minimal* solution since the non-optimal logic networks are eliminated, the experimental results show the same problems as in the exact P&R approach. This means that the high complexity of the satisfiability solver leads to a time-out of the algorithms for circuits with a gate size  $|N| \ge 30$ .

These shortcomings lead to the usage of *scalable* placement and routing algorithms. This approach trades optimality of the circuit for computing time, yielding larger, more expensive layouts, but in short time. This makes them scalable in the time domain and therefore applicable for the manufacturing of commercial QCA circuits. All algorithms reviewed in the following are based on the original VLIS process, meaning that they treat logic synthesis and physical design as their own problem and not as one-pass synthesis.

Starting with logic synthesis, many works present a preprocessing of logic networks enabling them to be translated directly into gate level representations. There are several steps which are widely used to modify logic networks. The first of them is the node duplication or rather dummy node insertion. The idea of this process is to minimize wire-crossings, which we have analyzed to be very costly in QCA and reduce the number of fan-outs at the nodes, leading to a reduction of the place and route complexity. One simple algorithm for this is to visit every node in a breadth-first search from each primary output to the primary inputs. If the current node hasn't been visited its marked as visited and if a already marked node is visited it is duplicated. This process is quite problematic, because not only the visited node is duplicated, but also all the nodes included in the sub tree rooted by it [49]. From this simple example it can already be suggested that the insertion of dummy nodes can lead to uncontrollable growth of the logic network and also layout size. More dedicated algorithms don't have such a high overhead in dummy nodes but for that they can't eliminate all wire crossings, making it necessary to include nodes for crossings called crossing edge insertions [13]. Another preprocessing steps including the insertion of so called buffer nodes is aiming for the synchronization of signals in order to meet the global timing constraint. Since this constraint requires two paths leading to the same node to pass the same amount of tiles, a valid layout can be easily deduced from the logic network, if every path has the same amount of nodes. Also the insertion of buffers allows the generation of different partitions of a logic network [10]. Some approaches insert even a higher number of nodes in order to obtain a complete ternary logic network representation of a QCA circuit. This idea is based on the majority function representation of gates. When extra nodes are included in the logic network, also extra area is produced as shown in figure 3.2 and this in turn leads to an increase in wire lengths. Because these approaches are based on cell-based clocking, this implies



Figure 3.2: Gate placement with black circles showing wasted area [49]

that if the longest wire has to be split into more than one clock zone also the shortest wire has to be split into the same amount of clock zones in order to preserve the signal synchronization for gates with two or three inputs [49].

Another big problem of these algorithms is the requirement of cell-based clocking itself, which has already been shown to be insufficient. Even though there also exist algorithms using tile-based clocking they are limited by the general drawbacks of preprocessing [54] leading to exploding logic networks and even use greedy placement and routing algorithms limiting the approach to small and simple reconvergent patterns [49].

All these reasons lead to the proposal of *ortho*, an algorithm implementing a scalable placement, routing and clocking without preprocessing steps proposed in [59]. Since this algorithms forms the base of this work, the algorithm is explained detailed in the following.

First of all, a proper representation of the logic network is needed. Therefore, in some works already the idea of an orthogonal embedding, had been proposed [10]. Orthogonal embedding is the mapping of a logic network onto a two-dimensional grid, so it can be seen as an assignment of the tuple (p,r). For ortho this is done by orthogonal graph drawing (OGD), which is described in [14].

**Definition 3.1.1** (Orthogonal Graph Drawing). An OGD maps a graph G = (V, E) onto a plane grid with size  $w \times h$ . The mapping assigns vertices  $v \in V$  with coordinates (x, y) to grid points, with  $1 \le x < w$  and  $1 \le y < h$ . Edges  $e \in E$  are assigned to paths in the grid, so they consist only from horizontal and vertical segments. The paths are non-overlapping, meaning that they are not allowed to cross any vertices.



Figure 3.3: Example OGD drawing



Figure 3.4: Insufficient timing constraints of a OGD representation [56]

Figure 3.3 shows an example OGD. The dots in the graph represent vertices and are connected via straight line paths. Thus, the graph is drawn orthogonally.

Nonetheless an OGD only respects the placement and routing, leaving the clocking to be addressed. The problem of insufficient clocking in a valid OGD representation can be shown from the example in figure 3.4. For the given logic network in subfigure 3.4(a) and its valid OGD in subfigure 3.4(b), there has to be no clocking which can resolve the timing constraints. In subfigure 3.4(c) it stands out that for the down right corner no clocking zone can be found so that either the local synchronization constraint but also the global synchronization constraint are satisfied. Since the clocking or rather signal synchronization was a main task of the preprocessing, which is not used here, some other solution has to be found.



(a) Color assignment of (b) Color assignment of incoming signals outgoing signals

Figure 3.5: Relative positions of an OGD graph with correct color assinment

The idea used for the ortho algorithm comes from an extension to OGDs, which allows to determine a special OGD from a logic network in polynomial time being the constraint needed for a scalable approach. The base used in [59] is formed by Therese Biedl [7], who proposes a OGD with an additional edge coloring. Although the effectiveness and complexity bounds in her work were proven on the restriction of undirected 3-graphs and as we already examined from the precious chapter a logic network is neither containing only nodes of most degree 3 nor undirected. To overcome the fist restriction, a custom logic network can be created by assigning own nodes for fan-outs and inverters. This way the maximum node degree gets decreased to three, while the expressiveness of the logic network representation is maintained. The second restriction can be overcome by a custom coloring built on the original approach, which also serves as direction assignment. Given a logic network converted to a 3-graph, the coloring in form of edge directions  $d: \Delta \to \{east, south\}$  is assigned. The coloring can be understood as relative position arrangement. If an edge  $(v_i, v_i)$  is colored *east*, means that the vertex  $v_i$  is positioned east of  $v_i$ , so that  $x_i > x_i$ . The color south for an edge  $(v_i, v_i)$  assigns  $v_i$  a relative position of  $v_i$ , so that  $v_i$  is south of  $v_i$  or  $y_i > y_i$ . In order to color a graph with only these two colors the following assignment constraints must hold true:

- 1. All **incoming** edge of a vertex has to be painted with the **same** color.
- 2. All **outgoing** edges of a vertex have to be painted with **opposite** colors.

The relative position assignment under the proposed constraints can be seen at an arbitrary example in figure 3.5. In the example for outgoing edges (figure 3.5(a)), the assignment constraint makes sure that two outgoing edges of the same vertex are routed in different directions to avoid a conflict. Equivalent to the definition of the colors, the layout is increased in x-direction for an east-coloring and analogously extended in the

y-Direction for a south-coloring. Figure 3.5(b) depicts the assignment constraint for the incoming edges of one vertex. This assignment has to use one color and the node can be set non-conflicting for both incoming nodes in the layout by extending it in x-direction based on the east-coloring. However, there exist logic networks for which no coloring in regards to the constraints can be found. When such a coloring conflict appears, the conflicting edge, for which no direction can be assigned after the formulated constraints is divided and an auxiliary node is introduced resolving the conflict. In order to allow data to propagate, ortho needs to map a valid clocking onto the layout. Because ortho uses OGDs with exactly two directions *east* and *south*, the 2DDWave scheme, which also supports the data flow in exactly two directions, is perfectly suited for the algorithm. Also the usage of a predefined clocking scheme already gives a solution for the local synchronization constraint and due to the uniformity and simplicity of the 2DDWave scheme also the global synchronization constraint for nodes placed and routed after the proposed direction assignment is maintained.

In the following the pseudo code of ortho, derived from [59] is depicted as algorithm 1 and described in own words, before its evaluated on an example and its main characteristics are described. Following the VLSI design process the ortho algorithm has as input a Logic network N and a clocking scheme or rather a clock number clkfor every tile in order to fulfill the timing constraints. As already mentioned N has to be converted into a 3-graph by substitution so a valid coloring can be assigned. Then an empty Layout L with a 2DD-Wave clocking scheme is created and the coloring is calculated for N. Also the nodes need to be topologically ordered starting with the lowest number at the inputs and the highest numbers at the outputs. Therefore, the algorithm is starting with the lowest numbered vertices representing primary inputs. In order to connect the inputs to external signals they are placed at the borders of the layout. In ortho, the first column of the layout is therefore reserved for placing the inputs under each other. Due to the properties of OGD, a conflict arises, when inputs have outgoing edges colored south, because the algorithm would then wire these edges into y-direction and therefore over other primary inputs. Since this is forbidden in OGD and for QCA layouts, these conflicts need to be resolved by the algorithm. To do so, primary inputs colored *south* are resolved by first rewiring them each on a new column, allowing a wiring into y-Direction and thus the placement of the nodes connected to their outgoing edges. Further, for the placement and routing of all nodes in the logic network, the two parameters coloring and the updated parameter (w, h), saving the current dimensions of L, need to be evaluated in each step. So if a node is colored east, the layout is extended by one column and the node is placed at  $(w-1,h_p)$ , where w is the current width of L and  $h_v$  is the maximum vertical position of its childs. According to this scheme for nodes colored south the layout is extended by one row and the node is placed to  $(w_p, h-1,)$ , where  $w_p$  is the maximum horizontal position of the nodes

childs and h is the current height of the layout. After placing the node, it is wired to its predecessors, while the placement into a new row or column makes sure the wiring doesn't pass over another gate. If only one predecessor exists the wiring also goes only south or east. But when two predecessors exist, in case of east the predecessor giving  $h_p$  is also only wired in x-Direction, while the other predecessor has to be wired with two wire segments, the first also going into x-Direction and the second one connecting in y-Direction. If the node is colored south the two segment wiring goes south first and then east. After all nodes were placed in this fashion the primary outputs are also connected to the borders either to the east or the south and the finished layout L is returned by the algorithm.

#### Algorithm 1 Ortho algorithm

```
Input: Logic network N
    Input: Clock number clk
    Output: Gate level layout L
 1: Convert N to a 3-graph by substitution
 2: L \leftarrow \text{empty 2DDWave-clocked layout of size } w = 0 \times (h = 0)
 3: Generate direction assignment d: \Delta \rightarrow \{east, south\} and subdivide signals if
   necessary
 4: Compute topological ordering v_1, ..., v_i \in N
 5: Extend L by one column and reserve it for primary inputs
 6: for all vertex v_1, ..., v_i \in N with at most two incoming signals \sigma_1, \sigma_2 do
 7:
       if vertex v is terminal/primary input then
 8:
           Extend L by one row
 9:
           Place v at position (0, h - 1)
           if vertex v is colored south then
10:
               Extend L by one column
11:
12:
               Wire the primary input to position (w-1, h-1)
13:
           end if
       else if d(\sigma_1) = d(\sigma_2) = east then
14:
           Extend L by one column
15:
           h_p \leftarrow \text{max.} vertical position of v's predecessors
16:
           Place v at position (w-1, h_v)
17:
       else if signals are labeled south then
18:
           Extend L by one row
19:
           w_p \leftarrow \text{max. horizontal position of v's predecessors}
20:
           Place v at position (w_n, h-1)
21:
22:
       end if
       Draw orthogonal wire segments to connect v with its predecessor(s) accordingly
25: Connect the primary outputs to the respective borders
26: return L
```

The example depicted in figure 3.6 shows the (p,r,c) of a 2:1 mux. Starting with a layout of size (1,0), for the PI  $v_1$ , the layout is extended by one column to (1,1) and is placed to (0,h-1)=(0,0). Because the outgoing edge of the PI is labeled *south*, the wiring has to be resolved by extending L to (2,1) and wiring the PI to (w-1,h-1)=(1,0). For the other PIs  $v_2,v_3$  it follows the placement to (0,1) and (0,2) and an increase of one column per PI since they have both outgoing edges labeled



Figure 3.6: Placement and routing of a 2:1 mux network using the ortho algorithm

south. The resolving wiring leads to (2,1) and (3,2). Now the remaining nodes can be placed following the *east*, *south* scheme. The size of the layout after the input network is (4,3). With  $v_4$  a fan-out node is placed south of the third input, which has coordinates (3,2). After extending L by one row, the y-coordinate is evaluated to be w-1=3 and the x-coordinate 3 is adopted from its only predecessor. Therefore, the node is placed on (3,3) and L=(4,4). In the same fashion the parent node of the fan-out  $v_5$ , representing an inverter is placed east of it. So the coordinates of the inverter are (4,3) and L=(5,4). Looking at the next node  $v_6$ , which is the first node with two children and which is labeled south now the x-coordinate is determined by the eastern predecessor, so  $v_5=(4,3)$ . The y-coordinate is again adapted form the size of the layout, after it was increased in y-direction once, resulting in a placement on (4,4) and L=(5.5), The same way the AND-node  $v_7=(3,5)$  (L=(5,6)) and the OR-node  $v_8=(4,6)$  (L=(5,7)) are placed. Since all nodes are placed now the primary output has to be placed from  $v_8$ . For this the layout is increased by one column again (L=(6,7)) and the PO is placed to the eastern border.

From the returned layout we can see that the signal flow is straight forward in south-eastern direction due to the 2DD-Wave clocking scheme the algorithm is bound

to and as already discussed in the section about clocking 2.2.3, this limits ortho in many ways. First of all due to the selection of the "+"-majority gate as standard gate, they cannot be placed on the 2DD-Wave scheme, because the clocking only allows two-input logic gates. Also back-loops are not allowed in the clocking, prohibiting the placement and routing of sequential circuits. Though, ortho provides an efficient tool for the placement and routing of purely combinational circuits. As shown in paper [58], the 2DD-Wave scheme provides the most area efficient clocking when it comes to combinational circuits, beating both the USE and RES scheme.

With a deep understanding of the ortho algorithm with its constraints, involving drawbacks and advantages, now other ideas with comparable approaches or based on the ortho algorithm can be discussed. *Ropper*, a placement and routing framework [16] proposes an algorithm, which is based on [54]. As already discussed in the part about preprocessing, this algorithm brings some disadvantages, because dummy nodes are inserted as part of logic synthesis, leading to an increased size of the logic network. In ortho this step is unnecessary. Nevertheless the authors of Ropper point out that they have overcome some restrictions of ortho, one of them being able to place majority gates and also a more area efficient placement and routing. But these improvements come at a price. First of all the framework only achieves the placement and routing of majority gates not because of a clocking scheme providing three input tiles to a given tile, but supporting the use of rotated majority gates, which had been discussed to be very prone to crosstalk. Also custom gates and double wiring is used, so that many tiles have QCA cells placed only with a distance of one cell and not like [60] suggests a minimum distance of two QCA cells. The use of these custom gates is necessary in this algorithm, because the design used doesn't rely on the same strict constraints as ortho. The Ropper framework even routes wires above gates (solved with custom tiles) and doesn't use border inputs and outputs making it challenging to input and read data from the circuit. Based on the argumentation used in this work, the Ropper framework violates to many design rules, which have been analyzed to be necessary for a sufficient placement and routing algorithm.

Another paper trying to implement majority gates for QCA is *migortho*, which is based on the ortho algorithm provided in fiction. The difference of the algorithms is the use of the underlying gate-library. While this work uses the same as ortho, migortho utilizes the QCA-ONE library. From the preliminaries, it is already known that again the use of rotated majority gates and therefore the use of double wires is allowed. This means that circuits designed by migortho are also being considered to be prone to crosstalk, following the argumentation from above. But the algorithm shows that with the use of a different library ortho is already powerful enough to overcome some restrictions. This fact has motivated this work to implement some different ideas to enable ortho to be more area efficient, place "+"-majority gates and even implement a

strategy for the automated placement and routing of sequential circuits.

## 3.2 Design of Sequential QCA circuits

In this section the state of the art for sequential circuit design in QCA is discussed. Compared to the algorithms existent for the placement and routing for combinational logic, this area is still in its infancy. This section first focuses on the main part of implementing sequential logic and then uses this knowledge to give an insight into the implementation of storage cells.

#### 3.2.1 Sequential logic in QCA

In recent literature, several attempts have been made to implement latches and flip-flops (FFs) in order to obtain storage elements and enable sequentiality for quantum-dot cellular automata (QCA) circuits [31, 40, 46, 50]. These works primarily aim to translate Boolean CMOS equations into majority representations and implement this representation using QCA gates. However, the resulting circuits often rely on external 2-phase clocking signals adopted from the CMOS domain [31], despite the fact that QCA circuits already employ 4-phase clocking. The authors of [40] argue that a latching can be accomplished using QCA clocking, but that this would lead to a restriction of the circuit. However, as discussed in the subsections on clocking (2.2.3) and storage elements (2.4), it is suggested that the external clocking signal may be unnecessary and that the 4-phase clocking of QCA circuits should be used to accomplish sequential functionalities.

While some works propose more advanced sequential circuits, such as dual edge-triggered D-FFs [46] and reversible latches [50], they still rely on cell-based clocking, which was found to be insufficient in subsection 2.2.3. The USE clocking scheme [11] has been proposed as an evolution towards sequential tile-based placement and routing, but leads to worse layouts compared to 2DD-Wave for most benchmark problems [58]. Other works, such as [36] and [6], propose different latches and sequential element implementation algorithms using the USE scheme. However, these approaches all share the drawback of translating sequential logic directly from the CMOS domain.

As previously discussed in this work, wire segments should be used as storage elements as proposed in [56]. However, this work does not provide any circuits or placement and routing algorithms using these elements. In this work, the basic ideas of wire delays are utilized and adjusted according to the placement and routing of QCA circuits.

#### 3.2.2 QCA storage cells (QCA RAM)

Another area of research in QCA technology focuses on the implementation of random access memory (RAM) cells. While this paper primarily focuses on a placement and routing algorithm for sequential logic, the implementation of QCA storage can also be adapted. The current state of the art can be divided into two approaches for implementing RAM in QCA.

The first approach, as presented in the papers [12, 60, 47], involves translating CMOS technology into QCA and thus dealing with the same shortcomings resulting from an external clock signal. Additionally, these circuits are also based on cell-based clocking, which is insufficient for this work.

The second approach, as presented in [2, 33], proposes the use of multiplexer (MUX) structures to implement RAM cells. By using a 2:1 MUX with a RAM cell holding one bit of information and a bitline (BL) also holding one bit as inputs, the 2:1 MUX can decide whether the information on the BL is passed into the RAM cell or if the information in the RAM cell is retained, based on the information on the wordline (WL) which serves as the third input to the 2:1 MUX. However, the implementations shown in [2] and [33] still use an external clock signal and are also clocked cell-based.

The ideas of wire delays can be combined with the use of MUX to build a RAM cell, which can be constructed using orthogonal and the corresponding sequential distribution network, as proposed later in this work.

# 4 Methodology

This chapter proposes three different signal distribution networks that overcome the restrictions of the placement and routing algorithm, ortho, which was reviewed as the state of the art in chapter 3. Specifically, an ordering network, a majority gate network, and a sequential distribution network are introduced.

The placement and routing algorithm, ortho, has been shown to have limitations, but it still possesses a powerful placement and routing procedure. To overcome these limitations, it is proposed to maintain the foundation of the algorithm while incorporating new functionalities. These new functionalities often result in irregularities, such as in the clocking or routing, and the goal of the proposed signal distribution networks is to redistribute signals on the layout in a way that these irregular parts align with the regular placement and routing while still satisfying all design constraints. In this way, the algorithm can still function in a similar manner as ortho. However, the underlying logic network and the placement and routing must also be modified to incorporate the desired functionalities, adding complexity to the preprocessing and algorithm.

Since ortho is restricted to the use of only 2DD-Wave clocking, the signal distribution networks must have the ability to change the clocking in the layout to implement their respective functionalities. Therefore, the implementation of these networks must be done with care and synchronization constraints must be closely considered.

The ordering distribution network is discussed first, which aims to reduce the area in the input region of the layout. Afterwards, the networks used for implementing majority gates and sequential parts are discussed and analyzed.

## 4.1 Ordering Distribution Network

Looking again at the resulting layout of a 2:1 MUX designed by ortho in 3.6, it can be observed that in the first few rows, where the primary inputs are placed, no other gates are placed, as the space has been reserved for rewiring to resolve conflicts. The idea of the ordering distribution network is to order and place inputs in a way that wire crossings can be minimized and gates can be placed in the input area to save space. Additionally, after reordering the primary inputs (PIs), the remaining logic network is ordered topologically, so that the improvements following the ordering of the PIs can



Figure 4.1: Scheme of area usage in the Ordering Distribution Network

be applied to the entire network. To allow the usage of the input area, the ordering network must resolve conflicts in a different way.

Recalling the pseudo-code from the ortho algorithm 1, an input has a conflict when it is colored south, which would lead the routing to wire over the other inputs in the same column, which is not allowed. This means that the area overhead in the input region is highly dependent on the coloring assigned to the outgoing edges of the inputs. Line 3 of the pseudocode, which invokes the coloring algorithm, finds a valid but not an optimal coloring for the given logic network. Unfortunately, due to the algorithm's nature, it often assigns the color south to edges connected to PIs resulting in conflicts and area overhead.

Therefore, two preprocessing steps are introduced. In the first step the PIs are ordered, depending on their outgoing edges, allowing PIs connected with the same gates to be placed near each other reducing wire expenses and crossings. Secondly, the coloring in the input region of the logic network is improved to prevent excess wiring. To enable the use of the preprocessing steps, also new rule for edges colored south is introduced, making the rewiring redundant. This rule is found to be effective not only inside the conflicting input area but for the whole layout.

To examine the concept, it is necessary to first identify the portions of the logic network that belong to the ordering distribution network. This can be determined by examining the nodes that are placed in the input area. These include nodes that are directly connected to the primary inputs (PIs) and nodes that are connected to these nodes and colored east. For the network, all nodes that are successively connected, starting at each PI and ending at the first node with two fan-ins, are considered. Once these nodes are placed correctly, all conflicts are resolved, and the ortho algorithm can function as intended. The respective ordering and coloring should then be discussed, starting with the different gates inputs can be connected to.

#### Algorithm 2 Ortho changes with ordering distribution network

```
Convert N to a 3-graph by substitution and balance inverters at fan-out nodes
Order primary input nodes
Generate conditional direction assignment d: \Delta \rightarrow \{east, south\} and subdivide
signals if necessary
Compute topological ordering v_1, ..., v_i \in N
Extend L by one column and reserve it for primary inputs
for all vertex v_1,...,v_i \in N with at most two incoming signals \sigma_1,\sigma_2 do
   if vertex v is terminal/primary input then
       Extend L by one row
       Place v at position (0, h-1)
   else if d(\sigma_1) = d(\sigma_2) = east then
   else if signals are labeled south then
       if not root node exists then
           Extend L by one row
       end if
       w_{v} \leftarrow \text{max.} horizontal position of v's predecessors
       Place v at position (w_n, h-1)
   end if
end for
return L
```

For the ordering, primary inputs (PIs) connected to the same two-fanin gate are placed consecutively, minimizing the routing distance and reducing the probability of wire crossings. In addition to ordering, a valid coloring must also be found for all

nodes in the ordering distribution network. This requires consideration of all types of nodes connected to PIs.

The direction assignment for one-fanin nodes, including inverters and fan-out nodes, can be chosen arbitrarily as the primary input to which they are connected has only one outgoing edge, resulting in no dependencies. In this case, the non-conflicting east assignment can always be chosen. When examining two-input logic gates such as AND and OR gates, the coloring can only be chosen arbitrarily if both input nodes are primary inputs, again allowing for the non-conflicting assignment of east. In all other cases, the direction assignment must consider the coloring of the other incoming edge of the gate. To determine dependencies, all one-fanin nodes must first be colored, including inverters and fan-outs.

Regarding fan-outs, following the coloring rules, the two outgoing edges need to be colored in different directions, so that the fan-out gates placed into the network have one output assigned with color east and one output assigned with color south. Considering that the second coloring constraint requires both incoming edge of the gate connected to the edge colored south, also to be colored south, and the second incoming edge being connected to a PI, we can see that a conditional coloring alone is not powerful enough to resolve all conflicts. For this case a new placement rule for the south coloring is introduced in order to preserve the direction assignment rules but still resolve the conflict between primary inputs. The original algorithm part (lines 14-22) handling the placement of nodes based on their coloring makes sure that every gate placed east occupies a new column and every node colored south occupies a new row. These placement rules allow every gate to be placed without interfering with other gates, but the rules have been found to be too restrictive, allowing the following placement rule for south. If a node is labeled south and its predecessor, which has the lower horizontal position also has the higher vertical position, it is called root node and the layout is **not** extended by a row while the gate is still in position  $(w_v, h-1)$ . Following this rule the gate is now placed in the same row as its root node and the same column as the predecessor with the higher x-coordinate. If we apply this to a two-input gate in the ordering distribution network with a primary input and a fan-out node as predecessors, the primary input is always the root node due to the ordering and new coloring. Thus, the new rule allows the two-input gate connected to the primary input colored south and the fan-out node to be placed in the same column as the primary input, resulting in no conflict because the node is not actually placed south of its predecessors. It was found that this rule could not only be utilized for resolving PI conflicts, but also for the general *south* placement in the algorithm with one exception. Though, considering a fan-out node to be the root node, the coloring would wire both the eastern and the southern colored outgoing edges onto the same row, yielding a conflict. For this case the new rule is not applied and this case is excluded for the



Figure 4.2: Placement and routing of a 2:1 mux network using the ortho algorithm with the ordering distribution network

input area through the ordering. The resulting pseudo-code snippets replacing the used code are shown in algorithm 2. Also it has to be considered that the conditional coloring in the distribution network still needs to include helping nodes e.g. when three fan-out nodes are connected to the same nodes. Also before the coloring, first the input nodes need to be ordered according to the ideas presented. Thus, primary input nodes connected to fan-out nodes are placed first and then the primary input nodes, which are connected to the outgoing edges of the fan-out nodes are placed. This is done to reduce the distance between coherent gates and therefore also the number of wire crossings. Afterwards primary inputs directly connected to a gate which has its other incoming edge connected to a second primary input are placed. Finally all input nodes, which are not connected to the rest of the ordering distribution network are placed arbitrarily and the logic network is topologically ordered according to the new order of the primary inputs.

In addition to fan-out gates, issues related to inverter nodes also arise in the network design. For example, if an inverter node is assigned *south*, such as after a fan-out node, and it is intended to be placed in the same row as a primary input, a conflict arises because the input always has to wire in the x-direction. To minimize conflicts, all

inverters colored *south* must be placed at a minimum of one row further than the most southern primary input.

To further reduce the number of inverters in the logic network and prevent excessive overhead, a balancing network is introduced. This network aims to reduce the number of inverters by substituting them with fan-out nodes. For instance, in some cases a fan-out node has two inverters connected to its outgoing edges. These inverters can be replaced by a single inverter as the incoming node to the fan-out, resulting in an overall lower number of inverter nodes.

Figure 4.2 shows the placement and routing of the ortho algorithm after implementing the proposed ordering distribution network. The ordering of the inputs puts first the fan-out node and then the two connected primary inputs. Since the inverter is part of the ordering distribution network it gets colored *east* and allows the AND gates connected to the PIs to be colored south and the new rule for placement and routing can be applied. The last OR gate is not part of the distribution network and is therefore placed after the normal rules of the ortho algorithm. In the comparison to the layout in figure 3.6 can be quickly seen that the resulting layout saves up place and even wire crossings. The exact results are presented and analyzed in the next chapter.

## 4.2 Majority Gate Distribution Network

This section addresses the placement and routing of majority gates in QCA circuits using the ortho algorithm. The use of majority gates is significant in QCA as they can implement the majority function using only one gate, unlike in CMOS circuits which require multiple gates, as defined in Definition 2.1.3. However, this theoretical advantage can only be realized through efficient placement and routing. To this end, a majority gates distribution network for the orthogonal algorithm is proposed, allowing for the placement and routing of majority gates and enabling a comparison of design metrics between a logic network in the MIG representation and its corresponding logic network in the AIG representation

#### 4.2.1 The proposed signal distribution Network

The orthogonal algorithm uses 2DDWave clocking, which limits the direction assignment to only two options, *east* and *south*, and can only handle the placement and routing of 2-input logic gates. To introduce "+"-majority gates into the layout, a RES-like clocking scheme is necessary, which includes tiles with three incoming tiles and one outgoing tile. As shown in Figure 2.13(c), such a tile is located at position (1,1) and is suitable for placing a "+" majority gate, allowing it to be connected with three incoming signals. However, changing the clocking scheme of ortho to RES would



Figure 4.3: Global sychronization violation when connecting 2DD-Wave and RES clocked scheme

be highly inefficient and difficult to implement. This is because if the clocking were completely changed to RES, the algorithm would not be able to utilize every row and column of the clocking. The RES scheme supports signals to flow into the western or northern directions, thus only the first and third rows and the second and fourth columns support eastern and southern signal propagation. This would lead to only these parts of the clocking being utilized for the placement of two-input gates, resulting in approximately double the area usage for only two-input logic gates. An alternative approach to utilizing the RES scheme is to only support it in certain evenly distributed regions, allowing majority gates to be placed in specific, permanently assigned locations. For example, the layout could be divided into 4x4 tile sub-regions, and every fifth sub-region would be RES clocked, while the rest would be occupied with the 2DDWave scheme. On one hand, this approach should not produce as much area overhead since only some regions are inaccessible for two-input logic gates. However, the permanent clocking assignment limits the placement of majority gates to specific spots, leading to large area overhead if a majority gate needs to be placed far away from such a sub-region. For a network consisting mainly of majority gates, this implementation would also waste most of the 2DD-Wave clocked area. Another aspect to consider is the trivial global synchronization constraints within a uniformly 2DDWave clocked layout,



Figure 4.4: Proposed majority gate distribution network

which are disrupted by introducing RES clocking within the layout. By introducing irregular clocking such as RES sub-regions, signals can pass different amounts of tiles to reach the same tile, thereby violating the global synchronization constraint. In Figure 4.3, the top four rows are 2DD-Wave clocked and the bottom four rows are RES clocked. In the 2DD-Wave scheme, all three paths start globally synchronized. The two left paths need exactly one clock cycle to reach the majority gate. However, the right path in the RES scheme causes a delay, so that the signal travels two clock cycles before reaching the majority gate, thereby violating the global synchronization constraint.

To overcome these complications, the proposed distribution network uses a custom clocking scheme only in areas where majority gates are placed, and addresses the global signal synchronization constraint. As a result, the placement and routing of solely two-input gates should not produce any additional area overhead. Figure 4.4 illustrates the proposed majority gate signal distribution network, which is integrated into the orthogonal algorithm. The red-marked cells indicate the three inputs for the

majority gate distribution network. The cells in the middle of the tile are marked because they can be connected from above north or the west, which would result in one normal wire and one bent wire. The output in blue allows the algorithm to wire it in the east or south direction, eliminating any limitations. It is also important to note that the input tiles as well as the output tiles have the same clocking number as in a regular 2DDWave scheme, allowing for straightforward connections. Although the proposed distribution network does not produce any area overhead for the placement and routing of two-input gates, it can be seen that it does produce excess area due to its complex wiring, resulting from the synchronization conditions that had to be considered in its design. In comparison to the implementation of an AIG representation of the majority function designed with the orthogonal algorithm, the placement and routing of this single majority gate already needs more area. However, it should be noted that no wire crossings are used. Assuming that wire crossings have a high cost, the proposed majority gate distribution network is considered more ideal, although a meaningful cost comparison of the two implementations can only be done under a cost function that represents wire crossings in terms of normal gates.

The design constraints used to develop the signal distribution network are discussed in the following. Firstly, the distribution network should not contain any wire-crossings as they are considered highly costly. While introducing a cost-metric for wire-crossings may result in a more efficient implementation, this work excludes wire-crossings as a design rule for the network. Secondly, the distribution network must meet the global synchronization constraint. In a 2DDWave clocked layout, every diagonal is synchronous, and every signal wired on the same diagonal passes the same number of tiles following the orthogonal placement and routing. However, when examining the incoming tiles of a three-input tile, it can be seen that only two of the incoming tiles are on the same diagonal, and the third one is shifted by half a clock cycle. This results in the third incoming signal being delayed by half a clock cycle, violating the global synchronization constraint. Additionally, signals must pass a multiple of whole clock cycles in the signal distribution network in order to support the further use of 2DDWave and the local synchronization constraint. To satisfy this, the initially synchronous signals are first delayed by half a clock signal at the tile where the majority gate is placed, meeting the global synchronization constraint. Then, the output signal of the majority gate is delayed by another half a clock cycle, allowing it to be connected to the regular 2DDWave clocking scheme used in the remaining layout. The delays resulting from the distribution network lead to a total delay of one whole clock cycle for the signal propagating through the majority gate distribution network compared to all other signals in the logic network. Because the delay affects the global synchronization constraint, it has to be considered for each gate connected to the parents of a majority gate distribution network. In the following, the basic placement and routing of the



Figure 4.5: Scheme of the P&R using the Majority Gate Distribution Network

majority gates distribution network and the insertion of buffers in order to meet global synchronization is discussed.

#### 4.2.2 Placement and routing

The placement and routing of the proposed signal distribution network are subject to certain constraints. Firstly, the coloring of majority gates must be reviewed, as the logic network now includes three input nodes. However, the coloring algorithm can include helping nodes to resolve coloring conflicts of edges, and therefore, by dividing every edge with a helping node, it can be seen that a trivial coloring can be found even when including three input nodes in the logic network. Another aspect to consider is the need for a new direction to connect a third signal to the majority gate. However, since the only time such wiring occurs is inside the fixed distribution network, which can be placed and routed in the usual south-eastern manner, no additional directions need to be included. Additionally, the irregular clocking inside the signal distribution network must be reviewed. These irregularities don't allow the algorithm to wire connections over the network, requiring a special treatment for the placement of the majority gates. From the algorithm's perspective, a majority gate cannot be placed just south or east of another gate, as these gates could need wiring through the majority gate distribution network. Instead, the algorithm is forced to always assign the majority gate distribution



Figure 4.6: Buffer in east direction with resolve column and respective clocking

network to the south and east direction to prevent routing conflicts. This means that for majority gates, a trivial coloring is always chosen. The major drawback of this is that the area is not used optimally and the layout is extended in two directions as shown in Figure 4.5, compromising the beneficial use of the "+" majority gate.

#### 4.2.3 Signal synchronization and buffer insertion

The placement and routing using the proposed distribution network results in a delay of one clock cycles of signals passing through a majority gates. Since the tile-based clocking doesn't support a speedup of a signal, every other signal which comes into contact with a delayed signal also has to be delayed in order to meet the global synchronization constraint. Therefore a function is introduced to compute the delay of signals and allowing signals which are connected together to be synchronized by buffer insertion. For the delay computation, the algorithms views every incoming edge from every node starting at the primary output. If an incoming edge is connected to a majority gate, every other incoming edge of the same node gets a delay of one assigned, if this edge is not also connected to a majority gate. In the latter case all incoming edges of a node are delayed, resulting again in synchronous behavior. Nodes with incoming edges from majority gates are marked as delayed and the delays stack for every majority gate on a path. The inserted delays then result from the difference of delays of the incoming edges from a node and are realized by inserting wire buffers.

Figure 4.6 depicts a buffer in the east direction, which can also be used in the south direction by just rotating it 90 degrees. The snake-shaped structure delays a signal by exactly one clock cycle and is also used in custom placement and routing resulting from the QCA ONE library [39]. As in the majority gate distribution network, the buffers support irregular clocking, creating zones through which the algorithm cannot wire. In the case of buffers only one column or row is made impassable, allowing them to be

tracked and introducing a rewiring for conflicts. Algorithm 3 shows the code snippets which are changed and added to ortho in order to allow the placement and routing of majority gates distribution networks and the corresponding majority buffers. Figure 4.7 shows the placement of a majority gate inside the input distribution network and two and gates that have to be delayed to be connected to the delayed signal coming out of the majority gate distribution network. The insertion of the first buffer blocks the eastern direction of the second input. For this case, a resolve column is introduced where the signal can be assigned to a new row and be wired without conflict. From this layout, it can already be seen that the implementation of the majority gate distribution network brings several complications with it, all resulting in area overhead, which stands in contrast to the area which should be saved by introducing majority gates in the first place.

## Algorithm 3 Ortho changes with majority gate distribution network

```
Convert N to a 3-graph by substitution and balance inverters at fan-out nodes, except
for majority gates
Compute the delay as majority buffer insertion buf_{maj} for every node and assign it to
the incoming signals \sigma
Order primary input nodes
Create vectors with from the majority buffers blocked columns bl_c and rows bl_r
for all vertex v_1,...,v_i \in N with at most three incoming signals \sigma_1,\sigma_2,\sigma_3 do
   Rewire incoming signals which are wired on bl_c or bl_r
   if vertex v has fanin of three (is a majority gate) then
       if d(\sigma_1) = d(\sigma_2) = d(\sigma_3) = south then
          Extend L by one row and wire the incoming signal to (w_p, h-1) for every
incoming signal
       end if
       Insert majority buffers according to the delay computed in buf_{maj} and safe
blocked columns bl_c and rows bl_r
       Extend the layout by number of rows (7) and columns (5) of the majority gate
distribution network and place the distribution network at (w-5, h-7)
       Connect incoming signals west to the inputs of the distribution network
   else if d(\sigma_1) = d(\sigma_2) = east then
       Insert majority buffers according to the delay computed in buf_{maj} and safe
blocked rows bl<sub>r</sub>
   else if signals are labeled south then
       Insert majority buffers according to the delay computed in buf_{maj} and safe
```

blocked columns  $bl_c$ 

end if

end for



Figure 4.7: Placement and routing of a majority gate in conjunction with a delayed primary input

## 4.3 Sequential Distribution Network

In this section, a distribution network that enables orthogonal automatic design of sequential circuits is presented. To the authors' knowledge, there is currently no solution for placing and routing sequential circuits in Quantum-dot Cellular Automata (QCA). The existing algorithms for handling sequentiality in QCA, discussed in Chapter 3, simply translate CMOS structures into QCA and rely on an external clock signal, which is considered unnatural in the context of QCA's inherent clocking paradigm. The proposed placement and routing method utilizes the concept of signal-delaying wires from [56], and builds upon it to enable automated placement and routing.

#### 4.3.1 Placement and Routing

Before looking at the algorithm, first the idea of an FF wire should be discussed in the domain of placement and routing and not only as a single element. From the FF wire it can be examined that the proposed FF implementation requires more complex clock generators for every FF, and it is not sure if this is possible to implement. Also if we look back at the analogy of CMOS, a sequential circuit includes a combinational logic block and the storage element, which is formed by a wire FF in this case. But there is still a big difference. While in CMOS the information can just be arbitrarily wired back from the storage to the inputs of the combinational logic, in QCA the wiring back implies the placement of wire segments of which each is delaying the information by one clocking zone already, being a partial FF. If a signal is wired through four adjacent wire gates, a basic FF is already formed, since the information is delayed by four clock zones equaling a clock cycle. When looking back to the functionality of a storage element, it can be found that this delay is exactly the purpose of a clocking element and the reason why the clocking for the wire FF is customized. The idea proposed in this work is now to use the delay, which occurs naturally due to sequential wiring to mimic storage elements and therefore wire segments can be summarized to FFs without the need for customized clocking.

Considering the placement and routing of sequential circuits, not only a distribution network has to be designed but also the logic network has to be expanded regarding to storage elements. They are represented in the logic network by registers with its corresponding input, determining the value, which has to be stored, and its output, which gives the register value to the combinational logic again after delaying it to the next *circuit clocking cycle*. A circuit clocking cycle refers to one cycle of Bennet clocking that has propagated through the circuit completely. The registers are implemented into the logic network as follows. Register inputs (RIs) are treated similar to primary outputs, therefore they are dangling edges, which point to no node but additionally



Figure 4.8: Scheme of a sequential circuit layout after placement and routing

have a register output assigned. Register outputs (ROs) are treated similarly to primary inputs, being terminal vertices, but always feeding in the data which were given to the corresponding register input in the last circuit clocking cycle. Therefore, the logic network extends to  $N = (\Lambda, I, RO, \Sigma, O, RI)$ . Here it has to be mentioned that for the input distribution network due to their similarity ROs can be treated just like PIs, enabling the combination with the sequential distribution network. Also, for placement and routing, the similarities between PIs/ROs and POs/RIs can be exploited. The schematic layout resulting from the described algorithm is shown in figure 4.8. When the first part of ortho is performed, first the combinational logic part is placed and routed, treating ROs just like PIs and RIs just like POs. From this stage, a routing from the RIs to the ROs has to be found, which retains the local and global synchronization constraint. Because every register input has exactly one register output assigned, first of all, the register inputs are rewired and sorted in the same order as the register outputs. The ordering follows in a way that all RIs are put on a diagonal, and since to this point every gate is clocked uniformly with 2DDWave, the signals are all synchronized. With this starting position now wires with the same length have to be found between every register input and output. Since the wires now also have to go in western and northern directions in order to close the loop between the ROs in the upper left corner and the RIs in the down-right corner of the layout, the wiring is not arbitrary. One

big issue is also that the clocking cannot be chosen independently for each back-loop because the loops cross each other. Also another issue regarding timing can be derived. Considering a primary Input in a completely combinational circuit being placed in the fifth row of the layout. In this case the PI is set in a different time zone, because its signal is globally delayed by one clock cycle. Until now the assumption was made, that an input network can be used to delay the primary input by one clocking cycle to achieve again global synchronization. When an RO is placed in a different time zone, this also has to be respected by the wiring of the registers. As already mentioned, the registers do not delay the information by only one clocking cycle but by multiple clocking slowing down the performance of the circuit drastically. This huge delay is due to the fact that ortho lays the combinational logic only in the south-eastern direction, always increasing the distance between PIs/ROs and POs/RIs. Therefore the sequential signal distribution network always grows with the size of the combinational logic. Maybe a folding operation can be found for the ortho algorithm so that the distance between RIs and ROs can be decreased and therefore the delay produced by the sequential distribution network can be decreased as well.

#### Algorithm 4 Ortho changes with sequential distribution network

```
if vertex v is primary input then
   Extend L by one row
   Place v at position (0, h - 1)
   Wire the primary input to position (num_{reg} * 2, h - 1)
   if vertex v is colored south then
       Extend L by one column
       Wire the primary input to position (w-1, h-1)
   end if
else if vertex v is register output then
   Extend L by one row
   Place v at position (num_{reg} * 2, h - 1)
   if vertex v is colored south then
       Extend L by one column
       Wire the primary input to position (w-1, h-1)
   end if
end if
Connect the primary outputs to the respective borders
Connect the register inputs to the respective borders and connect the RIs to the ROs
return L
```

#### 4.3.2 RAM cell

As mentioned in 3.2.2, a RAM cell can be realized using wire delays to safe the data and a MUX, which can input new data into the RAM cell. With the possibility of placing and routing sequential circuits a RAM cell now is a MUX with two PIs and one RO. The PIs are the BL and WL, while the RO is the information held in the RAM cell. The output of the MUX is both the PO and the RI. Figure 4.9 shows a RAM cell, with its respective wordline(s), bitline(s) and latch. The core of the cell is a latch, which is simply propagating the data in a circle and therefore producing a stable output once in every clock cycle. The input mechanism works via two majority gates. The majority gates with the wordlines decides if BL or BL should be propagated to the majority gate connected to the loop. When  $WL_1 = WL_2$  the majority gate always outputs the third input BL. But for  $WL_1 = WL_2 = BL$  the first majority gate outputs BL. If the first majority gate has as output BL, the second majority gate has as output BL, the



Figure 4.9: QCA RAM cell designed as 2:1 mux with a sequential distribution network

second majority gate has two distinct inputs and the data which is held in the cell. In this case the stored bit always decides the output and therefore it is latched meaning that the data is stored. For the read operation only one majority gate is needed and in order to read the RAM cell the output-wordlines have to be inverted. Otherwise if both output-worlines are set to "0", the output isn't read. The corresponding truth tables for the RAM cell is shown in 4.9.

## 5 Experimental Evaluation

The proposed Ordering Networks were implemented in C++17 on top of the open-source framework *fiction*. The forked source-code can be found on GitHub with the following link: https://github.com/hibenj/fiction.

The experimental results were created on a system with a AMD Ryzen 5 PRO 3500U CPU with 8 cores, 16GB RAM and Windows 11 OS.

#### 5.1 Benchmarks

The selected benchmarks aim to compare the Distribution Networks with the state of the art, which is formed by ortho. Hence, the benchmarks [15, 54, 3, 8] ortho has been evaluated on in [59] need to be considered. In case of the Ordering Distribution Network all these benchmarks can be evaluated and therefore form a sufficient comparison to the state of the art.

Since only a few of the selected benchmarks contain majority gates, they are not suited to evaluate the properties of Majority Gates Distribution Networks. Therefore the random network generator in *fiction* is used to create MIGs with different sizes. The MIG benchmark can be placed and routed by the Majority Distribution Network and is converted into an AIG in order to enable ortho to place and route the same logic. Because the Sequential Distribution Network is the first algorithm, which is able to place sequential QCA circuits, no comparison can be made. In this case the *itc99-poli* [51] benchmarks are selected for the evaluation.

#### 5.2 Results

In this section the evaluation results of the three Distribution Networks are shown. Hence, they are compared each with the state of the art regarding their *key metrics*. The key metrics are those metrics, which are aimed to be impacted by the Distribution Network. But also common metrics, which were already investigated in the state of the art are given for completeness.

#### 5.2.1 Ordering Distribution Network

The data evaluated for the comparison between the Ordering Distribution Network and ortho as state of the art is shown in table 5.1. The table is ordered as follows. The column section describes the benchmarks, with their source, name, number of PIs and POs and the number of gates, which are placed on the layout including inverter and fan-out nodes. The second section gives the evaluation results of ortho, including the layout size, represented by the tile number in horizontal (width) and vertical direction (height), the number of wire crossings, number of total wires and the running time. The same metrics are shown for the Ordering Distribution Network for comparison. Since the Ordering Distribution Network aims to reduce area and wire crossings by ordering the inputs and giving new placement and route rules, the layout area and number of wire crossings are together with the running time the most important metrics for comparison. Because these functionalities add complexity to the placement and routing algorithm it is expected a slight increase in running time, while the wire crossings and area are reduced. As already stated also an inverter balancing was introduced, which only brings a gate reduction for the *par\_check* benchmark. Therefore, the gate number is given for both algorithms the same, for *par\_check* it has to be considered that it is reduced by one gate. Looking first at the running time of both algorithms, the average is 4% higher for the Ordering Distribution Network. The highest running time difference, with plus 75%, is made at the benchmark *c*5315, which is one of the benchmarks with the highest number of gates analyzed. Though the Ordering Distribution Network performs very well on this benchmark, reducing wire crossings by 21% and the layout area by 11%. Overall the wire crossings could be reduced by 13%. In case of the parity benchmark they could even be reduced by 56%, meaning that wire crossings are not only reduced within the newly utilized area east of the PIs, but also in the further placement and routing. A negative impact on wire crossings was shown only in four out of 34 cases (12%). Further the layout area was reduced for all benchmarks with an average of 17%. For smaller benchmarks the area reduction has more percentage improvement, since the area utilization in the PI section has more impact compared to the total size of the layout. For bigger benchmarks this effect is reduced, but with the increase of gates, the area efficient placement and routing of gates being colored south can be applied more often. Hence, even for the c7552 benchmark with 5654 gates an area reduction of 11% is reached.

|             | tins      | 7          | 7    | 7     | 7       | 7   | 7                 |          | 7     | 7             | 7            | 7            | 7     | 7      | 7       | 7     |       | 1,95  |        |        |        |        |         |        | 000     |
|-------------|-----------|------------|------|-------|---------|-----|-------------------|----------|-------|---------------|--------------|--------------|-------|--------|---------|-------|-------|-------|--------|--------|--------|--------|---------|--------|---------|
| NW          | M         | 6          | 11   | 18    | 51      | 66  | 154               | 112      | 95    | 79            | 178          | 413          | 438   | 2212   | 44      | 36732 | 95451 | 62090 | 104694 | 66266  | 304793 | 440620 | 1577735 | 705176 | 0,000   |
| Ordering NW | IWC    IW | 1          |      | 3     | ∞       | 13  | 10                | 19       | 12    | 12            | 27           | 53           | 54    | 72     | 8       | 4063  | 9172  | 8166  | 5912   | 9319   | 25247  | 39534  | 96594   | 34994  | 1       |
|             | l h       | 4          | 4    | 9     | 10      | 13  | 14                | 16       | 12    | 15            | 56           | 51           | 42    | 103    | 10      | 419   | 734   | 645   | 1110   | 763    | 1497   | 820    | 3267    | 5713   | 77 70   |
|             | *         | 4          | гC   | 9     | 6       | 13  | 11                | 12       | 11    | 11            | 13           | 22           | 20    | 48     | ^       | 193   | 328   | 267   | 440    | 342    | 604    | 1949   | 1509    | 1330   | 7       |
|             | t in s    | 7          | 7    | 7     | 7       | 7   | $\overline{\lor}$ | 7        | 7     | 7             | 7            | 7            | 7     | 7      | 7       | 7     | 2,02  | 1,41  | 2,34   | 1,90   | 7,19   | 9,71   | 34,50   | 24,73  | 2       |
|             | I M I     | 24         | 17   | 20    | 26      | 110 | 107               | 132      | 117   | 93            | 181          | 418          | 475   | 1867   | 62      | 34911 | 88686 | 65197 | 103721 | 101085 | 308518 | 433132 | 1551411 | 629779 | 7000    |
| Ortho       | MC        | rc.        | 2    | 2     | 9       | 14  | 22                | 22       | 15    | 14            | 28           | 62           | 89    | 164    | 8       | 4273  | 9422  | 7918  | 6318   | 10300  | 29792  | 41794  | 122373  | 31535  | 0000    |
|             | l h       |            | ^    | 8     | 13      | 16  | 19                | 24       | 17    | 18            | 32           | 53           | 48    | 119    | 12      | 432   | 841   | 969   | 1113   | 819    | 1580   | 2028   | 3436    | 5715   | 2,0     |
|             | 8         | 9          | гO   | 9     | 6       | 14  | 12                | 6        | 13    | 12            | 13           | 23           | 22    | 48     | 6       | 187   | 359   | 292   | 431    | 365    | 664    | 840    | 1616    | 1361   | 717     |
|             | 191       | 5          | 9    | 8     | 14      | 21  | 21                | 21       | 19    | 21            | 36           | 26           | 26    | 133    | 10      | 537   | 1089  | 845   | 1329   | 1092   | 1840   | 2748   | 4615    | 6963   | בעבע    |
| mark        | 0/I       | 3/1        | 2/1  | 2/1   | 3/1     | 3/2 | 4/1               | 5/1      | 3/4   | 3/2           | 3/1          | 5/2          | 5/3   | 16/1   | 5/2     | 36/7  | 41/32 | 60/26 | 41/32  | 33/25  | 233/63 | 50/22  | 178/123 | 32/32  | 707/101 |
| Benchmark   | Name      | mux21      | xor2 | xnor2 | par_gen | FA  | par_check         | majority | b1_r2 | 1bitAdderAOIG | 1bitAdderMaj | 2bitAdderMaj | cm82a | parity | c17     | c432  | c499  | c880  | c1355  | c1908  | c2670  | c3540  | c5315   | c6288  | CLLL    |
|             |           | trindade16 |      |       |         |     |                   | fontes18 |       |               |              |              |       |        | ISCAS85 |       |       |       |        |        |        |        |         |        |         |

|             | _      | _     |           |        |        |        |         |          |         |         |         |
|-------------|--------|-------|-----------|--------|--------|--------|---------|----------|---------|---------|---------|
|             | t in s | 7     | 1,05      | 1,16   | 3,04   | 5,61   | 20,15   | 12,30    | 27,53   | 128,79  | 142,19  |
| NM          | MI     | 23703 | 47421     | 56821  | 101910 | 279467 | 859067  | 593235   | 1020260 | 3484387 | 5289167 |
| Ordering NW | WC     | 2497  | 5319      | 2699   | 7166   | 28325  | 82809   | 41365    | 84432   | 283554  | 360324  |
|             | h      | 344   | 480       | 502    | 471    | 1393   | 2668    | 1990     | 2469    | 6189    | 6158    |
|             | W      | 161   | 222       | 240    | 410    | 268    | 668     | 813      | 1221    | 2438    | 2506    |
|             | t in s | <1    | 1,67      | 1,30   | 3,73   | 5,79   | 15,23   | 12,17    | 33,16   | 83,12   | 136,49  |
| 0           | M      | 25240 | 112860    | 51370  | 159743 | 284337 | 693203  | 585776   | 998017  | 3515968 | 5028437 |
| Ortho       | MC     | 2690  | 5533      | 7873   | 7161   | 28690  | 83063   | 60749    | 92862   | 306390  | 544606  |
|             | h      | 252   | 495       | 549    | 472    | 1428   | 2670    | 2211     | 2576    | 6426    | 6415    |
|             | W      | 185   | 228       | 234    | 999    | 578    | 1027    | 815      | 1329    | 2565    | 2606    |
|             | B      | 498   | 693       | 629    | 864    | 1973   | 3055    | 2761     | 3507    | 8592    | 2866    |
| Benchmark   | 0/1    | 7/27  | 11/7      | 60/3   | 8/256  | 10/11  | 256/129 | 128/8    | 136/127 | 135/128 | 512/130 |
| Be          | Name   | ctrl  | int2float | router | dec    | cavlc  | adder   | priority | i2c     | bar     | max     |
|             |        | EPFL  |           |        |        |        |         |          |         |         |         |

Table 5.1: I/O number of primary inputs/outputs, |G| number of logic network nodes (gates + fan-outs),  $w \times h$ aspect ratio given in tiles, |WC| number wire crossings, |W| number of wires,  $w \times h$  aspect ratio given in tiles,  $w \times h$  aspect ratio given in tiles, t in s runtime in seconds, OOM maximum RAM reached, —no data available

#### 5.2.2 Majority Gates Distribution Network

For the Majority gates distribution network the evaluated data is given in table 5.2. The collected results again divided in three column sections. The first section describing the benchmark with name, number of PIs and POs and also the number of majority gates in the network. The remaining two sections again include the metrics, number of gates, layout size, represented by the tile number in horizontal (width) and vertical direction (height), number of wire crossings, number of total wires and the running time. Since the benchmarks are given as MIGs, the number of gates differs for the logic network placed and routed with the Majority Gates Distribution Network and the AIG representation placed and routed with ortho. Due to this property, the Distribution Network should always have to place fewer gates. But as discussed in section 4.2, because of the placement and routing of the majority gate itself and the additionally required buffer insertion, the majority gates implementation in expected to scale very badly especially if buffers need to be inserted, which is more likely to happen for bigger circuits. On the other hand, since the Majority Distribution Network doesn't include any wire crossings, a decrease in wire crossings is expected. Even though the number of gates decreases, the complexity of the algorithm is increased dramatically, due to the buffer insertion, leaving the expectation of an increase in running time. Therefore, the key metrics examined for this Distribution Network are the number of placed gates, layout size, number of wire crossings and running time. Looking at the number of gates placed in these benchmarks, the MIG representation contains on average only 28% number of gates compared to their AIG representation. Though an average area increase of 865% and an average running time increase of 457% can be evaluated in the Majority Gates Distribution Network, making it very costly, while also having bad performance. The only positive property is the decrease of wire crossings by an average of 40%. Depending on a cost metric, which can compare how much area and wires the reduction of one wire crossing is worth, the real impact of this Distribution Network can be evaluated. All the named effects scale with the size of the network. In the smallest network r1 all effects are damped, showing an area increase of 421%, wire crossing reduction of 24%, while the MIG contains only 31% of the gates compared to the AIG network. For both algorithms the benchmark r1 finishes in under one second. Looking at the benchmark r8 with the most gates, the area and running time increase by 1012% and 666%, while the wire crossings are decreased by 52%. Also the number of gates in the MIG is only 26% from the number of gates in the AIG.

| Benchmark | nark |     |      |      |      | Ortho |         |           |      |       | M/   | AJ DNW |         |        |
|-----------|------|-----|------|------|------|-------|---------|-----------|------|-------|------|--------|---------|--------|
| O/I       | _    | M   | lG   | W    | h    | MC    | M       | t in s    | B    | W     | h    | MC     | IWI     | t in s |
| 3/2       |      | 5   | 84   | 29   | 64   | 92    | 602     | <1        | 26   | 107   | 73   | 70     | 1440    | <1     |
| 4/3       |      | ^   | 06   | 30   | 72   | 92    | 731     | $\nabla$  | 28   | 176   | 87   | 78     | 2377    | 7      |
| 6/1       | 0.   | 30  | 432  | 138  | 318  | 1205  | 11838   | $\forall$ | 132  | 904   | 385  | 831    | 34158   |        |
| 10/27     | .27  | 93  | 1321 | 390  | 086  | 8479  | 100222  | 2,42      | 350  | 3722  | 1000 | 4813   | 355351  |        |
| 10/       | . 20 | 190 | 2668 | 292  | 1975 | 31667 | 371272  | 14,68     | 712  | 8112  | 1978 | 16555  | 1431295 |        |
| 10/       | .75  | 82  | 1204 | 361  | 890  | 6803  | 79105   | 2,01      | 310  | 2744  | 902  | 3915   | 235040  |        |
| 23/       |      | 267 | 3799 | 1106 | 2811 | 63270 | 744287  | 22,72     | 971  | 11421 | 2801 | 32302  | 2791464 | 111,13 |
| 16/       | .85  | 324 | 4638 | 1321 | 3436 | 86195 | 1050366 | 29,24     | 1189 | 13714 | 3349 | 41489  | 3943616 | 194,62 |
| 16/       |      | 61  | 864  | 253  | 643  | 3907  | 40424   | 1,12      | 245  | 1815  | 999  | 2280   | 120456  | 6,82   |
| 8/21      |      | 83  | 1149 | 333  | 855  | 6861  | 74636   | 1,89      | 306  | 2706  | 854  | 3623   | 235136  | 9,74   |
| 12/46     | 46   | 140 | 2014 | 290  | 1496 | 19547 | 221230  | 2,96      | 520  | 6247  | 1544 | 9331   | 857118  | 43,59  |

Table 5.2: I/O number of primary inputs/outputs, |M| number of majority gates, |G| number of logic network nodes (gates + fan-outs),  $w \times h$  aspect ratio given in tiles, |WC| number wire crossings, |W| number of wires,  $w \times h$  aspect ratio given in tiles,  $w \times h$  aspect ratio given in tiles, t in s runtime in seconds, OOM maximum RAM reached, —no data available

#### 5.2.3 Sequential Distribution Network

The Sequential Distribution Network is the first algorithm, which is able to place sequential logic based on the theory provided in chapter 4. This means, that in this chapter the first experiments with such an algorithm are made. Therefore, the data in table 5.3 only contains the evaluation results of the Sequential Distribution Network. Again the rows hold the information for each benchmark problem, which are evaluated in the key metrics number of registers, number of gates, layout area, number of wire crossings, total number of wires and running time. As benchmarks the itc99 – poli library for sequential circuits is chosen. For benchmarks with more than 245 registers the test system ran out of memory (OOM). The total number of wires is especially important since the implementation of registers is implemented using only wires. For combinational circuits the size of the layout and hence the number of total wires scales with the number of gates in a benchmark. For sequential circuits the number of wires also scales with the number of registers, also directly increasing the layout size and the number of wire crossings. Comparing e.g. benchmark b04 with b05, it can be seen that b04 has about 800 gates less, but about 30 registers more than b05, which already results in a higher wiring regarding number of wires and also wire crossings. Therefore the implementation of the Sequential Distribution Network can be said to be really costly. The explosion in wiring costs and layout size based on the number of registers is the reason the systems gets an OOM issue.

|            | Name   I/O | 0/1   | R    | IRI   IGI | M    | h     | MC    W | M        | t in s  |
|------------|------------|-------|------|-----------|------|-------|---------|----------|---------|
| itc99-poli | b01        | 2/2   | 5    | 127       | 72   | 107   | 226     | 2837     | <1      |
|            | b02        | 1/1   | 4    | 89        | 51   | 26    | 127     | 1214     | 7       |
|            | b03        | 4/4   | 30   | 420       | 352  | 376   | 4421    | 47514    | 1,01365 |
|            | b04        | 11/8  | 99   | 1866      | 1032 | 1469  | 28841   | 440650   | 9,527   |
|            | p05        | 1/36  | 34   | 2636      | 1004 | 1887  | 19518   | 366310   | 15,2846 |
|            | 90q        | 2/6   | 6    | 143       | 102  | 125   | 425     | 4660     | 7       |
|            | P07        | 1/8   | 49   | 1149      | 682  | 941   | 15163   | 196220   | 4,66632 |
|            | 80q        | 9/4   | 21   | 462       | 298  | 387   | 3428    | 39367    | 1,34292 |
|            | 60q        | 1/1   | 28   | 426       | 341  | 365   | 4619    | 44597    | 7       |
| -          | b10        | 11/6  | 17   | 549       | 291  | 447   | 4546    | 47939    | 1,23536 |
|            | b11        | 9/2   | 31   | 1718      | 683  | 1303  | 18781   | 261924   | 6,27877 |
|            | b12        | 9/9   | 121  | 2854      | 1706 | 2313  | 74904   | 1195053  | 42,3646 |
|            | b13        | 10/10 | 53   | 878       | 670  | 762   | 14746   | 174757   | 10,0356 |
|            | b14        | 32/54 | 245  | 24900     | 8648 | 18087 | 876527  | 26976087 | 690'068 |
|            | b15        | 36/70 | 449  |           |      |       |         |          | MOO     |
|            | b17        | 37/97 | 1415 |           |      |       |         |          | OOM     |

Table 5.3: I/O number of primary inputs/outputs, |R| number of registers (D-flipflops), |G| number of logic network nodes (gates + fan-outs), w × h aspect ratio given in tiles, t in s runtime in seconds, OOM maximum RAM reached, —no data available

## 6 Conclusion

In this work the state of the art scalable placement and routing algoithm ortho was extended with three signal distribution networks, each adding different functionality to the preprocessing and base algorithm.

The ordering distribution network aims to reduce wire crossings and area, through the ordering of PIs and allowing gates to be placed in the input area. The results showed that not only these goals could be reached but also suggests that the ordering of primary inputs affects the placement and routing of the whole logic network resulting in even more benefits.

The majority distribution network aims to show the difficulty of applying one of the theoretically biggest advantage of QCA technology, the placement and routing of majority gates. Although the number of gates could be reduced significantly in some networks, the implementation into ortho results in higher area usage and wiring effort. Big part of this is the need for buffer insertion, resulting from the 2DDWave scheme used within ortho in combination of the signal delay inflicted by the placement of majority gates. For further improvement, the area usage could be reduced by finding a way to place majority gates in the same rows or column, which would however penalize the running time. Also the use of an algorithm which naturally supports a clocking, in which majority gates can be implemented are suggested to not improve the placement and routing of majority gates.

The sequential distribution network qualifies the ortho algorithm to design sequential circuits after rethinking the basic theory of sequentiality, storage elements and clocking for the QCA domain. Therefore, so called wire registers were proposed and used in the placement and routing. This has to be considered for the Bennet clocking and slows down the circuit quite heavily, depending on the size of the layouts combinational part. The ideas of the wire register were even powerful enough to implement a RAM cell based on a 2:1 mux and are suggested to be used for further research in implementing sequential behavior into QCA circuits.

# **List of Figures**

| 2.1  | Binary Logic Network                                                        | 6  |
|------|-----------------------------------------------------------------------------|----|
| 2.2  | QCA-Cell sates                                                              | 9  |
| 2.3  | Adjacent QCA-cells forming a wire segment                                   | 10 |
| 2.4  | Different QCA Inverter representations                                      | 10 |
| 2.5  | The QCA Majority gate                                                       | 11 |
| 2.6  | QCA wires                                                                   | 12 |
| 2.7  | Different QCA wire crossing implementations                                 | 13 |
| 2.8  | QCA Standard Library                                                        | 14 |
| 2.9  | Schematic of a combined QCA and CMOS system [29]                            | 14 |
| 2.10 | QCA wire divided into the four clock zones according to Bennet clocking     | 15 |
| 2.11 | QCA clocking pipeline                                                       | 16 |
| 2.12 | Cell based layout of a 2:1 mux [33]                                         | 17 |
| 2.13 | Different clocking Schemes in QCA                                           | 18 |
|      | Eccles-Jordan-Flip-Flop                                                     | 20 |
| 2.15 | SR-Latch                                                                    | 21 |
| 2.16 | D-Latch                                                                     | 22 |
|      | Clocking of a basic latch in QCA                                            | 23 |
| 3.1  | Lazy SMT-solving process [1]                                                | 25 |
| 3.2  | Gate placement with black circles showing wasted area [49]                  | 27 |
| 3.3  | Example OGD drawing                                                         | 28 |
| 3.4  | Insufficient timing constraints of a OGD representation [56]                | 28 |
| 3.5  | Relative positions of an OGD graph with correct color assinment             | 29 |
| 3.6  | Placement and routing of a 2:1 mux network using the ortho algorithm        | 33 |
| 4.1  | Scheme of area usage in the Ordering Distribution Network                   | 38 |
| 4.2  | Placement and routing of a 2:1 mux network using the ortho algorithm        |    |
|      | with the ordering distribution network                                      | 41 |
| 4.3  | Global sychronization violation when connecting 2DD-Wave and RES            |    |
|      | clocked scheme                                                              | 43 |
| 4.4  | Proposed majority gate distribution network                                 | 44 |
| 4.5  | Scheme of the P&R using the Majority Gate Distribution Network              | 46 |
| 4.6  | Buffer in <i>east</i> direction with resolve column and respective clocking | 47 |

## List of Figures

| 4.7 | Placement and routing of a majority gate in conjunction with a delayed  |    |
|-----|-------------------------------------------------------------------------|----|
|     | primary input                                                           | 50 |
| 4.8 | Scheme of a sequential circuit layout after placement and routing       | 52 |
| 4.9 | QCA RAM cell designed as 2:1 mux with a sequential distribution network | 55 |

# **List of Tables**

| 5.1 | I/O number of primary inputs/outputs, $ G $ number of logic network nodes (gates + fan-outs), $w \times h$ aspect ratio given in tiles, $ WC $ number |    |
|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|     | wire crossings, $ W $ number of wires, $w \times h$ aspect ratio given in tiles, $w$                                                                  |    |
|     | × h aspect ratio given in tiles,t in s runtime in seconds, OOM maximum                                                                                |    |
|     | RAM reached, —no data available                                                                                                                       | 59 |
| 5.2 | I/O number of primary inputs/outputs,  M  number of majority gates,                                                                                   |    |
|     | $ G $ number of logic network nodes (gates + fan-outs), w $\times$ h aspect                                                                           |    |
|     | ratio given in tiles,   WC   number wire crossings,   W   number of wires,                                                                            |    |
|     | $w \times h$ aspect ratio given in tiles, $w \times h$ aspect ratio given in tiles, t in s                                                            |    |
|     | runtime in seconds, OOM maximum RAM reached, —no data available                                                                                       | 61 |
| 5.3 | I/O number of primary inputs/outputs,  R  number of registers (D-                                                                                     |    |
|     | flipflops),  G  number of logic network nodes (gates + fan-outs), w ×                                                                                 |    |
|     | h aspect ratio given in tiles, t in s runtime in seconds, OOM maximum                                                                                 |    |
|     | RAM reached —no data available                                                                                                                        | 63 |

# **Bibliography**

- [1] E. Ábrahám and G. Kremer. "Smt solving for arithmetic theories: Theory and tool support." In: 2017 19th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). IEEE. 2017, pp. 1–8.
- [2] F. Ahmad. "An optimal design of QCA based 2n: 1/1: 2n multiplexer/demultiplexer and its efficient digital logic realization." In: *Microprocessors and Microsystems* 56 (2018), pp. 64–75.
- [3] L. Amarú, P.-E. Gaillardon, and G. De Micheli. "The EPFL combinational benchmark suite." In: *Proceedings of the 24th International Workshop on Logic & Synthesis (IWLS)*. CONF. 2015.
- [4] T. Ball, A. Podelski, and S. K. Rajamani. "Boolean and Cartesian abstraction for model checking C programs." In: *International Conference on Tools and Algorithms for the Construction and Analysis of Systems*. Springer. 2001, pp. 268–283.
- [5] Z. Beiki and A. Shahidinejad. "An Introduction to Quantum Cellular Automata Technology and Its Defects." In: *Reviews in Theoretical Science* 2 (Dec. 2014), pp. 334–342. DOI: 10.1166/rits.2014.1028.
- [6] D. Bhowmik, A. K. Pramanik, J. Pal, P. Sen, A. R. Singh, A. K. Saha, and B. Sen. "Regular clocking-based Automated Cell Placement technique in QCA targeting sequential circuit." In: *Computers & Electrical Engineering* 98 (2022), p. 107668.
- [7] T. Biedl and G. Kant. "A better heuristic for orthogonal graph drawings." In: *Computational Geometry* 9.3 (1998), pp. 159–180.
- [8] F. Brglez and H. Fujiwara. "A neutral netlist of 10 combinational benchmark circuits and a target translator." In: *Fortran. ISCAS'85*. 1985.
- [9] R. E. Bryant. "Graph-based algorithms for boolean function manipulation." In: *Computers, IEEE Transactions on* 100.8 (1986), pp. 677–691.
- [10] M. Bubna, S. Roy, N. Shenoy, and S. Mazumdar. "A layout-aware physical design method for constructing feasible QCA circuits." In: *Proceedings of the 18th ACM Great Lakes symposium on VLSI*. 2008, pp. 243–248.

- [11] C. A. T. Campos, A. L. Marciano, O. P. V. Neto, and F. S. Torres. "Use: a universal, scalable, and efficient clocking scheme for QCA." In: *IEEE Transactions on computer-aided design of integrated circuits and systems* 35.3 (2015), pp. 513–517.
- [12] J. Chaharlang and M. Mosleh. "An overview on RAM memories in QCA technology." In: *Majlesi Journal of Electrical Engineering* 11.2 (2017).
- [13] W.-J. Chung, B. Smith, and S. K. Lim. "Node duplication and routing algorithms for quantum-dot cellular automata circuits." In: *IEE Proceedings-Circuits, Devices and Systems* 153.5 (2006), pp. 497–505.
- [14] M. Eiglsperger, S. P. Fekete, and G. W. Klau. "Orthogonal graph drawing." In: *Drawing Graphs*. Springer, 2001, pp. 121–171.
- [15] G. Fontes, P. A. R. Silva, J. A. M. Nacif, O. P. V. Neto, and R. Ferreira. "Placement and routing by overlapping and merging qca gates." In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE. 2018, pp. 1–5.
- [16] R. E. Formigoni, R. S. Ferreira, and J. A. M. Nacif. "Ropper: A placement and routing framework for field-coupled nanotechnologies." In: 2019 32nd Symposium on Integrated Circuits and Systems Design (SBCCI). IEEE. 2019, pp. 1–6.
- [17] A. Gin, P. D. Tougaw, and S. Williams. "An alternative geometry for quantum-dot cellular automata." In: *Journal of Applied Physics* 85.12 (1999), pp. 8281–8286.
- [18] M. Goswami, A. Mondal, M. H. Mahalat, B. Sen, and B. K. Sikdar. "An efficient clocking scheme for quantum-dot cellular automata." In: *International Journal of Electronics Letters* 8.1 (2020), pp. 83–96.
- [19] A. Grabowski. "Mechanizing complemented lattices within Mizar type system." In: *Journal of Automated Reasoning* 55.3 (2015), pp. 211–221.
- [20] C. Hawkins, J. Segura, and P. Zarkesh-Ha. *CMOS Digital Integrated Circuits: A First Course*. Materials, Circuits and Devices. Institution of Engineering and Technology, 2012. ISBN: 9781613530023.
- [21] E. V. Huntington. "Boolean Algebra. A Correction." In: *Transactions of the American Mathematical Society* 35 (1933), p. 557.
- [22] E. V. Huntington. "A New Set of Independent Postulates for the Algebra of Logic with Special Reference to Whitehead and Russell's Principia Mathematica\*." In: *Proceedings of the National Academy of Sciences* 18.2 (1932), pp. 179–180. DOI: 10.1073/pnas.18.2.179. eprint: https://www.pnas.org/doi/pdf/10.1073/pnas.18.2.179.
- [23] P. T. Johnstone. "Conditions related to De Morgan's law." In: *Applications of sheaves* (1979), pp. 479–491.

- [24] R. W. Keyes and R. Landauer. "Minimal energy dissipation in logic." In: *IBM Journal of Research and Development* 14.2 (1970), pp. 152–157.
- [25] D. Kumar and D. Mitra. "A systematic approach towards fault-tolerant design of QCA circuits." In: *Analog Integrated Circuits and Signal Processing* 98 (Mar. 2019), pp. 1–15. DOI: 10.1007/s10470-018-1270-x.
- [26] R. Landauer. "Irreversibility and heat generation in the computing process." In: *IBM journal of research and development* 5.3 (1961), pp. 183–191.
- [27] C. S. Lent, M. Liu, and Y. Lu. "Bennett clocking of quantum-dot cellular automata and the limits to binary logic scaling." In: *Nanotechnology* 17.16 (2006), p. 4240.
- [28] C. S. Lent and P. D. Tougaw. "A device architecture for computing with quantum dots." In: *Proceedings of the IEEE* 85.4 (1997), pp. 541–557.
- [29] C. S. Lent, P. D. Tougaw, and W. Porod. "Quantum cellular automata: the physics of computing with arrays of quantum dot molecules." In: *Proceedings Workshop on Physics and Computation. PhysComp*'94. IEEE. 1994, pp. 5–13.
- [30] C. S. Lent, P. D. Tougaw, W. Porod, and G. H. Bernstein. "Quantum cellular automata." In: *Nanotechnology* 4.1 (1993), p. 49.
- [31] L. A. Lim, A. Ghazali, S. C. T. Yan, and C. C. Fat. "Sequential circuit design using Quantum-dot Cellular Automata (QCA)." In: 2012 IEEE International Conference on Circuits and Systems (ICCAS). IEEE. 2012, pp. 162–167.
- [32] M. Mahdavi, M. A. Amiri, S. Mirzakuchaki, and M. N. Moghaddasi. "Single Electron Fault in QCA Inverter Gate." In: 2009 Fifth International Conference on MEMS NANO, and Smart Systems. 2009, pp. 63–66. DOI: 10.1109/ICMENS.2009.23.
- [33] A. H. Majeed, E. Alkaldy, M. S. Zainal, K. Navi, and D. Nor. "Optimal design of RAM cell using novel 2: 1 multiplexer in QCA technology." In: *Circuit world* (2019).
- [34] A. H. Majeed, M. S. Zainal, and E. Alkaldy. "Quantum-dot Cellular Automata." In: *International Journal of Integrated Engineering* 11.8 (2019), pp. 143–158.
- [35] M. Mohammadi, M. Mohammadi, and S. Gorgin. "An efficient design of full adder in quantum-dot cellular automata (QCA) technology." In: *Microelectronics journal* 50 (2016), pp. 35–43.
- [36] R. S. e. a. Mohammed Alharbi Gerard Edwards. "Novel Ultra-Energy-Efficient Reversible Designs of Sequential Logic Quantum-Dot Cellular Automata Flip-Flop Circuits." In: *PREPRINT* (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-2145478/v1] 1 (2022).

- [37] F. Peng, Y. Zhang, R. Kuang, and G. Xie. "Spars: A Full Flow Quantum-Dot Cellular Automata Circuit Design Tool." In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 68.4 (2020), pp. 1233–1237.
- [38] F. Pigorsch, C. Scholl, and S. Disch. "Advanced unbounded model checking based on AIGs, BDD sweeping, and quantifier scheduling." In: 2006 Formal Methods in Computer Aided Design. IEEE. 2006, pp. 89–96.
- [39] D. A. Reis, C. A. T. Campos, T. R. B. S. Soares, O. P. V. Neto, and F. S. Torres. "A Methodology for Standard Cell Design for QCA." In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS). 2016, pp. 2114–2117. DOI: 10.1109/ISCAS.2016.7538997.
- [40] J. I. Reshi, M. T. Banday, and F. A. Khanday. "Sequential circuit design using quantum dot cellular automata (QCA)." In: 2015 Symposium on Computers, Communication and Electronic Engineering. 2015, pp. 143–148.
- [41] T. N. Sasamal, A. K. Singh, and A. Mohan. "Quantum-Dot Cellular Automata Based Digital Logic Circuits: A Design Perspective." In: *Studies in Computational Intelligence*. 2020.
- [42] R. R. Schaller. "Moore's law: past, present and future." In: *IEEE spectrum* 34.6 (1997), pp. 52–59.
- [43] G. Schedelbeck, W. Wegscheider, M. Bichler, and G. Abstreiter. "Coupled quantum dots fabricated by cleaved edge overgrowth: From artificial atoms to molecules." In: *Science* 278.5344 (1997), pp. 1792–1795.
- [44] C. Scholl and P. Molitor. Communication based FPGA synthesis for multi-output Boolean functions. IEEE, 1995.
- [45] G. Schulhof, K. Walus, and G. A. Jullien. "Simulation of Random Cell Displacements in QCA." In: *J. Emerg. Technol. Comput. Syst.* 3.1 (2007), 2–es. ISSN: 1550-4832. DOI: 10.1145/1229175.1229177.
- [46] R. Singh and M. K. Pandey. "Analysis and implementation of reversible dual edge triggered d flip flop using quantum dot cellular automata." In: *Int. J. Innov. Comput. Inf. Control* 14.1 (2018), pp. 147–159.
- [47] R. Singh and D. K. Sharma. "Design of efficient multilayer RAM cell in QCA framework." In: Circuit World (2020).
- [48] M. Taucer, F. Karim, K. Walus, and R. A. Wolkow. "Consequences of many-cell correlations in clocked quantum-dot cellular automata." In: *IEEE Transactions on Nanotechnology* 14.4 (2015), pp. 638–647.
- [49] T. Teodósio and L. Sousa. "QCA-LG: A tool for the automatic layout generation of QCA combinational circuits." In: *Norchip* 2007. IEEE. 2007, pp. 1–5.

- [50] H. Thapliyal and N. Ranganathan. "Reversible logic-based concurrently testable latches for molecular QCA." In: *IEEE transactions on nanotechnology* 9.1 (2009), pp. 62–69.
- [51] C. G. at Politecnico di Torino. *PoliTo ITC99*. 1999. URL: https://github.com/squillero/itc99-poli (visited on 06/26/2018).
- [52] F. S. Torres, M. Walter, R. Wille, D. Große, and R. Drechsler. "Synchronization of clocked field-coupled circuits." In: 2018 IEEE 18th International Conference on Nanotechnology (IEEE-NANO). IEEE. 2018, pp. 1–5.
- [53] G. Toth and C. S. Lent. "Quasiadiabatic switching for metal-island quantum-dot cellular automata." In: *Journal of Applied Physics* 85.5 (1999), pp. 2977–2984.
- [54] A. Trindade, R. Ferreira, J. A. M. Nacif, D. Sales, and O. P. V. Neto. "A placement and routing algorithm for quantum-dot cellular automata." In: 2016 29th symposium on integrated circuits and systems design (SBCCI). IEEE. 2016, pp. 1–6.
- [55] V. Vankamamidi, M. Ottavi, and F. Lombardi. "Two-dimensional schemes for clocking/timing of QCA circuits." In: *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems* 27.1 (2007), pp. 34–44.
- [56] M. Walter and R. Drechsler. "Design Automation for Field-Coupled Nanotechnologies." In: July 2020, pp. 176–181. DOI: 10.1109/ISVLSI49217.2020.00040.
- [57] M. Walter, R. Wille, D. Große, F. S. Torres, and R. Drechsler. "Placement and Routing for Tile-Based Field-Coupled Nanocomputing Circuits Is NP-Complete (Research Note)." In: *J. Emerg. Technol. Comput. Syst.* 15.3 (2019). ISSN: 1550-4832. DOI: 10.1145/3312661.
- [58] M. Walter, R. Wille, F. Sill Torres, and R. Drechsler. "Exact Placement and Routing." In: Design Automation for Field-coupled Nanotechnologies. Springer, 2022, pp. 47–78.
- [59] M. Walter, R. Wille, F. S. Torres, D. Große, and R. Drechsler. "Scalable design for field-coupled nanocomputing circuits." In: *Proceedings of the 24th Asia and South Pacific Design Automation Conference*. 2019, pp. 197–202.
- [60] K Walus, A Vetteth, G. Jullien, and V. Dimitrov. "RAM design using quantum-dot cellular automata." In: *Nanotechnology conference*. Vol. 2. 2003, pp. 160–163.
- [61] M. Wilson, K. Kannangara, G. Smith, M. Simmons, and B. Raguse. "Nanotechnology: basic science and emerging technologies." In: (2002).
- [62] L. Zhang. "Solving QBF with combined conjunctive and disjunctive normal form." In: PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE. Vol. 21. 1. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999. 2006, p. 143.