# DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

TECHNISCHE UNIVERSITÄT MÜNCHEN

Master's Thesis in Electrical Engineering

# Signal Distribution Networks in Automatic QCA Standard Cell Placement and Routing

Benjamin Hien

# DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

#### TECHNISCHE UNIVERSITÄT MÜNCHEN

Master's Thesis in Electrical Engineering

# Signal Distribution Networks in Automatic QCA Standard Cell Placement and Routing

# Signalverteilungs-Netzwerke in automatisierter QCA Standard Zellen Plazierung und Verdrahtung

Author: Benjamin Hien

Supervisor: Prof. Dr. Robert Wille Advisor: Dr. Marcel Walter

Submission Date: 08.02.2023

| I confirm that this master's thesis in electrical engineering is my own work and I have documented all sources and material used. |               |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------|---------------|--|--|
| Munich, 08.02.2023                                                                                                                | Benjamin Hien |  |  |
|                                                                                                                                   |               |  |  |
|                                                                                                                                   |               |  |  |



## **Abstract**

New technologies to compete with CMOS, one of them QCA
Placement and Routing as key to producibility.
Challenges of Placement and Routing in previous algorithms.
Goals of this work:
I minimizing area/tiles,
II making it possible to place majority-gates (which is a promising aspect of QCA),
III making it possible to P&R sequential circuits
This is done by introducing several signal distribution networks
Results are compared with already existing algorithms...

# **Contents**

| A  | knov  | vledgments                       | iii |
|----|-------|----------------------------------|-----|
| Al | strac | et                               | iv  |
| 1  | Intr  | oduction                         | 1   |
|    | 1.1   | Motivation                       | 1   |
|    | 1.2   | Objective                        | 1   |
| 2  | Prel  | iminaries                        | 2   |
|    | 2.1   | Representation of Logic Circuits | 2   |
|    |       | 2.1.1 Boolean Functions          | 2   |
|    |       | 2.1.2 Logic Networks             | 3   |
|    | 2.2   | QCA Technology                   | 6   |
|    |       | 2.2.1 Cells                      | 7   |
|    |       | 2.2.2 Information Transfer       | 8   |
|    |       | 2.2.3 Gates                      | 12  |
|    | 2.3   | P&R problem                      | 17  |
| 3  | Stat  | e of the Art                     | 19  |
|    | 3.1   | Combinational P&R Algorithms     | 19  |
|    | 3.2   | Sequential P&R                   | 19  |
| 4  | Met   | hodology                         | 20  |
|    | 4.1   | Input Network                    | 20  |
|    | 4.2   | Majority Gates Placement Network | 20  |
|    | 4.3   | Sequential Circuits Placement    | 20  |
| 5  | Exp   | erimental Evaluation             | 21  |
|    | 5.1   | Benchmarks                       | 21  |
|    | 5.2   | Results                          | 21  |
| Li | st of | Figures                          | 22  |
| Li | st of | Tables                           | 23  |

Bibliography 24

## 1 Introduction

#### 1.1 Motivation

About the technology, why its promising and important.

Lack of automated algorithms for P&R

P&R as sign of producibility

Why the distribution networks are able to make QCA better producible / cheaper

### 1.2 Objective

The thesis is divided in ...

#### 2 Preliminaries

#### 2.1 Representation of Logic Circuits

Independent of the underlying technology, digital circuits can be represented by logic functions. The Boolean Algebra, formed by mathematician George Boole in 1847, proposes these logic functions and provides a foundation to discuss about them.

#### 2.1.1 Boolean Functions

The definition as given here is based on an addition to Boolean calculus by Edward V. Huntington. CITE The basis of a Boolean algebra is given as follows:

**Definition 2.1.1** (Basis for Boolean algebra). Given a finite set S, two binary functions  $\cdot$ :  $S \times S \to S$  and  $+: S \times S \to S$ , and one unary function  $\neg: S \to S$ , the tuple  $(S, \cdot, +, \neg)$  is called a Boolean algebra iff the following constraints hold for all a, b,  $c \in S$ :

```
(1) a \cdot b = b \cdot a a + b = b + a

(2) a \cdot (b + c) = (a \cdot b) + (a \cdot c) a + (b \cdot c) = (a + b) \cdot (a + c)

(3) \exists 1 \in S : a \cdot 1 = a \exists 0 \in S : a + 0 = a

(4) \exists 0 \in S : a \cdot \neg a = 0 \exists 1 \in S : a + \neg a = 1.
```

With the constraints describing (1) *commutativity*, (2) *distributivity*, (3) *neutrality*, and (4) *complementarity*.

In the original definition a Boolean algebra is defined by the 6-tuple ( $\mathbb{B}$ ,  $\vee$ ,  $\wedge$ ,  $\neg$ , 0, 1), where  $\vee$  and  $\wedge$  are another denotations for the binary operands for disjunction + and conjunction  $\cdot$  in  $\mathbb{B}$ , the known unary negation function  $\neg$ , and two distinct elements 0 and 1. Negation  $\neg a$  is also commonly notated as  $\bar{a}$ .

Since this definition is restricting the use of only the three Boolean functions  $(\vee, \wedge, \neg)$ , we want to extend by the following definition:

**Definition 2.1.2.** A function  $f : \mathbb{B}^n \to \mathbb{B}$ , where  $n \in N$ , is called a Boolean function. Analogously, a function  $f : \mathbb{B}^n \to \mathbb{B}^m$ , where  $n, m \in N$ , is called multi-output Boolean

function and can be interpreted as  $f_v = (f_{v1},...,f_{vm})$ , where  $f_{vi} : \mathbb{B}^n \to \mathbb{B}$ , for all  $1 \le i \le m$ .

A common notation for Boolean functions are the *conjunctive normal form* (CNF) and *disjunctive normal form* (DNF), using literals.

**Definition 2.1.3.** A literal is an atom or the negation of an atom. In the former case the literal is positive, in the latter case it is negative.

**Definition 2.1.4.** A formula F is in conjunctive normal form (CNF) if it is a conjunction of disjunctions of literals:

$$\bigwedge_{i}\bigvee_{j}(\neg)v_{ij},$$

where  $v_{ii} \in \mathbb{B}$ .

A formula F is in disjunctive normal form (DNF) if it is a conjunction of disjunctions of literals:

$$\bigvee_{i} \bigwedge_{j} (\neg) v_{ij}$$
,

where  $v_{ij} \in \mathbb{B}$ .

Using the CNF or rather the DNF and *De Morgan's laws* following from the definitions in 2.1.1, it follows that any Boolean Algebra can be reduced to only two operands, e.g. conjunction ( $\vee$ ) and negation ( $\neg$ ). Any set of such two Boolean functions is called *universal*.

#### 2.1.2 Logic Networks

There are many ways of representing Boolean Functions. But most of them, including e.g. truth tables or reduced sum of products, suffer from drawbacks like exponential growth of size with the number of arguments and functions with exponential representations. The representation of combinational circuits as logic networks overcomes these restrictions and has proven to be very useful in the logic synthesis process. Following definition for directed acyclic graphs is shown in CITE:

**Definition 2.1.5** (Function graph). A function graph is a rooted, directed graph with vertex set V containing two types of vertices. A *nonterminal* vertex v has as attributes an argument index  $index(v) \in \{1,...,n\}$ , and two children low(v),  $high(v) \in V$ . A *terminal* vertex v has as attribute a value  $value(v) \in \{0,1\}$ .



The corresponding recursive Boolean functions read:

$$f_v = f_v(v_8) \qquad f_v(v_6) = f_v(v_2) \lor f_v(v_3) \\ f_v(v_8) = f_v(v_7) \land f_v(v_6) \qquad f_v(v_5) = f_v(v_3) \lor f_v(v_1) \\ f_v(v_7) = f_v(v_4) \land f_v(v_5) \qquad f_v(v_4) = f_v(v_2) \lor f_v(v_1) \\ \text{with primary inputs:} \qquad f_v(v_1), f_v(v_2), f_v(v_3) \in \{0, 1\}$$

Figure 2.1: Binary Logic Network of Majority Function

Furthermore, for any nonterminal vertex v, if low(v) is also nonterminal, then we must have index(v) < index(low(v)). Similarly, if high(v) is nonterminal, then we must have index(v) < index(high(v)).

Since the definition dosen't yet include Boolean Functions and reduces the number of children connected to a vertice to two, therefore only allowing binary Boolean Functions, a custom definition is given:

**Definition 2.1.6** (Logic Network). A logic network L(V, E) is a rooted, directed graph with vertex set V and edge set E. For any vertex  $v \in V$ , vertices connected by incoming edges  $e_{inc} \in E$  are called children. A vertex connected by and outgoing edge  $e_{out} \in E$  is

called parent. V contains two types of vertices. A *nonterminal* vertex v has as attributes an argument index  $index(v) \in \{1,...,n\}$ , and l children  $child_1(v),...,child_l(v) \in V$ . A *terminal* vertex v has as attribute a value  $value(v) \in \{0,1\}$ .

Furthermore, for any nonterminal vertex v, if  $child_i(v)$  with  $1 \le i \le l$ , then we must have  $index(v) < index(child_i(v))$  respectively.

**Definition 2.1.7** (Logic Network Boolean Functions). A set of nary Boolean Functions  $\mathbb{B}$  is accessed via the argument index, assigning a Boolean Function  $f_v$  to every vertex:

- 1. If v is a terminal vertex:
  - a) If value(v) = 1, then  $f_v = 1$
  - b) If value(v) = 0, then  $f_v = 0$
- 2. If v is a nonterminal vertex with index(v) = i, then  $f_v$  is the function  $f_v(x_1,...,x_n) = f_v(f_v(child_1(v)),...,f_v(child_l(v)))$ .

The binary Logic Network of the ternary majority function is depicted figure 2.1.

**Definition 2.1.8** (Majority Function). The ternary Boolean majority function is defined as:  $\langle a, b, c \rangle = ab + ac + bc$ , so that the function value equals the majority of it's incoming values.

It follows: < a, b, 0 >= ab and < a, b, 1 >= a + b.

Adapting the names used in the underlying libraries used to program the algorithms proposed in chapter 4 terminal vertices are referred to as *primary input* (PI) witch their set denoted as I. The set of nonterminal vertices referred to as *nodes* is denoted as  $\Lambda$ . From the definition follows  $I \cap \Lambda = \emptyset$ . An edge connecting a children  $v_i$  and parent vertex  $v_j$  is called a *signal*. With  $v_i < v_j$  the notation of a signal is  $(v_i, v_j)$ . The set of all signals is denoted as  $\Sigma$ . If an edge doesn't point to another vertex it is called *primary output* (PO) and their set is denoted as O. Therefore also  $\Sigma \cap O = \emptyset$  holds true. From the definition of a logic network we can now describe it as acyclic directed graph  $L = (\Lambda, I, \Sigma, O)$ .

As already mentioned in subsection 2.1.1, a set of two certain Boolean functions can form any Boolean algebra. As long as this universality is contained the set of node functions can be extended. Using conjunction and negation as the only node functions of a logic network, we get the so called *AND-Inverter Graphs* (AIGs). Another widely used binary logic network is the *Majority-Inverter Graphs* (MIGs) utilizing the ternary majority function and negation. But there also exists a wide range of logic networks permitting more than just two node functions, like *XOR-AND-Inverter Graphs* (XAGs).

Since the logic network represents the combinational circuit in the given technology, a suitable logic network representation has to be determined. Because, even though these logic networks can implement any Boolean function given in a specification, not every logic network can be synthesized into any given technology. Looking at the current standard technology *complementary metal-oxidesemiconductor* (CMOS), the logic network is then synthesized by using building blocks consisting of *metal-oxide-semiconductor field-effect transistors* (MOSFETs), the elemental unit in this technology. The process of turning a circuit specification into a logic gate representation is called *logic synthesis*.

Given these logic network characteristics, none of the representations are *canonical*, which means that a given function can be represented by different logic networks. This property can be explained by the fact that nodes with the identity function are allowed. Even the exclusion of such identity nodes has no impact, since simple node combinations, like two negotiation nodes, collapse to the identity function. Following this argumentation, there exists an infinite number of logic networks representing one Boolean Function, resulting in he widely accepted assumption, that the determination of an optimal logic network is a  $\mathcal{NP}$ -complete problem. Attempts to create canonical logic networks, seem to evade this problem, but include  $co\mathcal{NP}$ -complete problems in itself. Algorithms used for logic synthesis are therefore based on approximate solutions.

#### 2.2 QCA Technology

Following the well known Moores law CMOS technology is facing a multitude of challenges, e.g. short channel effect, impurity variations, and most importantly the heat, resulting from static and dynamic power losses. To tackle these challenges the International Roadmap for Devices and Systems (IDRS), former ITRS, proposes solutions within the semiconductor domain, e.g. new materials and multi-core architectures. But also new technologies are researched including Quantum computing and the domain of *Field-Coupled Nanocomputing* (FCN). This work focuses on one of the most promising FCN technologies, namely *Quantum dot cellular automata* (QCA). The main difference of this technology compared to CMOS is the representation of logical modes, using the location of electron pairs in QCA-cells, instead of voltage levels. Data between cells is transferred based on Coulomb repulsion, utilizing electromagnetic fields. This enables the technology to achieve high performance in terms of device density, clock frequency and power consumption.

Point out why majority gates are a big part of this technology. Point out why wires and especially wire crossings are so expensive to produce.



Figure 2.2: QCA-Cell sates



Figure 2.3: Adjacent QCA-cells forming a wire segment

#### 2.2.1 Cells

As already mentioned, the elemental unit of this technology is a QCA-cell. Since there in no uniform way of build the quantum dots and connecting them to cells, we look at a rather lower-level abstraction depicted in figure 2.2. The four circles in the corners of the QCA-cells show quantum-dots, that can be implemented by any charge container with discrete electrical energy states. Further a cell contains two excess electrons, which can be localized by the quantum dots. The energy barriers or the quantum dots are able to trap the charge of the electron. If an electron is trapped inside a quantum-dot it is filled black. Due to the Coulomb repulsion the electrons occupy diagonally opposed quantum dots, resulting in two possible stable cell configurations and one unstable cell configuration.

A stable states indicates, that it is well distinguishable of the usual energy band and therefore has a energy difference to another stable state of minimum the thermal noise energy ( $k_BT$ ). Only such states are suited for information transfer. The stable states can be derived from the cell Polarization, which can be +1 and -1 or null in the unexcited state. The two stable states contain the same electrostatic energy and are used to encode the binary values 0 and 1.

In order to transfer information, cells are placed side by side, whereby the polarization of the driver cell, which is the left most cell inputting the information, changes the polarization of the adjacent cell. When the adjacent cell is polarized it can give its state to the next cell and so on. This simple structure is representing a wire in QCA-technology and its function is depicted in Figure 2.3.

#### 2.2.2 Information Transfer

As already mentioned the data transfer in the QCA paradigm is accomplished by cell-to-cell interaction. The leftmost cell has a fixed polarization and is called the input, while the right most cell represents the output of the simplified QCA-circuit. Looking at the wire segment in figure 2.3, we can see that in the ground state, all neighboring cells have the same polarization. This means that, with this given input, there exists exactly one configuration of cell polarization resulting in the minimum state energy of the wire. When the input of the system is changed, the systems energy rises, as depicted in figure 2.5. It results from the so called kink-energy, which describes the energy difference between two QCA-cells with opposing polarization. In this state the kink moves along the cells. While the kink-energy stays the same, an increasing number of cells results in a rising degeneracy of the excited state. This again means, that the system is in an excited state at non-zero temperature. After time  $\tau$  the system dissipates again into the ground state, with newly computed outputs.

The described process is called abrupt switching with dissipative coupling to the environment and can be summarized in the following three steps:

- 1. Write the input bit by fixing the polarization state of cells along the input edge.
- 2. Allow the array to relax to its ground state while the new inputs are kept fixed.
- 3. Non-invasively read the results of the computation by sensing the polarization state of cells.

In the second step, highly complex dissipation mechanisms like phonon and plasmon emission take place, making it nearly impossible to get a complete theoretical description of the system. The dissipation time  $\tau$  is therefore determined via experiments. The first and third step require an environment, which provides the features for fixing inputs and sensing the output polarization. Such a system has to be integrated with classical CMOS devices as seen in Figure 2.4.

In order to implement a full QCA-system without CMOS-components, the QCA-Array has to be divided into smaller decoupled sub-regions. Therefore an adiabatic switching or rather a clocking is introduced. The clocked regions are referred to as clocking zones. The clocking utilizes an external signal, the clock, to activate and deactivate said clocking zones. The approach first used, decreases the interdot barriers of all cells in a clocking zone, when applying a new input. When all cells in the region are stable, the barriers are raised again, while the barriers of the subsequent clocking zone are lowered simultaneously. This way the ground state gradually propagates



Figure 2.4: Schematic of a combined QCA and CMOS sytem.



Figure 2.5: Schematic representation of a metastable state. Instead of relaxing correctly to the new ground state, a system may be delayed in an excited state due to an inability to tunnel through a kinetic barrier.



Figure 2.6: QCA-Cell wire with corresponding clock zones

through the whole circuit. Today's used approaches create electrical fields with an external clock generator and distribute it to the cells through the device substrate using embedded electrodes. Thereby the energy level of the *null* state can be controlled, resulting in a equivalent effect as in the former approach.

A wire, divided in such clocking zones is shown in 2.3. The colors of the zones and cells as well as the zone number show redundant information about the type of clocking zone. They differ in the external applied electrical field and therefore the energy of the cells. In QCA the clocking is divided into the four consecutive states, *switch*, *hold*, *release* and *relax*. They are aligned like in a pipeline like structure, where each of these state is phase shifted by  $\pi/2$ , forming a  $2\pi$  clock cycle. In the switch-phase cells start getting polarized dependent on the polarization of the driving cell. When the cells are polarized they get fixed in the hold-phase. Afterwards in the release-phase the excitation gradually decreases, resulting in the unexcited relax-state. The scheme of such a pipeline like clocking is depicted exemplary on a wire segment in Figure 2.7.

The described clocking is named Landauer clocking. The inventor Rolf Landauer himself pointed out the vast power dissipation of this clocking mechanism. This impairing property is already well known from the CMOS technology, resulting in several drawbacks including switched off chip areas referred to as dark silicon in extreme cases. One common approach is the lowering of circuit frequency utilizing e.g. multi-chip architectures. To tackle this problem in the QCA domain, Landauer pointed out that the *erase* function has to be eliminated from the clocking. This function is logically irreversible and describes the erasing of information in a clock zone, when passed to the next. He argued that every erased bit dissipates at least  $k_B T \ln(2)$  in heat. Exemplary if a QCA-cell has size  $1nm \times 1nm$  and operating frequency of 100GHz, the corresponding density of devices results to  $10^{14}$  cm<sup>-2</sup>. Further a dissipation of 0.1eV per electron every clock cycle is assumed, resulting in a total power dissipation of 160 kW  $cm^{-2}$ . This directly yields to the statement, that a device operating with this clocking would be inoperable (it would evaporate due to the heat). The Bennet clocking tackles exactly this problem by altering the timing of the clocking signals. Just as in the Landau clocking the clocking-wave moves from left to right, but leaving no



Figure 2.7: QCA clocking pipeline

trailing edge, when information is passed. Instead the cells will be held in the excited state until the information propagates through the whole QCA-array. When the output was read, the excitation is released in reverse order resulting in no erase functions. This means, that this *quasi-adiabatic* clocking leads to a minimal power dissipation but with two constraints. The effective clock rate is at least halved due to the additional backwards propagation and since only one signal vector can be transmitted through the system, the pipeline capabilities are reduced.

As already stated the QCA-array has to be divided into clock zones. Allowing an arbitrary number of cells in one clock zone allows lots of freedom for the clocking and is called *cell-based* clocking. This freedom leads to clock zones with variable geometries, which again lead to variation in their fabrication. Assuming the necessity of an uniform fabrication to fabricate circuits with millions of cells, this clocking gets infeasible for large circuits. Also the scheme supports clocking of single cells, which means that electrodes of the same size have to be fabricated. Since this is also not feasible, this design is obsolete.

In order to attain uniform clocking zones with a possible distribution of clocking signals, the *tile-based* clocking is introduced. The approach of this design is to provide uniform square tiles of the size  $3 \times 3$  or  $5 \times 5$ . For clocking tiles bigger than this, the information propagation was suggested to be erroneous, also following in an argument



Figure 2.8: Different clocking Schemes in QCA

against cell-based clocking.

The tile-based clocking leads to several proposals of clocking-schemes, which give a certain distribution of clock zones. Since they follow an uniform pattern they can be extended easily for every size of the circuit. In 2.8 three clocking schemes are showed, each of them based on a different idea. Since information is only allowed in ascending clock order (except 4 to 1), the 2DDWave clocking scheme in figure 2.8(a) only allows information to propagate in two directions, south and north. This simplicity allows no back propagation, prohibiting the placement of sequential circuits. Also it restricts gates in the scheme to have a maximum input size of two. The USE scheme 2.8(b) tackles the first problem by introducing clocking loops into the scheme, giving the possibility to place sequential circuits. To tackle the second problem, the RES scheme 2.8(c) gives an opportunity to place gates of input size three. Since one tile is restricted to four adjacent cells of which one has to output the information of the cell, this gives the maximum input size allowed. This is especially important for the placement of majority-gates. In QCA-technology they can be represented by only one tile, making it to a huge advantage over CMOS technology. This is further evaluated in the next subsection on gates.

#### 2.2.3 **Gates**

In this chapter a library of gates is introduced, which are later used in the proposed placement and routing algorithms. In [4] a custom QCA library called QCA ONE is proposed. It contains gates formed by both one and multiple tiles. A major drawback of this library is the prerequisite of a clocking scheme in order to form multiple tile gates. This restricts the underlying placement and routing algorithm to this exact clocking



Figure 2.9: Cell representation of tile



Figure 2.10: Different QCA Inverter representations

scheme. Also manual changes of the standard cells clock zones, size or positioning is not allowed, imposing the designer with even more restrictions. The standard cell library used for this work should only contain gates occupying one tile. Every other gate is composed out of these standard cells. The tiles used are of the dimension  $5 \times 5$ , which means that all standard gates are reduced to this area.

The first gate in the library is the inverter or NOT gate. The simplest implementation is shown in figure 2.10(a). It consists of two wire segments which are shifted by exactly one cell height, so that the polarization is transferred diagonally resulting in an inversion of the input [5]. In order to get a more robust gate regarding disturbance, the C-shaped inverter shown in figure 2.10(b) is introduced. This gate is used as standard in many libraries and works CITE but it still has to be mentioned that this implementation is really prone to common displacement faults [6], suggesting the addition of an inverter leg resulting to a E-shaped gate structure, and more complex single electron faults [3]. Nevertheless the C-shaped inverter gate is used as standard in many works and is also selected as standard cell for this work.

The most important gate in QCA technology is the majority gate. To implement this function in CMOS technology after Definition 2.1.8 multiple AND and OR gates have to be used. In QCA technology the majority function can be represented by exactly one gate, making it one of the major advantages over CMOS technology. There are two main implementations of the majority gate. The rotated majority gate in figure 2.11(a) and the +-majority gate 2.11(b). Both of these implementations have their advantages



Figure 2.11: The QCA Majority gate

and drawbacks. On the one hand the rotated majority gate exhibits sufficiently high degree of fault-tolerance against cell displacement or misalignment but has very poor degree of fault-tolerance against single cell omission or extra cell deposition [2]. The +-majority gate on the other hand is very prone to cell displacement but is also used as building block for AND and OR gates in most works. This means that the fabrication process for all these gates is very similar and since this work is more aimed to enhance the production of QCA circuits, the +-majority gate is chosen as standard gate for this work.

Following Definition 2.1.8 the AND gate can be derived by fixing one input of the majority gate to logic 0, while the OR gate is obtained by fixing one input to logic 1. The resulting gates, which are also part of the standard gate library of this work are shown in figure 2.11(c).

Another major topic regarding gates are wires. Until now only straight planar wires have been introduced. In order to distribute information on the 2D grid, provided by the tile based layout, also bent wires have to be introduced. They are depicted in Figure !! and show a bend of 90 degrees. Given that all tiles can be rotated by 90, 180 and 270 respectively an incoming tile can be connected to each outgoing tile of a bent wire. Also since all gates introduced so far have a fan-out of one we need a fan-out node to



Figure 2.12: Different wire QCA wire crossing implementations

multiply signals. This is done by adding a bent wire to a straight wire resulting in the fan-out shown in Figure !!.

The last special case of wires are crossing cases. By rotating the cells of one wire string by 45 the rotated cells don't have crosstalk with non-rotated cells [6] as shown in figure 2.12(a). This solution is very handy because it supports the planar structure of the circuit and is therefore called *coplanar crossover*. Further the possibility of multi-layer QCA has been investigated and found especially useful in the case of wire crossings. To use this, one wire string is raised to an additional higher layer, which is connected with a vertical interconnect as in figure 2.12(b). The signal transmission in the vertical stacked cells works just as in horizontal direction. To impede any crosstalk between the wire strings two intermediate layers of cells are used in vertical direction. Theoretically the added layer cannot only be used as wire but since the signal distribution works just as in the ground layer also gates can be placed in these multi-layers. Simulations have shown that coplanar crossovers reduce the coupling between the horizontal string segments significantly. This makes the horizontal interconnect very sensitive to crosstalk and therefore highly prone to cell displacements. Multi-layer circuits on the other hand show a high robustness and therefore are used as standard in this library as well as in the state of the art researched for this work CITE. In the library the raised wire string is described with an X, while the vertical layers are described with a circle. In Figure 2.13 all gates used in this work are summarized.

#### Latches and Registers

In order to implement sequential circuits in QCA technology we have to look at storage elements, which properties can derived from CMOS technology. The simplest element, which can store one bit is an Eccles–Jordan flip-flop (FF). It is formed by connecting



Figure 2.13: QCA Standard Library

the output of one inverter to the input of another inverter and vice versa. Because this element is so simple it also has some drawbacks. First of all the latch output is directly connected to the input causing noisy behavior, called transparency [1]. Further a high voltage shift is needed at the FF value and the FF is transparent in the transition region, requiring a stable voltage level during the transition. To tackle these Problems the inverters are replaced by NOR gates, so now the inputs and outputs are coupled by these gates, reducing transparency. Also clear states are introduced by this change. In the set state the latch is set to Q = 1, in the reset phase to Q = 0. It also has a hold phase where the current value is held and an undefined state. These states give the latch the name Set-Reset-Latch (SR-Latch). In order to use latches in synchronous circuits the Set and Reset inputs are clocked. This results in the gated SR-Latch. Because this latch still has an undefined state, which is not allowed the set and inverted reset inputs are connected together excluding the undefined state. Now the value is held when the clock clk = 0 and the D-latch is transparent, when clk = 1. Since transparency is still present a flip flop is formed by connecting two D-latches to a master-slave D-flip-flop producing only stable outputs.

The goal of a storage element in QCA is to have all the properties of a D-FF. In the QCA ONE library an effort was made to translate the D-FF into QCA by just replacing the CMOS gates and wires with the corresponding QCA gates. But since clocking plays an important role in the functionality of storage elements as already observed in 2.2.2 the clocking has some major differences the sense of purpose is questionable.

To create a QCA element which is able to store one bit the solution is rather simple.

Since every tile is clocked on its own, a simple latch can be formed by a wire segment held in the hold phase of the clocking. In Figure !! such a wire segment is depicted. It is still adjacent to one ascending and one descending clock number (mod  $N_{clk}$ ). The storage property is reached by extending the hold phase a the required number of clock cycles [7]. Because this implementation needs a different clock generator for every latch it is not sure if it is possible to implement. Though it can be shown that the same properties can be achieved by placing buffering wire segments instead of the introduced wire-latch. For every clock cycle the hold phase would be extended in the wire-FF, four adjacent wire segments are added delaying the information arrival by exactly the same time.

#### 2.3 P&R problem

As seen in the last sections of this chapter placement and routing are strongly related to the clocking inside the QCA domain. Therefore it makes sense to define a joint problem, which evolves from a grid enabling tile based design in conjunction with a logic network.

**Definition 2.3.1.** A *layout* is defined by a  $w \times h$  grid  $\Gamma_{w,h}$  and a graph G(V,E), which is placed on the grid. Every *tile* of the layout can be accessed via its x and y coordinates. The set of tiles is denoted as T with  $t = (x,y) \in T$ . For any vertex of the graph v(x,y) is restricted to the boundaries x < w and y < h. For edges  $\{(x,y),(x^*,y^*)\}$  it holds  $|x-x^*| + |y-y^*| = 1, 0 \le x, x^* \le w, 0 \le y, y^* \le h$ .

**Definition 2.3.2.** A *gate-level* layout describes a layout grid in combination with a logic network  $N=(\Lambda,I,\Sigma,O)$ . Besides the already known mapping *placement p*, which assigns nodes to tiles, there are two additional mappings. The *routing r*, which assigns logic network signals to layout paths (connected tiles) and a *clocking c* assigning clock numbers to tiles. The gate-level layout is therefore described as  $L=(\Gamma,N,p,r,c)$ . Further, nodes placed on the gate-layout are referred to as *gates*. Two tiles  $t_i=(x_i,y_i)$  and  $t_j=(x_j,y_j)$  where  $|x_i-x_j|+|y_i-y_j|=1$  are called *adjacent*. A path, which is wired through adjacent tiles is called *wire*. In this context one tile corresponds to a *wire segment*. If neither a gate nor a wire segment is placed on a tile, it is empty. It follows that a layout with only empty tiles is also empty. A layout is said to be S-clocked if it follows a clocking scheme S. Otherwise it is irregularly clocked. Moreover an adjacent tile of a tile  $t \in T$ , where T is the set of all tiles, is incoming  $t^-$  if  $c(t)-c(t^-)$  mod clk=1. This means that the incoming tile is able to forward information to the viewed tile according to pipelined clocking. For outgoing tiles  $t^+$  it holds  $c(t^+)-c(t)$  mod clk=1 accordingly. For QCA it was already stated that the clock number clk=4.

From this definition we can outline the difficulty of placing and routing a logic network onto a two dimensional grid, with exception of wire crossings, which however are really costly and therefore should be minimized. One major challenge for P&R algorithms is the signal synchronization, which results the strong dependency of clocking and signal distribution. As already pointed out, for every signal path it has to hold true that information can only propagate from a tile with clocking number i to an outgoing tile with clocking number  $(i+1 \mod clk)$ . This property is called the *local synchronization constraint*. The existence of possible signal paths can be assured by using predefined clocking schemes, but however can comprise some constraints. Further the *global synchronization constraint* states that every two signal paths leading to the same tile need to pass the same amount of tiles starting at their primary input. Since this constraint has to hold true for every gate the complexity increases rapidly with growing network sizes. Therefore the combination of all these challenges forms a P&R problem, which is commonly believed to be  $\mathcal{NP}$ -hard.

After reaching a gate-level layout still a technology has to be mapped onto it. For this work the in subsection 2.2.3 proposed standard library is used for the mapping. Although the definitions in this work are held quite generic because they are based on the book [7], defining the P&R problem for the field-coupled nanotechnologies domain. This means that for example a change of clock to clk = 3 also allows a placement and routing for  $Nanomagnet\ Logic$  (NML). Even though the algorithm in this work is designed only for QCA, the ideas may also be derived for other FCN technologies.

## 3 State of the Art

#### 3.1 Combinational P&R Algorithms

Quick recap on the Algorithms from the book Quick recap on ortho and why i selected ortho for my work. Maybe point out that balancing is not used.

#### 3.2 Sequential P&R

Present the ideas in papers for QCA standard cell placement and routing. Point out why they are not actionable: Reasons like clocking or cells aren't producible, no automated algorithms

# 4 Methodology

#### 4.1 Input Network

Reduces area and wire crossings.

Sorts the inputs.

Idea is simple place Fanout nodes at the beginning since they produce the most crossings

Needs a conditional east west coloring.

#### 4.2 Majority Gates Placement Network

Reference to the importance of this part in the QCA technology.

Point out that 2DD-Wave is still used but clocking scheme is adjusted to place majority gates.

This is why buffers have to be used.

Point out the overhead that is produced by buffers.

Maybe assumption why it doesn't make sense to only use RES. (you lose really much space for the cells which are no majority gates)

#### 4.3 Sequential Circuits Placement

Importance of sequential circuits.

Point out how the registers are implemented.

Show how the distribution network is generated, where Ris and Ros are placed and how they are treated within the network.

Make clear that this implementation is slowing down the circuit significantly.

# 5 Experimental Evaluation

#### 5.1 Benchmarks

For combinational and sequential

#### 5.2 Results

Everything

# **List of Figures**

| 2.1  | Binary Logic Network of Majority Function                                | 4  |
|------|--------------------------------------------------------------------------|----|
| 2.2  | QCA-Cell sates                                                           | 7  |
| 2.3  | Adjacent QCA-cells forming a wire segment                                | 7  |
| 2.4  | Schematic of a combined QCA and CMOS sytem                               | 9  |
| 2.5  | Schematic representation of a metastable state. Instead of relaxing cor- |    |
|      | rectly to the new ground state, a system may be delayed in an excited    |    |
|      | state due to an inability to tunnel through a kinetic barrier            | 9  |
| 2.6  | QCA-Cell wire with corresponding clock zones                             | 10 |
| 2.7  | QCA clocking pipeline                                                    | 11 |
| 2.8  | Different clocking Schemes in QCA                                        | 12 |
| 2.9  | Cell representation of tile                                              | 13 |
| 2.10 | Different QCA Inverter representations                                   | 13 |
| 2.11 | The QCA Majority gate                                                    | 14 |
| 2.12 | Different wire QCA wire crossing implementations                         | 15 |
| 2.13 | QCA Standard Library                                                     | 16 |

# **List of Tables**

## **Bibliography**

- [1] C. Hawkins, J. Segura, and P. Zarkesh-Ha. *CMOS Digital Integrated Circuits: A First Course*. Materials, Circuits and Devices. Institution of Engineering and Technology, 2012. ISBN: 9781613530023.
- [2] D. Kumar and D. Mitra. "A systematic approach towards fault-tolerant design of QCA circuits." In: *Analog Integrated Circuits and Signal Processing* 98 (Mar. 2019), pp. 1–15. DOI: 10.1007/s10470-018-1270-x.
- [3] M. Mahdavi, M. A. Amiri, S. Mirzakuchaki, and M. N. Moghaddasi. "Single Electron Fault in QCA Inverter Gate." In: 2009 Fifth International Conference on MEMS NANO, and Smart Systems. 2009, pp. 63–66. DOI: 10.1109/ICMENS.2009.23.
- [4] D. A. Reis, C. A. T. Campos, T. R. B. S. Soares, O. P. V. Neto, and F. S. Torres. "A Methodology for Standard Cell Design for QCA." In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS). 2016, pp. 2114–2117. DOI: 10.1109/ISCAS.2016.7538997.
- [5] T. N. Sasamal, A. K. Singh, and A. Mohan. "Quantum-Dot Cellular Automata Based Digital Logic Circuits: A Design Perspective." In: *Studies in Computational Intelligence*. 2020.
- [6] G. Schulhof, K. Walus, and G. A. Jullien. "Simulation of Random Cell Displacements in QCA." In: J. Emerg. Technol. Comput. Syst. 3.1 (2007), 2–es. ISSN: 1550-4832. DOI: 10.1145/1229175.1229177.
- [7] M. Walter and R. Drechsler. "Design Automation for Field-Coupled Nanotechnologies." In: July 2020, pp. 176–181. DOI: 10.1109/ISVLSI49217.2020.00040.