# Dynamic quantum circuit compilation

Kun Fang<sup>1</sup>, Munan Zhang<sup>1</sup>, Ruqi Shi<sup>1</sup>, and Yinan Li<sup>2</sup>

Quantum computing has shown tremendous promise in addressing complex computational problems, yet its practical realization is hindered by the limited availability of qubits for computation. Recent advancements in quantum hardware have introduced mid-circuit measurements and resets, enabling the reuse of measured qubits and significantly reducing the qubit requirements for executing quantum algorithms. In this work, we present a systematic study of dynamic quantum circuit compilation, a process that transforms static quantum circuits into their dynamic equivalents with a reduced qubit count through qubit-reuse. We establish the first general framework for optimizing the dynamic circuit compilation via graph manipulation. In particular, we completely characterize the optimal quantum circuit compilation using binary integer programming, provide efficient algorithms for determining whether a given quantum circuit can be reduced to a smaller circuit and present heuristic algorithms for devising dynamic compilation schemes in general. Furthermore, we conduct a thorough analysis of quantum circuits with practical relevance, offering optimal compilations for well-known quantum algorithms in quantum computation, ansatz circuits utilized in quantum machine learning, and measurement-based quantum computation crucial for quantum networking. We also perform a comparative analysis against state-of-the-art approaches, demonstrating the superior performance of our methods in both structured and random quantum circuits. Our framework lays a rigorous foundation for comprehending dynamic quantum circuit compilation via qubit-reuse, bridging the gap between theoretical quantum algorithms and their physical implementation on quantum computers with limited resources.

### Contents

| 1 | Introduction                                                                                                                                                                                                                        | : |  |  |  |
|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|--|--|--|
| 2 | Preliminaries                                                                                                                                                                                                                       |   |  |  |  |
| 3 | Quantum circuit and its representations         3.1 Quantum computing basics          3.2 Quantum circuit instructions          3.3 Graph representation of quantum circuit          3.4 Quantum circuit composition and subcircuit |   |  |  |  |
| 4 | Quantum circuit compilation via graph manipulation                                                                                                                                                                                  | 8 |  |  |  |
| 5 | Determine the reducibility of quantum circuit 5.1 Approach 1: determine the reducibility from graph                                                                                                                                 | ć |  |  |  |

Kun Fang: fangkun02@baidu.com

 $<sup>^{</sup>m 1}$ Institute for Quantum Computing, Baidu Research, Beijing 100193, China

<sup>&</sup>lt;sup>2</sup>School of Mathematics and Statistics and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, China

| 6            | Optimal quantum circuit compilation                                                                                                                                                                                                                         | 11                                                       |
|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|
| 7            | Heuristic algorithms7.1 Algorithm 1: Minimum Remaining Values Heuristic7.2 Algorithm 2: Greedy Heuristic Algorithm7.3 Algorithm 3: Hybrid Algorithm                                                                                                         | 12<br>13<br>13<br>15                                     |
| 8            | Analytical and numerical evaluations  8.1 Quantum supremacy circuits                                                                                                                                                                                        | 16<br>16<br>17                                           |
| 9            | Related Work                                                                                                                                                                                                                                                | 20                                                       |
| <b>10</b>    | Conclusion and Future work                                                                                                                                                                                                                                  | 20                                                       |
| A            | Further Algorithms A.1 Converting static quantum circuit to DAG                                                                                                                                                                                             | 25<br>25<br>26<br>27                                     |
| В            | Proofs of results         B.1 Thm. 1          B.2 Prop. 2          B.3 Prop. 3          B.4 Prop. 5          B.5 Prop. 6          B.6 Prop. 7          B.7 Prop. 8          B.8 Prop. 13          B.9 Prop. 9          B.10 Prop. 10          B.11 Prop. 11 | 29<br>29<br>29<br>30<br>30<br>30<br>30<br>31<br>32<br>33 |
| $\mathbf{C}$ | Optimal compilations for important quantum circuits C.1 Frequently used quantum algorithms                                                                                                                                                                  | 3;<br>3;<br>40<br>44                                     |
| D            | Implementation of DCKF algorithm                                                                                                                                                                                                                            | 46                                                       |
| Е            | Experimental Results Continued  E.1 Experimental evaluation of hybrid algorithm  E.2 Experimental evaluation of random circuits  E.3 Noisy simulation                                                                                                       | 40<br>40<br>40<br>47                                     |
| $\mathbf{F}$ | Open Problems         F.1 Compiling dynamic quantum circuit                                                                                                                                                                                                 | 49<br>49<br>5                                            |

## 1 Introduction

Quantum computing has emerged as a promising avenue for solving intractable problems that are beyond the reach of classical computing, such as integer factoring [Sho94], large database search [Gro96], chemistry simulation [LWG+10] and machine learning [BWP+17]. Nevertheless, existing quantum computers are limited by the number of qubits available for computation. To conclusively demonstrate the computational advantage of these devices compared to their classical counterparts in a variety of practical applications, we need to make efficient use of qubits when designing and executing quantum algorithms.

A plethora of quantum algorithms have been traditionally formulated as *static quantum circuits*, where the computation is performed on an initially prepared quantum state, and all measurements are applied at the end of the circuit to extract the computational results. However, the recent advancements in quantum hardware have paved the way for a more flexible approach, allowing for measurements and qubit resets to be executed in the midst of a quantum circuit. Moreover, these dynamic circuits permit the real-time evolution of the quantum circuit based on prior measurement outcomes [CTI+21, PDF+21, AI23]. This new paradigm of quantum computation, characterized by the ability to dynamically adjust the circuit, is referred to as a *dynamic quantum circuit*, and plays a crucial role in the study of quantum error correction [FMMC12], quantum communication [BBC+93], and also measurement-based quantum computation [RB01].



- (a) A static quantum circuit with 3 qubits.
- (b) A dynamic quantum circuit with 2 qubits.

Figure 1: An example of dynamic quantum circuit compilation from (a) to (b).

This work investigates dynamic quantum circuit compilation, which rewires static quantum circuits into equivalent dynamic circuits using fewer qubits through qubit-reuse strategies. An explicit example is illustrated in Figure 1, where a 3-qubit static quantum circuit is compiled into a 2-qubit dynamic circuit by recycling the first qubit following its measurement instruction and subsequently reusing it for operations on the third qubit of the original static circuit. Dynamic quantum circuit compilation via qubit-reuse offers substantial advantages and addresses several critical facets of fault-tolerant quantum computation. Firstly, it efficiently reduces the number of qubits required for executing quantum algorithms, particularly valuable for implementing large-scale quantum circuits on near-term quantum computers and enhancing their classical simulation feasibility. Moreover, it effectively compacts the quantum circuit topology. This simplification minimizes the need for inserting swap gates and circumvents the allocation of faulty qubits when mapping logical circuits to specific quantum hardware devices, ultimately resulting in error reduction and fidelity improvement [HJC<sup>+</sup>23, BPK23]. Furthermore, determining the minimum qubit requirement for executing specific quantum algorithms provides profound insights into the algorithms' complexity, aiding in the design of practical algorithms with quantum advantages.

The idea of compiling a quantum circuit via qubit-reuse was first introduced in [PWD16], which employed the notion of wire recycling applied to predefined ancilla qubits. In [HJC<sup>+</sup>23], the authors developed a compiler-assisted tool and exploited the trade-off between qubit reuse, fidelity, gate count, and circuit duration. In [BPK23], the authors introduced an SAT-based model for qubit-reuse optimization on near-term quantum devices. Previous studies have predominantly concentrated on superconducting quantum computers, known for their limited qubit connectivity and relatively short coherence time. But the situation is completely different for other quantum architectures, such as trapped-ion quantum computers. In these systems, qubits feature all-to-all connectivity and extremely long coherence time (on the order of tens of minutes [WLQ<sup>+</sup>21]) but the scalability forms a primary bottleneck in their development. This motivates our emphasis on minimizing the number of qubits needed to execute a quantum circuit.

The work in [DCKFF22] studied the qubit-reuse compilation using the causal structure of the circuits and proposed a constraint programming optimization model capable of delivering exact solutions for small circuits, along with greedy algorithms designed to generate approximate solutions for large circuits. In this work, we conducted a thorough comparative analysis against their algorithms, demonstrating the superior performance of our methods in both structured and random quantum circuits. Notably, our framework successfully addresses an open challenge emphasized in their work, namely, the effective handling of quantum circuits with commutable structures and the ability to conduct compilation at the level of quantum algorithms, regardless of their specific quantum instruction sequences. This holds particular significance for practical applications in quantum machine learning [FGG14], measurement-based quantum computation [BBD+09], and the pursuit of quantum supremacy [SB09].

Quantum computers are expected to operate in a noisy environment with a limited number of qubits and without error correction in the near term. Therefore, circuit optimization is a crucial step to ensure the successful execution of quantum algorithms. This optimization can manifest in various forms, typically entailing modifications to the gate structure of a quantum circuit with the goal of enhancing result fidelity, such as reducing gate count by simplifying Clifford subcircuits [AG04, BSHM21], eliminating redundant gates [XLP+22] [con23, XMP+23], and applying the Cartan decomposition to two-qubit gates [KG01]. Furthermore, circuit optimization may encompass the mapping of circuits to a specific quantum device architecture, aiming to minimize the number of inserted swap gates [con23, ZPW18] or the depth of the compiled circuit [ZHQ+21] and schedule the braiding path between remote qubits [HCJ+21]. In contrast, dynamic circuit compilation is centered around the reordering of instructions and the reassignment of logical qubits, while preserving both the number and type of circuit instructions. This approach serves as a complementary strategy to other circuit optimization techniques and can be seamlessly integrated into the existing ones. For instance, it can be applied after the removal of redundant gates or before the mapping of the circuit to a specific quantum system.

Main contributions. In this work, we give a comprehensive investigation into dynamic quantum circuit compilation, a process that transforms a static quantum circuit into an equivalent dynamic circuit with a smaller size. In summary, we make the following contributions:

- The first general framework for optimizing dynamic circuit compilation through graph manipulation and a rigorous mathematical model for optimal compilation in terms of qubit reuse. Our framework primarily targets qubit savings but is also adaptable to other scenarios, such as optimizing tradeoffs among circuit width, depth, and related factors.
- Efficient approaches to determine whether a given static quantum circuit can be reduced to a smaller circuit through qubit-reuse and heuristic algorithms to design dynamic circuit compilation schemes in general. Detailed time complexity analyses are also provided.
- Compiling quantum circuits with commutable structures, which is a significant feature in recent quantum applications. This successfully resolves an open challenge for circuit compilation in the prior study.
- Optimal compilations of quantum circuits with practical relevance, including well-known
  quantum algorithms in quantum computation, ansatz circuits applied in quantum machine
  learning, and measurement-based quantum computation crucial for quantum networking.
  These optimal compilations establish tight lower bounds and serve as benchmarks for other
  variants of qubit-reuse compilation methods.
- Numerical evaluations of our heuristic algorithms on both structured and random quantum circuits, highlighting the superior performance of our methods over existing approaches. In particular, experiments show that our approach outperforms in approximately 98.5% of randomly generated quantum circuits and nearly 100% of randomly generated instantaneous quantum polynomial (IQP) circuits. Noisy simulations also demonstrate a substantial enhancement in qubit reduction and the algorithm's performance for compiled circuits.

The rest of this work is structured as follows: Section 2 and Section 3 briefly introduce the notations and fundamental concepts employed throughout our study. Section 4 establishes the core principles behind dynamic quantum circuit compilation through graph manipulation. Section 5 introduces various efficient approaches for determining the reducibility of a static quantum circuit. Section 6 presents a rigorous mathematical model for optimal quantum circuit compilation. Section 7 introduces several heuristic algorithms for circuit compilation in general. Section 8 presents the analytical and numerical evaluations of the proposed methods and compares them with existing approaches. Section 9 reviews some related works. Finally, Section 10 concludes our study and discusses related open problems. All proofs in the main text are delegated to the supplementary material. All algorithms in this work have been implemented in QNET, a quantum network toolkit developed by the Institute for Quantum Computing at Baidu. All the code referenced in this paper can be accessed online on GitHub <sup>1</sup>.

### 2 Preliminaries

**Notation** Throughout this work, we denote  $I_n$  as the identity matrix of size  $n \times n$ ,  $J_n$  as the all-one matrix of size  $n \times n$ , and  $O_n$  as the zero matrix of size  $n \times n$ . We also use  $E_{i,j}^k$  to represent a  $k \times k$  matrix where the (i,j)-th entry is one, and zero otherwise. We use n to represent the width of a quantum circuit, while m signifies the number of quantum instructions. We use list[i] to represent the i-th element of a list. Furthermore, we adopt the convention that all qubit registers in quantum circuits, along with the row and column indices of matrices, and the indices of lists in algorithms, all start from zero.

**Boolean matrix** A Boolean matrix is a matrix whose entries are either 0 or 1. Let A and B be two Boolean matrices of size  $k \times k$ . Then their Boolean product is defined as  $(A \odot B)_{ij} := \bigvee_{l=0}^{k-1} (A_{il} \wedge B_{lj})$  where  $\vee$  is the logical OR and  $\wedge$  is the logical AND.

Directed Graph A graph is an ordered pair G=(V,E) comprising vertices V and edges E. A graph is directed if its edges are ordered pairs of vertices. For any edge (u,v) in a directed graph, u is called the tail and v is called the head of the edge. For a vertex v, the number of head ends adjacent to a vertex is called the indegree of the vertex which is denoted as  $\delta^-(v)$  and the number of tail ends adjacent to a vertex is its outdegree, which is denoted as  $\delta^+(v)$ . A vertex with zero indegree is called a root and a vertex with zero outdegree is called a terminal. A vertex v is reachable from another vertex u if there exists a direct path from u to v in the graph. A directed acyclic graph (DAG) is a directed graph with no directed cycles, which has been widely used in formulating the classical and quantum circuit model. A topological ordering of a DAG is an ordering of its vertices into a sequence, such that for every edge the tail vertex of the edge occurs earlier in the sequence than the head vertex of the edge. This ordering can be efficiently obtained from its DAG but the result is not unique. A bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets U and V such that every edge connects a vertex in set U to a vertex in set V. We write G=(U,V,E) to denote a bipartite graph with parts U, V and edges E. A bipartite graph is complete if every vertex in U is connected to every vertex in V.

Matrix Representation of Graph Let G = (V, E) be a directed graph where  $V = \{v_0, \dots, v_{k-1}\}$ . Its adjacency matrix is a Boolean matrix, denoted as A(G), of size  $k \times k$  whose (i, j)-th entry is one if the directed edge  $(v_i, v_j) \in E$  and zero otherwise. If a bipartite graph G = (U, V, E) with  $U = \{u_0, \dots, u_{k-1}\}$ ,  $V = \{v_0, \dots, v_{k-1}\}$  and all edges pointing from U to V, then its adjacency matrix can be written as  $A(G) = \begin{pmatrix} O_k & X \\ O_k & O_k \end{pmatrix}$  where X is a  $k \times k$  Boolean matrix in which  $X_{ij} = 1$  if  $(u_i, v_j) \in E$ . We call this submatrix the biadjacency matrix of the bipartite graph and denote it as B(G). The biadjacency matrix of a complete bipartite graph is an all-one matrix. A matrix A of size  $k \times k$  is nilpotent if there exists an integer l with  $1 \le l \le k$  such that  $A^l = O_k$ . A directed graph is acyclic if and only if its adjacency matrix is nilpotent (see [Deo16, Theorem 9-17] or  $[LQW^+22]$ ).

<sup>&</sup>lt;sup>1</sup> https://github.com/baidu/QCompute/tree/master/Extensions/QuantumNetwork

# 3 Quantum circuit and its representations

### 3.1 Quantum computing basics

**Quantum states** In quantum computing, a quantum bit or qubit is the fundamental unit of quantum information. A qubit can be in two *computational basis states*, which are represented by the 2-dimensional vectors  $|0\rangle = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$  and  $|1\rangle = \begin{pmatrix} 0 \\ 1 \end{pmatrix}$ . Unlike classical bit, a qubit can be in a linear combination (*superposition*) of the basis states,  $\alpha|0\rangle + \beta|1\rangle = \begin{pmatrix} \alpha \\ \beta \end{pmatrix}$ , where  $\alpha, \beta$  are complex numbers such that  $|\alpha|^2 + |\beta|^2 = 1$ .

Quantum operations Quantum operations manipulate the state of qubits and encompass various actions such as quantum gates, quantum measurements and reset operations. Quantum gates are unitary operators that transform the state of the qubits. For example, the Hadamard gate H can be used to create equal superposition:  $H|0\rangle = (|0\rangle + |1\rangle)/\sqrt{2}$ , while the controlled-NOT (CNOT) gate acts on 2 qubits and maps the state  $|a,b\rangle$  to  $|a,a\oplus b\rangle$ , where  $\oplus$  denotes XOR operation. Quantum measurements extract classical information from quantum states, resulting in the collapse of the quantum state. For example, measuring a single qubit state  $\alpha|0\rangle + \beta|1\rangle$  yields  $|0\rangle$  with probability  $|\alpha|^2$  or  $|1\rangle$  with probability  $|\beta|^2$ . After a qubit is measured, it can be reset to a known state (typically the state  $|0\rangle$ ) and to be used for subsequent computation.

Quantum circuits A quantum circuit is a mathematical and visual model used in quantum computing to represent and perform quantum computations. It is analogous to classical digital circuits used in classical computing. A quantum circuit consists of a series of quantum gates, each of which operates on one or more qubits. In this work, we focus on single-qubit and two-qubit gates without loss of generality and most of the results apply to multi-qubit gates with slight modification. A quantum circuit is *static* if it does not contain mid-circuit reset operations and all measurements, are performed at the end of the circuit (e.g., Figure 1a). A quantum circuit is *dynamic* if it contains mid-circuit measurements and reset operations (e.g., Figure 1b). A quantum circuit is *reducible* if it can be written as an equivalent quantum circuit with fewer qubits. Otherwise, it is called *irreducible*. A quantum circuit and its reduced circuit are considered equivalent if their measurement outcomes yield identical distributions. In the following discussion, a 'quantum circuit' specifically refers to a circuit that incorporates a predetermined sequence of operations. However, for scenarios involving commutable gates or blocks, we utilize the term 'quantum circuit with commutable structure'.

#### 3.2 Quantum circuit instructions

Quantum circuits are conventionally depicted through circuit diagrams. Nevertheless, when considering circuit compilation, a more precise and streamlined approach involves employing quantum circuit instructions. This allows a quantum circuit to be presented as a list of instructions, structured according to the chronological order of their execution. Each entry within the instruction list corresponds to a distinct quantum operation within the circuit and is denoted by a quartet [ID, TYPE, QUBIT, PAR. This quartet serves to uniquely identify the operation, specify its operation type, indicate the qubit(s) it acts upon, and provide any relevant parameters such as the rotation angle or the group tag in case where gates are commutable (set to 'NONE' in cases where no parameters are applicable) [FZL<sup>+</sup>23]. For example, the instruction [5, RY, 1,  $\pi$ ] signifies the application of an  $R_{y}$  rotation gate to qubit  $q_{1}$ , with the rotation angle set to  $\pi$ , and the corresponding instruction ID is 5. Similarly, the instruction [4, CX, [0, 2], NONE] represents the implementation of a controlled-NOT gate, with  $q_0$  as the control qubit and  $q_2$  as the target qubit, with an instruction ID of 4. Likewise, the instruction [7, MEASURE, 2, NONE] denotes the measurement of qubit  $q_2$  in the computational basis, with an instruction ID of 7. Finally, the instruction [2, RESET, 3, NONE] indicates the reset of qubit  $q_3$  to the ground state, typically represented as the zero state  $|0\rangle$ , and carries an instruction ID of 2. An example of quantum circuit instructions is given in Figure 2(d).

### 3.3 Graph representation of quantum circuit

Given that a quantum circuit essentially constitutes a time-ordered sequence of quantum instructions, it is natural to employ a directed graph to represent the causal relationships among them. In this context, a quantum circuit finds its representation through a directed graph, effectively preserving the execution constraints of all quantum instructions. Specifically, each quantum instruction corresponds to a vertex within the graph. A directed edge starting from vertex u and terminating at vertex v signifies that the quantum instruction associated with vertex u must be executed before the quantum instruction associated with vertex v. This directed graph is inherently acyclic, as the presence of any cycle would imply a dependency of past instructions on future ones, contravening the causal relationship inherent in the quantum circuit. Given this characteristic, we refer to this directed graph as the DAG representation of the quantum circuit (see e.g. Figure 3).

In many instances of interest, the original circuit is not unique and can be represented with different ordering of gates. A typical example is given by the instantaneous quantum polynomial (IQP) circuits [SB09] in the form of  $H^{\otimes n}DH^{\otimes n}$ , where H denotes the hadamard gate and D constitutes a block of gates diagonal in the computational basis (e.g. constructed by randomly selecting gates from the set  $\{\sqrt{CZ}, T\}$ ) and consequently can be applied in any temporal order. Other examples can be given by the quantum approximate optimization algorithm (QAOA) used in quantum machine learning [FGG14] and the preparation of graph states in the measurement-based quantum computation [BBD+09]. These circuits do not have pre-determined structures, and imposing any dependencies among commuting gates may limit the opportunities for qubit-reuse. To handle circuits with commutable structure, we can avoid imposing dependencies between commutable gates and only establish edges between operations with pre-defined ordering when generating the DAG representation of the circuit. An illustrative example of circuits with commutable structure and the algorithm for generating the DAG representation is detailed in Appendix A of the supplemental material.

In this work, we will employ the DAG representation to investigate the dynamic quantum circuit compilation problem. To facilitate our analysis, all vertices within the DAG representation can be categorized into three distinct groups:

- 1. Root vertices: vertices with zero indegree, which correspond to the first layer of quantum operations on each qubit (typically reset operations);
- 2. Terminal vertices: vertices with zero outdegree, which correspond to the last layer of quantum operations on each qubit (typically quantum measurements);
- 3. Internal vertices: vertices with nonzero indegree and outdegree, which correspond to the intermediate quantum operations (typically quantum gates).

Remark 1 We assume that all static quantum circuits in this work start from reset operations and end with measurements. This implies that the circuit width is equal to the number of root vertices within the DAG representation.

Concerning our compilation problem, all essential information resides within the reachability from roots to terminals within the DAG representation, while the internal vertices facilitate and transmit this reachability. Consequently, we can further streamline the DAG representation, concentrating our focus on the *simplified DAG representation* of the circuit. This simplified representation takes the form of a bipartite graph (R, T, E), with R and T representing the sets of roots and terminals from the DAG representation. An edge (r,t) in E connects a root  $r \in R$  to a terminal  $t \in T$  if a directed path exists from r to t within the DAG representation. Figure 3 showcases an example of a quantum circuit alongside its corresponding DAG and simplified DAG representations.

### 3.4 Quantum circuit composition and subcircuit

Quantum circuit composition stands as a fundamental technique for integrating modular components into complex quantum algorithms. An illustrative instance arises in the domain of quantum

machine learning, exemplified by the Variational Quantum Eigensolver (VQE) [PMS+14] where specific quantum circuit patterns are joined sequentially. More specifically, given two quantum circuits of the same size, we define their *circuit composition* as the sequential integration of quantum gate operations from a second circuit onto those of the first, while preserving the initialization in the first circuit and the measurement in the second. An illustration of quantum circuit composition and the sequence of circuit instructions is presented in Figure 2. Conversely, we can also consider *subcircuits* which are obtained by removing part of circuit instructions from the original circuit. This approach offers an alternative perspective for gaining insight into the essential structure of the circuit.



Figure 2: (Color online) An illustration of quantum circuit composition. (a), (b) quantum circuits; (c) composite quantum circuit; (d) the instructions of the composite quantum circuit.

# 4 Quantum circuit compilation via graph manipulation

In this section, we present the mathematical formulation of the quantum circuit compilation problem using its graph representation, which serves as a pivotal foundation for subsequent in-depth investigations. Drawing inspiration from the example illustrated in Figure 1, dynamic quantum circuit compilation via qubit-reuse involves resetting a qubit after measurement, thereby deferring certain reset operations until after the measurement process. In the DAG representation, this corresponds to the addition of a directed edge from a terminal to a root, signifying that the corresponding reset operation occurs subsequent to the execution of the measurement.

For instance, Figure 3b provides a DAG representation of the quantum circuit depicted in Figure 1a or 3a. The qubit-reuse in Figure 1b is depicted by the addition of a new edge (marked as a dashed green line) from terminal 8 to root 2. With the inclusion of this new edge, the resulting DAG exactly mirrors the DAG representation of the dynamic quantum circuit depicted in Figure 1b. This new edge can also be integrated into the simplified DAG representation as shown in Figure 3c.



Figure 3: (Color online) An illustration of quantum circuit compilation via graph manipulation. Root vertices are marked in blue, and terminal vertices in red. The newly added edge is marked in green.

The above idea of dynamic quantum circuit compilation is formally formulated as follows.

**Theorem 1** Let G = (R, T, E) be the simplified DAG of a static quantum circuit. Then compiling the quantum circuit via qubit-reuse is equivalent to adding edges to G (Let E' be the set of added edges) such that

- $\forall (t,r) \in E'$ , it has  $t \in T$  and  $r \in R$ ;
- $\forall (t,r) \in E'$ , it has  $\delta^+(t) = 1$  and  $\delta^-(r) = 1$ ;
- $G' = (R \cup T, E \cup E')$  is an acyclic graph.

The new graph G' will be called the modified DAG in the subsequent discussion.

The first condition indicates the reuse of a qubit after another qubit has been measured. The second condition specifies that one qubit is reassigned to a single new qubit, and a reused qubit accommodates only one new qubit. The last condition ensures the equivalence of the compiled circuit. Conversely, any graph manipulation adhering to these conditions corresponds to a dynamic circuit compilation scheme. This is presented in Algorithm 7 in the supplementary material.

**Remark 2** Theorem 1 shows that dynamic quantum circuit compilation only depends on the simplified DAG representation, which is independent of the single-qubit gates in the static quantum circuit to compile. So we will omit all single-qubit gates in the subsequent discussion.

# 5 Determine the reducibility of quantum circuit

In this section, we present three distinct approaches for assessing the reducibility of a static quantum circuit. The first approach is rooted in the DAG representation of the quantum circuit, while the second approach involves a direct and comprehensive analysis of the circuit's structure, guided by a set of critical observations. Departing from these geometric perspectives, the third approach utilizes Boolean matrices to check reducibility from the subcircuits.

### 5.1 Approach 1: determine the reducibility from graph

It is clear from Theorem 1 that a quantum circuit is reducible if and only if we can add at least one more edge to its DAG while adhering to the three specified conditions. This idea can be elaborated upon as follows:

**Proposition 2** A static quantum circuit is irreducible if and only if its simplified DAG is a complete bipartite graph.

This result shows that the reducibility of a static quantum circuit can be completely determined by its simplified DAG. In particular, we only need to check if the biadjacency matrix of the simplified DAG is an all-one matrix. For this, we can exploit Depth-First Search (DFS) algorithm to identify all paths from roots to terminals for a given DAG. The detailed algorithm of this approach along with its time complexity analysis is given in Algorithm 1 and Proposition 3.

**Proposition 3** For a given static quantum circuit with n qubits and m operations, its reducibility can be efficiently determined by Algorithm 1 with a worst-case time complexity of  $O(mn^2)$ .

Remark 3 Note that this approach can be used to determine the reducibility of quantum circuits with commutable structures.

### 5.2 Approach 2: determine the reducibility from qubit reachability

It is worth noting that within the DAG representation of a quantum circuit, the connections between roots and terminals on distinct qubits are effectively established by vertices that correspond to two-qubit gates in the circuit. With this in mind, we introduce the concept of qubit reachability within a quantum circuit and present our second approach to determining its reducibility.

#### **Algorithm 1:** Determining reducibility from graph

```
Input:
     StaticCircuit
                         the instruction list of a static quantum circuit
   Output:
     True or False
                         whether the static quantum circuit is reducible
 1 Run Algorithm 6 in Appendix A to obtain the DAG of StaticCircuit and record it as Digraph;
   Let Roots and Terminals be the set of roots and terminals in Digraph, respectively;
 3 Let n be the length of Roots and Terminals;
 4 Initialize a n \times n zero matrix B;
   for i = 0 to n - 1 do
       for j = 0 to n - 1 do
          Apply DFS algorithm to search if there is a path from Roots[i] to Terminals[j];
          if such a path exists then
              Set the (i, j) entry of matrix B to 1;
10 if B is all-one matrix then
    return False:
12 otherwise do return True;
```

**Definition 4 (Reachability between qubits)** Given an instruction list of an n-qubit quantum circuit acting on the set of qubits  $Q = \{q_0, q_1, \cdots, q_{n-1}\}$ . Then any two-qubit instruction introduces two reachability relations  $q_i \xrightarrow{k} q_j$  and  $q_j \xrightarrow{k} q_i$  where  $q_i$  and  $q_j$  are the qubits upon which the instruction operates and k is the order index of the instruction within the instruction list. A qubit  $q_i$  reaches  $q_j$  (or, equivalently,  $q_j$  is reachable from  $q_i$ ), denoted as  $q_i \to q_j$ , if there exists a sequence of relations  $q_i \xrightarrow{k_1} q_{l_1}$ ,  $q_{l_1} \xrightarrow{k_2} q_{l_2}$ ,  $\cdots$ ,  $q_{l_s} \xrightarrow{k_{s+1}} q_j$  such that  $k_1 \le k_2 \le \cdots \le k_{s+1}$ . Moreover, qubits  $q_i$  and  $q_j$  are mutually reachable if  $q_i$  reaches  $q_j$  and vice versa.

To illustrate this definition, consider the quantum circuit in Figure 3(a). Here the order indices of each double-qubit instructions are the same as their IDs. For the first CNOT instruction acting on  $q_0$  and  $q_1$ , it introduces two qubit relations  $q_0 \xrightarrow{6} q_1$  and  $q_1 \xrightarrow{6} q_0$ . So we have  $q_0 \to q_1$  and  $q_1 \to q_0$ , that is, they are mutually reachable. We can also see that  $q_0 \to q_2$  because we have relation  $q_0 \xrightarrow{6} q_1$  and  $q_1 \xrightarrow{7} q_2$ . But the reverse direction  $q_2 \to q_0$  does not hold.

The following result demonstrates that qubit reachability constitutes a necessary and sufficient condition for determining circuit reducibility.

**Proposition 5** A static quantum circuit is irreducible if and only if any two qubits of this quantum circuit are mutually reachable.

Compared to the approach in Section 5.1, the qubit reachability approach does not require the explicit construction of the DAG or the use of the DFS algorithm to derive the simplified DAG. Instead, we can establish qubit reachability by traversing the circuit instructions only once, progressively building up the reachability through a transitive rule.

For convenience, define the reachable set of  $q_i$  by  $Q_i := \{q_j \in Q : q_j \to q_i\} \cup \{q_i\}$ . This set is updated with each double-qubit instruction. Taking Figure 3(a) as an example, before the second CNOT instruction on  $q_1$  and  $q_2$ , the reachable sets are  $Q_1 = \{q_0, q_1\}$  and  $Q_2 = \{q_2\}$ . Subsequently, after this instruction, any qubit in the set  $Q_1 \cup Q_2 = \{q_0, q_1, q_2\}$  can reach both  $q_1$  and  $q_2$ . So the reachable sets are updated to  $Q_1 = \{q_0, q_1, q_2\}$ , and  $Q_2 = \{q_0, q_1, q_2\}$ .

By this transitive rule, we can effectively determine circuit reducibility. The algorithm for this procedure is outlined in Algorithm 2, followed by its time complexity in Proposition 6.

**Proposition 6** For a given static quantum circuit with n qubits and m operations, its reducibility can be efficiently determined by Algorithm 2 with a worst-case time complexity of O(mn).

#### **Algorithm 2:** Determining reducibility from qubit reachability Input: Static Circuitthe instruction list of a static quantum circuit Output: True or False whether the static quantum circuit is reducible 1 Let n be the quantum circuit width; 2 Initialize a list *ReachableSets* of length n, with each element initialized as an empty set; **3 for** i = 0 **to** n - 1 **do** add $q_i$ to ReachableSets[i]; foreach Instruction in StaticCircuit do if Instruction is a double gubit gate then Record the QUBIT values of *Instruction* as i and j respectively; 6 Calculate $Union = ReachableSets[i] \cup ReachableSets[j];$ 7 Set ReachableSets[i] and ReachableSets[j] to Union; 9 foreach ReachableSets in ReachableSets do if the length of ReachableSet is less than n then 10 return Ture 12 otherwise do return False;

### 5.3 Approach 3: determine the reducibility from matrix

The preceding approach by qubit reachability can be further formulated via Boolean matrix manipulation, providing greater convenience for analytical studies, particularly for demonstrating the optimal compilations detailed in Section 8.

Let  $C_1 \circ C_2$  be a composition of quantum circuits  $C_1$  and  $C_2$  and let B(C) be the biadjacency matrix of the simplified DAG of the quantum circuit C. Then we have the following relation for quantum circuit composition.

**Proposition 7** Let  $C_1$  and  $C_2$  be two static quantum circuits. Then  $B(C_1 \circ C_2) = B(C_1) \odot B(C_2)$ .

Note that if an n-qubit quantum circuit only contains one two-qubit gate, e.g., a CNOT gate acting on the i-th and j-th qubit, its biadjacency matrix is given by  $B_{ij} = I_n + E_{ij}^n + E_{ji}^n$  where  $E_{ij}^n$  is the matrix whose (i,j) entry is one and zero otherwise. Since any quantum circuit can be seen as the composition of subcircuits containing only a single two-qubit gate, as per Proposition 7, the computation of a quantum circuit's biadjacency matrix entails the product of a sequence of matrices in the form of  $B_{ij}$ .

A more in-depth examination of the Boolean product reveals that there is no need to perform explicit matrix multiplication. That is, for any matrix A, we have  $A \odot B_{ij} = (a_1, \dots, a_i \lor a_j, \dots, a_j \lor a_i, \dots, a_n)$  where  $a_i$  is the i-th column of the matrix A and  $a_i \lor a_j$  represents entrywise OR of  $a_i$  and  $a_j$ . In other words, the impact of multiplying a matrix  $B_{ij}$  is equivalent to replacing the i-th and j-th columns of A with  $a_i \lor a_j$ . This gives the following algorithm for checking the reducibility.

**Proposition 8** For a given static quantum circuit with n qubits and m operations, its reducibility can be efficiently determined by Algorithm 3 with a worst-case time complexity of O(mn).

# 6 Optimal quantum circuit compilation

In this section, we introduce the mathematical model that characterizes optimal quantum circuit compilation. This model serves as the foundation for deriving heuristic algorithms and analyzing optimal compilation schemes for specific quantum circuits in the subsequent sections.

**Proposition 9** Finding the optimal dynamic circuit compilation scheme of a static quantum circuit with n qubits is equivalent to solving the following binary integer programming problem with

#### **Algorithm 3:** Determining reducibility from matrix

```
Input:
    StaticCircuit
                         the instruction list of a static quantum circuit
   Output:
     True or False
                         whether the static quantum circuit is reducible
 1 Let n be the quantum circuit width;
   Initialize B as an n \times n identity matrix.;
   foreach Instruction in StaticCircuit do
      if Instruction is a two-qubit gate then
          Record the QUBIT values of Instruction as i and j respectively;
 5
 6
           Calculate B' = B[i] \vee B[j], where B[k] is the k-th column of matrix B;
          Set B[i] = B' and B[j] = B';
s if B is all-one matrix then
   return False;
10 otherwise do return True;
```

 $n^2$  Boolean variables  $F_{ij}$ :

$$\alpha := \max \sum_{i,j=0}^{n-1} F_{ij} \tag{1a}$$

s.t. 
$$F_{ij} \le \bar{B}_{ij}, \forall i, j \in \{0, 1, ..., n-1\}$$
 (1b)

$$\sum_{j=0}^{n-1} F_{ij} \le 1, \, \forall i \in \{0, 1, ..., n-1\}$$
 (1c)

$$\sum_{i=0}^{n-1} F_{ij} \le 1, \, \forall j \in \{0, 1, ..., n-1\}$$
(1d)

$$\begin{pmatrix} O_n & B \\ F & O_n \end{pmatrix} \text{ nilpotent }, \tag{1e}$$

where B is the biadjacency matrix of the simplified DAG and  $\bar{B} = (B^{\top})$ , referred to as the candidate matrix, is the logical NOT of the transpose of matrix B. Furthermore, the width of the compiled quantum circuit is given by  $n - \alpha$ .

This result can be seen as a translation of Theorem 1 from graph manipulation to matrix optimization. Particularly, the variable F here represents a feasible solution in the graph manipulation. The block matrix in Eq. (1e) is the adjacency matrix of the modified DAG, which can be used to compile the quantum circuit.

Remark 4 It is worth noting that condition (1b) is indeed implied by condition (1e). However, we explicitly enforce this condition as it proves helpful in the analysis of the optimal solution for quantum circuits in Section 8. Furthermore, there are other equivalent variants of the optimization problem. For instance, the second condition can be omitted, allowing different terminals to connect to the same root. Nevertheless, this modification does not contribute to a reduction in the number of qubits. In such cases, the objective function should be adjusted accordingly.

**Remark 5** The difficulty in solving the binary integer programming arises from the presence of the nilpotent constraint. Due to the non-convex nature of the set of nilpotent matrices, the optimization problem in Proposition 9 is inherently non-convex.

# 7 Heuristic algorithms

The previous section demonstrated that the quantum circuit compilation problem is essentially a binary integer optimization problem with an exponentially increased solution space. While

checking the reducibility of a quantum circuit is a polynomial-time task, as analyzed in Section 5, finding the optimal compilation scheme could potentially require exponential time.

In this section, we introduce several efficient heuristic algorithms designed to address the optimization problem within polynomial time. All algorithms are presented to compile a circuit into the smallest size. However, users retain the flexibility to halt the main loop based on specific criteria, enabling the investigation of tradeoffs among circuit width, depth, and related factors.

### 7.1 Algorithm 1: Minimum Remaining Values Heuristic

The Minimum Remaining Values (MRV) heuristic is a commonly employed technique in constraint satisfaction problems [RN09], which designates the variable with the fewest valid values (i.e., the minimum remaining values) as the next one for value assignment. This approach finds extensive application in problems such as Sudoku solvers and map coloring.

$$\begin{pmatrix} 1 & 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 & 1 \end{pmatrix} \begin{pmatrix} 0 & 1 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 & 1 \\ 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 1 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \end{pmatrix}$$

$$(a) \qquad (b) \qquad (c) \qquad (d)$$

Figure 4: (Color online) An illustration of MRV heuristic algorithm. Suppose the row and column of the matrix are indexed by  $\{t_0,\cdots,t_4\}$  and  $\{r_0,\cdots,r_4\}$ . (a) the biadjacency matrix B of the simplified DAG of a 5-qubit Bernstein–Vazirani algorithm; (b) the candidate matrix  $\bar{B} = (B^\top)$  whereas the candidate edge is chosen as  $(t_3,r_1)$  marked in green; (c) set all entries in  $T_3 \times R_1$  to zero , where  $T_3 = \{t_1\}$  and  $R_1 = \{r_0,r_2,r_3,r_4\}$ ; (d) set all entries in the  $t_3$  row and all entries in the  $r_1$  column to zero.

In the context of our dynamic circuit compilation problem, we can implement the MRV heuristic algorithm as follows. Consider a 5-qubit Bernstein-Vazirani algorithm, with its biadjacency matrix B provided in Figure 4(a), and the candidate matrix  $\bar{B} = (B^{\top})$  given in Figure 4(b). Let's assume that the rows and columns are indexed by  $\{t_0, \dots, t_4\}$  and  $\{r_0, \dots, r_4\}$ . During each iteration, we first identify the terminal  $t_i$  with the fewest candidate roots according to the candidate matrix and connect it to a candidate root  $r_i$  with the least number of choices of terminals. In this example, the candidate edge is selected as  $(t_3, r_1)$ , as depicted in green. After adding this edge, we need to update the candidate matrix. Prior to the incorporation of this edge, the set  $R_i$ encompasses all roots capable of reaching terminal  $t_i$ , while the set  $T_j$  represents all terminals that are reachable from root  $r_j$ . In this example, we have  $R_1 = \{r_0, r_2, r_3, r_4\}$  and  $T_3 = \{t_1\}$ . Upon adding this edge, any root r in the set  $R_i$  will gain the ability to reach any terminal t in the set  $T_j$ , which indicates that all edges (t,r) where  $t \in T_j$  and  $r \in R_i$  are no longer candidate edges. Consequently, we need to update all these entries in the candidate matrix to zero. That is, set entries  $(t_1, r_0), (t_1, r_2), (t_1, r_3), (t_1, r_4)$  to zero, as depicted in Figure 4(c). Furthermore, to ensure that the added edges do not share common vertices, it is necessary to update all entries in the  $t_i$  row and all entries in the  $r_i$  column of the candidate matrix to zero. In this example, set all entries in the  $t_3$  row and all entries in the  $r_1$  column to zero, which is depicted in Figure 4(d). The complete MRV algorithm is given in Algorithm 4.

**Remark 6** Note that the role of root and terminal in Algorithm 4 can be exchanged. That is, we can identify the root with the fewest candidate terminals and connect it to a candidate terminal with least choice of roots. In practice, we can run both algorithms and choose a better result.

**Proposition 10** For a static quantum circuit with n qubits and m operations, Algorithm 4 has a worst-case time complexity of  $O(mn + n^3)$ .

### 7.2 Algorithm 2: Greedy Heuristic Algorithm

Greedy algorithms are time-efficient heuristic strategy that makes a locally optimal choice at each iterative step. This idea can be readily applied to address the dynamic circuit compilation as

#### Algorithm 4: MRV heuristic algorithm

#### Input:

Static Circuit the instruction list of a static quantum circuit to compile

#### **Output:**

Dynamic Circuit the instruction list of the compiled dynamic quantum circuit

- 1 Run Algorithm 6 in Appendix A to get the DAG representation Digraph of the circuit;
- 2 Let Roots and Terminals be the set of roots and terminals, respectively;
- 3 Run Algorithm 3 to get the biadjacency matrix B of the simplified DAG of the circuit;
- 4 Calculate the candidate matrix CandidateMatrix as  $\neg (B^{\top})$ ;
- 5 Initialize two empty lists CandidatesNum and AddedEdges;
- 6 while CandidateMatrix is not zero matrix do
- 7 Calculate the sum of each row of CandidateMatrix and record it in CandidatesNum;
- **s** Identify the smallest non-zero element in *CandidatesNum* and record its index as t;
- Identify the non-zero element in the t-th row of CandidateMatrix with the smallest column sum and record its column index as r;
- Append the edge (Terminals[t], Roots[r]) to AddedEdges;
- Let R be the column indices of zero elements in the t-th row of CandidateMatrix;
- Let T be the row indices of zero elements in the r-th column of CandidateMatrix;
- foreach pair (u, v) where  $u \in T$  and  $v \in R$  do
- Set the entry (u, v) of CandidateMatrix to zero;
- Set all entries in the t-th row and the r-th column of CandidateMatrix to zero;
- 16 Add AddedEdges to Digraph and get the ModifiedGraph;
- 17 Run Algorithm 7 in Appendix A to get the compiled circuit Dynamic Circuit;
- 18 return DynamicCircuit

follows: in each iteration, the algorithm evaluates the potential impact of adding each candidate edge and selects the one that maximizes the possibility to add more edges in subsequent steps. Specifically, for each candidate edge, the algorithm temporarily integrates this edge into the simplified DAG and updates the candidate matrix following the rules outlined in Algorithm 4. The summation of all elements within the updated candidate matrix serves as the score for the candidate edge, which is then stored in the corresponding entry of a matrix called the *score matrix*. After evaluating all candidate edges, the algorithm identifies the candidate edge with the highest score as the optimal choice for inclusion in the current iteration and updates the candidate matrix accordingly before proceeding to the next iteration. In cases where multiple entries share the highest score, one edge is *randomly* selected from among them. An illustrative example of the scoring procedure is depicted in Figure 5. The complete greedy algorithm is given in Algorithm 5.



Figure 5: (Color online) An illustration of the greedy heuristic algorithm. Suppose the row and column of the matrix are indexed by  $\{t_0,\cdots,t_4\}$  and  $\{r_0,\cdots,r_4\}$ . (a) the candidate matrix of the simplified DAG of a 5-qubit Bernstein–Vazirani algorithm  $B= (B^\top)$  whereas the candidate edge under evaluation is  $(t_3,r_1)$  marked in green; (b) set all entries in  $T_3\times R_1$  to zero , where  $T_3=\{t_1\}$  and  $R_1=\{r_0,r_2,r_3,r_4\}$ ; (c) set all entries in the  $t_3$  row and all entries in the  $t_3$  row and all entries in the  $t_3$  row and all entries in the candidate edge  $(t_3,r_1)$ ; (d) the score matrix after the first iteration, where the edge for inclusion in this round is randomly selected from the entries with the highest score (marked in blue).

Remark 7 The scoring rule in the greedy algorithm is flexible and can be replaced with alternative approaches based on specific objectives. One such approach involves evaluating the impact on the

#### Algorithm 5: Greedy heuristic algorithm

```
Input:
     StaticCircuit
                              the instruction list of a static quantum circuit to compile
   Output:
     DynamicCircuit
                              the instruction list of the compiled dynamic quantum circuit
 1 Run Algorithm 6 in Appendix A to get the DAG representation Digraph of the circuit;
   Let Roots and Terminals be the set of roots and terminals, respectively;
 3 Run Algorithm 3 to get the biadjacency matrix B of the simplified DAG of the circuit;
 4 Calculate the candidate matrix CandidateMatrix as \neg (B^{\top});
   Initialize an empty list AddedEdges;
   while CandidateMatrix is not zero matrix do
       Initialize a n \times n zero matrix ScoresMatrix;
       foreach non-zero entry (i, j) in CandidateMatrix do
           Initialize a matrix B_{i,j} as CandidateMatrix;
           Let R_i be the column indices of zero elements in the i-th row of \bar{B}_{i,j};
10
           Let T_j be the row indices of zero elements in the j-th column of \bar{B}_{i,j};
11
           foreach pair (u, v) where u \in T_j and v \in R_i do
12
            Set the entry (u, v) of \bar{B}_{i,j} to zero;
13
           Set all entries in the i-th row and all entries in the j-th column of \bar{B}_{i,j} to zero;
14
           Set the entry (i, j) in ScoresMatrix to the sum of all entries in \bar{B}_{i,j} plus one;
15
       Identify all entries with the largest score in ScoresMatrix as MaxScore;
16
       Randomly select one entry in MaxScore and record its index (t, r);
17
       Append the edge (Terminals[t], Roots[r]) to AddedEdges;
18
       Update CandidateMatrix to B_{t,r}
20 Add AddedEdges to Digraph and get the ModifiedGraph;
21 Run Algorithm 7 in Appendix A to get the compiled circuit Dynamic Circuit;
22 return DynamicCircuit
```

circuit depth for adding a candidate edge to the graph by calculating the length of the critical path in the graph. Then the candidate edge can be scored by a predefined cost function that deliberates the tradeoff between resultant circuit width and depth, alongside other pertinent factors.

Remark 8 In scenarios where multiple candidate edges attain the same highest score in an iterative step, a straightforward approach is to choose the one associated with a fixed rule (e.g. the smallest index). However, adopting such a deterministic procedure could inadvertently tie the compilation process to specific qubit labels, potentially restricting the algorithm's performance. To address this constraint, the integration of a random selection process becomes crucial. This stochastic method holds the potential to uncover enhanced solutions by running the algorithm multiple times, a strategy that has proven quite useful in our numerical experiments.

**Proposition 11** For a static quantum circuit with n qubits and m operations, Algorithm 5 has a worst-case time complexity of  $O(mn + n^5)$ .

### 7.3 Algorithm 3: Hybrid Algorithm

To further enhance the performance of the heuristic algorithms, we introduce a hybrid algorithm by combining the MRV heuristic (also works for other heuristics) and brute force search. Initially, the hybrid algorithm employs a brute force search on a designated subset of terminals, denoted as  $T_E \subseteq T$ , which exhaustively enumerates all feasible edge additions pertinent to terminals within this subset. Subsequently, the MRV heuristic algorithm is employed to identify edges that can be added to the remaining terminals in  $T \setminus T_E$ . Let L denote the cardinality of the subset  $L = |T_E|$ . Notably, at L = 0 the hybrid algorithm aligns precisely with the the MRV algorithm. As L increases, the hybrid algorithm progressively approximates the characteristics of the brute

force search. Upon reaching L=n, the hybrid algorithm becomes the brute force search. This hierarchical variation of the hybrid algorithm, characterized by different values of L, which we referred as the hierarchy level, provides the opportunity to tradeoff between the optimality of the solution and the computational time complexity. Detailed discussions and numerical experiments can be found in Appendix A.3 of the supplementary material.

# 8 Analytical and numerical evaluations

In this section, we conduct a thorough analysis of quantum circuits with practical relevance, offering optimal compilations for well-known quantum algorithms in quantum computation, ansatz circuits utilized in quantum machine learning, and measurement-based quantum computation crucial for quantum networking. We also perform a comparative analysis against state-of-the-art approaches, demonstrating the superior performance of our methods in both structured and random quantum circuits. A brief summary of the quantum circuits explored in this study is provided in Table 1. Optimal compilations for the frequently used quantum algorithms and their proofs can be found in Appendix C of the supplementary material.

Table 1: (Color online) Quantum circuits studied in this work. The case in blue indicates irreducible circuits, while the case in red means the compiled width is optimal. Numerical studies are provided for cases in green.

| Quantum circuits                             | Original | Compiled |
|----------------------------------------------|----------|----------|
| Deutsch-Jozsa algorithm                      | n        | 2(1)     |
| Bernstein-Vazirani algorithm                 | n        | 2 (1)    |
| Simon's algorithm                            | 2n       | 3        |
| Quantum Fourier transform                    | n        | n        |
| Quantum phase estimation                     | n        | n        |
| Shor's algorithm                             | n        | n        |
| Grover's algorithm                           | n        | n        |
| Quantum counting algorithm                   | n        | n        |
| Linearly entangled circuit with $l$ layers   | n        | l+1      |
| Circularly entangled circuit with $l$ layers | n        | 3        |
| Pairwisely entangled circuit with $l$ layers | n        | 2l + 1   |
| Fully entangled circuit                      | n        | n        |
| Diamond-structured quantum circuit           | 2n       | n+1      |
| MBQC with cluster state of size $(w, d)$     | wd       | w+1      |
| MBQC with brickwork state of size $(w, d)$   | wd       | w+1      |
| Quantum ripple carry adder circuit           | n(4)     | 4(3)     |
| Quantum supremacy circuits                   | -        | -        |
| GRCS circuits                                | -        | -        |
| QAOA circuits for max-cut                    | -        | -        |
| Random (IQP) circuits                        | -        | -        |

To compare the reducibility of quantum circuits as well as the performance of different compilation methods, we define the *reducibility factor* of a quantum circuit as

$$r = 1 - \frac{n'}{n} \in [0, 1), \tag{2}$$

where n is the width of the original circuit and n' is the width of the compiled circuit. This factor characterizes the extent to which the circuit width can be reduced by a certain algorithm, which is zero if the circuit is not reducible.

### 8.1 Quantum supremacy circuits

In this section, we analyze the reducibility factor of quantum circuits used to claim quantum supremacy, including those executed on Sycamore and Zuchongzhi quantum computers. Figure 6(a) displays the reducibility factor of these circuits with varying numbers of cycles, determined using the greedy heuristic algorithm (Algorithm 5).

An interesting observation is that quantum circuits with 70 qubits and 24 cycles running on Sycamore [MVM+23], 56 qubits and 20 cycles on Zuchongzhi [WBC+21], and 60 qubits and 24 cycles on Zuchongzhi [ZCC+22] are used to claim quantum supremacy, but they all belong to irreducible circuits. This observation reveals a fundamental tradeoff between the technical challenges associated with running deep circuits (a limitation of current quantum computers) and the structural complexity of these circuits (a limitation of classical computers). It highlights the delicate balance required when designing quantum circuits to showcase quantum supremacy in the near term.

Furthermore, another noteworthy point is the quantum circuit with 53 qubits and 20 cycles executed on Sycamore [AAB<sup>+</sup>19], which possesses a reducibility factor of 0.02. This result indicates that the circuit can be compiled into a quantum circuit with 52 qubits. This observation aligns with the historical fact that one qubit on the Sycamore chip is non-functional, thereby breaking the complexity of the circuit and leaving the room for its compilation.



Figure 6: (Color online) (a) The reducibility factor of different quantum supremacy circuits using the greedy heuristic algorithm 5; (b) The reducibility factor of GRCS circuits located in the GitHub directory 'inst/rectangular' with different circuit widths and numbers of cycles using the greedy heuristic algorithm 5.

In 2018, Google proposed a series of random quantum circuits (GRCS) [BIS<sup>+</sup>18]. Due to the hardness of simulation, GRCS is frequently used as benchmark to test the performance of classical simulators. Each instance in GRCS is a random quantum circuit designed for qubits configured in an  $n \times m$  lattice. These circuits are composed of multiple cycles of quantum gates. Figure 6(b) demonstrates the reducibility factor of GRCS circuits through a single running of the greedy heuristic algorithm 5. It further validates the earlier observation that the larger the number of cycles (depth), the more difficult for a quantum circuit to reduce.

### 8.2 Algorithm benchmarking

In this section, we conduct a numerical analysis to assess the performance of different algorithms on a range of benchmark circuits, including quantum adders, quantum approximate optimization algorithm (QAOA), random quantum circuits and random IQP circuits. Our primary focus centers around three distinct algorithms for dynamic circuit compilation: the MRV heuristic algorithm (Algorithm 4), the greedy heuristic algorithm (Algorithm 5) and the greedy algorithms proposed in [DCKFF22] (referred to as DCKF in the subsequent discussion). As the source code for the DCKF algorithms is not publicly available, we have implemented these algorithms based on our understanding of the paper [DCKFF22]. Further details regarding our implementation of the DCKF algorithms can be found in Appendix D of the supplementary material. For the MRV algorithm, we perform two separate runs, exchanging the roles of roots and terminals within the algorithm for each run. Then, we select the dynamic circuit with the smaller circuit width as the final output. We have also utilized the stochastic nature of our greedy heuristic algorithm by running it multiple times to improve its performance.

#### 8.2.1 Quantum Ripple Carry Adders

Quantum adder is a quantum circuit designed for performing addition operation between two bit strings. For example, if we compute '1+2=3', then we represent the input string as '01' and '10', and the expected output bit string is '11'. Here we focus on the Quantum Ripple Carry Adders initially proposed in [CSK08].

In Figure 7(a), we present the results of the numerical experiments conducted using three different algorithms. It is evident that both MRV and the greedy algorithm successfully find the optimal compilation. In contrast, the results obtained from the DCKF algorithms display linear scaling in the original circuit width, indicating a deficiency in its performance. This limitation is likely linked to a specific implementation aspect. That is, the process of establishing measurement orders within the DCKF algorithms sometimes generates multiple local optima within a single iteration. Consequently, deterministic selections in these scenarios might lead to unfavorable measurement orders, significantly affecting overall performance. This justifies the reasoning behind our inclusion of randomness within our greedy heuristic algorithm (Algorithm 5).



Figure 7: (Color online) (a) Compiled circuit width against the original circuit width of quantum ripple carry adders; (b) Compiled circuit width against the original circuit width of the max-cut QAOA circuits with p=1. The plotted error bars correspond to the maximum and minimum compiled width over 20 instances.

#### 8.2.2 QAOA circuits for max-cut problem

QAOA [FGG14] is a quantum algorithm designed to approximately solve classical combinatorial optimization problems and have the potential to run on near-term quantum devices. The QAOA unitary takes the form of the alternative application of a mixing unitary and a problem unitary for p layers. A max-cut problem is a combinatorial optimization problem in graph theory which involves to find a partition of vertices into two sets, such that the number of edges between the sets is maximized. In QAOA circuits designed for solving the max-cut problem, the number of qubits matches the number of vertices in the graph, and the connectivity of two-qubit gates corresponds to the edges in the graph.

Here, we assess the performance of different algorithms applied to QAOA circuits for solving the max-cut problem on random unweighted three-regular (U3R) graphs with p=1. For each experiment, we ran our greedy heuristic algorithm 10 times and recorded the best result. We evaluated four algorithms for each fixed qubit number on 20 random U3R graphs generated using the NetworkX package [HSS08]. The results are presented in Figure 7(b). It is evident that the average compiled width achieved by our greedy algorithm is consistently lower than that obtained using the DCKF algorithms for all qubit numbers. Moreover, as the number of qubits increases, this advantage becomes increasingly pronounced.

#### 8.2.3 Random circuits

In addition to the previously studied structured circuits, we conducted comprehensive numerical experiments involving random quantum circuits to assess algorithm performance across a broader

spectrum of scenarios. These experiments involved fixing the ratio r = m/n, where m represents the number of two-qubit gates, and n represents the width of the original circuits. We uniformly and randomly selected a qubit number from the range between 10 and 80 and sampled the desired number of two-qubit gates to construct the circuit.

We evaluated the reducibility factor using both our greedy heuristic and the DCKF algorithms on these random circuits. For each fixed ratio, we sampled 300 random circuits and ran our greedy algorithm 15 times for each instance. The results in Figure 8(a) demonstrate that our greedy heuristic (vertical) outperforms the DCKF algorithm (horizontal) in approximately 98.5% of instances.

To underscore the significance of handling circuits with commutable structures, we further evaluated the reducibility factor using our greedy heuristic and the improved DCKF algorithms on random IQP circuits. We sampled 300 random IQP circuits for each fixed ratio and ran our greedy algorithm 10 times for each instance. As depicted in Figure 8(b), our greedy algorithm outperforms in nearly 100% of instances with an absolute advantage in 98.4% of cases. More numerical analysis can be found in Appendix E of the supplementary material.



Figure 8: (Color online) (a) The reducibility factor of the randomly generated quantum circuits evaluated using our greedy heuristic algorithm 5 and the DCKF algorithm. (b) The reducibility factor of the randomly generated IQP circuits evaluated using our greedy heuristic algorithm 5 and the improved version of DCKF algorithm. The black diagonal line indicates the point at which the reducibility factors produced by both algorithms are equal. (c) The probability of obtaining a correct outcome (all-one string) against the compiled circuit width in the noisy simulation of an 11-qubit Bernstein-Vazirani algorithm.

### 8.2.4 Noisy simulation

Error variability poses a challenge in near-term quantum hardware, making certain qubits perform better than others. By maximizing qubit reuse, we can consistently select qubits with superior performance, thereby enhancing the algorithm's performance. To further demonstrate the practical efficacy of the proposed methods, we design a noisy simulation of an 11-qubit Bernstein-Vazirani (BV) algorithm, specifically targeting the real-world 11-qubit trapped-ion quantum computer reported in [WBD+19]. The secret bitstring of the BV algorithm is set to an all-one string. In our simulation, we gradually reduce the number of qubits from 11 to 2 and map the logical qubits onto the physical qubits of the hardware, systematically eliminating a physical qubit with a higher error rate at each step. The resulting probability of obtaining the correct outcome, plotted against the circuit width, is depicted in Figure 8(c). The utilization of dynamic quantum circuit compilation enables us to reduce qubit usage by up to 82% while also improving the probability of achieving accurate results by 8%. Note that this example is for illustrative purposes, and the advantages of dynamic circuit compilation are expected to be even more prominent when applied to larger-scale algorithms and quantum computers. Detailed information about the noisy simulation is available in Appendix E of the supplemental material.

### 9 Related Work

The work [PWD16] studied quantum circuit compilation by wire recycling. They constructed a causal graph to represent the temporal ordering of quantum circuit operations and analyzed the lifetimes of qubits to exploit the potential of recycling wires. They developed two heuristic algorithms based on graph search for wire recycling. However, the proposed method is limited to recycling wires between pre-defined ancilla qubits.

The work [DCKFF22] investigated quantum circuit compilation by leveraging the causal structure of the circuits. They formulated the task of minimizing the required number of qubits as a constraint programming and satisfiability (CP-SAT) model. This model incorporates a bunch of constraints and is primarily utilized to numerically benchmark their heuristic algorithms on small-scale problems. In contrast, our approach utilizes a graph manipulation framework that induces straightforward binary integer programming for optimal compilation. This framework has been effectively employed to establish the optimal compilation of numerous quantum circuits. Additionally, [DCKFF22] proposed greedy heuristic algorithms for approximate compilation. However, our comparative analysis highlights the superior performance of our methods across both structured and random quantum circuits. Notably, our framework successfully addresses an open challenge emphasized in their work, namely, the effective handling of quantum circuits with commutable structures and the ability to conduct compilation at the level of quantum algorithms, regardless of their specific quantum instruction sequences.

The work [HJC<sup>+</sup>23] explored the tradeoff between qubit reuse, fidelity, gate count, and circuit duration. They established two conditions for qubit-reuse and designed two versions of compiler-assisted tools with one prioritizing qubit-saving and the other emphasizing SWAP reduction and fidelity improvement. Their empirical demonstrations on quantum hardware showcased notable improvements in qubit usage and circuit fidelity for specific applications, primarily focused on superconducting quantum computers. Our approach, on the other hand, centers on minimizing the required number of qubits—a scenario well-motivated by trapped ion quantum systems. Additionally, we contribute a more adaptable framework capable of facile extensions to accommodate various optimization objectives. For instance, by fine-tuning the cost function within our greedy algorithm's scoring process, we can explore tradeoffs between circuit width, depth, and other related factors, encompassing their qubit-saving approach. Furthermore, our study introduces efficient techniques for assessing a quantum circuit's reducibility, serving as a preprocessing step to screen circuits before actual compilation. The optimal compilations identified in our work can serve as benchmarks for other variants of qubit-reuse compilation methods.

The work [BPK23] introduced a formal SAT-based model for qubit reuse optimization on near-term quantum devices. This model ensured provably optimal solutions concerning quantum circuit depth, number of qubits, or number of swap gates. However, their approach may encounter serious computational challenges and scalability issues as the number of qubits increases.

### 10 Conclusion and Future work

We have conducted a comprehensive investigation into the dynamic circuit compilation problem, introducing the first characterization of this task through graph manipulation and a precise mathematical model for optimal compilation. Our framework primarily targets qubit savings but is general enough to be adapted to other scenarios. The effectiveness of our approach was demonstrated through theoretical analyses of reducibility and optimal compilation for various renowned quantum circuits, as well as numerical evaluation of our heuristic algorithms on a wide range of benchmark circuits. It is worth noting that the dynamic circuit compilation explored in this work offers a complementary strategy to other circuit optimization techniques and can be seamlessly integrated with existing methods. As we approach the point of demonstrating quantum advantage in practical applications, our results shall serve as timely contributions for bridging the gap between theoretical quantum algorithms and their physical implementation on quantum computers with limited resources.

In addition to the practical utility, our work also establishes the connection between dynamic circuit compilation and graph theory. We believe there are many other techniques from graph theory that could be used to extend our framework and be applied in the area of quantum circuit

compilation and optimization. An in-depth discussion of related open problems can be found in Appendix F of the supplementary material.

# Acknowledgements

We would like to thank Jingtian Zhao for part of the code implementation in QNET. This work was done when M. Z. and R. S. were research interns at Baidu Research. Y. L. is supported by the National Nature Science Foundation of China (No. 62302346) and supported by "the Fundamental Research Funds for the Central Universities".

# References

- [AAB<sup>+</sup>19] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, et al. Quantum supremacy using a programmable superconducting processor. *Nature*, 574(7779):505–510, Oct 2019, 10.1038/s41586-019-1666-5.
- [AG04] Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizer circuits. Physical Review A, 70(5):052328, Nov 2004, 10.1103/PhysRevA.70.052328.
- [AI23] Google Quantum AI. Suppressing quantum errors by scaling a surface code logical qubit. *Nature*, 614(7949):676–681, Feb 2023, 10.1038/s41586-022-05434-1.
- [BBC<sup>+</sup>93] Charles H Bennett, Gilles Brassard, Claude Crépeau, Richard Jozsa, Asher Peres, and William K Wootters. Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels. *Physical Review Letters*, 70(13):1895–1899, Mar 1993, 10.1103/PhysRevLett.70.1895.
- [BBD<sup>+</sup>09] Hans J. Briegel, Daniel E. Browne, Wolfgang Dür, Robert Raussendorf, and Maarten Van den Nest. Measurement-based quantum computation. *Nature Physics*, 5(1):19–26, Jan 2009, 10.1038/nphys1157.
- [BFK09] Anne Broadbent, Joseph Fitzsimons, and Elham Kashefi. Universal blind quantum computation. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pages 517–526. IEEE, Oct 2009. 10.1109/focs.2009.36.
- [BHT98] Gilles Brassard, Peter Høyer, and Alain Tapp. Quantum counting. In *Automata*, Languages and Programming: 25th International Colloquium, ICALP'98, pages 820–831, Berlin, Heidelberg, July 1998. Springer. 10.1007/BFb0055105.
- [BIS+18] Sergio Boixo, Sergei V Isakov, Vadim N Smelyanskiy, Ryan Babbush, Nan Ding, Zhang Jiang, Michael J Bremner, John M Martinis, and Hartmut Neven. Characterizing quantum supremacy in near-term devices. Nature Physics, 14(6):595-600, Apr 2018, 10.1038/s41567-018-0124-x.
- [BJG08] Jørgen Bang-Jensen and Gregory Z Gutin. Digraphs: theory, algorithms and applications. Springer Science & Business Media, 2008.
- [BPK23] Sebastian Brandhofer, Ilia Polian, and Kevin Krsulich. Optimal qubit reuse for nearterm quantum computers, 2023. arXiv:2308.00194.
- [BSHM21] Sergey Bravyi, Ruslan Shaydulin, Shaohan Hu, and Dmitri Maslov. Clifford circuit optimization with templates and symbolic pauli gates. *Quantum*, 5:580, Nov 2021, 10.22331/q-2021-11-16-580.
- [BV97] Ethan Bernstein and Umesh Vazirani. Quantum complexity theory. SIAM Journal on Computing, 26(5):1411–1473, Oct 1997, 10.1137/S0097539796300921.
- [BWP+17] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. *Nature*, 549(7671):195–202, Sep 2017, 10.1038/nature23474.
- [con23] Qiskit contributors. Qiskit: An open-source framework for quantum computing, 2023, 10.5281/zenodo.2573505.

- [CP20] Aleksandar Cvetković and Vladimir Yu. Protasov. Maximal acyclic subgraphs and closest stable matrices. SIAM Journal on Matrix Analysis and Applications, 41(3):1167–1182, Aug 2020, 10.1137/19M1305422.
- [CSK08] Amlan Chakrabarti and Susmita Sur-Kolay. Designing quantum adder circuits and evaluating their error performance. In 2008 International Conference on Electronic Design, pages 1–6. IEEE, Dec 2008. 10.1109/ICED.2008.4786689.
- [CTI+21] A. D. Córcoles, Maika Takita, Ken Inoue, Scott Lekuch, Zlatko K. Minev, Jerry M. Chow, and Jay M. Gambetta. Exploiting dynamic quantum circuits in a quantum algorithm with superconducting qubits. *Physical Review Letters*, 127(10):100501, Aug 2021, 10.1103/PhysRevLett.127.100501.
- [DCKFF22] Matthew DeCross, Eli Chertkov, Megan Kohagen, and Michael Foss-Feig. Qubit-reuse compilation with mid-circuit measurement and reset, 2022. arXiv:2210.08039.
- [Deo16] Narsingh Deo. Graph theory with applications to engineering and computer science. Dover Publications, 2016.
- [DJ92] David Deutsch and Richard Jozsa. Rapid solution of problems by quantum computation. *Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences*, 439(1907):553–558, Dec 1992, 10.1098/rspa.1992.0167.
- [FGG14] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm, 2014. arXiv:1411.4028.
- [Fit17] Joseph F. Fitzsimons. Private quantum computation: an introduction to blind quantum computing and related protocols. npj Quantum Information, 3(1):23, Jun 2017, 10.1038/s41534-017-0025-3.
- [FMMC12] Austin G. Fowler, Matteo Mariantoni, John M. Martinis, and Andrew N. Cleland. Surface codes: Towards practical large-scale quantum computation. *Physical Review A*, 86(3):032324, Sep 2012, 10.1103/PhysRevA.86.032324.
- [FZL<sup>+</sup>23] Kun Fang, Jingtian Zhao, Xiufan Li, Yifei Li, and Runyao Duan. Quantum NETwork: from theory to practice. *Science China Information Sciences*, 66(8):180509, Jul 2023, 10.1007/s11432-023-3773-4.
- [Goo20] Google AI Quantum and Collaborators. Hartree-Fock on a superconducting qubit quantum computer. *Science*, 369(6507):1084–1089, Aug 2020, 10.1126/science.abb9811.
- [Gro96] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of Computing, STOC '96, pages 212–219, New York, NY, USA, Jul 1996. Association for Computing Machinery. 10.1145/237814.237866.
- [HCJ<sup>+</sup>21] Fei Hua, Yanhao Chen, Yuwei Jin, Chi Zhang, Ari Hayes, Youtao Zhang, and Eddy Z. Zhang. AutoBraid: A Framework for Enabling Efficient Surface Code Communication in Quantum Computing. In *MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture*, MICRO '21, page 925–936, New York, NY, USA, 2021. Association for Computing Machinery. 10.1145/3466752.3480072.
- [HJC<sup>+</sup>23] Fei Hua, Yuwei Jin, Yanhao Chen, Suhas Vittal, Kevin Krsulich, Lev S. Bishop, John Lapeyre, Ali Javadi-Abhari, and Eddy Z. Zhang. CaQR: A Compiler-Assisted Approach for Qubit Reuse through Dynamic Circuit. In *Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3*, ASPLOS 2023, page 59–71, New York, NY, USA, 2023. Association for Computing Machinery. 10.1145/3582016.3582030.
- [HR94] Refael Hassin and Shlomi Rubinstein. Approximations for the maximum acyclic subgraph problem. Information Processing Letters, 51(3):133-140, Aug 1994, 10.1016/0020-0190(94)00086-7.
- [HSS08] Aric A. Hagberg, Daniel A. Schult, and Pieter J. Swart. Exploring network structure, dynamics, and function using NetworkX. In *Proceedings of the 7th Python in Science Conference (Scipy2008)*, pages 11 15, Pasadena, CA, USA, Aug 2008.

- [Kar72] Richard M. Karp. Reducibility among combinatorial problems. In Complexity of Computer Computations: Proceedings of a symposium on the Complexity of Computer Computations, pages 85–103, Boston, MA, Mar 1972. Springer US. 10.1007/978-1-4684-2001-2\_9.
- [KG01] Navin Khaneja and Steffen J. Glaser. Cartan decomposition of  $SU(2^n)$  and control of spin systems. Chemical Physics, 267(1):11–23, June 2001, https://doi.org/10.1016/S0301-0104(01)00318-4.
- [LQW+22] Yinan Li, Youming Qiao, Avi Wigderson, Yuval Wigderson, and Chuanqi Zhang. Connections between graphs and matrix spaces, 2022. arXiv:2206.04815.
- [LWG<sup>+</sup>10] B. P. Lanyon, J. D. Whitfield, G. G. Gillett, M. E. Goggin, M. P. Almeida, I. Kassal, J. D. Biamonte, M. Mohseni, B. J. Powell, M. Barbieri, A. Aspuru-Guzik, and A. G. White. Towards quantum chemistry on a quantum computer. *Nature Chemistry*, 2(2):106–111, Jan 2010, 10.1038/nchem.483.
- [MVM+23] A. Morvan, B. Villalonga, X. Mi, S. Mandrà, A. Bengtsson, P. V. Klimov, Z. Chen, S. Hong, C. Erickson, I. K. Drozdov, et al. Phase transition in random circuit sampling, 2023. arXiv:2304.11119.
- [PDF<sup>+</sup>21] J. M. Pino, J. M. Dreiling, C. Figgatt, J. P. Gaebler, S. A. Moses, M. S. Allman, C. H. Baldwin, M. Foss-Feig, D. Hayes, K. Mayer, C. Ryan-Anderson, and B. Neyenhuis. Demonstration of the trapped-ion quantum CCD computer architecture. *Nature*, 592(7853):209–213, Apr 2021, 10.1038/s41586-021-03318-4.
- [PMS<sup>+</sup>14] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J Love, Alán Aspuru-Guzik, and Jeremy L O'brien. A variational eigenvalue solver on a photonic quantum processor. *Nature communications*, 5(1):4213, July 2014, 10.1038/ncomms5213.
- [PWD16] Alexandru Paler, Robert Wille, and Simon J. Devitt. Wire recycling for quantum circuit optimization. *Physical Review A*, 94(4):042337, Oct 2016, 10.1103/Phys-RevA.94.042337.
- [RB01] Robert Raussendorf and Hans J Briegel. A one-way quantum computer. *Physical Review Letters*, 86(22):5188, 2001, 10.1103/PhysRevLett.86.5188.
- [RN09] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 2009.
- [SB09] Dan Shepherd and Michael J. Bremner. Temporally unstructured quantum computation. *Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences*, 465(2105):1413–1439, Feb 2009, 10.1098/rspa.2008.0443.
- [Sho94] Peter W. Shor. Algorithms for quantum computation: discrete logarithms and factoring. In *Proceedings 35th Annual Symposium on Foundations of Computer Science*, pages 124–134. IEEE, 1994. 10.1109/SFCS.1994.365700.
- [Sim97] Daniel R. Simon. On the power of quantum computation. SIAM Journal on Computing, 26(5):1474–1483, Oct 1997, 10.1137/S0097539796298637.
- [SJAG19] Sukin Sim, Peter D. Johnson, and Alán Aspuru-Guzik. Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms.

  Advanced Quantum Technologies, 2(12):1900070, Oct 2019, 10.1002/qute.201900070.
- [WBC<sup>+</sup>21] Yulin Wu, Wan-Su Bao, Sirui Cao, Fusheng Chen, Ming-Cheng Chen, Xiawei Chen, Tung-Hsun Chung, Hui Deng, Yajie Du, Daojin Fan, et al. Strong quantum computational advantage using a superconducting quantum processor. *Physical Review Letters*, 127(18):180501, Oct 2021, 10.1103/PhysRevLett.127.180501.
- [WBD+19] K. Wright, K. M. Beck, S. Debnath, J. M. Amini, Y. Nam, N. Grzesiak, J.-S. Chen, N. C. Pisenti, M. Chmielewski, C. Collins, K. M. Hudek, J. Mizrahi, J. D. Wong-Campos, S. Allen, J. Apisdorf, P. Solomon, M. Williams, A. M. Ducore, A. Blinov, S. M. Kreikemeier, V. Chaplin, M. Keesan, C. Monroe, and J. Kim. Benchmarking an 11-qubit quantum computer. Nature Communications, 10(1):5464, Nov 2019, 10.1038/s41467-019-13534-2.

- [WLQ<sup>+</sup>21] Pengfei Wang, Chun-Yang Luan, Mu Qiao, Mark Um, Junhua Zhang, Ye Wang, Xiao Yuan, Mile Gu, Jingning Zhang, and Kihwan Kim. Single ion qubit with estimated coherence time exceeding one hour. *Nature Communications*, 12(1):233, Jan 2021, 10.1038/s41467-020-20330-w.
- [XLP+22] Mingkuan Xu, Zikun Li, Oded Padon, Sina Lin, Jessica Pointing, Auguste Hirth, Henry Ma, Jens Palsberg, Alex Aiken, Umut A. Acar, and Zhihao Jia. Quartz: Superoptimization of quantum circuits. In Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022, page 625–640, New York, NY, USA, Jun 2022. Association for Computing Machinery. 10.1145/3519939.3523433.
- [XMP+23] Amanda Xu, Abtin Molavi, Lauren Pick, Swamit Tannu, and Aws Albarghouthi. Synthesizing quantum-circuit optimizers. In Proceedings of the ACM on Programming Languages, volume 7, page 835–859, New York, NY, USA, jun 2023. Association for Computing Machinery. 10.1145/3591254.
- [ZCC<sup>+</sup>22] Qingling Zhu, Sirui Cao, Fusheng Chen, Ming-Cheng Chen, Xiawei Chen, Tung-Hsun Chung, Hui Deng, Yajie Du, Daojin Fan, Ming Gong, et al. Quantum computational advantage via 60-qubit 24-cycle random circuit sampling. Science Bulletin, 67(3):240–245, Feb 2022, 10.1016/j.scib.2021.10.017.
- [ZHQ<sup>+</sup>21] Chi Zhang, Ari B. Hayes, Longfei Qiu, Yuwei Jin, Yanhao Chen, and Eddy Z. Zhang. Time-optimal qubit mapping. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '21, page 360–374, New York, NY, USA, 2021. Association for Computing Machinery. 10.1145/3445814.3446706.
- [ZPW18] Alwin Zulehner, Alexandru Paler, and Robert Wille. An efficient methodology for mapping quantum circuits to the IBM QX architectures. *IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (TCAD)*, 38(7):1226–1236, June 2018, 10.1109/TCAD.2018.2846658.

# Supplementary Material

This supplementary material provides a more detailed analysis and proof of the results in the main text. It also contains some further algorithms and their experimental results.

- Appendix A: Further algorithms
- Appendix B: Proofs of results
- Appendix C: Optimal compilations for important quantum circuits
- Appendix D: Implementation of DCKF algorithm
- Appendix E: Experimental results continued
- Appendix F: Open problems

# A Further Algorithms

In this section, we provide further algorithms to support our compilation framework.

### A.1 Converting static quantum circuit to DAG

Algorithm 6 below provides an efficient procedure for transforming a quantum circuit from its circuit instructions into the corresponding DAG representation. Particularly, to process circuits with commutable structure, each instruction is associated with a group tag, indicating to which group of gates it belongs. Operations within the same group are commutable, whereas a 'None' tag denotes a non-commutable gate. The algorithm traverses the static circuit instructions, generating vertices in the directed graph for each quantum instruction encountered. It then iterates over the qubits involved in the instruction, examining the preceding operation on each qubit. In cases where the preceding operation is non-commutable, a directed edge is established from the previous operation to the current one. However, if the previous operation is commutable, it needs to check whether the previous operation and the current operation belong to the same commutable group. If they do, it traverses the *CasualList* of this qubit in reversed order and identifies the previous commutable group. If this group is 'None', a directed edge is created from the first non-commutable operation to the current one. Conversely, if a previous commutable group is identified, the algorithm connects all operations on this qubit belonging to the previous commutable group to the current one.

The time complexity of this algorithm is analyzed as follows. Assume that the input static circuit has n qubits and m operations. The outermost 'foreach' loops over the static circuit instruction, which takes O(m) times. In cases where the previous operation on a qubit is not commutable, we only add an edge to the graph, which can be done in O(1) times. The primary complexity arises when the previous operation and the current operation belong to the same commutable group, where we need to traverse the CasualList of this qubit in reversed order, which takes O(m/n) times in the worst-case (m/n) indicates the average number of operations on a qubit). Then adding edges from operations in the previous commutable group to the current operation also takes at most O(m/n) times. Therefore, the overall time complexity of this algorithm can be calculated as  $O(m \times (m/n + m/n)) = O(m^2/n)$ .

It's important to recognize that imposing dependencies between commutable operations may limit the opportunities for qubit-reuse. As an example, consider the quantum circuit in Figure 9a, where all the CZ gates are commutable. As depicted by Figure 9b, ignoring the commutability will impose some unnecessary dependencies (edges marked in red), ultimately limiting the potential for qubit-reuse. The DAG with flexible dependencies is shown in Figure 9c, where two qubits can be reused and the circuit can be reduced to 2 qubits.

```
Algorithm 6: Converting static quantum circuit to DAG
   Input:
    StaticCircuit
                         a static quantum circuit instructions
   Output:
     Digraph
                         a DAG representation of the static quantum circuit
 1 Let n be the width of quantum circuit;
 2 Initialize an empty directed graph Digraph;
 3 Initialize a list CausalLists of length n, with each element initialized as an empty list;
   foreach Instruction in StaticCircuit do
       Add a vertex Vertex labeled as ID of Instruction to Digraph;
 5
       Identify the group tag of Instruction as Group;
 6
       foreach q in QUBIT value of instruction do
 7
           Find the last entry of CausalLists[q] and record it as PreVertex;
 8
          if PreVertex exits then
 9
              Identify the group tag of StaticCircuit[PreVertex] as PreGroup;
10
              if PreGroup is None then
11
                  Add a directed edge from PreVertex to Vertex to Digraph;
12
              else
13
                  if Group is equal to PreGroup then
14
                      foreach V in reversed(CausalLists[q]) do
15
                         Identify the the group tag of StaticCircuit[V] as CurrentGroup;
16
                         if CurrentGroup is not equal to Group then
17
                             Set PreGroup to CurrentGroup;
18
                             if PreGroup is None then
19
                                 Add a directed edge from V to Vertex to Digraph;
20
                                 End else;
21
                             Break foreach;
22
                  Identify all vertices in CausalLists[q] belonging to PreGroup;
23
                  Add directed edges from all identified vertices to Vertex to Digraph;
24
           Append ID of Instruction to the end of list CausalLists[q];
25
26 return Digraph
```



Figure 9: (Color online) An illustration of compiling quantum circuits with commutable structure.

### A.2 Converting modified DAG to dynamic quantum circuit

After getting a modified graph, Algorithm 7 can efficiently convert it to a dynamic quantum circuit.

The time complexity of Algorithm 7 is analyzed as follows. Assume that the input static circuit has n qubits and m operations. Initially, the modified DAG is topologically sorted using the Depth-First Search (DFS) algorithm. Denoting the number of vertices and edges in the modified DAG as |V| and |E|, it is clear that |V| = m. As each operation introduces at most two edges to the graph, |E| scales linearly with m, resulting in |E| = O(m). Consequently, the time complexity of topological sorting with DFS is O(|V| + |E|) = O(m) [BJG08]. Following this, a traversal of the

#### Algorithm 7: Converting modified DAG to dynamic quantum circuit

```
Input:
     StaticCircuit
                         the instruction list of a static quantum circuit
     ModifiedGraph
                         a modified DAG
                         a list of added edges
     AddedEdges
   Output:
                         the compiled dynamic quantum circuit instructions
     DynamicCircuit
 1 Initialize empty lists TopologicalOrder and DynamicCircuit;
 2 Apply topological sorting to ModifiedGraph, record the result in TopologicalOrder;
 3 Rearrange the order of instructions in Static Circuit according to the order of vertices in
     TopologicalOrder, record the result in DynamicCircuit;
 4 foreach Edge in AddedEdges do
       Let v_i and v_j be the head and tail vertices of Edge, respectively;
       Identify the circuit instructions corresponding to v_i and v_j, record the QUBIT values of these
        two instructions as q_i and q_j, respectively;
       foreach Instruction in DynamicCircuit do
           foreach Qubit in QUBIT value of Instruction do
              if Qubit is equal to q_i then update Qubit to q_j;
10 return DynamicCircuit
```

topological order with m vertices is performed to reorder the static circuit instructions. Note that the ID of an instruction and the label of the corresponding vertex are the same. Therefore, for each vertex label i, the corresponding instruction is accessed through StaticCircuit[i] and appended to DynamicCircuit, which allows the rearrangement to be completed in O(m) time. Subsequently, for each added edge, we need to traverse the DynamicCircuit list and update the qubit indices. For a static circuit with n qubits, at most n-1 edges can be added to the DAG, therefore the updating step exhibits a time complexity of O(mn). Consequently, the overall time complexity of Algorithm 7 is computed as O(m) + O(m) + O(mn) = O(mn).

### A.3 Hybrid Algorithm in details

Given that the search space is the Cartesian product of candidate roots sets of all terminals in  $T_E$ , the algorithm initiates by identifying L terminals with the least number of candidate roots, which significantly contracts the search space. Note that for each  $t \in T_E$ , we add a  $\bot$  to the set of candidate roots to represent the situation where t connects no root. Due to the constraint that each terminal can be connected to only one root, solutions with repeated items (except  $\bot$ ) in the calculated search space are rendered unfeasible and are consequently eliminated. Subsequently, for each feasible solution, we check whether the DAG with these additional edges is acyclic. Once acyclicity is verified, Algorithm 1 is employed on the modified DAG to update the candidate matrix for the remaining terminals and roots, followed by Algorithm 4 to identify the further added edges. Finally, the solution featuring the highest number of added edges is selected to compile the input static circuit. The complete algorithm is as follows.

```
Algorithm 8: Hybrid algorithm with hierarchy level L
   Input:
    L
                            the hierarchy level
    StaticCircuit
                            the instruction list of a static quantum circuit to compile
   Output:
    DynamicCircuit
                            the instruction list of the compiled dynamic quantum circuit
 1 Run Algorithm 6 to get the DAG representation Digraph of the circuit;
   Let Roots and Terminals be the set of roots and terminals, respectively;
   Run Algorithm 3 to get the biadjacency matrix B of the simplified DAG of the circuit;
   Calculate the candidate matrix CandidateMatrix as \neg (B^\top);
 5 Initialize empty lists MaxEdges, CandidatesNum, IdxTerminals, IdxRoots and SearchSpace;
 6 Let CandidatesNum be the sum of each row in CandidateMatrix;
 7 Let IdxTerminals be indices of the first L smallest non-zero values in CandidatesNum;
   foreach Index in IdxTerminals do
       Let R be indices of non-zero entries in the Index-th row of CandidateMatrix;
       Add an element \perp to R, and append R to IdxRoots;
10
11 Calculate the Cartesian product of sets in IdxRoots and record it in SearchSpace;
12 Delete any elements with repeated items (except \perp) from SearchSpace;
   foreach Solution in SearchSpace do
       Initialize an empty list AddedEdges;
14
       for i = 0 to L - 1 do
15
          if Solution[i] is not \bot then
16
             Add edge (Terminals[IdxTerminal[i]], Roots[Solution[i]]) to AddedEdges;
17
       if Digraph with AddedEdges is acyclic then
18
          Run Algorithm 1 with Digraph and AddedEdges to get a updated simplified DAG and
19
            then the updated candidate matrix CanSubMatrix;
          Run Algorithm 4 with CanSubMatrix to get the added edges Edges;
20
           Append Edges to AddedEdges;
21
          if the length of AddedEdges is larger than the length of MaxEdges then
22
              Set MaxEdges to AddedEdges;
23
24 Add MaxEdges to Digraph and get the ModifiedGraph;
```

**Proposition 12** For a static quantum circuit with n qubits and m operations, Algorithm 8 with hierarchy level L has a worst-case time complexity of  $O(n^L m^2 (n-L)^2)$ .

25 Run Algorithm 7 with the ModifiedGraph to get the compiled circuit DynamicCircuit;

26 return DynamciCircuit

**Proof** The primary complexity of this algorithm arises from the enumeration process (the second 'foreach' loop). In the worst-case, each terminal  $t \in T_E$  has (n-1) candidate roots, leading to a search space of size at most  $(n-1)^L$ . Within the search space, each solution undergoes a topological sorting to check whether DAG with additional edges is acyclic. This operation leverages the DFS algorithm, which carries a time complexity of O(m). Following the topological sorting step, Algorithm 1 is executed on the modified DAG to update both the simplified DAG and the candidate matrix of the remaining (n-L) terminals and roots, which demands  $O(m(n-L)^2)$  time as outlined in Proposition 3. Subsequently, the MRV heuristic algorithm is employed to identify edges that can be added between the remaining terminals and roots. Proposition 10 indicates that the MRV algorithm operating on a  $(n-L) \times (n-L)$  candidate matrix exhibits a worst-case time complexity of  $O(m(n-L)+(n-L)^3)$ . Consequently, the total time complexity of the hybrid algorithm is given by:  $O(n^L m^2(n-L)^2) + O(n^L m^2(n-L)) + O(n^L m(n-L)^3)$ , where the dominant factor is  $O(n^L m^2(n-L)^2)$  since m is typically larger than n.

### B Proofs of results

### B.1 Thm. 1

**Proof** Dynamic quantum circuit compilation through qubit-reuse seeks to delay certain reset operations until after the measurements, which is illustrated by the addition of a directed edge from a terminal to a root in the DAG representation. However, when incorporating new edges, it is imperative to adhere to the following constraints.

- 1. Resetting a qubit is possible only when all operations on it have been carried out. So the added edge should start from terminals. Moreover, the circuit width is determined by the number of roots in the DAG representation. To reduce the circuit width, the added edges should end at roots. Overall, the directed edges should be added from terminals to roots. This corresponds to the first condition in the asserted result.
- 2. A reused qubit can only accommodate one reset operation. So the added edges should have no common tails. Moreover, since the circuit width is determined by the number of roots in the DAG representation, the added edges having common heads will not help. So we can restrict our attention to the case that the added edges share no common heads. This corresponds to the second condition in the asserted result.
- 3. Since directed edges represent the execution order of operations, the presence of any cycle in the graph indicates a dependency of past operations on future ones, which violates the causal relation of the quantum circuit. Therefore, the addition of these directed edges must not introduce any cycles, indicating the compiled circuit is still well-defined. This corresponds to the third condition in the asserted result.

Conversely, for any graph manipulation that complies with the specified conditions, we can demonstrate that it corresponds to a dynamic circuit compilation approach. Given that the DAG representation preserves the execution order of quantum operations within the corresponding quantum circuit, employing topological sorting on the modified DAG enables us to establish a viable execution sequence. This sequence is then utilized to rearrange the circuit instruction list. Since our actions solely involve rewiring the original quantum circuit, the resulting compiled circuit maintains its equivalence to the static circuit. Ultimately, it's worth noting that all the conditions mentioned remain independent of whether we are working with a DAG or a simplified DAG representation.

### B.2 Prop. 2

**Proof** If the simplified DAG G = (R, T, E) is a complete bipartite graph, then any terminal is reachable from any root. Within such a graph, the inclusion of any additional edge from a terminal to a root necessarily introduces a directed cycle, thus violating the conditions. Therefore, the circuit width can not be reduced through qubit reuse. Conversely, if the simplified DAG is not complete, it indicates the absence of connections between certain roots and terminals. More specifically, there exists at least one terminal t that is not reachable from a root t. In this case, the directed edge (t, r) can be added to the graph, which does not introduce any cycles. Thereby, the corresponding circuit can be reduced to a circuit with a smaller width.

### B.3 Prop. 3

**Proof** Note that this approach combines three parts. First, we employ Algorithm 6 to derive the DAG representation of the given static circuit, which exhibits a time complexity of O(m). Then, we convert the DAG to a simplified DAG by using DFS. For each root-terminal pair (r,t) in the DAG, DFS algorithm is applied to explore the existence of a path from r to t. Let |R|, |T|, |V|, and |E| denote the number of roots, terminals, vertices, and edges in the DAG, respectively. It is clear that |R| = |T| = n, and |V| = m, |E| = O(m) (each instruction introduces at most two edges to DAG). Since running DFS once in such a graph takes O(|V| + |E|) = O(m) steps, and

29

the number of trials we need is  $|R| \times |T| = n^2$ . So this part takes at most  $O(mn^2)$  steps. Finally, after obtaining the reachability relation between all roots and terminals, we get the biadjacency matrix of the simplified DAG, which can be used to conclude the reducibility within  $n^2$  steps. Therefore, the overall time complexity required to determine the reducibility of a given static circuit is  $O(m) + O(mn^2) + O(n^2)$ , with the dominant term being  $O(mn^2)$ .

### B.4 Prop. 5

**Proof** Note that the sequence of qubit relations is equivalent to a subset of instructions involving interlacing qubits. This equivalence can further be represented as a directed path in the DAG representation, connecting the i-th root to the j-th terminal. Thus, any two qubits in the quantum circuit are mutually reachable if and only if a directed path exists in the DAG from any root to any terminal. This condition is equivalent to stating that the simplified DAG is complete. Finally, the proof is concluded by referring to Proposition 2.

### B.5 Prop. 6

**Proof** To determine the reducibility of a static circuit, it is necessary to analyze the reachability relation for each qubit. These reachability relations are stored in a list named the *ReachableSets*, which has a length of n, representing the circuit's width, with the i-th entry containing the reachable set  $Q_i$ . This initial process involves O(n) steps. Subsequently, a 'foreach' loop is executed over the 'StaticCircuit' list, which comprises m elements. Throughout this loop, all operations, except the union set calculation, exhibit a constant time complexity of O(1) as they are independent of the input size. However, the computational complexity for uniting two sets,  $S_1$  and  $S_2$ , amounts to  $O(|S_1| + |S_2|)$ . It is worth noting that the reachable set of a qubit contains a maximum of n elements, thus the union calculation is bounded by O(n). Consequently, the overall time complexity of the 'foreach' loop is O(mn). In the final step, which determines reducibility, another 'foreach' loop is executed, iterating over the *ReachableSets*, incurring an additional O(n) steps. In summary, the time complexity of Algorithm 2 is O(mn) + O(n) + O(n), with the dominant factor remaining O(mn).

### B.6 Prop. 7

**Proof** Let  $G_m$  be the simplified DAG of circuit  $\mathcal{C}_m$ . Then the (i,j) entry of  $B(\mathcal{C}_m)$  equals to one if and only if there is an edge from root  $r_i^m$  to terminal  $t_j^m$  in graph  $G_m$ . To study the simplified DAG of the composite circuit  $\mathcal{C}_1 \circ \mathcal{C}_2$ , we can connect all terminals  $t_k^1$  in graph  $G_1$  with their corresponding roots  $r_k^2$  in graph  $G_2$ . Therefore, a path from  $r_i^1$  to  $t_j^2$  exists if and only if there is an edge from  $r_i^1$  to  $t_k^1$  in graph  $G_1$ , followed by an edge from  $r_k^2$  to  $t_j^2$  in graph  $G_2$  for some intermediate k. This condition can be represented as  $\vee_{k=1}^n (B(\mathcal{C}_1)_{ik} \wedge B(\mathcal{C}_2)_{kj})$ , which is exactly the (i,j) entry of the Boolean matrix product  $B(\mathcal{C}_1) \odot B(\mathcal{C}_2)$ .

**Remark 9** Consider a quantum circuit  $\mathcal{C}$  comprising m two-qubit gates. Denote  $B_i$  as the biadjacency matrix of the subcircuits, each containing only a single two-qubit gate. Then the biadjacency matrix of  $\mathcal{C}$  is given by  $B = B_1 \cdots B_m$ . Since  $B_i^{\top} = B_i$ , we get

$$B = (B_m B_{m-1} \cdots B_1)^T. \tag{3}$$

This corresponds to the concept of a dual circuit as presented in [DCKFF22]. In this context, we reverse the sequence of gates and exchange the roles of state preparation and measurement. The equality mentioned above implies that the dual circuit shares the same biadjacency matrix as the original circuit. Therefore, the compilation strategies for both circuits can be the same.

### B.7 Prop. 8

**Proof** First, a 'foreach' loop is executed over the StaticCircuit list, encompassing m elements. Throughout this loop, all operations, except for the calculation of the entrywise OR, exhibit a

constant time complexity of O(1), as they are independent of input size. Since each column of B contains n elements, the entrywise OR takes O(n) steps. Consequently, the cumulative time complexity of the 'foreach' loop equates to O(mn). The last part that determines if a Boolean matrix is all-one matrix takes  $O(n^2)$  steps. In summary, the overall time complexity is  $O(mn) + O(n^2)$ , with the dominant factor remaining O(mn) (typically, m is larger than n).

### B.8 Prop. 13

The following result explores the connection between the reducibility of a quantum circuit and its subcircuits.

**Proposition 13** A static quantum circuit is irreducible if it contains an irreducible subcircuit. The reverse direction is not true. That is, there exists an irreducible static quantum circuit such that any of its subcircuit is reducible.



Figure 10: (Color online) An irreducible static quantum circuit such that any of its subcircuit is reducible.

**Proof** Note that any quantum circuit  $\mathcal{C}$  can be seen as a composition of a sequence of subcircuits with a single quantum gate  $\mathcal{C} = \mathcal{C}_1 \circ \cdots \circ \mathcal{C}_M$ . Let  $B_i$  be the biadjacency matrix of the simplified DAG of  $\mathcal{C}_i$ . The circuit  $\mathcal{C}$  has an irreducible subcircuit implies that there exists  $\mathcal{C}_{i_1} \circ \cdots \circ \mathcal{C}_{i_m}$  such that  $\bigcirc_{j=1}^m B_{i_j} = J$ . Then this implies  $\bigcirc_{i=1}^M B_i \geq \bigcirc_{j=1}^m B_{i_j} = J$ , where the inequality signifies that each entry of the left matrix is greater than or equal to the corresponding entry in the right matrix. This proves the first statement. The quantum circuit in Figure 10 is an irreducible circuit with 4 qubits and 5 CNOTs. However, the removal of any CNOT in this circuit will result in a reducible circuit.

### B.9 Prop. 9

**Proof** Let G = (R, T, E) be the simplified DAG of the circuit and  $R = \{r_0, ..., r_{n-1}\}$ ,  $T = \{t_0, ..., t_{n-1}\}$ . Then the adjacency matrix of the directed graph G, denoted as A(G), is a  $2n \times 2n$  block anti-diagonal matrix, where the first n rows/columns correspond to the n roots, and the last n rows/columns correspond to the n terminals. Since all edges in the original bipartite graph are pointing from roots to terminals, only the  $n \times n$  submatrix in the upper right corner has non-zero elements. Therefore, the adjacency matrix can be written in the form:

$$A(G) = \begin{pmatrix} O_n & B \\ O_n & O_n \end{pmatrix} \tag{4}$$

where B is the biadjacency matrix of the bipartite graph.

Note that the absence of directed edge from  $r_i$  to  $t_j$  in the original bipartite graph G indicates a candidate edge from  $t_j$  to  $r_i$  in the modified DAG. All these candidate edges are pointing from terminals to roots, therefore can be represented by the  $n \times n$  submatrix  $\bar{B}$  in the lower left corner of the adjacency matrix A(G). The submatrix  $\bar{B}$ , can be obtained by:

$$\bar{B} = \bar{\ } (B^{\top}) \tag{5}$$

which is the logical NOT of the transpose of matrix B.

We use a  $n \times n$  Boolean matrix F to represent all added edges, where the entry  $F_{ij} = 1$  if a directed edge from  $t_i$  to  $r_j$  is added to the graph, and 0 otherwise. Suppose that G' is the modified DAG, then the adjacency matrix of G' can be written as:

$$A(G') = \begin{pmatrix} O_n & B \\ F & O_n \end{pmatrix} \tag{6}$$

The constraints imposed in Theorem 1 when adding edges to G can be translated into the following constraints on matrix F:

1. The objective of maximizing the number of added edges is equivalent to maximizing the sum of all elements in matrix F:

$$\max \sum_{i,j=0}^{n-1} F_{ij} \tag{7}$$

2. All added edges should be selected from candidate edges, which means elements of matrix F should be selected from elements of matrix  $\bar{B}$  and can be expressed as:

$$F_{ij} \le \bar{B}_{ij}, \forall i, j \in \{0, 1, ..., n-1\}$$
 (8)

3. The added edges must not share common vertices, which implies the sum of each row/column of F can not exceed one:

$$\sum_{j=1}^{n} F_{ij} \le 1, \, \forall i \in \{0, 1, ..., n-1\}$$
(9)

$$\sum_{i=1}^{n} F_{ij} \le 1, \, \forall j \in \{0, 1, ..., n-1\}.$$

$$(10)$$

4. After incorporating these edges, the graph G' should remain acyclic. According to [Deo16, Theorem 9-17], this requirement is equivalent to the adjacency matrix of G' being nilpotent.

Finally, since the optimal value  $\alpha$  gives the maximum number of terminal-root pairs in the modified DAG, the remaining number of roots is  $n - \alpha$ , which is the width of the compiled circuit. This completes the proof.

### B.10 Prop. 10

The MRV heuristic algorithm comprises three key steps. Initially, Algorithm 6 and Algorithm 3 are applied to derive the DAG and the biadjacency matrix of the simplified DAG of the input static circuit and calculate the candidate matrix, which exhibits a time complexity of  $O(m) + O(mn) + O(n^2) = O(mn)$ . Subsequently, our MRV heuristic is applied on the candidate matrix of size  $n \times n$  to explore a feasible assignment. Since the maximum number of added edges is n-1, the 'while' loop iterates at most n-1 times. Within the loop, several operations are adapted to identify a candidate edge, where the row summation of the candidate matrix takes  $O(n^2)$  time, the candidate terminal identification takes O(n) times, while the candidate root identification involves traversing all candidate roots and summing of the corresponding columns, resulting in a worst-case time complexity of  $O(n^2)$ . Following this, the candidate matrix needs to be updated. Given that sets R and T contain a maximum of n-1 elements, this encompasses the update of at most  $(n-1)^2$  entries, which can be completed in  $O(n^2)$  time. Consequently, the time complexity of the entire 'while' loop is  $O(n^3) + O(n^2) + O(n^3) + O(n^3) = O(n^3)$ . Upon obtaining a list of added edges, Algorithm 7 is employed to compile the input static circuit, which requires O(mn)time. In summary, the total time complexity of the MRV heuristic algorithm is calculated as:  $O(mn) + O(n^3) + O(mn) = O(mn + n^3).$ 

### B.11 Prop. 11

**Proof** Similar to the MRV heuristic algorithm, both the initial step to get the candidate matrix and the final step to compile the input static circuit has a time complexity of O(mn), while the primary 'while' loop iterates at most n-1 times. However, the main difference between the greedy heuristic and the MRV heuristic lies in the candidate edge identification process. In the worst-case scenario, where the biadjacency matrix is an identity matrix and the candidate matrix has  $(n^2-n)$  non-zero entries. Each of these entries triggers the update of the candidate matrix and the summation of a matrix's elements, which both exhibit a time complexity of  $O(n^2)$ . As a result, the total time complexity of the scoring step becomes  $O(n^4)$ . Subsequently, both the identification of the entry with maximum scoring and the corresponding candidate matrix update can be completed within  $O(n^2)$  time. In summary, the total time complexity of the greedy heuristic algorithm 5 is evaluated as  $O(mn) + O(n^5) + O(n^3)$ , with the dominating term being  $O(mn) + O(n^5)$ .

# C Optimal compilations for important quantum circuits

### C.1 Frequently used quantum algorithms

### C.1.1 Deutsch-Jozsa algorithm

The Deutsch-Jozsa algorithm [DJ92] was the pioneering example of a quantum algorithm that outperforms classical counterparts. It exemplifies the advantages of using quantum computing for specific problems. The algorithm aims to determine whether a Boolean function  $f:\{0,1\}^n \to \{0,1\}$  is balanced or constant. This algorithm is represented by the following (n+1)-qubit quantum circuit. The initial state has the first n qubits set to  $|0\rangle$ , and the last qubit initialized as  $|1\rangle = X|0\rangle$ . A Hadamard gate is subsequently applied to each qubit. Following this, a quantum oracle  $U_f$  maps  $|x\rangle|y\rangle$  to  $|x\rangle|y\oplus f(x)\rangle$ . Finally, Hadamard gates are reapplied to the first n qubits, and all qubits are measured in the computational basis.



Figure 11: (Color online) The Deutsch-Jozsa algorithm for determing an n-bit Boolean function.

If the Boolean function f is constant, the quantum oracle  $U_f$  can be implemented using only single-qubit gates, making it trivially reducible to a quantum circuit with 1 qubit. In the case where the Boolean function f is balanced, multiple quantum circuit implementations are possible. For instance, the quantum oracle  $U_f$  can be realized using a quantum circuit depicted in Figure 12, that is, regardless of the single-qubit gates, applying a CNOT gate for each of the first n qubits, with the (n+1)-th qubit as the target.



Figure 12: A quantum circuit implementation of a balanced oracle for determining a 3-bit Boolean function. The empty box represents a single-qubit gate.

The following result gives an optimal compilation of the Deutsch-Jozsa algorithm.

**Proposition 14 (Deutsch-Jozsa)** The quantum circuit of the Deutsch-Jozsa algorithm for determining an n-bit Boolean function f contains n+1 qubits  $(n \geq 2)$ . This circuit is always reducible. If f is constant, the quantum circuit can be reduced to a dynamic quantum circuit with 1 qubit. Otherwise, it can be reduced to a dynamic quantum circuit with 2 qubits. In this case, the corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=0}^{n-2} E_{i,i+1}^{n+1}$ .

**Proof** If the function is constant, the quantum circuit only contains single-qubit gate. So it can be trivially reduced to a quantum circuit with 1. Now we prove the other case. For the quantum circuit implementation in Figure 12, the biadjacency matrix of the simplified DAG is given by

$$B = \sum_{i=0}^{n} \sum_{j=i}^{n} E_{i,j}^{n+1} + \sum_{i=0}^{n-1} E_{n,i}^{n+1}.$$
 (11)

So the candidate matrix is

$$C = \sum_{i=0}^{n-2} \sum_{j=i+1}^{n-1} E_{i,j}^{n+1}.$$
 (12)

For the optimal compilation, we need to select the maximum number of 1 elements in such a matrix under the conditions stated in Proposition 9. Without the nilpotent condition, the maximum number of 1 elements that simultaneously satisfies conditions (1b), (1c) and (1d) is n-1. Let us choose

$$F = \sum_{i=0}^{n-2} E_{i,i+1}^{n+1},\tag{13}$$

with the total sum of n-1. Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it an optimal solution in (1). Finally, by Algorithm 7, the compiled circuit has a circuit width 2.

#### C.1.2 Bernstein-Vazirani algorithm

The Bernstein-Vazirani algorithm [BV97] can be considered an extension of the Deutsch-Jozsa algorithm, demonstrating the advantages of using a quantum computer as a computational tool for more complex problems than the Deutsch-Jozsa problem. Instead of distinguishing between two different classes of functions, it is designed to recover a string encoded within a function. Specifically, when provided with an oracle implementing a function  $f: \{0,1\}^n \to \{0,1\}$  in which f(x) is promised to be the dot product between x and a secret string  $s \in \{0,1\}^n$  modulo 2, the objective is to determine the value of s. The Bernstein-Vazirani algorithm employs the same (n+1)-qubit quantum circuit as depicted in Figure 11. Furthermore, the quantum oracle can be realized by applying a CNOT gate to the corresponding qubit with the last qubit as the target for each bit in the string s that equals one. For example, if the bit string is s = 101, the quantum oracle is implemented as



Figure 13: A quantum circuit implementation of an oracle for  $f(x) = (x_0 + x_2) \mod 2$ .

The following result gives an optimal compilation of the Bernstein-Vazirani algorithm.

**Proposition 15 (Bernstein-Vazirani)** The quantum circuit of the Bernstein-Vazirani algorithm for determining an n-bit Boolean function f contains n+1 qubits  $(n \ge 2)$ . This circuit is always

reducible. If f is constant, the quantum circuit can be reduced to a dynamic quantum circuit with 1 qubit. Otherwise, it can be reduced to a dynamic quantum circuit with 2 qubits. In this case, the corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=0}^{n-2} E_{i,i+1}^{n+1}$ .

**Proof** If the secret string is all zero, then the function is constant and the quantum oracle contains no multi-qubit gates. This makes the circuit trivially reducible to one qubit. Otherwise, the secret string contains at least one non-zero element. So the quantum oracle has at least one CNOT gate. Without loss of generality, we can consider the case in which the secret string is all one. This is because the quantum circuit for all the other cases is a subcircuit of this one. In this case, the quantum circuit structure is the same as the Deutsch-Jozsa algorithm when omitting the single-qubit gates. So the rest of proof follows exactly the same as the proof of Proposition 14.

#### C.1.3 Simon's algorithm

Simon's algorithm [Sim97] marked a significant milestone as the first quantum algorithm to exhibit an exponential speed-up when compared to the best classical algorithm for a specific problem. This breakthrough served as a foundational inspiration for a family of quantum algorithms centered around the quantum Fourier transform, including the renowned Shor's factoring algorithm. In the problem Simon's algorithm addresses, we are provided with an oracle that is guaranteed to have either a one-to-one mapping (it maps a unique output to every input) or a two-to-one mapping (it maps two different inputs to a single unique output). The nature of this two-to-one mapping is determined by a secret bitstring s, where the function f(x) equals f(y) if and only if  $y = x \oplus s$ . The primary objectives are twofold: first, to decide whether the function f is one-to-one or two-to-one; and second, in the event it is determined to be two-to-one, to unveil the secret bitstring s. The quantum circuit of Simon's algorithm is given by



Figure 14: (Color online) The Simon's algorithm for determining an n-bit Boolean function.

There are many possible ways of implementing a desired two-to-one function. Here we can consider a specific choice that

$$f(x) = \begin{cases} x, & x_j = 0, \\ x \oplus s, & x_j = 1, \end{cases}$$
 (14)

where  $x_j$  is the j-th bit of x and j is the smallest index where the bit of s equals to one. For example, if s = 011, then j = 1. In this case, the quantum oracle  $U_f$  can be implemented by first performing a CNOT gate between i-th and n+i-th qubits for every  $i \in \{0, 1, \dots, n-1\}$  and then performing a CNOT gate on the j-th qubit and the n+k-th qubit whenever the k-th bit of s equals to one. For instance, given secret string s = 011, the quantum oracle  $U_f$  can be implemented by



Figure 15: A quantum circuit implementation of an oracle for secret string s=011.

**Proposition 16 (Simon)** The quantum circuit of the Simon's algorithm for determining an n-bit function f contains 2n qubits  $(n \geq 2)$ . This circuit is always reducible. It can be reduced to a dynamic quantum circuit with 3 qubits. In this case, the corresponding optimal solution for optimization (1) can be taken at  $E_{0,0}^2 \otimes \sum_{i=1}^{n-2} E_{i,i+1}^n + E_{1,1}^2 \otimes \sum_{i=0}^{n-2} E_{i,i+1}^n$ .

**Proof** Note that any quantum circuit is a subcircuit of the one given by a secret string with all-one values. So we can restrict our consideration in the later case, where a CNOT is applied between *i*-th and i+n-th qubits for every  $i \in \{0,1,\cdots,n-1\}$  and a CNOT is applied between 0-th and i+n-th qubits for every  $i \in \{0,1,\cdots,n-1\}$ . In this case, the biadjacency matrix of the simplified DAG is given by

$$B = (E_{0,0}^2 + E_{1,0}^2) \otimes \left(I_n + \sum_{i=1}^{n-1} E_{i,0}^n\right) + (E_{0,1}^2 + E_{1,1}^2) \otimes \sum_{i=0}^{n-1} \sum_{j=i}^{n-1} E_{i,j}^n.$$
 (15)

So the candidate matrix is

$$C = (E_{0,0}^2 + E_{0,1}^2) \otimes \left(\sum_{i=1}^{n-2} \sum_{j=i+1}^{n-1} E_{i,j}^n + \sum_{i=1}^{n-1} \sum_{j=0}^{i-1} E_{i,j}^n\right) + (E_{1,0}^2 + E_{1,1}^2) \otimes \sum_{i=0}^{n-2} \sum_{j=i+1}^{n-1} E_{i,j}^n.$$
(16)

Without the nilpotent condition, we can choose

$$F = E_{0,0}^2 \otimes \sum_{i=1}^{n-2} E_{i,i+1}^n + E_{1,1}^2 \otimes \sum_{i=0}^{n-2} E_{i,i+1}^n,$$
(17)

with the total sum of 2n-3. Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it a feasible solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width 3. Note that if we take n=2, the quantum circuit is a subcircuit for any quantum circuit with larger n. In the case of n=2, we can easily enumerate all possible compilation schemes and conclude that it requires at least 3 qubits in the compiled quantum circuit. This implies that any larger circuit will also require at least 3 qubits and concludes the optimality of our feasible solution in (17).

#### C.1.4 Quantum Fourier transform

The Fourier transform is a fundamental concept in classical computing with various applications, including signal processing, data compression, and complexity theory. In quantum computing, the quantum Fourier transform (QFT) serves as the quantum analog of the discrete Fourier transform and operates on the amplitudes of a quantum wavefunction. The QFT plays a pivotal role in numerous quantum algorithms, with its most notable appearances in Shor's factoring algorithm and quantum phase estimation. The quantum Fourier transform acts on a quantum state  $|x\rangle = \sum_{j=0}^{2^n-1} x_j |j\rangle$  and maps it to another quantum state  $|y\rangle = \sum_{k=0}^{2^n-1} y_k |k\rangle$  where  $y_k = \frac{1}{\sqrt{2^n}} \sum_{j=0}^{2^n-1} x_j e^{2\pi i j k/2^n}$ . Let  $R_k = |0\rangle\langle 0| + e^{2\pi i/2^k} |1\rangle\langle 1|$ . Then the quantum Fourier transform can be implemented by the following quantum circuit. For every  $j \in \{0,1,\cdots,n-1\}$ , apply a Hadamard gate on the j-th qubit and then apply a controlled- $R_k$  gate between the j-th qubit and the (j+k-1)-th qubit for every  $k \in \{2,\cdots,n-j\}$ . Finally, apply SWAP gates to reverse the order of the qubits. For instance, the quantum circuit for n=4 is illustrated below.

Since there is a two-qubit gate on any two qubits in the quantum circuit of the quantum Fourier transform, this quantum circuit is clearly irreducible by Proposition 5.

### C.1.5 Quantum phase estimation

The Fourier transform plays a crucial role in a general procedure known as phase estimation, which serves as a fundamental component in many quantum algorithms. Consider a unitary operator U that possesses an eigenvector  $|u\rangle$  with an associated eigenvalue of  $e^{2\pi i\theta}$ , where the precise value of  $\theta$  remains unknown. The objective of the phase estimation algorithm is to estimate the value of



Figure 16: A quantum circuit implementation of the quantum Fourier transform for n=4.

 $\theta$ . To conduct this estimation, we assume the availability of oracles capable of preparing the state  $|u\rangle$  and performing controlled- $U^{2^j}$  operations, with j being a non-negative integer. The quantum phase estimation procedure involves two registers. The first register comprises t qubits, initially set to  $|0\rangle$ . The choice of the value of t depends on two factors: the level of precision required for estimating  $\theta$ , and the desired probability of a successful phase estimation procedure. The second register begins in the state  $|u\rangle$  and accommodates a number of qubits sufficient to store  $|u\rangle$ . Phase estimation unfolds in two stages. First, the circuit commences by applying a Hadamard transform to the first register, followed by a series of controlled-U operations on the second register, each involving U raised to successive powers of two. The second stage of phase estimation involves the application of the inverse quantum Fourier transform. An example is provided in Figure 17.



Figure 17: (Color online) A quantum circuit implementation of the quantum phase estimation for t=4.

The original quantum circuit for quantum phase estimation necessitates the utilization of t+n qubits. It is important to observe that the reducibility of a quantum circuit is equivalent to the reducibility of its dual circuit, which is essentially the circuit with a reversed gate sequence. In the context of the phase estimation algorithm, the dual circuit is structured as follows: it begins with the quantum Fourier transform applied to the first t qubits, followed by a sequence of controlled-U operations conducted on each qubit within the initial t qubits and the subsequent n qubits. The quantum Fourier transform gives the biadjacency matrix as

$$B_{-1} = \begin{pmatrix} J_t & O \\ O & I_n \end{pmatrix} \tag{18}$$

For every controlled operation acting on the k-th qubit and the last n qubits, the biadjacency matrix is given by

$$B_k = I_{t+n} + \sum_{\substack{i \neq j \\ i,j \in \{k,t,t+1,\cdots,t+n-1\}}} (E_{i,j}^{t+n} + E_{j,i}^{t+n}).$$
(19)

According to Proposition 7, the biadjacency matrix for the dual circuit of quantum phase estimation is given by

$$B_{-1} \bigodot \left( \bigodot_{k=0}^{t-1} B_k \right), \tag{20}$$

which is an all-one matrix  $J_{t+n}$ . So the quantum circuit for quantum phase estimation is irreducible.

#### C.1.6 Shor's algorithm

Shor's algorithm [Sho94] is renowned for its ability to factor integers in polynomial time. This algorithm holds particular significance because the best-known classical algorithm for factoring requires more than polynomial time, and RSA (Rivest-Shamir-Adleman), the widely-used cryptographic protocol, relies on the assumption that factoring large integers is computationally infeasible. The factoring problem is equivalent to order-finding problem, that is, given positive integers x and N with x < N with no common factors, find the smallest positive integer r such that  $x^r = 1 \mod N$ . The quantum algorithm for order-finding is just the phase estimation algorithm applied to the unitary operator  $U|y\rangle = |xy \mod N\rangle$ . Since the quantum circuit for quantum phase estimation is irreducible in general, the quantum circuits for order-finding and consequently Shor's algorithm are also irreducible.

#### C.1.7 Grover's algorithm

Grover's algorithm [Gro96], also known as the quantum search algorithm, refers to a quantum algorithm for unstructured search that finds with high probability the unique input to a black box function that produces a particular output value, using just  $O(\sqrt{N})$  evaluations of the function, where N is the size of the function's domain. Grover's algorithm consists of three main algorithms steps as illustrated in Figure 18: state preparation, the oracle  $U_f$ , and the diffusion operator  $U_s$ . The first step, state preparation, involves creating the search space, encompassing all possible cases where the solution might exist. The oracle serves the crucial role of marking the correct answers we seek within the search space. It identifies the desired solutions, making them distinguishable for measurement in the final steps of the algorithm. The oracle and diffusion operator are iteratively applied in the quantum circuit, amplifying the presence of the marked solutions until they stand out for measurement.



Figure 18: (Color online) The Grover's algorithm.

In particular, the diffusion operator can be implemented by a multi-level controlled-CNOT gate wrapped by single-qubit gates on both sides. An illustration of the diffusion operator with n=4 qubits is given below. Since the quantum circuit involves an n-qubit gate, it is clearly irreducible by Proposition 7.



Figure 19: A quantum circuit implementation of the diffusion operator for 4-qubit Grover's algorithm. Each empty box represents a single-qubit gate.

### C.1.8 Quantum counting algorithm

Quantum counting algorithm [BHT98] is a quantum algorithm designed to efficiently determine the number of solutions to a given search problem. While Grover's algorithm focuses on finding a specific solution within the oracle, the Quantum counting algorithm provides us with the ability to count how many solutions are present. The quantum circuit underlying the quantum counting

algorithm essentially performs quantum phase estimation on the Grover iterator  $U_sU_f$ . As discussed in Section C.1.5, the quantum circuit for quantum counting is inherently irreducible due to the irreducibility of quantum phase estimation.

### C.1.9 Quantum Ripple Carry Adder

Here we focus on the Quantum Ripple Carry Adders initially proposed in [CSK08]. These circuits are composed of 3k + 1 qubits where the additional qubit is used to store the carry bit and rely on CNOT and Toffoli gates as foundational components. The implementation of a 2-bit quantum adder circuit is illustrated in Figure 20.



Figure 20: (Color online) The implementation of a 2-bit quantum ripple carry adder circuit to calculate  $a_2a_1 + b_2b_1$ .

The optimal compilation of quantum ripple carry adder with n=3k+1 qubits is given in the following proposition.

**Proposition 17 (Quantum ripple carry adder.)** A quantum ripple carry adder circuit with n qubits is always reducible. The corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=0}^{n-5} E_{i,i+4}^n$  for general cases and  $E_{1,0}$  for n=4. Consequently, the circuit can be compiled to an equivalent dynamic circuit with 4 qubits for  $n \geq 5$  and 3 qubits for n=4.

**Proof** For a quantum ripple carry adder circuit with 4 qubits, the candidate matrix only has one non-zero entry, therefore the optimal compilation gives 3 qubits. For a quantum adder circuit with n = 3k + 1 > 4 qubits, the biadjacency matrix of the simplified DAG is given by

$$B = \sum_{i=0}^{3} \sum_{j=0}^{n-1} E_{i,j}^{n} + \sum_{m=1}^{k-1} \sum_{i=3m+1}^{3m+3} \sum_{j=3m}^{n-1} E_{i,j}^{n} - \sum_{j=0}^{k-1} \sum_{i=0}^{3j} E_{i,3j+1}^{n}.$$
 (21)

The candidate matrix can be further computed as

$$C = \sum_{i=1}^{k} \sum_{j=0}^{n-3i-1} E_{n-3i,j}^{n} + \sum_{m=0}^{k-2} \sum_{i=3m}^{3m+2} \sum_{j=3m+4}^{n-1} E_{i,j}^{n}.$$
 (22)

We need to select as many as 1 elements within the conditions stated in Proposition 9. Without taking into account the nilpotent condition, we can choose

$$F = \sum_{i=0}^{n-5} E_{i,i+4}^n + E_{n-3,l}$$
 (23)

where  $l \in [0, n-4]$ . However, upon closer examination, it becomes evident that each of these candidate edges  $E_{n-3,l}$  conflicts with the edge  $E_{n-5,n-1}$  due to the nilpotent condition. Therefore, our selection is constrained to

$$F = \sum_{i=0}^{n-5} E_{i,i+4}^n \tag{24}$$

we can verify that the adjacency matrix with such a selection is indeed nilpotent, making it the optimal solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width 4. This can be understood as the quantum ripple carry adder employs Toffoli gates, therefore requiring a minimum of three qubits, while an additional qubit serving as a carry-over between distinct adders.

# C.2 Ansatzs in quantum machine learning

#### C.2.1 Frequently used entanglement structures

Parameterized Quantum Circuits (PQCs) serve as foundational components in quantum machine learning. These circuits usually consist of alternating rotation and entanglement layers. In the rotation layers, single-qubit rotation gates are applied to all qubits. Subsequently, the entanglement layers employ a predefined strategy to create entanglement among the qubits using two-qubit gates. Each of these layers can be repeated multiple times to achieve the desired level of entanglement. Several commonly used entanglement layers include the following [SJAG19]:

- A linearly entangled layer of n qubits means that each qubit i is entangled with the subsequent qubit i+1 for  $i \in \{0,1,\dots,n-2\}$ . An example with 4 qubits is given in Figure 21a;
- A circularly entangled layer of n qubits is a linearly entangled layer with an additional entangling gate between qubit n-1 and qubit 0. An example with 4 qubits is given in Figure 21b;
- A pairwisely entangled layer of n qubits consists of two sub-layers. In the first sub-layer, qubit i is entangled with qubit i+1 for all even values of i, and in the second sub-layer, qubit i is entangled with qubit i+1 for all odd values of i. An example with 4 qubits is illustrated in Figure 21c;
- A fully entangled layer of n qubits means that each qubit i for  $i \in \{0, 1, \dots, n-1\}$  is entangled with qubit j for  $j \in \{i, i+1, \dots, n-1\}$ . An example with 4 qubits is illustrated in Figure 21d.

Utilizing Proposition 7 and Theorem 9, we present the analysis of the reducibility and the optimal compilation for quantum circuits employing the four entanglement structures.

**Proposition 18 (Linearly entangled quantum circuit)** A linearly entangled quantum circuit with  $n \geq 2$  qubits and  $l \geq 1$  linear layers is reducible if and only if  $l \leq n-2$ . The corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=0}^{n-l-2} E_{i,i+l+1}^n$ . Consequently, the circuit can be compiled to an equivalent dynamic circuit with l+1 qubits.

**Proof** For a linearly entangled quantum circuit with layer l = 1, the biadjacency matrix of the simplified DAG is given by

$$B = \sum_{i=0}^{n-1} \sum_{j=\max(i-1,0)}^{n-1} E_{i,j}^{n}.$$
 (25)

For any integer  $l \geq 1$ , we can easily check that

$$B^{\odot l} = \sum_{i=0}^{n-1} \sum_{j=\max(i-l,0)}^{n-1} E_{i,j}^{n}.$$
 (26)



Figure 21: (Color online) Examples of four entanglement structures widely used in quantum machine learning. All single-qubit gates are omitted. Each layer in the dashed box can repeat l times.

Therefore,  $B^{\odot l}$  is an all-one matrix if and only if  $l \geq n-1$ . By Proposition 7, the biadjacency matrix for the multi-layer quantum circuit is exactly given by  $B^{\odot l}$ . So the circuit is irreducible if and only if  $l \geq n-1$ , or equivalently, it is reducible if and only if  $l \leq n-2$ .

Moreover, we can compute the candidate matrix for the circuit with l layers as

$$C = \sum_{i=0}^{n-l-2} \sum_{j=i+l+1}^{n-1} E_{i,j}^{n}.$$
 (27)

For the optimal compilation, we need to select the maximum number of 1 elements in such a matrix under the conditions in Proposition 9. Without the nilpotent condition, the maximum number of 1 elements that simultaneously satisfies conditions (1b), (1c) and (1d) is n - l - 1. Let us choose

$$F = \sum_{i=0}^{n-l-2} E_{i,i+l+1}^n, \tag{28}$$

with the total sum of n-l-1. Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it an optimal solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width l+1.

**Proposition 19 (Circularly entangled quantum circuit)** A circularly entangled quantum circuit with  $n \geq 2$  qubits and  $l \geq 1$  circular layers is reducible if and only if  $n \geq 4$  and l = 1. The corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=1}^{n-3} E_{i,i+2}^n$ . Consequently, the circuit can be compiled to an equivalent dynamic circuit with 3 qubits.

**Proof** The proof is similar to the proof of Proposition 18. For a circularly entangled quantum circuit with layer l = 1, the biadjacency matrix of the simplified DAG is given by

$$B = \sum_{i=0}^{n-1} \sum_{j=\max(i-2,0)}^{n-1} E_{i,(j+1 \pmod{n})}^{n}.$$
 (29)

which is an all-one matrix if  $n \leq 3$ . Moreover, for any  $n \geq 4$  we can check that  $B^{\odot 2}$  becomes an all-one matrix. So the circuit is reducible if and only if  $n \geq 4$  and l = 1.

Moreover, we can compute the candidate matrix for the circuit with l=1 layer as

$$C = \sum_{i=1}^{n-3} \sum_{j=i+2}^{n-1} E_{i,j}^{n}.$$
 (30)

For the optimal compilation, we need to select the maximum number of 1 elements in such a matrix under the conditions stated in Proposition 9. Without the nilpotent condition, the maximum number of 1 elements that simultaneously satisfies conditions (1b), (1c) and (1d) is n-l-1. Let us choose

$$F = \sum_{i=1}^{n-3} E_{i,i+2}^n. \tag{31}$$

with the total sum of n-3. Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it an optimal solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width 3.

**Proposition 20 (Pairwisely entangled quantum circuit)** A pairwisely entangled quantum circuit with  $n \geq 2$  qubits and  $l \geq 1$  pairwise layers is reducible if and only if  $l \leq \lceil n/2 \rceil - 1$ . In this case, a feasible solution for optimization (1) can be taken at  $\sum_{i=0}^{n-(2l+2)} E_{i,i+2l+1}^n$  which is optimal when l > (n-2)/4. Consequently, the circuit can be compiled to an equivalent dynamic circuit with 2l+1 qubits.

**Proof** The proof is also similar to the proof of Proposition 18. For a pairwisely entangled quantum circuit with layer l = 1, the biadjacency matrix of the simplified DAG is given by

$$B = \sum_{\substack{i,j=0\\|i-j|<1}}^{n-1} E_{i,j}^n + \sum_{\substack{i=0\\i \text{ even}}}^{n-3} E_{i,i+2}^n + \sum_{\substack{i=0\\i \text{ odd}}}^{n-3} E_{i+2,i}^n$$
(32)

For any integer  $l \geq 1$ , we can check that

$$B^{\odot l} = \sum_{\substack{i,j=0\\|i-j| \le 2l-1}}^{n-1} E_{i,j}^n + \sum_{\substack{i=0\\i \text{ even}}}^{n-(2l+1)} E_{i,i+2l}^n + \sum_{\substack{i=0\\i \text{ odd}}}^{n-(2l+1)} E_{i+2l,i}^n.$$
(33)

Therefore,  $B^{\odot l}$  is an all-one matrix if and only if  $l \geq \lceil n/2 \rceil$ . By Proposition 7, the biadjacency matrix for the multi-layer quantum circuit is exactly given by  $B^{\odot l}$ . So the circuit is irreducible if and only if  $l \geq \lceil n/2 \rceil$ , or equivalently, it is reducible if and only if  $l \leq \lceil n/2 \rceil - 1$ .

Moreover, we can compute the candidate matrix for the circuit with l layers as  $C = C_L + C_R$  where

$$C_L = \sum_{\substack{i,j=0\\i-j\geq 2l+1}}^{n-1} E_{i,j}^n + \sum_{\substack{i=0\\i \text{ odd}}}^{n-(2l+1)} E_{i+2l,i}^n$$
(34)

$$C_R = \sum_{\substack{i,j=0\\j-i \ge 2l+1}}^{n-1} E_{i,j}^n + \sum_{\substack{i=0\\i \text{ even}}}^{n-(2l+1)} E_{i,i+2l}^n$$
(35)

Note that  $C_L$  and  $C_R$  have non-zero values only at the lower-left and upper-right corners of the matrix C. If l > (n-2)/4, the row and column indices of the non-zero entries of  $C_L$  and  $C_R$  have no intersection. So any candidate edge in  $C_R$  will contradict to any candidate edge in  $C_L$  because of the acyclic condition. If the optimal solution takes any value in  $C_R$  (or  $C_L$ ), then all candidate edges are taken from  $C_R$  (or  $C_L$ ).

For the optimal compilation, we need to select the maximum number of 1 elements in such a matrix under the conditions stated in Proposition 9. Without the nilpotent condition, the

maximum number of 1 elements that simultaneously satisfies conditions (1b), (1c) and (1d) is n - (2l + 1). Let us choose

$$F = \sum_{i=0}^{n-(2l+2)} E_{i,i+2l+1}^{n}.$$
(36)

with the total sum of n-(2l+1). Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it an optimal solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width 2l+1. Note that when  $l \leq (n-2)/4$ , the above choice of F is also a feasible solution. This completes the proof.

Proposition 21 (Fully entangled quantum circuit) A fully entangled quantum circuit with  $n \ge 2$  qubits and  $l \ge 1$  full layers is always irreducible.

**Proof** It is clear that for fully entangled quantum circuit, its biadjacencdiamond-structuredy matrix of the simplified DAG is always an all-one matrix. Therefore, the circuit is irreducible. Alternatively, the irreducibility can be easily seen from Proposition 5.

## C.2.2 Diamond-structured quantum circuits

The Hartree-Fock method is a widely used approach in quantum chemistry for determining the electronic structure and energy of molecules and atoms. In [Goo20] the authors performed a VQE simulation of the binding energy of  $H_6$ ,  $H_8$ ,  $H_{10}$ ,  $H_{12}$  hydrogen chains where they used a variational ansatz based on basis rotations to prepare the Hartree-Fock state. Figure 22 illustrates an example of the diamond-structured basis rotation circuit used for the  $H_6$  chain.



Figure 22: (Color online) The diamond-structured quantum circuit for the  $H_6$  chain. Each box with a rotation angle  $\theta$  represents a double qubit rotation gate.

The subsequent proposition provides the optimal compilation of the diamond-structured circuit designed to prepare the Hartree-Fock state.

**Proposition 22 (Diamond-structured quantum circuit)** A diamond-structured circuit for linear chain of 2n hydrogen atoms with 2n qubits is always reducible. The corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=0}^{n-2} E_{i,i+n+1}^{2n}$ . Consequently, the circuit can be compiled to an equivalent dynamic circuit with n+1 qubits.

**Proof** For a diamond-structured circuit with 2n qubits, the biadjacency matrix of the simplified DAG is given by

$$B = \sum_{i=0}^{n-1} \sum_{j=0}^{i+n} E_{i,j}^{2n} + \sum_{i=n}^{2n-1} \sum_{j=i-n}^{2n-1} E_{i,j}^{2n}$$
(37)

We can further compute the candidate matrix for the circuit as  $C = C_L + C_R$  where

$$C_L = \sum_{i=n+1}^{2n-1} \sum_{j=0}^{i-n-1} E_{i,j}^{2n}$$
(38)

$$C_R = \sum_{i=0}^{n-2} \sum_{j=i+n+1}^{2n-1} E_{i,j}^{2n}$$
(39)

It is worth noting that the matrices  $C_L$  and  $C_R$  contain non-zero entries only at the lower-left and upper-right corners. Since there is no overlap in the row and column indices of the non-zero entries between  $C_L$  and  $C_R$ , any candidate edge in  $C_L$  would conflict with any candidate edge in  $C_R$  because of the acyclic condition. Consequently, if the optimal solution selects any edge from  $C_R$  (or  $C_L$ ), it necessitates that all candidate edges are drawn exclusively from  $C_R$  (or  $C_L$ ).

To achieve the optimal compilation, our objective is to maximize the number 1 elements in such a matrix under the conditions stated in Proposition 9. In the absence of the nilpotent condition, we have the flexibility to select

$$F = \sum_{i=0}^{n-2} E_{i,i+n+1}^{2n} \tag{40}$$

with the total sum of n-1. Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it an optimal solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width n+1, which completes the proof.

# C.3 Measurement-based quantum computation

Measurement-based quantum computation (MBQC) constitutes an alternative quantum computation model [BBD+09]. This model guides the computation process by measuring a portion of the qubits within an entangled state, while the remaining unmeasured qubits evolve correspondingly. The MBQC computation can be divided into three primary steps. The initial step involves preparing a resource state, which is a highly entangled many-body quantum state. This state can be pre-generated offline and is independent of specific computational tasks. Subsequently, the second step entails sequentially performing single-qubit measurements on each qubit of the prepared resource state. These measurements might be adaptive, meaning that later measurement choices can depend on the outcomes of prior measurements. In the third step, classical data processing is applied to the measurement outcomes in order to derive the necessary computational results. An illustration of the resource state is provided in Figure 23(a), where the grid signifies a widely used quantum resource state referred to as the cluster state. Within this grid, each vertex represents a qubit (typically initialized as a plus state  $|+\rangle = (|0\rangle + |1\rangle)/\sqrt{2}$ ), while edges connecting the vertices symbolize controlled-phase gates operating on the linked qubits.

Measurement-based quantum computation plays a pivotal role in blind quantum computation, a cloud-based quantum computing scheme that enables users to conduct calculations privately in a quantum network without disclosing any information to the server [Fit17]. This capability holds great promise for the future quantum internet. Nonetheless, the standard approach of quantum computation in cloud servers typically operates in the quantum circuit model. Integrating an MBQC algorithm into this model necessitates a considerable number of qubits. Specifically, MBQC applied to a cluster state of size (w,d)—where w signifies the number of rows and d signifies the number of columns—can be translated directly into a quantum circuit of size wd. We refer to this circuit as a cluster-state quantum circuit of size (w,d). As shown in Figure 23(b), an initial Hadamard gate is applied to each qubit to prepare a plus state. Subsequently, controlled-phase gates are employed based on the topology of the grid. Finally, a measurement is taken on each qubit. It is important to note that measurements are typically applied sequentially to the qubits column by column. Measurements on the same column are independent, while measurements on later columns may depend on the results of the previous columns.

The reducibility of cluster-state quantum circuits depends on the order of the controlled-phase gates. The following result is based on the specific order in which we implement the controlled-phase gates. First, we implement the gates in the first column, then those between the first and



Figure 23: (Color online) An example of cluster state and cluster-state quantum circuit.

second columns, and so on. Additionally, the qubits in the first column of a cluster state can be initialized in an entangled quantum state.

The following result demonstrates that this circuit can be effectively reduced to a dynamic quantum circuit with w+1 qubits, which is independent of the parameter d. This reduction significantly reduces the resource requirements for running MBQC algorithms on quantum computers that are primarily designed for circuit-based operations.

**Proposition 23 (Cluster-state quantum circuit.)** A cluster-state quantum circuit of size (w, d) is reducible for any  $d \geq 2$ . The corresponding optimal solution for optimization (1) can be taken at  $\sum_{i=0}^{wd-(w+2)} E_{i,i+w+1}^{wd}$  in general. Consequently, the circuit can be compiled to an equivalent dynamic circuit with w+1 qubits.

**Proof** For a cluster-state quantum circuit of size (w, d), its biadjacency matrix of the simplified DAG is given by a block matrix:

$$B = \sum_{i=1}^{d-1} E_{i,i-1}^d \otimes I_w + \sum_{i=0}^{d-1} \sum_{j=0}^{d-i-1} E_{i,i+j}^d \otimes D_j$$
(41)

where each block

$$D_k = \sum_{i=0}^{w-1} \sum_{j=\max(i-k-1,0)}^{w-1} E_{i,j}^w.$$
(42)

We can further compute the candidate matrix as

$$C = \sum_{i=0}^{d-1} \sum_{j=0}^{i} E_{i,i-j}^{d} \otimes G_j + \sum_{i=0}^{d-2} E_{i,i+1}^{d} \otimes (J_w - I_w) + \sum_{i=0}^{d-1} \sum_{j=i+2}^{d-1} E_{i,j}^{d} \otimes J_w,$$
(43)

where

$$G_k = \sum_{i=0}^{w-1} \sum_{j=i+k+2}^{w-1} E_{i,j}^w. \tag{44}$$

We need to select as many as 1 elements within the conditions stated in Proposition 9. Without the nilpotent condition, we can choose

$$F = \sum_{i=0}^{wd - (w+2)} E_{i,i+w+1}^{wd}.$$
 (45)

which simultaneously satisfies conditions (1b), (1c) and (1d). Then we can easily check that the adjacency matrix with such a selection is indeed nilpotent, making it a feasible solution in (1). Finally, by Algorithm 7, the compiled circuit has circuit width w + 1. Since the quantum state in the first column of qubits can be assigned to an arbitrary global quantum state in general, this requires at least w qubits. Moreover, no matter which qubit in the first column to measure first, we need at least one more qubit to apply for the controlled-phase gate between the qubit to measure and the adjacent qubit on its right. This makes the circuit has at least w + 1 qubits, indicating the optimality of our compilation.

**Remark 10** Similar compilations also work for measurement-based quantum computation with other graph states, such as the brickwork state used in [BFK09].

# D Implementation of DCKF algorithm

Since the DCKF algorithm described in [DCKFF22] is not publicly available, we implemented this algorithm based on our interpretation of their results. We observed that the causal cone of an output qubit  $q_i$  in their algorithm corresponds exactly to the set of roots that can reach the terminal of  $q_i$ . We effectively obtained this information by identifying all non-zero indices in the i-th column of the biadjacency matrix of the quantum circuit.

Our initial step involved executing Algorithm 3 to obtain the biadjacency matrix and compute causal cones for all outputs. Subsequently, we applied their greedy strategy to determine the measurement order and identify how qubits are reused. It is important to note that, in order to complete a measurement, all qubits within its causal cone that have not yet been measured should be activated, while registers of all previously measured qubits are available for reuse.

Therefore, when a qubit  $q_i$  needed to be activated, we first identified all unoccupied registers from the existing ones. If an available register existed,  $q_i$  was loaded onto the first available one. Conversely, if all registers were occupied, a new register was initialized to accommodate  $q_i$ . Additionally, after the measurement of a qubit, the register it occupied was recycled for subsequent use. Upon the completion of all measurements in the input static circuit, the sequence in which registers were occupied was translated into a list of edges, which were subsequently incorporated into the DAG representation of the circuit. Finally, the compilation process was completed by applying Algorithm 7. The full implementation of the DCKF algorithm is provided in Algorithm 9.

# E Experimental Results Continued

### E.1 Experimental evaluation of hybrid algorithm

To demonstrate the tradeoff between solution optimality and computational time complexity, we assessed the performance and runtime of the hybrid algorithm (Algorithm 8) at various hierarchy levels (0, 1, 2, 3) for max-cut QAOA circuits on U3R graphs with p = 1. The numerical experiments were conducted on a MacBook Pro (2020) with an Intel i5 processor and 16GB memory. Figure 24 depicts the average compiled circuit width and algorithm runtime over 20 random graphs for each qubit number. As expected, the results show that as the hierarchy level increases, we attain better compilations. However, the algorithm runtime also increases proportionally. Therefore, the hybrid algorithm shall be a suitable choice for scenarios where minimizing the compiled circuit width is of top priority, provided that the runtime remains acceptable.

### E.2 Experimental evaluation of random circuits

Figure 25(a) depicts the complete results of the reducibility factor of random circuits. Notably, for all evaluated cases, our greedy heuristic outperforms the DCKF algorithm in approximately 87.6% of cases, with a strict advantage over 49.6% of random circuits. Additionally, a comparative analysis of the performance between our greedy heuristic and the improved DCKF algorithm with the first qubit search technique on random circuits is depicted in Figure 25(b). In cases where the

### Algorithm 9: DCKF Algorithm

```
Input:
     StaticCircuit
                                the instruction list of a static quantum circuit
   Output:
     DynamicCircuit
                                the instruction list of the compiled dynamic quantum circuit
 1 Let n be the width of the static quantum circuit;
 2 Run Algorithm 6 to get the DAG representation Digraph of the circuit;
 3 Let Roots and Terminals be the set of roots and terminals, respectively;
 4 Run Algorithm 3 to get the biadjacency matrix B of the simplified DAG of the circuit;
 5 Initialize empty lists CausalCones, Register, Occupation, AddedEdges and an empty set C_q;
 6 Initialize a set UnmeasuredQubits as \{0, 1, ..., n-1\};
   for i = 0 to n - 1 do
       Let C be the set of row indices of non-zero entries in the i-th column of B;
       Append C to CausalCones;
10 for t = 0 to n - 1 do
       Initialize ConeSize as n + 1;
11
       {\bf foreach}\ {\it Qubit}\ {\bf in}\ {\it UnmeasuredQubits}\ {\bf do}
12
           Calculate Union = C_q \cup CausalCones[Qubit];
13
           {f if} the length of Union < ConeSize {f then}
14
               Set ConeSize to the length of Union;
15
               Set NextMeasure to Qubit;
16
       Calculate Active Qubits as Causal Cones[NextMeasure] - Measure Order;
17
       foreach Qubit in ActiveQubits do
18
           if Qubit not in Register then
19
               Record all indeices of None in Register in Available;
20
              if Available is not empty then
21
                  Record Available[0] as Address;
22
                  Record the last element in Occupation[Address] as t;
23
                  Append an edge (Terminals[t], Roots[Qubit]) to AddedEdges;
24
                  Append Qubit to Occupation[Address];
25
              else
26
                  Set Address to the length of Register;
27
                  Append a list [Qubit] to Occupation;
28
              Set Register[Address] to Qubit;
29
       Identify the index of NextMeasure in Register as m and set Register[m] to None;
30
       Append NextMeasure to MeasureOrder;
31
       Remove NextMeasure from UnmeasuredQubits;
32
       Calculate C_q = C_q \cup CausalCones[NextMeasure];
34 Add AddedEdges to Digraph and get the ModifiedGraph;
35 Run Algorithm 7 to get the compiled circuit DynamicCircuit;
36 return DynamicCircuit
```

reducibility factor is relatively large, our algorithm showcases superiority in approximately 91.2% of instances, maintaining a strict advantage in over 36% of random circuits.

### E.3 Noisy simulation

To further demonstrate the practical efficacy of the proposed methods, we design a noisy simulation of an 11-qubit Bernstein-Vazirani (BV) algorithm, specifically targeting the real-world 11-qubit trapped-ion quantum computer reported in [WBD $^+$ 19]. The logical-physical qubit mapping used and the probability of getting the correct outcome in each circuit are listed in Table 2 below. For dynamic circuit compilation, we always reuse the logical qubit  $q_0$  for other qubits. This example is for an illustrative purpose under certain assumptions. We assume that CNOT gates admit the same performance as the native X-X gates. In addition, we utilize depolarizing noise as the noise



Figure 24: (Color online) Compiled circuit width (left) and algorithm runtime in seconds (right) as a function of the original circuit width of max-cut QAOA with p=1, compiled by hybrid algorithm 8 with different hierarchy levels. The ploted error bars in the left figure correspond to the maximum and the minimum compiled width over 20 evaluated instances. The runtime in the right figure represents the average runtime.



Figure 25: (Color online) (a) The reducibility factor of the random circuits evaluated using our greedy heuristic algorithm 5 and the DCKF algorithm; (b) The reducibility factor of the random circuits evaluated using our greedy heuristic algorithm 5 and the improved DCKF algorithm (DCKF + first qubit search).

model to simulate the reported error rate, which may not perfectly replicate real-world conditions. Nonetheless, the simulation can be more fine-grained for interested readers and the experiments can be readily conducted on quantum hardware when available.

Table 2: The mapping of logical qubits (q) to physical qubits (Q) and the probability of obtaining the correct outcome in each circuit.

|         | $q_0$ | $q_1$ | $q_2$ | $q_3$ | $q_4$ | $q_5$ | $q_6$ | $q_7$    | $q_8$    | $q_9$    | $q_{10}$ | probability |
|---------|-------|-------|-------|-------|-------|-------|-------|----------|----------|----------|----------|-------------|
| BV_11   | $Q_3$ | $Q_0$ | $Q_2$ | $Q_4$ | $Q_5$ | $Q_6$ | $Q_7$ | $Q_8$    | $Q_9$    | $Q_{10}$ | $Q_1$    | 66.2%       |
| BV_10   | $Q_3$ | $Q_0$ | $Q_4$ | $Q_5$ | $Q_6$ | $Q_7$ | $Q_8$ | $Q_9$    | $Q_{10}$ | $Q_1$    | -        | 66.7%       |
| $BV\_9$ | $Q_3$ | $Q_0$ | $Q_4$ | $Q_6$ | $Q_7$ | $Q_8$ | $Q_9$ | $Q_{10}$ | $Q_1$    | -        | -        | 68.5%       |
| $BV\_8$ | $Q_3$ | $Q_0$ | $Q_4$ | $Q_6$ | $Q_7$ | $Q_8$ | $Q_9$ | $Q_1$    | -        | -        | -        | 69.1%       |
| $BV_7$  | $Q_3$ | $Q_0$ | $Q_4$ | $Q_7$ | $Q_8$ | $Q_9$ | $Q_1$ | -        | -        | -        | -        | 70.5%       |
| $BV\_6$ | $Q_3$ | $Q_0$ | $Q_4$ | $Q_7$ | $Q_9$ | $Q_1$ | -     | -        | -        | -        | -        | 71.5%       |
| $BV\_5$ | $Q_3$ | $Q_0$ | $Q_7$ | $Q_9$ | $Q_1$ | -     | -     | -        | -        | -        | -        | 73.1%       |
| $BV\_4$ | $Q_3$ | $Q_0$ | $Q_7$ | $Q_1$ | -     | -     | -     | -        | -        | -        | -        | 73.6%       |
| $BV\_3$ | $Q_3$ | $Q_0$ | $Q_1$ | -     | -     | -     | -     | -        | -        | -        | -        | 74.1%       |
| BV_2    | $Q_3$ | $Q_1$ | -     | -     | -     | -     | -     | -        | -        | -        | -        | 74.3%       |

# F Open Problems

In this section, we provide some detailed discussion of several open problems related to our work.

## F.1 Compiling dynamic quantum circuit

The most significant open problem pertains to extending the approach outlined in the main text to dynamic quantum circuits. Specifically, we can consider to begin with a dynamic quantum circuit and seek to compile it into a circuit with a smaller width. However, this extension introduces greater complexities, as it necessitates determining whether a given dynamic quantum circuit can be further reduced in terms of qubit count. This problem is intricately linked to the optimality of quantum circuit compilation. Specifically, consider the decision problem of whether a given static circuit can be compiled into a dynamic circuit of a given size k. Then the optimization version of the problem reduces to this decision problem through a binary search running logarithmically in the size of the original circuit. Moreover, the task of determining if a dynamic quantum circuit can be further reduced is equivalent to questioning if the corresponding static circuit can be reduced to a circuit of size smaller than the current compilation. Combining these arguments, we can conclude that determining the reducibility of a dynamic circuit is as hard as finding the optimal compilation of a static circuit.

Therefore, it is crucial to emphasize that the applicability of our reducibility checking approaches, as proposed in Propositions 3, 5 and 8, is currently limited to static circuits. We provide an illustrative example below to highlight the challenges and considerations involved in compiling a dynamic quantum circuit. Examining the dynamic circuit illustrated in Figure 26a and the corresponding DAG representation in Figure 26b it becomes apparent that any terminal vertex is reachable from any root vertex. Consequently, it shall be concluded that this dynamic circuit is irreducible.



Figure 26: (Color online) A dynamic quantum circuit and its DAG representation.

Nevertheless, further exploration reveals that the dynamic circuit can actually be compiled into the dynamic circuit shown in Figure 27, with the number of qubits reduced by one.



Figure 27: (Color online) The dynamic quantum circuit reduced from the dynamic circuit in Figure 26a.

Interestingly, the dynamic circuit depicted in Figure 27 can be regarded as a dynamic circuit compilation outcomes of the static circuit in Figure 28a. As shown in Figure 28b, these two compilation schemes can be implemented by adding dashed orange edges and dashed green edges to the DAG representation of the static circuit in Figure 28a, respectively. Furthermore, when dealing with the compilation of dynamic quantum circuits, special consideration is imperative for classically controlled gates as these may introduce additional dependencies within the DAG of quantum circuits. Therefore, a reasonable starting point in the compilation of a dynamic circuit is to initiate the conversion of the dynamic circuit into an equivalent static circuit. Following this conversion, the methodology in the main text can be employed to explore the existence of an enhanced compilation scheme in comparison to the original dynamic circuit.





(a) The original reducible static circuit.

(b) Different compilation schemes.

Figure 28: (Color online) A static quantum circuit that can be reduced to both the dynamic circuit in Figure 26a and 27. The compilation scheme indicated by the dashed orange edges results in the dynamic circuit shown in Figure 26a, while the compilation scheme by the dashed green edges leads to the circuit in Figure 27.

The complete algorithm for converting a dynamic circuit to a static one is provided in Algorithm 10. The central concept involves assigning a new quantum register for each reset operation in the dynamic circuit. To elaborate, we traverse the dynamic circuit's instructions, and whenever we encounter a reset operation, we allocate a new qubit based on the current circuit width. All operations performed on the reset qubit after this reset operation are subsequently transferred to the newly allocated qubit. Through this process, the dynamic circuit can be transformed into an equivalent static circuit, essentially representing the reverse of our dynamic circuit compilation procedure. It is worth noting that Algorithm 10 can be adapted to accommodate dynamic circuits that feature classically controlled quantum gates with slight modification.

```
Algorithm 10: Converting dynamic circuit to static circuit
   Input:
                               the instruction list of a dynamic quantum circuit
     DynamicCircuit
   Output:
    StaticCircuit
                               the instruction list of the compiled quantum circuit
 1 Let n be the dynamic quantum circuit width;
   Initialize two empty lists StaticCircuit and Measurements;
   foreach instruction in DynamicCircuit do
       if instruction is a measurement then
          Append instruction to Measurement;
       else if instruction is a reset then
 6
          Record the value in QUBIT of instruction as ResetQubit;
          Initialize an empty list PostResets;
 8
          Record all instructions subsequent to instruction in Dynamic Circuit to PostResets;
 9
          Remove all instructions subsequent to instruction from Dynamic Circuit;
10
          foreach PostInstruction in PostResets do
11
              foreach Qubit in QUBIT of PostInstruction do
12
                 if Qubit is equal to ResetQubit then Update Qubit to n;
13
          Update n to (n+1);
14
          Append PostReset to DynamicCircuit;
15
       else if instruction is other single-/two-qubit gate then
16
          Append instruction to StaticCircuit;
17
18 Append Measurement to StaticCircuit;
19 return StaticCircuit
```

## F.2 NP-hardness of the dynamic circuit compilation problem

We have noted a remarkable similarity between the graph optimization problem explored in this work and the well-established Maximum Acyclic Subgraph (MAS) problem in graph theory. The MAS problem for a graph (V, E) involves identifying a subgraph (V, E') with  $E' \subseteq E$ , such that it contains no cycles and has the maximum number of edges. Our problem can be reformulated as a constrained version of the MAS problem. Let (V, E) be the simplified DAG and  $\bar{E}$  be the set of candidate edges. We can first incorporate all candidate edges into the simplified DAG and try to identify the maximum acyclic subgraph in the resultant graph  $(V, E \cup \bar{E})$  under the additional constraint that only edges in set  $\bar{E}$  can be removed. Suppose the solution to this problem yields a subgraph  $(V, E \cup \bar{E}')$  where  $\bar{E}' \subseteq \bar{E}$ , then the maximum matching in the graph  $(V, \bar{E}')$  corresponds to a feasible solution to our problem. It is worth noting that the MAS problem has been well-established as an NP-hard problem [Kar72]. Therefore, we expect that the optimal compilation of a quantum circuit may similarly belong to the class of NP-hard. However, the formal proof of its NP-hardness remains open. Additionally, several algorithms have been proposed in the literature to find approximate solutions for the MAS problem [HR94, CP20], which may offer valuable insights for seeking approximate solutions to the dynamic circuit compilation.