# Quantum channels

## Table of Contents

- [Introduction](#introduction)
- [Basics of channels](#basics-of-channels)
    - [Channels are linear mappings](#channels-are-linear-mappings)
    - [Channels transform density matrices into density matrices](#channels-transform-density-matrices-into-density-matrices)
    - [Unitary operations as channels](#unitary-operations-as-channels)
    - [Convex combinations of channels](#convex-combinations-of-channels)
    - [Examples of qubit channels](#examples-of-qubit-channels)
- [Channel representations](#channel-representations)
    - [Stinespring representations](#stinespring-representations)
    - [Kraus representations](#kraus-representations)
    - [Choi representations](#choi-representations)
- [Equivalence of the representations](#equivalence-of-the-representations)
    - [Overview of the proof](#overview-of-the-proof)
    - [First implication: channels to Choi matrices](#first-implication-channels-to-choi-matrices)
    - [Second implication: Choi representation to Kraus representation](#second-implication-choi-representation-to-kraus-representation)
    - [Third implication: from Kraus to Stinespring representations](#third-implication-from-kraus-to-stinespring-representations)
    - [Fourth implication: Stinespring representation back to the definition](#fourth-implication-stinespring-representation-back-to-the-definition)


## [Introduction](#introduction)
In the general formulation of quantum information, operations on quantum states are represented by a special class of mappings called channels. This includes useful operations, such as ones corresponding to unitary gates and circuits, as well as operations we might prefer to avoid, such as noise. We can also describe measurements as channels, which we'll do in the next lesson. In short, any discrete-time change in states that is physically realizable (in an idealized sense) can be described by a channel.

The term channel comes to us from information theory, which (among other things) studies the information-carrying capacities of noisy communication channels. In this context, a quantum channel could specify the quantum state that's received when a given quantum state is sent, perhaps through a quantum network of some sort. It should be understood, however, that the terminology merely reflects this historical motivation and is used in a more general way. Indeed, we can describe a wide variety of things (such as complicated quantum computations) as channels, even though they have nothing to do with communication and would be unlikely to arise naturally in such a setting.

In mathematical terms, channels are linear mappings from density matrices to density matrices that satisfy certain requirements. Because channels are linear mappings from matrices to matrices — as opposed to linear mappings from vectors to vectors — we'll require some additional mathematical machinery to describe them in general. We'll see that channels can, in fact, be described mathematically in a few different ways, including representations named in honor of three individuals who played key roles in their development: Stinespring, Kraus, and Choi. Together, these different ways of describing channels offer different angles from which they can be viewed and analyzed.

We'll begin the lesson with a discussion of some basic aspects of channels along with small selection of examples, and then we'll move on to Stinespring, Kraus, and Choi representations of channels later in the lesson. In the final section of the lesson we'll see that, although these representations are different, they all offer equivalent mathematical characterizations of channels.



## [Basics of channels](#basics-of-channels)
Throughout this lesson we'll use uppercase Greek letters, including $\Phi $ and $\Psi$ as well as some other letters in specific cases, to refer to channels. Every channel $\Phi$ has an input system and an output system, and we'll typically use the name $X$ to refer to the input system and $Y$ to refer to the output system. It's common that the output system of a channel is the same as the input system, and in this case we can use the same letter $X$ to refer to both.



### [Channels are linear mappings](#channels-are-linear-mappings)

Channels are described by *linear* mappings, just like probabilistic operations in the standard formulation of classical information and unitary operations in the simplified formulation of quantum information.

If a channel $\Phi$ is performed on an input system $X$ whose state is described by a density matrix $\rho$, then the output system of the channel is described by the density matrix $\Phi(\rho)$. In the situation in which the output system of $\Phi$ is also $X$, we can simply view that the channel represents a change in the state of $X$, from $\rho$ to $\Phi(\rho)$. When the output system of $\Phi$ is a different system $Y$ rather than $X$, it should be understood that $Y$ is a new system that is created by the process of applying the channel, and that the input system $X$ is no longer available once the channel is applied — as if the channel itself transformed $X$ into $Y$, leaving it in the state $\Phi(\rho)$.

The assumption that channels are described by *linear* mappings can be viewed as being an axiom — or in other words, a basic postulate of the theory rather than something that is proved. We can, however, see the need for channels to act linearly on convex combinations of density matrix inputs in order for them to be consistent with probability theory and what we've already learned about density matrices.

To be more specific, suppose that we have a channel $\Phi$ and we apply it to a system when it’s in one of the two states represented by the density matrices $\rho$ and $\sigma$. If we apply the channel to $\rho$ we obtain the density matrix $\Phi(\rho)$ and if we apply it to $\sigma$ we obtain the density matrix $\Phi(\sigma)$. Thus, if we randomly choose the input state of $X$ to be $\rho$ with probability $p \in [0,1]$ and $\sigma$ with probability $1 - p$, we’ll obtain the output state $\Phi(\rho)$ with probability $p$ and $\Phi(\sigma)$ with probability $1 - p$, which we represent by a weighted average of density matrices as $p\Phi(\rho) + (1 - p)\Phi(\sigma)$. On the other hand, we could alternatively think about the input state of the channel as being represented by the weighted average $p\rho + (1 - p)\sigma$, in which case the output is $\Phi(p\rho + (1 - p)\sigma)$. It’s the same state regardless of how we choose to think about it, so we must have

$$
\Phi(p\rho + (1 - p)\sigma) = p\Phi(\rho) + (1 - p)\Phi(\sigma).
$$

Whenever we have a mapping that satisfies this condition for every choice of density matrices $\rho$ and $\sigma$ and scalars $p \in [0,1]$, there’s always a unique way to extend that mapping to every matrix input (i.e., not just density matrix inputs) so that it’s linear.



### [Channels transform density matrices into density matrices](#channels-transform-density-matrices-into-density-matrices)

Naturally, in addition to being linear mappings, channels must also transform density matrices into density matrices. If a channel $\Phi$ is applied to an input system while it’s in a state represented by a density matrix $\rho$, then we obtain a system whose state is represented by $\Phi(\rho)$, which must be a valid density matrix in order for us to interpret it as a state.

It is critically important, though, that we consider a more general situation, where a channel $\Phi$ transforms a system $X$ into a system $Y$ in the presence of an additional system $Z$ (to which nothing happens). That is, if we start with the pair of systems $(Z,X)$ in a state described by some density matrix and then apply $\Phi$ just to $X$, transforming it into $Y$, we must obtain a density matrix describing a state of the pair $(Z,Y)$.

We can describe in mathematical terms how a channel $\Phi$ having an input system $X$ and an output system $Y$ transforms a state of the pair $(Z,X)$ into a state of $(Z,Y)$ when nothing is done to $Z$. To keep things simple, we’ll assume that the classical state set of $Z$ is $\{0, \ldots, m - 1\}$. This allows us to write an arbitrary density matrix $\rho$, representing a state of $(Z,X)$, in the following form:

$$
\rho = \sum_{a,b=0}^{m-1} |a\rangle\langle b| \otimes \rho_{a,b}, \quad
\rho = \begin{pmatrix}
\rho_{0,0} & \rho_{0,1} & \cdots & \rho_{0,m-1} \\
\rho_{1,0} & \rho_{1,1} & \cdots & \rho_{1,m-1} \\
\vdots & \vdots & \ddots & \vdots \\
\rho_{m-1,0} & \rho_{m-1,1} & \cdots & \rho_{m-1,m-1}
\end{pmatrix}
$$

On the right-hand side of this equation we have a block matrix, which we can think of as a matrix of matrices except that the inner parentheses are removed. This leaves us with an ordinary matrix that can alternatively be described using Dirac notation as we have in the middle expression. Each matrix $\rho_{a,b}$ has rows and columns corresponding to the classical states of $X$, and these matrices can be determined by a simple formula:

$$
\rho_{a,b} = \left( \langle a| \otimes \mathbb{I}_X \right) \rho \left( |b\rangle \otimes \mathbb{I}_X \right)
$$

Note that these are not density matrices in general — it’s only when they’re arranged together to form $\rho$ that we obtain a density matrix. The following equation describes the state of $(Z, Y)$ that is obtained when $\Phi$ is applied to $X$.

$$
\sum_{a,b=0}^{m-1} |a\rangle\langle b| \otimes \Phi(\rho_{a,b}) =
\begin{pmatrix}
\Phi(\rho_{0,0}) & \Phi(\rho_{0,1}) & \cdots & \Phi(\rho_{0,m-1}) \\
\Phi(\rho_{1,0}) & \Phi(\rho_{1,1}) & \cdots & \Phi(\rho_{1,m-1}) \\
\vdots & \vdots & \ddots & \vdots \\
\Phi(\rho_{m-1,0}) & \Phi(\rho_{m-1,1}) & \cdots & \Phi(\rho_{m-1,m-1})
\end{pmatrix}
$$

Notice that in order to evaluate this expression for a given choice of $\Phi$ and $\rho$, we must understand how $\Phi$ works as a linear mapping on non-density matrix inputs, as each $\rho_{a,b}$ generally won’t be a density matrix on its own. The equation is consistent with the expression $(\mathbb{I}_Z \otimes \Phi)(\rho)$, in which $\mathbb{I}_Z$ denotes the identity channel on the system $Z$. This presumes that we’ve extended the notion of a tensor product to linear mappings from matrices to matrices, which is straightforward but isn’t really essential to the lesson and won’t be explained further.

Reiterating a statement made above, in order for a linear mapping $\Phi$ to be a valid channel it must be the case that for every choice for $Z$ and every density matrix $\rho$ of the pair $(Z,X)$ we always obtain a density matrix when $\Phi$ is applied to $X$. In mathematical terms, the property a mapping must possess to be a channel are that it must be trace-preserving — so that the matrix we obtain by applying the channel has trace equal to one — as well as completely positive — so that the resulting matrix is positive semidefinite. These are both important properties that can be considered and studied separately, but it isn’t critical for the sake of this lesson to consider these properties in isolation. There are in fact linear mappings that always output a density matrix when given a density matrix as input, but fail to map density matrices to density matrices for compound systems, so we do eliminate some linear mappings from the class of channels in this way. (The linear mapping given by matrix transposition is the simplest example.)

We have an analogous formula to one above in the case that the two systems $X$ and $Z$ are swapped, so that $\Phi$ is applied to the system on the left rather than the right.

$$
(\Phi \otimes \mathbb{I}_Z)(\rho) = \sum_{a,b=0}^{m-1} \Phi(\rho_{a,b}) \otimes |a\rangle\langle b|
$$

This assumes that $\rho$ is a state of $(X,Z)$ rather than $(Z,X)$. This time the block matrix description doesn’t work because the matrices $\rho_{a,b}$ don’t fall into consecutive rows and columns in $\rho$, but it’s the same underlying mathematical structure.

Any linear mapping that satisfies the requirement that it always transforms density matrices into density matrices, even when it's applied to just one part of a compound systems, represents a valid channel. So, in an abstract sense, the notion of a channel is determined by the notion of a density matrix together with the assumption that channels act linearly. In this regard, channels are analogous to unitary operations in the simplified formulation of quantum information, which are precisely the linear mappings that always transform quantum state vectors to quantum state vectors for a given system; as well as to probabilistic operations (represented by stochastic matrices) in the standard formulation of classical information, which are precisely the linear mappings that always transform probability vectors into probability vectors.




### [Unitary operations as channels](#unitary-operations-as-channels)

Suppose $X$ is a system and $U$ is a unitary matrix representing an operation on $X$. The channel $\Phi$ that describes this operation on density matrices is defined as follows for every density matrix $\rho$ representing a quantum state of $X$.

$$
\Phi(\rho) = U\rho U^\dagger \tag{1}
$$

This action, where we multiply by $U$ on the left and $U^\dagger$ on the right, is commonly referred to as conjugation by $U$.



This description is consistent with the fact that the density matrix that represents a given quantum state vector $|\psi\rangle$ is $|\psi\rangle\langle\psi|$. In particular, if the unitary operation $U$ is performed on $|\psi\rangle$, then the output state is represented by the vector $U|\psi\rangle$, and so the density matrix describing this state is equal to

$$
(U|\psi\rangle)(U|\psi\rangle)^\dagger = U|\psi\rangle \langle\psi| U^\dagger.
$$

Once we know that, as a channel, the operation $U$ has the action $|\psi\rangle\langle\psi| \rightarrow U|\psi\rangle \langle\psi| U^\dagger$ on pure states, we can conclude by linearity that it must work as is specified by the equation (1) above for any density matrix $\rho$.

The particular channel we obtain when we take $U = I$ is the identity channel $\text{Id}$, which we can also give a subscript (such as $\text{Id}_X$ as we’ve already encountered) when we wish to indicate explicitly what system this channel acts on. Its output is always equal to its input: $\text{Id}(\rho) = \rho$. This might not seem like an interesting channel, but it's actually a very important one, and it's fitting that this is our first example. The identity channel is the perfect channel in some contexts, representing an ideal memory or a noiseless transmission of information from a sender to a receiver.

Every unitary channel is indeed a valid channel. Conjugation by a matrix $U$ gives us a linear map — and if $\rho$ is a density matrix of a system $(Z,X)$ and $U$ is unitary, then the result, which we can express as

$$
(\mathbb{I}_Z \otimes U)\rho(\mathbb{I}_Z \otimes U^\dagger),
$$

is also a density matrix. Specifically, this matrix must be positive semidefinite, for if $\rho = M^\dagger M$ then

$$
(\mathbb{I}_Z \otimes U)\rho(\mathbb{I}_Z \otimes U^\dagger) = K^\dagger K
$$

for $K = M(\mathbb{I}_Z \otimes U^\dagger)$, and it must have unit trace by the cyclic property of the trace:

$$
\text{Tr}((\mathbb{I}_Z \otimes U)\rho(\mathbb{I}_Z \otimes U^\dagger)) = \text{Tr}((\mathbb{I}_Z \otimes U^\dagger)(\mathbb{I}_Z \otimes U)\rho) = \text{Tr}((\mathbb{I}_Z \otimes \mathbb{I}_X)\rho) = \text{Tr}(\rho) = 1
$$

### [Convex combinations of channels](#convex-combinations-of-channels)

Suppose we have two channels $\Phi_0$ and $\Phi_1$ that share the same input system and the same output system. For any real number $p \in [0, 1]$, we could decide to apply $\Phi_0$ with probability $p$ and $\Phi_1$ with probability $1 - p$, which gives us a new channel that can be written as $p\Phi_0 + (1 - p)\Phi_1$. Explicitly, the way that this channel acts on a given density matrix is specified by the following simple equation.

$$
(p\Phi_0 + (1 - p)\Phi_1)(\rho) = p\Phi_0(\rho) + (1 - p)\Phi_1(\rho)
$$

More generally, if we have channels $\Phi_0, \ldots, \Phi_{m-1}$ and a probability vector $(p_0, \ldots, p_{m-1})$, then we can average these channels together to obtain a new channel.

$$
\sum_{k=0}^{m-1} p_k \Phi_k
$$

This is a convex combination of channels, and we always obtain a valid channel through this process. A simple way to say this in mathematical terms is that, for a given choice of an input and output system, the set of all channels is a convex set.

As an example, we could choose to apply one of a collection of unitary operations to a certain system. We obtain what’s known as a mixed unitary channel, which is a channel that can be expressed in the following form.

$$
\Phi(\rho) = \sum_{k=0}^{m-1} p_k U_k \rho U_k^\dagger
$$

Mixed unitary channels in which all of the unitary operations are Pauli matrices (or tensor products of Pauli matrices) are called Pauli channels, and are commonly encountered in quantum computing.

### [Examples of qubit channels](#examples-of-qubit-channels)

Now we'll take a look at a few specific examples of channels that aren't unitary. For all of these examples, the input and output systems are both single qubits, which is to say that these are examples of qubit channels.

### The qubit reset channel

This channel does something very simple: it resets a qubit to the $|0\rangle$ state. As a linear mapping this channel can be expressed as follows for every qubit density matrix $\rho$:

$$
\Lambda(\rho) = \text{Tr}(\rho)|0\rangle\langle0|
$$

Although the trace of every density matrix $\rho$ is equal to 1, writing the channel in this way makes it clear that it's a linear mapping that could be applied to any $2 \times 2$ matrix, not just a density matrix. As we already observed, we need to understand how channels work as linear mappings on non-density matrix inputs to describe what happens when they're applied to just one part of a compound system.

For example, suppose that $A$ and $B$ are qubits and together the pair $(A, B)$ is in the Bell state $|\phi^+\rangle$. As a density matrix, this state is given by

$$
|\phi^+\rangle\langle\phi^+| = \begin{pmatrix}
\frac{1}{2} & 0 & 0 & \frac{1}{2} \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
\frac{1}{2} & 0 & 0 & \frac{1}{2}
\end{pmatrix}
$$

Using Dirac notation we can alternatively express this state as follows.

$$
|\phi^+\rangle\langle\phi^+| = \frac{1}{2}|0\rangle\langle0| \otimes |0\rangle\langle0| + \frac{1}{2}|0\rangle\langle1| \otimes |0\rangle\langle1| + \frac{1}{2}|1\rangle\langle0| \otimes |1\rangle\langle0| + \frac{1}{2}|1\rangle\langle1| \otimes |1\rangle\langle1|
$$

By applying the qubit reset channel to $A$ and doing nothing to $B$ we obtain the following state.

$$
\frac{1}{2}\Lambda(|0\rangle\langle0|) \otimes |0\rangle\langle0| + \frac{1}{2}\Lambda(|0\rangle\langle1|) \otimes |0\rangle\langle1| + \frac{1}{2}\Lambda(|1\rangle\langle0|) \otimes |1\rangle\langle0| + \frac{1}{2}\Lambda(|1\rangle\langle1|) \otimes |1\rangle\langle1|
$$

$$
= \frac{1}{2}|0\rangle\langle0| \otimes |0\rangle\langle0| + \frac{1}{2}|0\rangle\langle0| \otimes |0\rangle\langle1| + \frac{1}{2}|0\rangle\langle0| \otimes |1\rangle\langle0| + \frac{1}{2}|0\rangle\langle0| \otimes |1\rangle\langle1| = |0\rangle\langle0| \otimes \frac{\mathbb{I}}{2}
$$

It might be tempting to say that resetting $A$ has had an effect on $B$ — but in some sense it's actually the opposite. Prior to $A$ being reset, the reduced state of $B$ was the completely mixed state, and that doesn't change as a result of resetting $A$.

### The completely dephasing channel

Here's an example of a qubit channel called $\Delta$, described by its action on $2 \times 2$ matrices:

$$
\Delta\begin{pmatrix}
\alpha_{00} & \alpha_{01} \\
\alpha_{10} & \alpha_{11}
\end{pmatrix} =
\begin{pmatrix}
\alpha_{00} & 0 \\
0 & \alpha_{11}
\end{pmatrix}
$$

In words, $\Delta$ zeros out the off-diagonal entries of a $2 \times 2$ matrix. This example can be generalized to arbitrary systems, as opposed to qubits: for whatever density matrix is input, the channel zeros out all of the off-diagonal entries and leaves the diagonal alone.

This channel is called the *completely dephasing channel*, and it can be thought of as representing an extreme form of the process known as *decoherence* — which essentially ruins quantum superpositions and turns them into classical probabilistic states.

Another way to think about this channel is that it describes a standard basis measurement on a qubit, where an input qubit is measured and then discarded, and where the output is a density matrix describing the measurement outcome. (Alternatively, but equivalently, we can imagine that the measurement outcome is discarded, leaving the qubit in its post-measurement state.)

Let us again consider an e-bit, and see what happens when $\Delta$ is applied to just one of the two qubits. Specifically, we have qubits $A$ and $B$ for which $(A, B)$ is in the state $|\phi^+\rangle$, and this time let's apply the channel to the second qubit. Here's the state we obtain.

$$
\frac{1}{2}|0\rangle\langle0| \otimes \Delta(|0\rangle\langle0|) + \frac{1}{2}|0\rangle\langle1| \otimes \Delta(|0\rangle\langle1|) + \frac{1}{2}|1\rangle\langle0| \otimes \Delta(|1\rangle\langle0|) + \frac{1}{2}|1\rangle\langle1| \otimes \Delta(|1\rangle\langle1|)
$$

$$
= \frac{1}{2}|0\rangle\langle0| \otimes |0\rangle\langle0| + \frac{1}{2}|1\rangle\langle1| \otimes |1\rangle\langle1|
$$

Alternatively we can express this equation using block matrices.

$$
\Delta\begin{pmatrix}
\frac{1}{2} & 0 & 0 & \frac{1}{2} \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
\frac{1}{2} & 0 & 0 & \frac{1}{2}
\end{pmatrix} =
\begin{pmatrix}
\frac{1}{2} & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & \frac{1}{2}
\end{pmatrix}
$$

We can also consider a qubit channel that only slightly dephases a qubit, as opposed to completely dephasing it, which is a less extreme form of decoherence than what is represented by the completely dephasing channel. In particular, suppose that $\epsilon \in (0, 1)$ is a small but nonzero real number. We can define a channel

$$
\Delta_\epsilon = (1 - \epsilon) \text{Id} + \epsilon \Delta,
$$

which transforms a given qubit density matrix $\rho$ like this:

$$
\Delta_\epsilon(\rho) = (1 - \epsilon)\rho + \epsilon \Delta(\rho).
$$

That is, nothing happens with probability $1 - \epsilon$, and with probability $\epsilon$ the qubit dephases. In terms of matrices, this action can be expressed as follows, where the diagonal entries are left alone and the off-diagonal entries are multiplied by $1 - \epsilon$:

$$
\rho =
\begin{pmatrix}
\langle0|\rho|0\rangle & \langle0|\rho|1\rangle \\
\langle1|\rho|0\rangle & \langle1|\rho|1\rangle
\end{pmatrix}
\mapsto
\begin{pmatrix}
\langle0|\rho|0\rangle & (1 - \epsilon)\langle0|\rho|1\rangle \\
(1 - \epsilon)\langle1|\rho|0\rangle & \langle1|\rho|1\rangle
\end{pmatrix}
$$

---

### The completely depolarizing channel

Here's another example of a qubit channel called $\Omega$:

$$
\Omega(\rho) = \text{Tr}(\rho)\frac{\mathbb{I}}{2}
$$

Here $\mathbb{I}$ denotes the $2 \times 2$ identity matrix. In words, for any density matrix input $\rho$, the channel $\Omega$ outputs the completely mixed state. It doesn’t get any noisier than this! This channel is called the *completely depolarizing channel*, and like the completely dephasing channel it can be generalized to arbitrary systems in place of qubits.

We can also consider a less extreme variant of this channel where depolarizing happens with probability $\varepsilon$, similar to what we saw for the dephasing channel.

$$
\Omega_\varepsilon(\rho) = (1 - \varepsilon)\rho + \varepsilon \Omega(\rho).
$$

## [Channel representations](#channel-representations)

Next we’ll discuss mathematical representations of channels, starting with a basic issue that was suggested at the start of the lesson. Linear mappings from vectors to vectors can be represented by matrices in a familiar way, where the action of the linear mapping is described by matrix-vector multiplication. But channels are linear mappings from matrices to matrices, not vectors to vectors. So, *in general*, how can we express channels in mathematical terms?

For some channels we may have a simple formula that describes them, like for the three examples of non-unitary qubit channels described in the previous section. But an arbitrary channel may not have such a nice formula, so it isn’t practical in general to express a channel in this way. As a point of comparison, in the simplified formulation of quantum information we use *unitary matrices* to represent operations on quantum state vectors: every unitary matrix represents a valid operation and *every* valid operation can be expressed as a unitary matrix. In essence, the question being asked is: How is this done for channels?

The answer to this question is that there are in fact multiple ways to represent channels in mathematical terms. We’ll discuss three specific ways of representing channels, named after three individuals whose work was important to their development: Stinespring, Kraus, and Choi.



### [Stinespring representations](#stinespring-representations)

Stinespring representations are based on the idea that every channel can be implemented in a standard way, where an input system is first combined with an initialized workspace system, forming a compound system; then a unitary operation is performed on the compound system; and finally the workspace system is discarded (or traced out), leaving the output of the channel.

The following figure depicts such an implementation in the form of a circuit diagram, for a channel whose input and output system is denoted by system $X$.

![Stinespring-X-to-X.png](attachment:Stinespring-X-to-X.png)

Note that in this diagram the wires represent arbitrary systems, as indicated by the labels above the wires, and not necessarily single qubits.

In words, the way the implementation works is as follows. The input system $\mathbf{X}$ begins in some state $\rho$, while a workspace system $\mathbf{W}$ is initialized to the standard basis state $|0\rangle$. (We’re presuming that $0$ is a classical state of $\mathbf{W}$ and we choose it to be the initialized state of this system, which will help to simplify the mathematics. One could, however, choose any fixed pure state to represent the initialized state of $\mathbf{W}$ without changing the basic properties of the representation.) A unitary operation $U$ is performed on the pair $(\mathbf{W}, \mathbf{X})$, and finally the workspace system $\mathbf{W}$ is *traced out*, leaving $\mathbf{X}$ as the output. In the diagram, the *ground* symbol commonly used in electrical engineering indicates explicitly that $\mathbf{W}$ is discarded.

A mathematical expression of the resulting channel $\Phi$ is as follows.

$$
\Phi(\rho) = \text{Tr}_\mathbf{W}(U(|0\rangle \langle 0|_\mathbf{W} \otimes \rho) U^\dagger)
$$

Notice that, as usual, we’re using Qiskit’s ordering convention: the system $\mathbf{X}$ is on top in the diagram, and therefore corresponds to the right-hand tensor factor in the formula.

In general, the input and output systems of a channel need not be the same. Here’s a figure depicting an implementation of a channel $\Phi$ whose input system is $\mathbf{X}$ and whose output system is $\mathbf{Y}$.


![Stinespring-X-to-Y.png](attachment:Stinespring-X-to-Y.png)

This time the unitary operation transforms $(\mathbf{W}, \mathbf{X})$ into a pair $(\mathbf{G}, \mathbf{Y})$, where $\mathbf{G}$ is a new “garbage” system that gets traced out, leaving $\mathbf{Y}$ as the output system. In order for $U$ to be unitary, it must be a square matrix. This requires that the pair $(\mathbf{G}, \mathbf{Y})$ has the same number of classical states as the pair $(\mathbf{W}, \mathbf{X})$, and so the systems $\mathbf{W}$ and $\mathbf{G}$ must be chosen in a way that allows this. We obtain a similar mathematical expression of the resulting channel $\Phi$ to what we had before.

$$
\Phi(\rho) = \text{Tr}_\mathbf{G}(U(|0\rangle \langle 0|_\mathbf{W} \otimes \rho) U^\dagger)
$$

When a channel is described in this way, as a unitary operation along with a specification of how the workspace system is initialized and how the output system is selected, we say that it is expressed in **Stinespring form** or that it's a **Stinespring representation** of the channel. It's not at all obvious, but every channel does in fact have a Stinespring representation, as we will see by the end of the lesson. We’ll also see that Stinespring representations aren’t unique; there will always be different ways to implement the same channel in the manner that’s been described.

> **Remark.** In the context of quantum information, term *Stinespring representation* commonly refers to a slightly more general expression of a channel having the form 
>
> $$
> \Phi(\rho) = \text{Tr}_\mathbf{G}(A \rho A^\dagger)
> $$
>
> for an **isometry** $A$, which is a matrix whose columns are orthonormal but that might not be a square matrix. For Stinespring representations having the form that we’ve adopted as a definition, we can obtain an expression of this other form by taking $A = U(|0\rangle_\mathbf{W} \otimes \mathbb{I}_\mathbf{X})$, so that 
>
> $$
> A \rho A^\dagger = U(|0\rangle \langle 0|_\mathbf{W} \otimes \rho) U^\dagger.
> $$

---

### Example: completely dephasing channel

Here’s a Stinespring representation of the qubit dephasing channel $\Delta$. In this diagram, both wires represent single qubits — so this is an ordinary quantum circuit diagram.


![dephasing-circuit.png](attachment:dephasing-circuit.png)

To see that the effect that this circuit has on the input qubit is indeed described by the completely dephasing channel, we can go through the circuit one step at a time, using the explicit matrix representation of the partial trace discussed in the previous lesson. We’ll refer to the top qubit as $X$ — this is the input and output of the channel — and we’ll assume that $X$ starts out in some arbitrary state $\rho$.

The first step is the introduction of a workspace qubit $W$. Prior to the controlled-NOT gate being performed, the state of the pair $(W, X)$ is represented by the following density matrix.

$$
|0\rangle \langle 0|_W \otimes \rho = \begin{pmatrix}1 & 0\\ 0 & 0\end{pmatrix} \otimes \begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix} = 
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & 0 & 0 \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{pmatrix}
$$

As per Qiskit’s ordering convention, the top qubit $X$ is on the right and the bottom qubit $W$ is on the left. We're using density matrices rather than quantum state vectors, but they're tensored together in a similar way to what’s done in the simplified formulation of quantum information.

The next step is to perform the controlled-NOT operation, where $X$ is the control and $W$ is the target. Still keeping in mind the Qiskit ordering convention, the matrix representation of this gate is as follows:

$$
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0
\end{pmatrix}
$$

This is a unitary operation, and to apply it to a density matrix we conjugate it by the unitary matrix. The conjugate-transpose doesn’t happen to change this particular matrix, so the result is as follows.

$$
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0
\end{pmatrix}
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & 0 & 0 \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{pmatrix}
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 1 \\
0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0
\end{pmatrix}
=
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 & 0 & \langle 0|\rho|1\rangle \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
\langle 1|\rho|0\rangle & 0 & 0 & \langle 1|\rho|1\rangle
\end{pmatrix}
$$

Finally, the partial trace is performed on $W$. Recalling the action of this operation on $4 \times 4$ matrices, which was described in the previous lesson, we obtain the following density matrix output.

$$
\text{Tr}_W\left(
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 & 0 & \langle 0|\rho|1\rangle \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
\langle 1|\rho|0\rangle & 0 & 0 & \langle 1|\rho|1\rangle
\end{pmatrix}
\right)
=
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 \\
0 & 0
\end{pmatrix}
+
\begin{pmatrix}
0 & 0 \\
0 & \langle 1|\rho|1\rangle
\end{pmatrix}
=
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 \\
0 & \langle 1|\rho|1\rangle
\end{pmatrix}
= \Delta(\rho)
$$

We can alternatively compute the partial trace by first converting to Dirac notation.

$$
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 & 0 & \langle 0|\rho|1\rangle \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
\langle 1|\rho|0\rangle & 0 & 0 & \langle 1|\rho|1\rangle
\end{pmatrix}
=
\langle 0|\rho|0\rangle |0\rangle \langle 0| \otimes |0\rangle \langle 0|
+ \langle 0|\rho|1\rangle |0\rangle \langle 1| \otimes |0\rangle \langle 0|
+ \langle 1|\rho|0\rangle |1\rangle \langle 0| \otimes |1\rangle \langle 0|
+ \langle 1|\rho|1\rangle |1\rangle \langle 1| \otimes |1\rangle \langle 1|
$$

Tracing out the qubit on the left-hand side yields the same answer as before.

$$
\langle 0|\rho|0\rangle |0\rangle \langle 0| + \langle 1|\rho|1\rangle |1\rangle \langle 1| = \Delta(\rho)
$$

An intuitive way to think about this circuit is that the controlled-NOT operation effectively copies the classical state of the input qubit, and when the copy is thrown in the trash the input qubit "collapses" probabilistically to one of the two possible classical states, which is equivalent to complete dephasing.


Example: completely dephasing channel (alternative)
The circuit described above is not the only way to implement the completely dephasing channel. Here's a different way to do it.

![dephasing-alternative.png](attachment:dephasing-alternative.png)

Here's a quick analysis showing that this implementation works. After the Hadamard gate is performed we have this two-qubit state as a density matrix:

$$
|+\rangle \langle +| \otimes \rho = \frac{1}{2} \begin{pmatrix}1 & 1 \\ 1 & 1\end{pmatrix} \otimes \begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
=
\frac{1}{2}
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle \\
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
$$

The controlled-$\sigma_z$ gate operates by conjugation as follows.

$$
\frac{1}{2}
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & -1
\end{pmatrix}
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle \\
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & -1
\end{pmatrix}
=
\frac{1}{2}
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle \\
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
-\langle 1|\rho|0\rangle & -\langle 1|\rho|1\rangle & -\langle 1|\rho|0\rangle & -\langle 1|\rho|1\rangle
\end{pmatrix}
$$

Finally the workspace system $W$ is traced out.

$$
\frac{1}{2}
\text{Tr}_W\left(
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle \\
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
-\langle 1|\rho|0\rangle & -\langle 1|\rho|1\rangle & -\langle 1|\rho|0\rangle & -\langle 1|\rho|1\rangle
\end{pmatrix}
\right)
=
\frac{1}{2}
\begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
+
\frac{1}{2}
\begin{pmatrix}
\langle 0|\rho|0\rangle & -\langle 0|\rho|1\rangle \\
-\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
=
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 \\
0 & \langle 1|\rho|1\rangle
\end{pmatrix}
= \Delta(\rho)
$$

This implementation is based on a simple idea: dephasing is equivalent to either doing nothing (i.e., applying an identity operation) or applying a $\sigma_z$ gate, each with probability $1/2$.

$$
\frac{1}{2}\rho + \frac{1}{2} \sigma_z \rho \sigma_z = \frac{1}{2} \begin{pmatrix}
\langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\
\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
+
\frac{1}{2} \begin{pmatrix}
\langle 0|\rho|0\rangle & -\langle 0|\rho|1\rangle \\
-\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle
\end{pmatrix}
=
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 \\
0 & \langle 1|\rho|1\rangle
\end{pmatrix}
= \Delta(\rho)
$$

That is, the completely dephasing channel is an example of a mixed-unitary channel, and more specifically a Pauli channel.

### Example: qubit reset channel

The qubit reset channel can be implemented as follows.


![reset-stinespring-1.png](attachment:reset-stinespring-1.png)

The swap gate simply shifts the $∣0⟩$ initialized state of the workspace qubit so that it gets output, while the input state 
$\rho$ gets moved to the bottom qubit and then traced out.

Alternatively, if we don't demand that the output of the channel is left on top, we can take this very simple circuit as our representation.

![reset-stinespring-2.png](attachment:reset-stinespring-2.png)

In words, resetting a qubit to the $|0\rangle$ state is equivalent to throwing the qubit in the trash and getting a new one.



### [Kraus representations](#kraus-representations)

Now we’ll discuss **Kraus representations**, which offer a convenient formulaic way to express the action of a channel through matrix multiplication and addition. In particular, a Kraus representation is a specification of a channel $\Phi$ in the following form.

$$
\Phi(\rho) = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger
$$

Here, $A_0, \ldots, A_{N-1}$ are matrices that all have the same dimensions: their columns correspond to the classical states of the input system $X$ and their rows correspond to the classical states of the output system, whether it’s $X$ or some other system $Y$. In order for $\Phi$ to be a valid channel these matrices must satisfy the following condition.

$$
\sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X
$$

(This condition is equivalent to the condition that $\Phi$ preserves trace. The other property required of a channel — which is complete positivity — follows from the general form of the equation for $\Phi$, as a sum of conjugations.)

Sometimes it’s inconvenient to name the matrices $A_0, \ldots, A_{N-1}$ in a different way. For instance, we could number them starting from 1, or we could use states in some arbitrary classical state set $T$ instead of numbers as subscripts:

$$
\Phi(\rho) = \sum_{a \in T} A_a \rho A_a^\dagger \quad \text{where} \quad \sum_{a \in T} A_a^\dagger A_a = \mathbb{I}
$$

These different ways of naming these matrices, which are called **Kraus matrices**, are all common and can be convenient in different situations — but we’ll stick with the names $A_0, \ldots, A_{N-1}$ in this lesson for the sake of simplicity.

The number $N$ can be an arbitrary positive integer, but it never needs to be too large: if the input system $X$ has $n$ classical states and the output system $Y$ has $m$ classical states, then any given channel from $X$ to $Y$ will always have a Kraus representation for which $N$ is at most the product $nm$.

---

### Example: completely dephasing channel

We obtain a Kraus representation of the completely dephasing channel by taking $A_0 = |0\rangle \langle 0|$ and $A_1 = |1\rangle \langle 1|$.

$$
\sum_{k=0}^{1} A_k \rho A_k^\dagger = |0\rangle \langle 0| \rho |0\rangle \langle 0| + |1\rangle \langle 1| \rho |1\rangle \langle 1| 
= \langle 0|\rho|0\rangle |0\rangle \langle 0| + \langle 1|\rho|1\rangle |1\rangle \langle 1| = 
\begin{pmatrix}
\langle 0|\rho|0\rangle & 0 \\
0 & \langle 1|\rho|1\rangle
\end{pmatrix}
$$

These matrices satisfy the required condition.

$$
\sum_{k=0}^{1} A_k^\dagger A_k = |0\rangle \langle 0| + |1\rangle \langle 1| = \mathbb{I}
$$

Alternatively we can take $A_0 = \frac{1}{\sqrt{2}} \mathbb{I}$ and $A_1 = \frac{1}{\sqrt{2}} \sigma_z$, so that

$$
\sum_{k=0}^{1} A_k \rho A_k^\dagger = \frac{1}{2} \rho + \frac{1}{2} \sigma_z \rho \sigma_z = \Delta(\rho)
$$

as was computed previously. This time the required condition can be verified as follows:

$$
\sum_{k=0}^{1} A_k^\dagger A_k = \frac{1}{2} \mathbb{I}^2 + \frac{1}{2} \sigma_z^2 = \frac{1}{2} \mathbb{I} + \frac{1}{2} \mathbb{I} = \mathbb{I}
$$

### Example: qubit reset channel

We obtain a Kraus representation of the qubit reset channel by taking $A_0 = |0\rangle \langle 0|$ and $A_1 = |0\rangle \langle 1|$.

$$
\sum_{k=0}^1 A_k \rho A_k^\dagger = |0\rangle \langle 0| \rho |0\rangle \langle 0| + |0\rangle \langle 1| \rho |1\rangle \langle 0| 
= \langle 0|\rho|0\rangle |0\rangle \langle 0| + \langle 1|\rho|1\rangle |0\rangle \langle 0| = \text{Tr}(\rho) |0\rangle \langle 0|
$$

These matrices satisfy the required condition.

$$
\sum_{k=0}^1 A_k^\dagger A_k = |0\rangle \langle 0| + |1\rangle \langle 1| = \mathbb{I}
$$

---

### Example: completely depolarizing channel

One way to obtain a Kraus representation for the completely depolarizing channel is to choose Kraus matrices $A_0, \ldots, A_3$ as follows.

$$
A_0 = \frac{|0\rangle \langle 0|}{\sqrt{2}}, \quad
A_1 = \frac{|0\rangle \langle 1|}{\sqrt{2}}, \quad
A_2 = \frac{|1\rangle \langle 0|}{\sqrt{2}}, \quad
A_3 = \frac{|1\rangle \langle 1|}{\sqrt{2}}
$$

For any qubit density matrix $\rho$ we then have:

$$
\sum_{k=0}^3 A_k \rho A_k^\dagger 
= \frac{1}{2} \left( \langle 0|\rho|0\rangle |0\rangle \langle 0| + \langle 0|\rho|1\rangle |0\rangle \langle 1| + \langle 1|\rho|0\rangle |1\rangle \langle 0| + \langle 1|\rho|1\rangle |1\rangle \langle 1| \right)
= \text{Tr}(\rho) \frac{\mathbb{I}}{2} = \Omega(\rho)
$$

An alternative Kraus representation is obtained by choosing Kraus matrices like so:

$$
A_0 = \frac{\mathbb{I}}{2}, \quad
A_1 = \frac{\sigma_x}{2}, \quad
A_2 = \frac{\sigma_y}{2}, \quad
A_3 = \frac{\sigma_z}{2}
$$

To verify that these Kraus matrices do in fact represent the completely depolarizing channel, let's first observe that conjugating an arbitrary $2 \times 2$ matrix by a Pauli matrix works as follows.

$$
\sigma_x \begin{pmatrix} \alpha_{00} & \alpha_{01} \\ \alpha_{10} & \alpha_{11} \end{pmatrix} \sigma_x 
= \begin{pmatrix} \alpha_{11} & \alpha_{10} \\ \alpha_{01} & \alpha_{00} \end{pmatrix}
$$

$$
\sigma_y \begin{pmatrix} \alpha_{00} & \alpha_{01} \\ \alpha_{10} & \alpha_{11} \end{pmatrix} \sigma_y 
= \begin{pmatrix} \alpha_{11} & -\alpha_{10} \\ -\alpha_{01} & \alpha_{00} \end{pmatrix}
$$

$$
\sigma_z \begin{pmatrix} \alpha_{00} & \alpha_{01} \\ \alpha_{10} & \alpha_{11} \end{pmatrix} \sigma_z 
= \begin{pmatrix} \alpha_{00} & -\alpha_{01} \\ -\alpha_{10} & \alpha_{11} \end{pmatrix}
$$

This allows us to verify the correctness of our Kraus representation:

$$
\sum_{k=0}^3 A_k \rho A_k^\dagger 
= \frac{1}{4} \left( \rho + \sigma_x \rho \sigma_x + \sigma_y \rho \sigma_y + \sigma_z \rho \sigma_z \right)
$$

$$
= \frac{1}{4} \left( \begin{pmatrix} \langle 0|\rho|0\rangle & \langle 0|\rho|1\rangle \\ \langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle \end{pmatrix}
+ \begin{pmatrix} \langle 1|\rho|1\rangle & \langle 1|\rho|0\rangle \\ \langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle \end{pmatrix}
+ \begin{pmatrix} \langle 1|\rho|1\rangle & -\langle 1|\rho|0\rangle \\ -\langle 0|\rho|1\rangle & \langle 0|\rho|0\rangle \end{pmatrix}
+ \begin{pmatrix} \langle 0|\rho|0\rangle & -\langle 0|\rho|1\rangle \\ -\langle 1|\rho|0\rangle & \langle 1|\rho|1\rangle \end{pmatrix} \right)
$$

$$
= \text{Tr}(\rho) \frac{\mathbb{I}}{2}
$$

This Kraus representation expresses an important idea, which is that the state of a qubit can be completely randomized by applying to it one of the four Pauli matrices (including the identity matrix) chosen uniformly at random. Thus, the completely depolarizing channel is another example of a Pauli channel.

It is not possible to find a Kraus representation for $\Omega$ having three (or fewer) Kraus matrices — at least four are required.

### Example: unitary channels

If we have a unitary matrix $U$ representing an operation on a system $X$, we can express the action of this unitary operation as a channel:

$$
\Phi(\rho) = U\rho U^\dagger.
$$

This expression is already a valid Kraus representation of the channel $\Phi$ where we happen to have just one Kraus matrix $A_0 = U$. In this case, the required condition

$$
\sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X
$$

takes the much simpler form $U^\dagger U = \mathbb{I}_X$, which we know is true because $U$ is unitary.



### [Choi representations](#choi-representations)

Now we'll discuss a third way that channels can be described, through the **Choi representation**. The way it works is that each channel is represented by a single matrix known as its **Choi matrix**. If the input system has $n$ classical states and the output system has $m$ classical states, then the Choi matrix of the channel will have $nm$ rows and $nm$ columns.

Choi matrices provide a **faithful** representation of channels, meaning that two channels are the same if and only if they have the same Choi matrix. One reason why this is important is that it provides us with a way of determining whether two different descriptions correspond to the same channel or to different channels — we simply compute the Choi matrices and compare them to see if they're equal. In contrast, Stinespring and Kraus representations are not unique in this way, as we have seen. Choi matrices are also useful in other regards for uncovering various mathematical properties of channels.

---

### Definition

Let $\Phi$ be a channel from a system $X$ to a system $Y$, and assume that the classical state set of the input system $X$ is $\Sigma$. The Choi representation of $\Phi$, which is denoted $J(\Phi)$, is defined by the following equation:

$$
J(\Phi) = \sum_{a,b \in \Sigma} |a\rangle \langle b| \otimes \Phi(|a\rangle \langle b|)
$$

If we assume that $\Sigma = \{0, \ldots, n-1\}$ for some positive integer $n$, then we can alternatively express $J(\Phi)$ as a block matrix:

$$
J(\Phi) =
\begin{pmatrix}
\Phi(|0\rangle \langle 0|) & \Phi(|0\rangle \langle 1|) & \cdots & \Phi(|0\rangle \langle n-1|) \\
\Phi(|1\rangle \langle 0|) & \Phi(|1\rangle \langle 1|) & \cdots & \Phi(|1\rangle \langle n-1|) \\
\vdots & \vdots & \ddots & \vdots \\
\Phi(|n-1\rangle \langle 0|) & \Phi(|n-1\rangle \langle 1|) & \cdots & \Phi(|n-1\rangle \langle n-1|)
\end{pmatrix}
$$

That is, as a block matrix the Choi matrix of a channel has one block $\Phi(|a\rangle \langle b|)$ for each pair $(a, b)$ of classical states of the input system, with the blocks arranged in a natural way. Notice that the set $\{|a\rangle \langle b| : 0 \leq a, b < n\}$ forms a basis for the space of all $n \times n$ matrices — and because $\Phi$ is linear, it follows that its action can be recovered from its Choi matrix by taking linear combinations of the blocks.

---

### The Choi state of a channel

Another way to think about the Choi matrix of a channel is that it's a density matrix if we divide by $n = |\Sigma|$. Let's focus on the situation that $\Sigma = \{0, \ldots, n-1\}$ for simplicity, and imagine that we have two identical copies of $X$ that are together in the entangled state

$$
|\psi\rangle = \frac{1}{\sqrt{n}} \sum_{a=0}^{n-1} |a\rangle \otimes |a\rangle.
$$

As a density matrix this state is as follows.

$$
|\psi\rangle \langle \psi| = \frac{1}{n} \sum_{a,b=0}^{n-1} |a\rangle \langle b| \otimes |a\rangle \langle b|
$$

If we apply the channel $\Phi$ to the copy of $X$ on the right-hand side, we obtain the Choi matrix divided by $n$.

$$
(\text{Id} \otimes \Phi)(|\psi\rangle \langle \psi|) = \frac{1}{n} \sum_{a,b=0}^{n-1} |a\rangle \langle b| \otimes \Phi(|a\rangle \langle b|) = \frac{J(\Phi)}{n}
$$

In words, up to a normalization factor $1/n$, the Choi matrix of $\Phi$ is the density matrix we obtain by evaluating $\Phi$ on one-half of a *maximally entangled* pair of input systems, as the following figure depicts. Notice in particular that this implies that the Choi matrix of a channel must always be positive semidefinite.


![Choi-state.png](attachment:Choi-state.png)

We also see that because the channel $\Phi$ is applied to the second (or top) system alone, it cannot affect the reduced state of the first (or bottom) system. In the case at hand that state is the completely mixed state $\mathbb{I}_X/n$, and therefore

$$
\text{Tr}_Y\left(\frac{J(\Phi)}{n}\right) = \frac{\mathbb{I}_X}{n}
$$

Clearing the denominator $n$ from both sides yields $\text{Tr}_Y(J(\Phi)) = \mathbb{I}_X$.

We can alternatively draw this same conclusion by using the fact that channels must always preserve trace, and therefore

$$
\text{Tr}_Y(J(\Phi)) = \sum_{a,b \in \Sigma} \text{Tr}(\Phi(|a\rangle \langle b|)) |a\rangle \langle b| = \sum_{a,b \in \Sigma} \text{Tr}(|a\rangle \langle b|) |a\rangle \langle b| = \sum_{a \in \Sigma} |a\rangle \langle a| = \mathbb{I}_X
$$

In summary, the Choi representation $J(\Phi)$ for any channel $\Phi$ must be positive semidefinite and must satisfy

$$
\text{Tr}_Y(J(\Phi)) = \mathbb{I}_X
$$

---

### Example: the completely dephasing channel

The Choi representation of the completely dephasing channel $\Delta$ is

$$
J(\Delta) = \sum_{a,b=0}^{1} |a\rangle \langle b| \otimes \Delta(|a\rangle \langle b|) = \sum_{a=0}^{1} |a\rangle \langle a| \otimes |a\rangle \langle a| =
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 1
\end{pmatrix}
$$

---

### Example: the completely depolarizing channel

The Choi representation of the completely depolarizing channel is

$$
J(\Omega) = \sum_{a,b=0}^{1} |a\rangle \langle b| \otimes \Omega(|a\rangle \langle b|) = \sum_{a=0}^{1} |a\rangle \langle a| \otimes \frac{1}{2}\mathbb{I} = \frac{1}{2} \mathbb{I} \otimes \mathbb{I} =
\begin{pmatrix}
\frac{1}{2} & 0 & 0 & 0 \\
0 & \frac{1}{2} & 0 & 0 \\
0 & 0 & \frac{1}{2} & 0 \\
0 & 0 & 0 & \frac{1}{2}
\end{pmatrix}
$$

---

### Example: the qubit reset channel

The Choi representation of the qubit reset channel $\Lambda$ is

$$
J(\Lambda) = \sum_{a,b=0}^{1} |a\rangle \langle b| \otimes \Lambda(|a\rangle \langle b|) = \sum_{a=0}^{1} |a\rangle \langle a| \otimes |0\rangle \langle 0| = \mathbb{I} \otimes |0\rangle \langle 0| =
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 0
\end{pmatrix}
$$

---

### Example: the identity channel

The Choi representation of the qubit identity channel $\text{Id}$ is

$$
J(\text{Id}) = \sum_{a,b=0}^{1} |a\rangle \langle b| \otimes \text{Id}(|a\rangle \langle b|) = \sum_{a,b=0}^{1} |a\rangle \langle b| \otimes |a\rangle \langle b| =
\begin{pmatrix}
1 & 0 & 0 & 1 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 \\
1 & 0 & 0 & 1
\end{pmatrix}
$$


Notice in particular that $J(Id)$ is not the identity matrix. The Choi representation does not directly describe a channel's action in the usual way that a matrix represents a linear mapping.



## [Equivalence of the representations](#equivalence-of-the-representations)
We've now discussed three different ways to represent channels in mathematical terms, namely Stinespring representations, Kraus representations, and Choi representations. We also have the definition of a channel, which states that a channel is a linear mapping that always transforms density matrices into density matrices, even when the channel is applied to just part of a compound system. The remainder of the lesson is devoted to a mathematical proof that the three representations are, in fact, equivalent and precisely capture the definition.

### [Overview of the proof](#overview-of-the-proof)
Our goal is to establish the equivalence of a collection of four statements, and we'll begin by writing them down precisely. All four statements follow the same conventions that have been used throughout the lesson, namely that $\Phi$ is a linear mapping from square matrices to square matrices, the rows and columns of the input matrices have been placed in correspondence with the classical states of a system $X$ (the input system), and the rows and columns of the output matrices have been placed in correspondence with the classical states of a system $Y$ (the output system).

1. $\Phi$ is a channel from $X$ to $Y$. That is, $\Phi$ always transforms density matrices to density matrices, even when it acts on one part of a larger compound system.

2. The Choi matrix $J(\Phi)$ is positive semidefinite and satisfies the condition $\text{Tr}_Y(J(\Phi)) = \mathbb{I}_X$.

3. There is a Kraus representation for $\Phi$. That is, there exist matrices $A_0, \dots, A_{N-1}$ for which the equation 
   $$
   \Phi(\rho) = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger
   $$
   is true for every input $\rho$, and that satisfy the condition 
   $$
   \sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X.
   $$

4. There is a Stinespring representation for $\Phi$. That is, there exist systems $W$ and $G$ for which the pairs $(W, X)$ and $(G, Y)$ have the same number of classical states, along with a unitary matrix $U$ representing a unitary operation from $(W, X)$ to $(G, Y)$, such that 
   $$
   \Phi(\rho) = \text{Tr}_G \left(U (|0\rangle \langle 0|_W \otimes \rho) U^\dagger \right).
   $$

The way the proof works is that a cycle of implications is proved: the first statement in our list implies the second, the second implies the third, the third implies the fourth, and the fourth statement implies the first. This establishes that all four statements are equivalent — which is to say that they’re either all true or all false for a given choice of $\Phi$ — because the implications can be followed transitively from any one statement to any other. This is a common strategy when proving that a collection of statements are equivalent, and a useful trick to use in such a context is to set the implications up in a way that makes them as easy to prove as possible. That is the case here, and in fact we’ve already encountered two of the four implications.


### [First implication: channels to Choi matrices](#first-implication-channels-to-choi-matrices)
Referring to the statements listed above by their numbers, the first implication to be proved is $1 \Rightarrow 2$. This implication was already discussed in the context of the Choi state of a channel. Here we’ll summarize the mathematical details.

Assume that the classical state set of the input system $X$ is $\Sigma$ and let $n = |\Sigma|$. Consider the situation in which $\Phi$ is applied to the second of two copies of $X$ together in the state

$$
|\psi\rangle = \frac{1}{\sqrt{n}} \sum_{a \in \Sigma} |a\rangle \otimes |a\rangle,
$$

which as a density matrix is given by

$$
|\psi\rangle \langle \psi| = \frac{1}{n} \sum_{a,b \in \Sigma} |a\rangle \langle b| \otimes |a\rangle \langle b|.
$$

The result can be written as

$$
(\text{Id} \otimes \Phi)(|\psi\rangle \langle \psi|) = \frac{1}{n} \sum_{a,b=0}^{n-1} |a\rangle \langle b| \otimes \Phi(|a\rangle \langle b|) = \frac{J(\Phi)}{n},
$$

and by the assumption that $\Phi$ is a channel this must be a density matrix. Like all density matrices it must be positive semidefinite, and multiplying a positive semidefinite matrix by a positive real number yields another positive semidefinite matrix, and therefore $J(\Phi) \geq 0$.

Moreover, under the assumption that $\Phi$ is a channel, it must preserve trace, and therefore

$$
\text{Tr}_Y(J(\Phi)) = \sum_{a,b \in \Sigma} \text{Tr}(\Phi(|a\rangle \langle b|)) |a\rangle \langle b| = \sum_{a,b \in \Sigma} \text{Tr}(|a\rangle \langle b|) |a\rangle \langle b| = \sum_{a \in \Sigma} |a\rangle \langle a| = \mathbb{I}_X.
$$

### [Second implication: Choi representation to Kraus representation](#second-implication-choi-representation-to-kraus-representation)

The second implication, again referring to the statements in our list by their numbers, is $2 \Rightarrow 3$. To be clear, we’re ignoring the other statements — and in particular we cannot make the assumption that $\Phi$ is a channel. All we have to work with is that $\Phi$ is a linear mapping whose Choi representation satisfies $J(\Phi) \geq 0$ and $\text{Tr}_Y(J(\Phi)) = \mathbb{I}_X$. This, however, is all we need to conclude that $\Phi$ has a Kraus representation

$$
\Phi(\rho) = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger
$$

for which the condition

$$
\sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X
$$

is satisfied.

We begin with the critically important assumption that $J(\Phi)$ is positive semidefinite, which means that it is possible to express it in the form

$$
J(\Phi) = \sum_{k=0}^{N-1} |\psi_k\rangle \langle \psi_k| \tag{2}
$$

for some way of choosing the vectors $|\psi_0\rangle, \dots, |\psi_{N-1}\rangle$. In general there will be multiple ways to do this — and in fact this directly mirrors the freedom one has in choosing a Kraus representation for $\Phi$.

One way to obtain such an expression is to first use the spectral theorem to write

$$
J(\Phi) = \sum_{k=0}^{N-1} \lambda_k |\gamma_k\rangle \langle \gamma_k|,
$$

in which $\lambda_0, \dots, \lambda_{N-1}$ are the eigenvalues of $J(\Phi)$ (which are necessarily nonnegative real numbers because $J(\Phi)$ is positive semidefinite) and $|\gamma_0\rangle, \dots, |\gamma_{N-1}\rangle$ are unit eigenvectors vectors corresponding to the eigenvalues $\lambda_0, \dots, \lambda_{N-1}$. Note that while there's no freedom in choosing the eigenvalues (except for how they're ordered), there is freedom in the choice of the eigenvectors, particularly when there are eigenvalues with multiplicity larger than one. So, this is not a unique expression of $J(\Phi)$ — we’re just assuming we have one such expression. Irregardless, because the eigenvalues are nonnegative real numbers they have nonnegative square roots, and so we can select

$$
|\psi_k\rangle = \sqrt{\lambda_k} |\gamma_k\rangle
$$

for each $k = 0, \dots, N - 1$ to obtain an expression of the form (2).

It is, however, not essential that the expression (2) comes from a spectral decomposition in this way, and in particular the vectors $|\psi_0\rangle, \dots, |\psi_{N-1}\rangle$ need not be orthogonal in general. It is noteworthy, though, that we can choose these vectors to be orthogonal if we wish, and moreover we never need $N$ to be larger than $nm$ (recalling that $n$ and $m$ denote the numbers of classical states of $X$ and $Y$, respectively).

Next, each of the vectors $|\psi_0\rangle, \dots, |\psi_{N-1}\rangle$ can be further decomposed as

$$
|\psi_k\rangle = \sum_{a \in \Sigma} |a\rangle \otimes |\phi_{k,a}\rangle,
$$

where the vectors $\{|\phi_{k,a}\rangle\}$ have entries corresponding to the classical states of $Y$ and can be explicitly determined by the equation

$$
|\phi_{k,a}\rangle = \left( (\langle a| \otimes \mathbb{I}_Y) |\psi_k\rangle \right)
$$

for each $a \in \Sigma$ and $k = 0, \dots, N - 1$. Although $|\psi_0\rangle, \dots, |\psi_{N-1}\rangle$ are not necessarily unit vectors, this is the same process we would use to analyze what would happen if a standard basis measurement was performed on the system $X$ given a quantum state represented by the pair $(k,a)$.

And now we come to the trick that makes the proof work. We define our Kraus matrices $A_0, \dots, A_{N-1}$ according to the following equation:

$$
A_k = \sum_{a \in \Sigma} |\phi_{k,a}\rangle \langle a|
$$

We can think about this formula purely symbolically: $|a\rangle$ effectively gets flipped around to form $\langle a|$ and moved to right-hand side, forming a matrix. For the purposes of verifying the proof, the formula is all we need.

There is, however, a simple and intuitive relationship between the vector $|\psi_k\rangle$ and the matrix $A_k$, which is that by vectorizing $A_k$ we get $|\psi_k\rangle$. What it means to vectorize $A_k$ is that we stack the columns on top of one another (with the leftmost column on top proceeding to the rightmost on the bottom), in order to form a vector. For instance, if $X$ and $Y$ are both qubits, and for some choice of $k$ we have

$$
|\psi_k\rangle = \alpha_{00} |0\rangle \otimes |0\rangle + \alpha_{01} |0\rangle \otimes |1\rangle + \alpha_{10} |1\rangle \otimes |0\rangle + \alpha_{11} |1\rangle \otimes |1\rangle
= \begin{pmatrix} \alpha_{00} \\ \alpha_{01} \\ \alpha_{10} \\ \alpha_{11} \end{pmatrix},
$$

then

$$
A_k = \alpha_{00} |0\rangle \langle 0| + \alpha_{01} |1\rangle \langle 0| + \alpha_{10} |0\rangle \langle 1| + \alpha_{11} |1\rangle \langle 1|
= \begin{pmatrix} \alpha_{00} & \alpha_{10} \\ \alpha_{01} & \alpha_{11} \end{pmatrix}
$$

(Beware: sometimes the vectorization of a matrix is defined in a slightly different way, which is that the rows of the matrix are transposed and stacked on top of one another to form a column vector.)

First we’ll verify that this choice of Kraus matrices correctly describes the mapping $\Phi$, after which we’ll verify the other required condition. To keep things straight, let’s define a new mapping $\Psi$ as follows:

$$
\Psi(\rho) = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger
$$

Thus, our goal is to verify that $\Psi = \Phi$.

The way we can do this is to compare the Choi representations of these mappings. Choi representations are faithful, so we have $\Psi = \Phi$ if and only if $J(\Phi) = J(\Psi)$. At this point we can simply compute $J(\Psi)$ using the expressions

$$
|\psi_k\rangle = \sum_{a \in \Sigma} |a\rangle \otimes |\phi_{k,a}\rangle \quad \text{and} \quad A_k = \sum_{a \in \Sigma} |\phi_{k,a}\rangle \langle a|
$$

together with the bilinearity of tensor products to simplify.

$$
\begin{aligned}
J(\Psi) &= \sum_{a,b \in \Sigma} |a\rangle \langle b| \otimes \sum_{k=0}^{N-1} A_k |a\rangle \langle b| A_k^\dagger \\
&= \sum_{a,b \in \Sigma} |a\rangle \langle b| \otimes \sum_{k=0}^{N-1} \left( \sum_{c \in \Sigma} |\phi_{k,c}\rangle \langle c| \right) |a\rangle \langle b| \left( \sum_{d \in \Sigma} |d\rangle \langle \phi_{k,d}| \right) \\
&= \sum_{k=0}^{N-1} \left( \sum_{a \in \Sigma} |a\rangle \otimes |\phi_{k,a}\rangle \right) \left( \sum_{b \in \Sigma} \langle b| \otimes \langle \phi_{k,b}| \right) \\
&= \sum_{k=0}^{N-1} |\psi_k\rangle \langle \psi_k| = J(\Phi)
\end{aligned}
$$

So, our Kraus matrices correctly describe $\Phi$.

It remains to check the required condition on $A_0, \dots, A_{N-1}$, which turns out to be equivalent to the assumption $\text{Tr}_Y(J(\Phi)) = \mathbb{I}_X$ (which we haven’t used yet). What we’ll show is this relationship:

$$
\left( \sum_{k=0}^{N-1} A_k^\dagger A_k \right)^T = \text{Tr}_Y(J(\Phi)) \tag{3}
$$

(…in which we’re referring the matrix transpose on the left-hand side). Starting on the left, we can first observe that

$$
\left( \sum_{k=0}^{N-1} A_k^\dagger A_k \right)^T = \left( \sum_{k=0}^{N-1} \sum_{a,b \in \Sigma} |b\rangle \langle \phi_{k,b,a} | \phi_{k,a} \rangle \langle a| \right)^T = \sum_{k=0}^{N-1} \sum_{a,b \in \Sigma} |\phi_{k,a} \rangle \langle \phi_{k,b}| \langle a | b \rangle
$$

The last step follows from the fact that the transpose is linear and maps $|b\rangle \langle a|$ to $|a\rangle \langle b|$. Moving to the right-hand side of our equation, we have

$$
J(\Phi) = \sum_{k=0}^{N-1} |\psi_k \rangle \langle \psi_k| = \sum_{k=0}^{N-1} \sum_{a,b \in \Sigma} |a\rangle \langle b| \otimes |\phi_{k,a}\rangle \langle \phi_{k,b}|
$$

and therefore

$$
\text{Tr}_Y(J(\Phi)) = \sum_{a,b \in \Sigma} \text{Tr} \left( \sum_{k=0}^{N-1} |\phi_{k,a}\rangle \langle \phi_{k,b}| \right) |a\rangle \langle b| = \sum_{k=0}^{N-1} \sum_{a \in \Sigma} |\phi_{k,a} \rangle \langle \phi_{k,a}| = \mathbb{I}_X
$$

We’ve obtained the same result, and therefore the equation (3) has been verified. It follows, by the assumption $\text{Tr}_Y(J(\Phi)) = \mathbb{I}_X$, that

$$
\left( \sum_{k=0}^{N-1} A_k^\dagger A_k \right)^T = \mathbb{I}_X
$$

and therefore, because the identity matrix is its own transpose, the required condition is true.

$$
\sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X
$$

### [Third implication: from Kraus to Stinespring representations](#third-implication-from-kraus-to-stinespring-representations)

Now suppose that we have a Kraus representation of a mapping

$$
\Phi(\rho) = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger
$$

for which

$$
\sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X
$$

Our goal is to find a Stinespring representation for $\Phi$.

What we’d like to do first is to choose the garbage system $G$ so that its classical state set is $\{0, \dots, N-1\}$. In order for $(W, X)$ and $(G, Y)$ to have the same size, however, it must be the case that $n$ divides $mN$, allowing us to take $W$ to have classical states $\{0, \dots, d-1\}$ for $d = mN / n$. For an arbitrary choice of $n$, $m$, and $N$, it may not be the case that $mN / n$ is an integer, so we’re not actually free to choose $G$ so that its classical state set is $\{0, \dots, N-1\}$ — but we can always increase $N$ arbitrarily in the Kraus representation by choosing $A_k = 0$ for however many additional values of $k$ that we wish. And so, if we tacitly assume that $mN / n$ is an integer, which is equivalent to $N$ being a multiple of $m / \text{gcd}(n, m)$, then we’re free to take $G$ so that its classical state set is $\{0, \dots, N-1\}$. As an aside, notice that if it is the case that $N = nm$, then we may take $W$ to have $m^2$ classical states.

It remains to choose $U$, and we’ll do this by matching the following pattern:

$$
U =
\begin{bmatrix}
A_0 & ? & \dots & ? \\
A_1 & ? & \dots & ? \\
\vdots & \vdots & & \vdots \\
A_{N-1} & ? & \dots & ?
\end{bmatrix}
$$

To be clear, this pattern is meant to suggest a block matrix, where each block (including $A_0, \dots, A_{N-1}$ as well as the blocks marked with a question mark) has $m$ rows and $n$ columns. There are $N$ rows of blocks, which means that there are $d = mN / n$ columns of blocks. Expressed in more formulaic terms, we will define $U$ as

$$
U = \sum_{k=0}^{N-1} \sum_{j=0}^{d-1} |k\rangle \langle j| \otimes M_{k,j} =
\begin{pmatrix}
M_{0,0} & M_{0,1} & \cdots & M_{0,d-1} \\
M_{1,0} & M_{1,1} & \cdots & M_{1,d-1} \\
\vdots & \vdots & & \vdots \\
M_{N-1,0} & M_{N-1,1} & \cdots & M_{N-1,d-1}
\end{pmatrix}
$$

where each matrix $M_{k,j}$ has $m$ rows and $n$ columns, and in particular we shall take $M_{k,0} = A_k$ for $k = 0, \dots, N - 1$. This must be a unitary matrix, and the blocks labeled with a question mark, or equivalently $M_{k,j}$ for $j > 0$, must be selected with this in mind — but aside from allowing $U$ to be unitary these blocks won’t have any relevance to the proof.

Let’s momentarily disregard the concern that $U$ is unitary and focus on the expression

$$
\text{Tr}_G \left( U (|0\rangle \langle 0|_W \otimes \rho) U^\dagger \right)
$$

that describes the output state of $Y$ given the input state $\rho$ of $X$ for our Stinespring representation. We can alternatively write

$$
U (|0\rangle \langle 0| \otimes \rho) U^\dagger = U (|0\rangle \langle 0|_W \otimes \rho)(|0\rangle \langle 0|_W \otimes \mathbb{I}_W) U^\dagger
$$

and we see from our choice of $U$ that

$$
U(|0\rangle \otimes \mathbb{I}_W) = \sum_{k=0}^{N-1} |k\rangle \otimes A_k
$$

We therefore find that

$$
U(|0\rangle \langle 0| \otimes \rho) U^\dagger = \sum_{j,k=0}^{N-1} |k\rangle \langle j| \otimes A_k \rho A_j^\dagger
$$

and so

$$
\text{Tr}_G \left( U(|0\rangle \langle 0| \otimes \rho) U^\dagger \right) = \sum_{j,k=0}^{N-1} \text{Tr}(|k\rangle \langle j|) A_k \rho A_j^\dagger = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger = \Phi(\rho)
$$

We therefore have a correct representation for the mapping $\Phi$, and it remains to verify that we can choose $U$ to be unitary.

Consider the first $n$ columns of $U$ when it is selected according to the pattern above — which is to say that by taking these columns alone we have a block matrix

$$
\begin{pmatrix}
A_0 \\
A_1 \\
\vdots \\
A_{N-1}
\end{pmatrix}
$$

There are $n$ columns, one for each classical state of $X$, and as vectors let us name them as $|\gamma_a\rangle$ for each $a \in \Sigma$. Here’s a formula for these vectors that can be matched to the block matrix representation above.

$$
|\gamma_a\rangle = \sum_{k=0}^{N-1} |k\rangle \otimes A_k |a\rangle
$$

Now let’s compute the inner product between any two of these vectors, meaning the ones corresponding to any choice of $a, b \in \Sigma$.

$$
\langle \gamma_a | \gamma_b \rangle = \sum_{j,k=0}^{N-1} \langle k|j\rangle \langle a| A_k^\dagger A_j |b\rangle = \langle a| \left( \sum_{k=0}^{N-1} A_k^\dagger A_k \right) |b\rangle
$$


By the assumption

$$
\sum_{k=0}^{N-1} A_k^\dagger A_k = \mathbb{I}_X
$$

we conclude that the $n$ column vectors $\{ |\gamma_a\rangle : a \in \Sigma \}$ form an orthonormal set:

$$
\langle \gamma_a | \gamma_b \rangle =
\begin{cases}
1 & a = b \\
0 & a \neq b
\end{cases}
$$

for all $a, b \in \Sigma$. This implies that it is possible to fill out the remaining columns of $U$ so that it becomes a unitary matrix. (In particular, the Gram–Schmidt orthogonalization process can be used to select the remaining columns. Recall that something similar was done in Lesson 3 of the *Basics of quantum information* course, in the context of the state discrimination problem.)

### [Fourth implication: Stinespring representation back to the definition](#fourth-implication-stinespring-representation-back-to-the-definition)


The final implication is $4 \Rightarrow 1$. That is, we assume that we have a unitary operation transforming a pair of systems $(W, X)$ into a pair $(G, Y)$, and our goal is to conclude that the mapping

$$
\Phi(\rho) = \text{Tr}_G \left( U(|0\rangle \langle 0|_W \otimes \rho) U^\dagger \right)
$$

is a valid channel. From its form it is evident that $\Phi$ is linear, and it remains to verify that it always transforms density matrices into density matrices. This is pretty straightforward and we’ve already discussed the key points.

In particular, if we start with a density matrix $\sigma$ of a compound system $(Z, X)$, and then add on an additional workspace system $W$, we will certainly be left with a density matrix. If we order the systems $(W, Z, X)$ (for convenience we can write this state as $|0\rangle \langle 0|_W \otimes \sigma$). We then apply the unitary operation $U$, and as we already discussed this is a valid channel, and hence maps density matrices to density matrices. Finally, the partial trace of a density matrix is another density matrix.

Another way to say this is first to observe that each of these things is a valid channel:

1. Introducing an initialized workspace system.
2. Performing a unitary operation.
3. Tracing out a system.

And finally, any composition of channels is another channel — which is immediate from the definition but certainly a fact worth observing in its own right.

This completes the proof of the final implication, and therefore we’ve established the equivalence of the four statements listed at the start of the section.
