# Purifications and fidelity

## Table of Contents

- [Introduction](#introduction)  
- [Purifications](#purifications)  
   - [Definition of Purifications](#definition-of-purifications)  
   - [Existence of Purifications](#existence-of-purifications)  
   - [Schmidt Decompositions](#schmidt-decompositions)  
   - [Unitary Equivalence of Purifications](#unitary-equivalence-of-purifications)  
- [Fidelity](#fidelity)  
   - [Definition of Fidelity](#definition-of-fidelity)  
   - [Basic Properties of Fidelity](#basic-properties-of-fidelity)  
- [Gentle Measurement Lemma](#gentle-measurement-lemma)  
- [Uhlmann's Theorem](#uhlmanns-theorem) 


# [Introduction](#introduction)  
This lesson is centered around a fundamentally important concept in the theory of quantum information, which is that of a purification of a state. A purification of a quantum state, represented by a density matrix $\rho$ is a pure state of a larger compound system that leaves us with $\rho$ when the rest of the compound system is traced out. As we'll see, every state $\rho$ has a purification, provided that the portion of the compound system that gets traced out is large enough.

It's both common and useful to consider purifications of states when reasoning about them. Intuitively speaking, quantum state vectors are simpler mathematical objects than density matrices, and we can often conclude interesting things about density matrices by thinking about them as representing parts of larger systems whose states are pure — and therefore simpler (at least in some regards). This is an example of a dilation in mathematics, where something relatively complicated is obtained by restricting or reducing something larger but simpler.

The lesson also discusses the fidelity between two quantum states, which is a value that quantifies the similarity between the states. We'll see how fidelity is defined by a mathematical formula and discuss how it connects to the notion of a purification through Uhlmann's theorem.

# [Purifications](#purifications)  


# [Definition of Purifications](#definition-of-purifications)  

Let us begin with a precise mathematical definition for purifications.

> **Definition.** Suppose $ X $ is a system in a state represented by a density matrix $ \rho $, and $ |\psi\rangle $ is a quantum state vector of a pair $ (X, Y) $ that leaves $ \rho $ when $ Y $ is traced out:  
> $
\rho = \mathrm{Tr}_Y \left( |\psi\rangle \langle \psi| \right).
$  
> The state vector $ |\psi\rangle $ is then said to be a *purification* of $ \rho $.

The pure state $ |\psi\rangle \langle\psi| $, expressed as a density matrix rather than a quantum state vector, is also commonly referred to as a purification of $ \rho $ when the equation in the definition is true, but we'll generally use the term to refer to a quantum state vector.

The term *purification* is also used more generally when the ordering of the systems is reversed, when the names of the systems and states are different (of course), and when there are more than two systems. For instance, if $ |\psi\rangle $ is a quantum state vector representing a pure state of a compound system $ (A, B, C) $, and the equation  
$
\rho = \mathrm{Tr}_B \left( |\psi\rangle \langle \psi| \right)
$  
is true for a density matrix $ \rho $ representing a state of the system $ (A, C) $, then $ |\psi\rangle $ is still referred to as a purification of $ \rho $. For the purposes of this lesson, however, we'll focus on the specific form described in the definition. Properties and facts concerning purifications, according to this definition, can typically be generalized to more than two systems by re-ordering and partitioning the systems into two compound systems, one playing the role of $ X $ and the other playing the role of $ Y $.



# [Existence of Purifications](#existence-of-purifications)  

Suppose that $ X $ and $ Y $ are any two systems and $ \rho $ is a given state of $ X $. We will prove that there exists a quantum state vector $ |\psi\rangle $ of $ (X, Y) $ that *purifies* $ \rho $ — which is another way of saying that $ |\psi\rangle $ is a purification of $ \rho $ — provided that the system $ Y $ is large enough. In particular, if $ Y $ has at least as many classical states as $ X $, then a purification of this form necessarily exists for every state $ \rho $. Fewer classical states of $ Y $ are required for some states $ \rho $; in general, $ \mathrm{rank}(\rho) $ classical states of $ Y $ are necessary and sufficient for the existence of a quantum state vector of $ (X, Y) $ that purifies $ \rho $.

Consider first any expression of $ \rho $ as a convex combination of $ n $ pure states, for any positive integer $ n $:
$
\rho = \sum_{a=0}^{n-1} p_a |\phi_a\rangle \langle \phi_a|.
$

In this expression, $ (p_0, \ldots, p_{n-1}) $ is a probability vector and $ |\phi_0\rangle, \ldots, |\phi_{n-1}\rangle $ are quantum state vectors of $ X $.

One way to obtain such an expression is through the spectral theorem, in which case $ n $ is the number of classical states of $ X $, $ p_0, \ldots, p_{n-1} $ are the eigenvalues of $ \rho $, and $ |\phi_0\rangle, \ldots, |\phi_{n-1}\rangle $ are orthonormal eigenvectors corresponding to these eigenvalues. There's actually no need to include the terms corresponding to the zero eigenvalues of $ \rho $ in the sum, which allows us to alternatively choose $ n = \mathrm{rank}(\rho) $ and $ p_0, \ldots, p_{n-1} $ to be the non-zero eigenvalues of $ \rho $. This is the minimum value of $ n $ for which an expression of $ \rho $ taking the form above exists.

To be clear, it is not *necessary* that the chosen expression of $ \rho $, as a convex combination of pure states, comes from the spectral theorem — this is just one way to obtain such an expression. In particular, $ n $ could be any positive integer, the unit vectors $ |\phi_0\rangle, \ldots, |\phi_{n-1}\rangle $ need not be orthogonal, and the probabilities $ p_0, \ldots, p_{n-1} $ need not be eigenvalues of $ \rho $.

We can now identify a purification of $ \rho $ as follows:
$
|\psi\rangle = \sum_{a=0}^{n-1} \sqrt{p_a} |\phi_a\rangle \otimes |a\rangle.
$

Here we’re making the assumption that the classical states of $ Y $ include $ 0, \ldots, n - 1 $. If they do not, an arbitrary choice for $ n $ distinct classical states of $ Y $ can be substituted for $ 0, \ldots, n - 1 $. Verifying that this is indeed a purification of $ \rho $ is a simple matter of computing the partial trace, which can be done in the following two equivalent ways.

$
\mathrm{Tr}_Y \left( |\psi\rangle \langle\psi| \right) = \sum_{a=0}^{n-1} (\mathbb{I}_X \otimes \langle a|) |\psi\rangle \langle\psi| (\mathbb{I}_X \otimes |a\rangle) = \sum_{a=0}^{n-1} p_a |\phi_a\rangle \langle\phi_a| = \rho
$

$
\mathrm{Tr}_Y \left( |\psi\rangle \langle\psi| \right) = \sum_{a,b=0}^{n-1} \sqrt{p_a} \sqrt{p_b} |\phi_a\rangle \langle\phi_b| \mathrm{Tr}(|a\rangle \langle b|) = \sum_{a=0}^{n-1} p_a |\phi_a\rangle \langle\phi_a| = \rho
$

More generally, for any orthonormal set of vectors $ \{|\gamma_0\rangle, \ldots, |\gamma_{n-1}\rangle\} $, the quantum state vector

$
|\psi\rangle = \sum_{a=0}^{n-1} \sqrt{p_a} |\phi_a\rangle \otimes |\gamma_a\rangle
$

is a purification of $ \rho $.

---

### Example

Suppose that $ X $ and $ Y $ are both qubits and

$
\rho = \begin{pmatrix}
\frac{3}{4} & \frac{1}{4} \\
\frac{1}{4} & \frac{1}{4}
\end{pmatrix}
$

is a density matrix representing a state of $ X $.

As was mentioned in the *Density matrices* lesson, we can use the spectral theorem to express $ \rho $ as

$
\rho = \cos^2(\pi/8) |\psi_{\pi/8}\rangle \langle \psi_{\pi/8}| + \sin^2(\pi/8) |\psi_{5\pi/8}\rangle \langle \psi_{5\pi/8}|,
$

where $ |\psi_\theta\rangle = \cos(\theta) |0\rangle + \sin(\theta) |1\rangle $. The quantum state vector

$
\cos(\pi/8) |\psi_{\pi/8}\rangle \otimes |0\rangle + \sin(\pi/8) |\psi_{5\pi/8}\rangle \otimes |1\rangle
$

which describes a pure state of the pair $ (X, Y) $, is therefore a purification of $ \sigma $.

Alternatively, we can write

$
\rho = \frac{1}{2} |0\rangle \langle 0| + \frac{1}{2} |+\rangle \langle +|.
$

This is a convex combination of pure states but not a spectral decomposition because $ |0\rangle $ and $ |+\rangle $ are not orthogonal and $ 1/2 $ is not an eigenvalue of $ \sigma $. Nevertheless, the quantum state vector

$
\frac{1}{\sqrt{2}} |0\rangle \otimes |0\rangle + \frac{1}{\sqrt{2}} |+\rangle \otimes |1\rangle,
$

is a purification of $ \rho $.



# [Schmidt Decompositions](#schmidt-decompositions)  
Next, we will discuss **Schmidt decompositions**, which are expressions of quantum state vectors of **pairs** of systems that take a certain form. Schmidt decompositions are closely connected with purifications, and they’re very useful in their own right. Indeed, when reasoning about a given quantum state vector $ |\psi\rangle $ of a pair of systems, the first step is often to identify or contemplate a Schmidt decomposition of this state.

---

### Definition

Let $ |\psi\rangle $ be a given quantum state vector of a pair of systems $ (X, Y) $. A **Schmidt decomposition** of $ |\psi\rangle $ is an expression of the form

$
|\psi\rangle = \sum_{a=0}^{n-1} \sqrt{p_a} \, |x_a\rangle \otimes |y_a\rangle,
$

where $ p_0, \ldots, p_{n-1} $ are positive real numbers summing to 1, and both of the sets $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ and $ \{ |y_0\rangle, \ldots, |y_{n-1}\rangle \} $ are orthonormal.

The values $ \sqrt{p_0}, \ldots, \sqrt{p_{n-1}} $ in a Schmidt decomposition of $ |\psi\rangle $ are known as its **Schmidt coefficients**, which are uniquely determined (up to their ordering) — they’re the only positive real numbers that can appear in such an expression of $ |\psi\rangle $. The sets $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ and $ \{ |y_0\rangle, \ldots, |y_{n-1}\rangle \} $, on the other hand, are not uniquely determined, and the freedom one has in choosing these sets of vectors will be clarified in the explanation that follows.

---

We’ll now verify that a given quantum state vector $ |\psi\rangle $ does indeed have a Schmidt decomposition, and in the process, we’ll learn how to find one.

Consider first an arbitrary (not necessarily orthogonal) basis $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ of the vector space corresponding to the system $ X $. Because this is a basis, there will always exist a uniquely determined selection of vectors $ |z_0\rangle, \ldots, |z_{n-1}\rangle $ for which the following equation is true:

$
|\psi\rangle = \sum_{a=0}^{n-1} |x_a\rangle \otimes |z_a\rangle \tag{1}
$

For example, suppose $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ is the standard basis associated with $ X $. Assuming the classical state set of $ X $ is $ \{0, \ldots, n - 1\} $, this means that $ |x_a\rangle = |a\rangle $ for each $ a \in \{0, \ldots, n - 1\} $, and we find that

$
|\psi\rangle = \sum_{a=0}^{n-1} |a\rangle \otimes |z_a\rangle
$

when $ |z_a\rangle = (\langle a| \otimes \mathbb{I}_Y) |\psi\rangle $ for each $ a \in \{0, \ldots, n - 1\} $. We frequently consider expressions like this when contemplating a standard basis measurement of $ X $.

---

It’s important to note that the formula $ |z_a\rangle = (\langle a| \otimes \mathbb{I}_Y) |\psi\rangle $ for the vectors $ |z_0\rangle, \ldots, |z_{n-1}\rangle $ in this example only works because $ \{ |0\rangle, \ldots, |n-1\rangle \} $ is an **orthonormal** basis. In general, if $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ is a basis that is not necessarily orthonormal, then the vectors $ |z_0\rangle, \ldots, |z_{n-1}\rangle $ are still uniquely determined by the equation (1), but a different formula is needed. One way to find them is first to identify vectors $ |w_0\rangle, \ldots, |w_{n-1}\rangle $ so that the equation

$
\langle w_a | x_b \rangle = 
\begin{cases}
1 & a = b \\
0 & a \ne b
\end{cases}
$

is satisfied for all $ a, b \in \{ 0, \ldots, n - 1 \} $, at which point we have

$
|z_a\rangle = (\langle w_a| \otimes \mathbb{I}_Y) |\psi\rangle.
$

For a given basis $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ of the vector space corresponding to $ X $, the uniquely determined vectors $ |z_0\rangle, \ldots, |z_{n-1}\rangle $ for which the equation (1) is satisfied won’t necessarily satisfy any special properties, even if $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ happens to be an orthonormal basis. If, however, we choose $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ to be an orthonormal basis of **eigenvectors** of the reduced state

$
\rho = \mathrm{Tr}_Y \left( |\psi\rangle \langle\psi| \right),
$

then something special happens. Specifically, for the non-zero eigenvalues $ p_0, \ldots, p_{r-1} $ of $ \rho $, the reduced state has the expression

$
\rho = \sum_{a=0}^{r-1} p_a |x_a\rangle \langle x_a|
$

for an orthonormal basis $ \{ |x_0\rangle, \ldots, |x_{r-1}\rangle \} $ of eigenvectors of $ \rho $.

In greater detail, consider a spectral decomposition of $ \rho $:

$
\rho = \sum_{a=0}^{n-1} p_a |x_a\rangle \langle x_a|
$

Here we're denoting the eigenvalues of $ \rho $ by $ p_0, \ldots, p_{n-1} $, in recognition of the fact that $ \rho $ is a density matrix — so the vector of eigenvalues $ (p_0, \ldots, p_{n-1}) $ forms a probability vector — while $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ is an orthonormal basis of eigenvectors corresponding to these eigenvalues.

To see that the unique collection $ \{ |z_0\rangle, \ldots, |z_{n-1}\rangle \} $ for which the equation (1) is true is necessarily orthogonal, we can begin by computing the partial trace:

$
\mathrm{Tr}_Y(|\psi\rangle \langle\psi|) = \sum_{a,b=0}^{n-1} |x_a\rangle \langle x_b| \, \mathrm{Tr}(|z_a\rangle \langle z_b|) 
= \sum_{a,b=0}^{n-1} \langle z_b | z_a \rangle |x_a\rangle \langle x_b|.
$

This expression must agree with the spectral decomposition of $ \rho $. Because $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ is a basis, we conclude that the set of matrices $ \{ |x_a\rangle \langle x_b| : a, b \in \{0, \ldots, n - 1\} \} $ is linearly independent, and so it follows that

$
\langle z_b | z_a \rangle = 
\begin{cases}
p_a & \text{if } a = b \\
0 & \text{if } a \ne b
\end{cases}
$

establishing that $ \{ |z_0\rangle, \ldots, |z_{n-1}\rangle \} $ is orthogonal.

---

We've nearly obtained a Schmidt decomposition of $ |\psi\rangle $ — it remains to discard those terms in (1) for which $ p_a = 0 $ and then write $ |z_a\rangle = \sqrt{p_a} |y_a\rangle $ for a unit vector $ |y_a\rangle $ for each of the remaining terms.

A convenient way to do this begins with the observation that we’re free to number the eigenvalue/eigenvector pairs in a spectral decomposition of the reduced state $ \rho $ however we wish — so we may assume that the eigenvalues are sorted in decreasing order: $ p_0 \ge p_1 \ge \cdots \ge p_{n-1} $.

Letting $ r = \text{rank}(\rho) $, we find that $ p_0, \ldots, p_{r-1} > 0 $ and $ p_r = \cdots = p_{n-1} = 0 $. So, we have

$
\rho = \sum_{a=0}^{r-1} p_a |x_a\rangle \langle x_a|,
$

and we can write the quantum state vector $ |\psi\rangle $ as

$
|\psi\rangle = \sum_{a=0}^{r-1} |x_a\rangle \otimes |z_a\rangle.
$

Given that $ \|z_a\|^2 = \langle z_a | z_a \rangle = p_a $ for $ a = 0, \ldots, r - 1 $, we can define unit vectors $ |y_0\rangle, \ldots, |y_{r-1}\rangle $ as

$
|y_a\rangle = \frac{|z_a\rangle}{\|z_a\|} = \frac{|z_a\rangle}{\sqrt{p_a}},
$

so that $ |z_a\rangle = \sqrt{p_a} |y_a\rangle $ for each $ a \in \{0, \ldots, r - 1\} $. Because the vectors $ \{ |z_0\rangle, \ldots, |z_{r-1}\rangle \} $ are orthogonal and nonzero, it follows that $ \{ |y_0\rangle, \ldots, |y_{r-1}\rangle \} $ is an **orthonormal** set, and so we have obtained a **Schmidt decomposition** of $ |\psi\rangle $:

$
|\psi\rangle = \sum_{a=0}^{r-1} \sqrt{p_a} |x_a\rangle \otimes |y_a\rangle.
$

---

Concerning the choice of the vectors $ \{ |x_0\rangle, \ldots, |x_{r-1}\rangle \} $ and $ \{ |y_0\rangle, \ldots, |y_{r-1}\rangle \} $, we can select $ \{ |x_0\rangle, \ldots, |x_{r-1}\rangle \} $ to be any orthonormal set of eigenvectors corresponding to the nonzero eigenvalues of the reduced state $ \mathrm{Tr}_Y(|\psi\rangle \langle\psi|) $ (as we have done above), in which case the vectors $ \{ |y_0\rangle, \ldots, |y_{r-1}\rangle \} $ are uniquely determined.

The situation is symmetric between the two systems, so we can alternatively choose $ \{ |y_0\rangle, \ldots, |y_{r-1}\rangle \} $ to be any orthonormal set of eigenvectors corresponding to the nonzero eigenvalues of the reduced state $ \mathrm{Tr}_X(|\psi\rangle \langle\psi|) $, in which case the vectors $ \{ |x_0\rangle, \ldots, |x_{r-1}\rangle \} $ will be uniquely determined.

Note, however, that once one of the sets is selected, a set of eigenvectors of the corresponding reduced state must be used to just describe the other side’s orthonormal system.

Although it won't come up again in this series, it is noteworthy that the non-zero eigenvalues $ p_0, \ldots, p_{r-1} $ of the reduced state $ \mathrm{Tr}_X(|\psi\rangle \langle\psi|) $ must always agree with the nonzero eigenvalues of the reduced state $ \mathrm{Tr}_Y(|\psi\rangle \langle\psi|) $ for any pure state $ |\psi\rangle $ of a pair of systems $ (X, Y) $. This fact is revealed by the Schmidt decomposition: in both cases the eigenvalues must agree with the squares of the Schmidt coefficients of $ |\psi\rangle $.



# [Unitary Equivalence of Purifications](#unitary-equivalence-of-purifications)  

We can use Schmidt decompositions to establish a fundamentally important fact concerning purifications known as the **unitary equivalence of purifications**.

> **Theorem** (Unitary equivalence of purifications).  
> Suppose that $ X $ and $ Y $ are systems, and $ |\psi\rangle $ and $ |\phi\rangle $ are quantum state vectors of $ (X, Y) $ that both purify the same state of $ X $.  
> In symbols, $ \mathrm{Tr}_Y(|\psi\rangle \langle\psi|) = \rho = \mathrm{Tr}_Y(|\phi\rangle \langle\phi|) $ for some density matrix $ \rho $ representing a state of $ X $.  
> There must then exist a unitary operation $ U $ on $ Y $ alone that transforms the first purification into the second:  
> $
(\mathbb{I}_X \otimes U) |\psi\rangle = |\phi\rangle.
$

---

We'll discuss a few implications of this theorem as the lesson continues, but first let’s see how it follows from our previous discussion of Schmidt decompositions.

Our assumption is that $ |\psi\rangle $ and $ |\phi\rangle $ are quantum state vectors of a pair of systems $ (X, Y) $ that satisfy the equation:

$
\mathrm{Tr}_Y(|\psi\rangle \langle\psi|) = \rho = \mathrm{Tr}_Y(|\phi\rangle \langle\phi|)
$

for some density matrix $ \rho $ representing a state of $ X $. Consider a spectral decomposition of $ \rho $:

$
\rho = \sum_{a=0}^{n-1} p_a |x_a\rangle \langle x_a|
$

Here $ \{ |x_0\rangle, \ldots, |x_{n-1}\rangle \} $ is an orthonormal basis of eigenvectors of $ \rho $. By following the prescription described previously, we can obtain Schmidt decompositions for both $ |\psi\rangle $ and $ |\phi\rangle $ having the following form:

$
|\psi\rangle = \sum_{a=0}^{r-1} \sqrt{p_a} |x_a\rangle \otimes |u_a\rangle, \quad
|\phi\rangle = \sum_{a=0}^{r-1} \sqrt{p_a} |x_a\rangle \otimes |v_a\rangle
$

In these expressions $ r $ is the rank of $ \rho $ and $ \{ |u_0\rangle, \ldots, |u_{r-1}\rangle \} $ and $ \{ |v_0\rangle, \ldots, |v_{r-1}\rangle \} $ are orthonormal sets of vectors in the space corresponding to $ Y $.

For any two orthonormal sets in the same space that have the same number of elements, there's always a unitary matrix that transforms the first set into the second. So we can choose a unitary matrix $ U $ so that $ U |u_a\rangle = |v_a\rangle $ for $ a = 0, \ldots, r - 1 $.

In particular, to find such a matrix $ U $, we can first use the Gram-Schmidt orthogonalization process to extend our orthonormal sets to orthonormal bases $ \{ |u_0\rangle, \ldots, |u_{m-1}\rangle \} $ and $ \{ |v_0\rangle, \ldots, |v_{m-1}\rangle \} $, where $ m $ is the dimension of the space corresponding to $ Y $, and then take

$
U = \sum_{a=0}^{m-1} |v_a\rangle \langle u_a|.
$

We now find that:

$
(\mathbb{I}_X \otimes U)|\psi\rangle = \sum_{a=0}^{r-1} \sqrt{p_a} |x_a\rangle \otimes U |u_a\rangle 
= \sum_{a=0}^{r-1} \sqrt{p_a} |x_a\rangle \otimes |v_a\rangle = |\phi\rangle.
$

which completes the proof.

Here are just a few of many interesting examples and implications connected with the unitary equivalence of purifications. (We'll see another critically important one later in the lesson, in the context of fidelity, known as Uhlmann's theorem.)

---

## Superdense coding

In the superdense coding protocol, Alice and Bob share an e-bit, meaning that Alice holds a qubit A, Bob holds a qubit B, and together the pair (A, B) is in the $ |\phi^+\rangle $ Bell state.  
The protocol describes how Alice can transform this shared state into any one of the four Bell states, $ |\phi^+\rangle, |\phi^-\rangle, |\psi^+\rangle, $ and $ |\psi^-\rangle $, by applying a unitary operation to her qubit A. Once she has done this, she sends A to Bob, and then Bob performs a measurement on the pair (A, B) to see which Bell state he holds.

For all four Bell states, the reduced state of Bob’s qubit B is the completely mixed state:

$
\mathrm{Tr}_A(|\phi^+\rangle \langle \phi^+|) = \mathrm{Tr}_A(|\phi^-\rangle \langle \phi^-|) 
= \mathrm{Tr}_A(|\psi^+\rangle \langle \psi^+|) = \mathrm{Tr}_A(|\psi^-\rangle \langle \psi^-|) 
= \frac{\mathbb{I}}{2}
$

By the **unitary equivalence of purifications**, we immediately conclude that for each Bell state there must exist a unitary operation on Alice’s qubit A alone that transforms $ |\phi^+\rangle $ into the chosen Bell state. Although this does not reveal the precise details of the protocol, the unitary equivalence of purifications does immediately imply that **superdense coding is possible**.

We can also conclude that generalizations of superdense coding to larger systems are always possible, provided that we replace the Bell states with any orthonormal basis of purifications of the completely mixed state.

---

## Cryptographic implications

The **unitary equivalence of purifications** has implications concerning the implementation of cryptographic primitives using quantum information.  
For instance, the unitary equivalence of purifications reveals that it is impossible to implement an ideal form of **bit commitment** using quantum information.

The **bit commitment** primitive involves two participants, Alice and Bob (who don’t trust one another), and has two phases:

- **Commit phase**: Alice commits to a binary value $ b \in \{0, 1\} $.  
  This commitment must be:
  - **binding**, meaning Alice cannot change her mind.
  - **concealing**, meaning Bob can't tell which value Alice has committed to.

- **Reveal phase**: Alice reveals the bit, and Bob verifies that it matches her earlier commitment.

Operationally:
- In the commit phase, Alice "locks" her chosen value inside a quantum "safe" and sends it to Bob.
- In the reveal phase, she sends the "key" so that Bob can open the safe and verify the value.

However, **unitary equivalence of purifications** proves this to be impossible under quantum information alone. Here's a **summary**:

1. Assume Alice and Bob only perform unitary operations or introduce new initialized systems.
2. The protocol must allow Bob to hold a quantum system whose reduced state is:
   - $ \rho_0 $ if Alice committed to 0.
   - $ \rho_1 $ if Alice committed to 1.

3. To be **perfectly concealing**, we must have $ \rho_0 = \rho_1 $.

4. The commitment phase ends with Alice holding part of a **purification** of $ \rho_b $, depending on her chosen bit $ b $.

5. But then by **unitary equivalence of purifications**, there exists a unitary transformation Alice can apply to her system to **switch** between the purification of $ \rho_0 $ and $ \rho_1 $, meaning she can **change her commitment**.

Thus, a **perfect quantum bit commitment protocol** cannot exist — the **binding** and **concealing** properties are **incompatible** due to the fundamental structure of quantum mechanics.

However, because Alice and Bob have only used unitary operations, the state of all of the systems involved in the protocol together after the commit phase must be in a pure state. In particular, suppose that $|\psi_0\rangle$ is the pure state of all of the systems involved in the protocol when Alice commits to 0, and $|\psi_1\rangle$ is the pure state of all of the systems involved in the protocol when Alice commits to 1. If we write A and B to denote Alice and Bob's (possibly compound) systems, then

$$
\rho_0 = \mathrm{Tr}_A(|\psi_0\rangle \langle \psi_0|) \\
\rho_1 = \mathrm{Tr}_A(|\psi_1\rangle \langle \psi_1|).
$$

Given the requirement that $\rho_0 = \rho_1$ for a perfectly concealing protocol, we find that $|\psi_0\rangle$ and $|\psi_1\rangle$ are purifications of the same state — and so, by the unitary equivalence of purifications, there must exist a unitary operation $U$ on A alone such that

$$
(U \otimes \mathbb{I}_B)|\psi_0\rangle = |\psi_1\rangle.
$$

Alice is therefore free to change her commitment from 0 to 1 by applying $U$ to A, or from 1 to 0 by applying $U^\dagger$, and so the hypothetical protocol being considered completely fails to be binding.

---

### Hughston-Jozsa-Wootters theorem

The last implication of the unitary equivalence of purifications that we'll discuss in this portion of the lesson is the following theorem known as the Hughston-Jozsa-Wootters theorem. (This is, in fact, a slightly simplified statement of the theorem known by this name.)

> **Theorem (Hughston-Jozsa-Wootters).**  
> Let X and Y be systems and let $|\phi\rangle$ be a quantum state vector of the pair $(X, Y)$. Also let $N$ be an arbitrary positive integer, let $(p_0, \ldots, p_{N-1})$ be a probability vector, and let $|\psi_0\rangle, \ldots, |\psi_{N-1}\rangle$ be quantum state vectors representing states of X such that  
> $$
> \mathrm{Tr}_Y(|\phi\rangle \langle \phi|) = \sum_{a=0}^{N-1} p_a |\psi_a\rangle \langle \psi_a|.
> $$  
> There exists a (general) measurement $\{P_0, \ldots, P_{N-1}\}$ on Y such that the following two statements are true when this measurement is performed on Y when $(X, Y)$ is in the state $|\phi\rangle$:

1. Each measurement outcome $a \in \{0, \ldots, N - 1\}$ appears with probability $p_a$.
2. Conditioned on obtaining the measurement outcome $a$, the state of X becomes $|\psi_a\rangle$.

Intuitively speaking, this theorem says that as long as we have a pure state of two systems, then for *any* way of thinking about the reduced state of the first system as a convex combination of pure states, there is a measurement of the second system that effectively makes this way of thinking about the first system a reality. Notice that the number $N$ is not necessarily bounded by the number of classical states of X or Y. For instance, it could be that $N = 1,\!000,\!000$ while X and Y are qubits.

We shall prove this theorem using the unitary equivalence of purifications, beginning with the introduction of a new system Z whose classical state set is $\{0, \ldots, N - 1\}$. Consider the following two quantum state vectors of the triple $(X, Y, Z)$.

$$
|\gamma_0\rangle = |\phi\rangle_{XY} \otimes |0\rangle_Z
$$

$$
|\gamma_1\rangle = \sum_{a=0}^{N-1} \sqrt{p_a} |\psi_a\rangle_X \otimes |0\rangle_Y \otimes |a\rangle_Z
$$

The first vector $|\gamma_0\rangle$ is simply the given quantum state vector $|\phi\rangle$ tensored with $|0\rangle$ for the new system Z. For the second vector $|\gamma_1\rangle$, we essentially have a quantum state vector that would make the theorem trivial — at least if Y were replaced by Z — because a standard basis measurement performed on Z clearly yields each outcome $a$ with probability $p_a$, and conditioned on obtaining this outcome the state of X becomes $|\psi_a\rangle$.

By thinking about the pair $(Y, Z)$ as a single, compound system that can be traced out to leave X, we find that we have identified two different purifications of the state

$$
\rho = \sum_{a=0}^{N-1} p_a |\psi_a\rangle \langle \psi_a|.
$$

Specifically, for the first one we have

$$
\mathrm{Tr}_{YZ}(|\gamma_0\rangle \langle \gamma_0|) = \mathrm{Tr}_Y(|\phi\rangle \langle \phi|) = \rho
$$

and for the second one we have

$$
\mathrm{Tr}_{YZ}(|\gamma_1\rangle \langle \gamma_1|) = \sum_{a,b=0}^{N-1} \sqrt{p_a} \sqrt{p_b} |\psi_a\rangle \langle \psi_b| \mathrm{Tr}(|0\rangle \langle 0| \otimes |a\rangle \langle b|) = \sum_{a=0}^{N-1} p_a |\psi_a\rangle \langle \psi_a| = \rho.
$$

There must therefore exist a unitary operation $U$ on $(Y, Z)$ satisfying $(\mathbb{I}_X \otimes U)|\gamma_0\rangle = |\gamma_1\rangle$ by the unitary equivalence of purifications.

Using this unitary operation $U$, we can implement a measurement that satisfies the requirements of the theorem as the following diagram illustrates. In words, we introduce the new system Z initialized to the $|0\rangle$ state, apply $U$ to $(Y, Z)$, which transforms the state of $(X, Y, Z)$ from $|\gamma_0\rangle$ into $|\gamma_1\rangle$, and then measure Z with a standard basis measurement, which we’ve already observed gives the desired behavior.


![HSW-measurement.png](attachment:HSW-measurement.png)

The dotted rectangle in the figure represents an implementation of this measurement, which can be described as a collection of positive semidefinite matrices $\{P_0, \ldots, P_{N-1}\}$ as follows.

$$
P_a = (\mathbb{I}_Y \otimes \langle 0|) U^\dagger (\mathbb{I}_Y \otimes |a\rangle \langle a|) U (\mathbb{I}_Y \otimes |0\rangle)
$$



# [Fidelity](#fidelity)  
In this part of the lesson, we'll discuss the *fidelity* between quantum states, which is a measure of their similarity — or how much they "overlap."

Given two quantum state vectors, the fidelity between the pure states associated with these quantum state vectors equals the absolute value of the inner product between the quantum state vectors. This provides a basic way to measure their similarity: the result is a value between 0 and 1, with larger values indicating greater similarity. In particular, the value is zero for orthogonal states (by definition), while the value is 1 for states equivalent up to a global phase.

Intuitively speaking, the fidelity can be seen as an extension of this basic measure of similarity, from quantum state vectors to density matrices.

# [Definition of Fidelity](#definition-of-fidelity)  
It's fitting to begin with a definition of fidelity. At first glance, the definition that follows might look unusual or mysterious, and perhaps not easy to work with. The function it defines, however, turns out to have many interesting properties and multiple alternative formulations, making it much nicer to work with than it may initially appear.

> **Definition.** Let $\rho$ and $\sigma$ be *density matrices* representing quantum states of the same system. The *fidelity* between $\rho$ and $\sigma$ is defined as $F(\rho, \sigma) = \mathrm{Tr} \sqrt{\sqrt{\rho} \sigma \sqrt{\rho}}$.

> **Remark.** Although this is a common definition, it is also common that the fidelity is defined as the *square* of the quantity defined here, which is then referred to as the *root-fidelity*. Neither definition is right or wrong — it’s essentially a matter of preference. Nevertheless, one must always be careful to understand or clarify which definition is being used.

To make sense of the formula in the definition, notice first that $\sqrt{\rho} \sigma \sqrt{\rho}$ is a positive semidefinite matrix: $\sqrt{\rho} \sigma \sqrt{\rho} = M^\dagger M$ for $M = \sqrt{\sigma} \sqrt{\rho}$. Like all positive semidefinite matrices, this positive semidefinite matrix has a unique positive semidefinite square root, the trace of which is the fidelity.

For every square matrix $M$, the eigenvalues of the two positive semidefinite matrices $M^\dagger M$ and $M M^\dagger$ are always the same, and hence the same is true for the square roots of these matrices. Choosing $M = \sqrt{\sigma} \sqrt{\rho}$ and using the fact that the trace of a square matrix is the sum of its eigenvalues, we find that

$$
F(\rho, \sigma) = \mathrm{Tr} \sqrt{\sqrt{\rho} \sigma \sqrt{\rho}} = \mathrm{Tr} \sqrt{M^\dagger M} = \mathrm{Tr} \sqrt{M M^\dagger} = \mathrm{Tr} \sqrt{\sqrt{\sigma} \rho \sqrt{\sigma}} = F(\sigma, \rho).
$$

So, although it is not immediate from the definition, the fidelity is symmetric in its two arguments.

---

### Fidelity in terms of the trace norm

An equivalent way to express the fidelity is by this formula:

$$
F(\rho, \sigma) = \| \sqrt{\sigma} \sqrt{\rho} \|_1.
$$

Here we see the *trace norm*, which we encountered in the previous lesson in the context of state discrimination. The trace norm of a (not necessarily square) matrix $M$ can be defined as

$$
\|M\|_1 = \mathrm{Tr} \sqrt{M^\dagger M},
$$

and by applying this definition to the matrix $\sqrt{\sigma} \sqrt{\rho}$ we obtain the formula in the definition.

An alternative way to express the trace norm of a (square) matrix $M$ is through this formula:

$$
\|M\|_1 = \max_{U \text{ unitary}} |\mathrm{Tr}(MU)|.
$$

Here the maximum is over all *unitary* matrices $U$ having the same number of rows and columns as $M$. Applying this formula in the situation at hand reveals another expression of the fidelity.

$$
F(\rho, \sigma) = \max_{U \text{ unitary}} |\mathrm{Tr}(\sqrt{\sigma} \sqrt{\rho} U)|
$$

---

### Fidelity for pure states

One last point on the definition of fidelity is that every pure state is (as a density matrix) equal to its own square root, which allows the formula for the fidelity to be simplified considerably when one or both of the states is pure. In particular, if one of the two states is pure we have the following formula.

$$
F(|\phi\rangle \langle \phi|, \sigma) = \sqrt{\langle \phi| \sigma | \phi \rangle}
$$

If both states are pure, the formula simplifies to the absolute value of the inner product of the corresponding quantum state vectors, as was mentioned at the start of the section.

$$
F(|\phi\rangle \langle \phi|, |\psi\rangle \langle \psi|) = |\langle \phi | \psi \rangle|
$$


# [Basic Properties of Fidelity](#basic-properties-of-fidelity)  

The fidelity has many remarkable properties and several alternative formulations. Here are just a few basic properties listed without proofs.

1. For any two density matrices $\rho$ and $\sigma$ having the same size, the fidelity $F(\rho, \sigma)$ lies between zero and one: $0 \leq F(\rho, \sigma) \leq 1$.  
   It is the case that $F(\rho, \sigma) = 0$ if and only if $\rho$ and $\sigma$ have orthogonal images (so they can be discriminated without error), and  
   $F(\rho, \sigma) = 1$ if and only if $\rho = \sigma$.

2. The fidelity is *multiplicative*, meaning that the fidelity between two product states is equal to the product of the individual fidelities:

   $$
   F(\rho_1 \otimes \cdots \otimes \rho_m, \sigma_1 \otimes \cdots \otimes \sigma_m) = F(\rho_1, \sigma_1) \cdots F(\rho_m, \sigma_m).
   $$

3. The fidelity between states is nondecreasing under the action of any channel. That is, if $\rho$ and $\sigma$ are density matrices and $\Phi$ is a channel that can take these two states as input, then it is necessarily the case that

   $$
   F(\rho, \sigma) \leq F(\Phi(\rho), \Phi(\sigma)).
   $$

4. The *Fuchs–van de Graaf inequalities* establish a close (though not exact) relationship between fidelity and trace distance: for any two states $\rho$ and $\sigma$ we have

   $$
   1 - \frac{1}{2} \| \rho - \sigma \|_1 \leq F(\rho, \sigma) \leq \sqrt{1 - \frac{1}{4} \| \rho - \sigma \|_1^2}.
   $$

The final property can be expressed in the form of a figure:

![FvdG-plot.png](attachment:FvdG-plot.png)

Specifically, for any choice of states $\rho$ and $\sigma$ of the same system, the horizontal line that crosses the $y$-axis at $F(\rho, \sigma)$ and the vertical line that crosses the $x$-axis at $\frac{1}{2} \|\rho - \sigma\|_1$ must intersect within the gray region bordered below by the line $y = 1 - x$ and above by the unit circle. The most interesting region of this figure from a practical viewpoint is the upper left-hand corner of the gray region: if the fidelity between two states is close to one, then their trace distance is close to zero, and *vice versa*.



# [Gentle Measurement Lemma](#gentle-measurement-lemma)  

## Gentle measurement lemma

Next we'll take a look at a simple but important fact, known as the *gentle measurement lemma*, which connects fidelity to non-destructive measurements. It's a very useful lemma that comes up from time to time, and it's also noteworthy because the seemingly clunky definition for the fidelity actually makes the lemma very easy to prove.

The set-up is as follows. Let $X$ be a system in a state $\rho$ and let $\{P_0, \ldots, P_{m-1} \}$ be a collection of positive semidefinite matrices representing a general measurement of $X$. Suppose further that if this measurement is performed on the system $X$ while it's in the state $\rho$, one of the outcomes is highly likely. To be concrete, let's assume that the likely measurement outcome is 0, and specifically let's assume that

$$
\mathrm{Tr}(P_0 \rho) > 1 - \varepsilon
$$

for a small positive real number $\varepsilon > 0$.

What the gentle measurement lemma states is that, under these assumptions, the non-destructive measurement obtained from $\{P_0, \ldots, P_{m-1} \}$ through Naimark's theorem causes only a small disturbance to $\rho$ in case the likely measurement outcome 0 is observed. More specifically, the lemma states that the fidelity-squared between $\rho$ and the state we obtain from the non-destructive measurement, conditioned on the outcome being 0, is greater than $1 - \varepsilon$:

$$
F\left(\rho, \frac{\sqrt{P_0} \rho \sqrt{P_0}}{\mathrm{Tr}(P_0 \rho)}\right)^2 > 1 - \varepsilon.
$$

We'll need a basic fact about positive semidefinite matrices to prove this. First, because the measurement matrices $P_0, \ldots, P_{m-1}$ are positive semidefinite and sum to the identity, we can conclude that all of the eigenvalues of $P_0$ are real numbers between 0 and 1. One way to see this is to observe that, for any unit vector $|\psi\rangle$, we have that $\langle \psi | P_a | \psi \rangle$ is a nonnegative real number for each $a \in \{0, \ldots, m - 1\}$ (because each $P_a$ is positive semidefinite), with these numbers summing to one:

$$
\sum_{a=0}^{m-1} \langle \psi | P_a | \psi \rangle = \left\langle \psi \left| \left(\sum_{a=0}^{m-1} P_a \right) \right| \psi \right\rangle = \langle \psi | \mathbb{I} | \psi \rangle = 1.
$$

Hence $\langle \psi | P_0 | \psi \rangle$ is always a real number between 0 and 1, and this implies that every eigenvalue of $P_0$ is a real number between 0 and 1 because we can choose $|\psi\rangle$ specifically to be a unit eigenvector corresponding to whichever eigenvalue is of interest.

From this observation we can conclude the following inequality for every density matrix $\rho$:

$$
\mathrm{Tr}(\sqrt{P_0} \rho) \geq \mathrm{Tr}(P_0 \rho)
$$

In greater detail, starting from a spectral decomposition

$$
\rho = \sum_{k=0}^{n-1} \lambda_k |\psi_k\rangle \langle \psi_k|,
$$

we conclude that

$$
\mathrm{Tr}(\sqrt{P_0} \rho) = \sum_{k=0}^{n-1} \lambda_k \langle \psi_k | \sqrt{P_0} | \psi_k \rangle \geq \sum_{k=0}^{n-1} \lambda_k \langle \psi_k | P_0 | \psi_k \rangle = \mathrm{Tr}(P_0 \rho)
$$

From the fact that $\langle \psi_k | \rho | \psi_k \rangle$ is a nonnegative real number and $\sqrt{\lambda_k} \geq \lambda_k$ for each $k = 0, \ldots, n - 1$, (squaring numbers between 0 and 1 can never make them larger.)

Now we can prove the gentle measurement lemma by evaluating the fidelity and then using our inequality. First, let's simplify the expression we're interested in.

$$
F\left(\rho, \frac{\sqrt{P_0} \rho \sqrt{P_0}}{\mathrm{Tr}(P_0 \rho)}\right) = \mathrm{Tr} \sqrt{ \frac{\sqrt{\rho} \sqrt{P_0} \rho \sqrt{P_0} \sqrt{\rho} }{ \mathrm{Tr}(P_0 \rho) } }
$$

$$
= \mathrm{Tr} \sqrt{ \left( \frac{ \sqrt{\rho} \sqrt{P_0} \sqrt{\rho} }{ \sqrt{\mathrm{Tr}(P_0 \rho)} } \right)^2 }
$$

$$
= \mathrm{Tr} \left( \frac{ \sqrt{\rho} \sqrt{P_0} \sqrt{\rho} }{ \sqrt{ \mathrm{Tr}(P_0 \rho) } } \right)
$$

$$
= \frac{ \mathrm{Tr}( \sqrt{P_0} \rho ) }{ \sqrt{ \mathrm{Tr}(P_0 \rho) } }
$$

Notice that these are all equalities – we've not used our inequality (or any other inequality) at this point, so we have an exact expression for the fidelity. We can now use our inequality to conclude

$$
F\left(\rho, \frac{\sqrt{P_0} \rho \sqrt{P_0}}{ \mathrm{Tr}(P_0 \rho)}\right) = \frac{ \mathrm{Tr}(\sqrt{P_0} \rho) }{ \sqrt{ \mathrm{Tr}(P_0 \rho) } } \geq \frac{ \mathrm{Tr}(P_0 \rho) }{ \sqrt{ \mathrm{Tr}(P_0 \rho) } } = \sqrt{ \mathrm{Tr}(P_0 \rho) }
$$

and therefore, by squaring both sides,

$$
F\left(\rho, \frac{ \sqrt{P_0} \rho \sqrt{P_0} }{ \mathrm{Tr}(P_0 \rho) } \right)^2 \geq \mathrm{Tr}(P_0 \rho) > 1 - \varepsilon
$$



# [Uhlmann's Theorem](#uhlmanns-theorem)  


To conclude the lesson, we'll take a look at *Uhlmann's theorem*, which is a fundamental fact about the fidelity that connects it with the notion of a purification. What the theorem says, in simple terms, is that the fidelity between any two quantum states is equal to the **maximum** inner product (in absolute value) between two purifications of those states.

> **Theorem** (Uhlmann's theorem). Let $\rho$ and $\sigma$ be density matrices representing states of a system $X$, and let $Y$ be a system having at least as many classical states as $X$. The fidelity between $\rho$ and $\sigma$ is given by  
> $$
> F(\rho, \sigma) = \max \{ | \langle \phi | \psi \rangle | : \mathrm{Tr}_Y(|\phi \rangle \langle \phi |) = \rho, \ \mathrm{Tr}_Y(|\psi \rangle \langle \psi |) = \sigma \}
> $$
> where the maximum is taken over all quantum state vectors $|\phi\rangle$ and $|\psi\rangle$ of the pair $(X, Y)$.

We can prove this theorem using the unitary equivalence of purifications – but it isn’t completely straightforward and we'll make use of a trick along the way.

To begin, consider spectral decompositions of the two density matrices $\rho$ and $\sigma$.

$$
\rho = \sum_{a=0}^{n-1} p_a |u_a\rangle \langle u_a|  
$$

$$
\sigma = \sum_{b=0}^{n-1} q_b |v_b\rangle \langle v_b|
$$

The two collections $\{ |u_0\rangle, \ldots, |u_{n-1}\rangle \}$ and $\{ |v_0\rangle, \ldots, |v_{n-1}\rangle \}$ are orthonormal bases of eigenvectors of $\rho$ and $\sigma$, respectively, and $p_0, \ldots, p_{n-1}$ and $q_0, \ldots, q_{n-1}$ are the corresponding eigenvalues.

We'll also define $\{ |\overline{u_0}\rangle, \ldots, |\overline{u_{n-1}}\rangle \}$ and $\{ |\overline{v_0}\rangle, \ldots, |\overline{v_{n-1}}\rangle \}$ to be the vectors obtained by taking the complex conjugate of each entry of $\{ |u_0\rangle, \ldots, |u_{n-1}\rangle \}$ and $\{ |v_0\rangle, \ldots, |v_{n-1}\rangle \}$. That is, for an arbitrary vector $|w\rangle$ we can define $|\overline{w}\rangle$ according to the following equation for each $c \in \{0, \ldots, n - 1\}$:

$$
\langle c | \overline{w} \rangle = \overline{ \langle c | w \rangle }
$$

Notice that for any two vectors $|u\rangle$ and $|v\rangle$ we have $\langle \overline{u} | \overline{v} \rangle = \langle v | u \rangle$. More generally, for any square matrix $M$ we have the following formula:

$$
\langle \overline{u} | M | \overline{v} \rangle = \langle v | M^T | u \rangle
$$

It follows that $|u\rangle$ and $|v\rangle$ are orthogonal if and only if $|\overline{u}\rangle$ and $|\overline{v}\rangle$ are orthogonal, and therefore $\{ |\overline{u_0}\rangle, \ldots, |\overline{u_{n-1}}\rangle \}$ and $\{ |\overline{v_0}\rangle, \ldots, |\overline{v_{n-1}}\rangle \}$ are both orthonormal bases.

Now consider the following two vectors $|\phi\rangle$ and $|\psi\rangle$, which are purifications of $\rho$ and $\sigma$, respectively.

$$
|\phi\rangle = \sum_{a=0}^{n-1} \sqrt{p_a} |u_a\rangle \otimes |\overline{u_a}\rangle
$$

$$
|\psi\rangle = \sum_{b=0}^{n-1} \sqrt{q_b} |v_b\rangle \otimes |\overline{v_b}\rangle
$$

This is the trick referred to previously. Nothing indicates explicitly at this point that it's a good idea to make these particular choices for purifications of $\rho$ and $\sigma$, but they are valid purifications, and the complex conjugations will allow the algebra to work out the way we need.

By the unitary equivalence of purifications, we know that every purification of $\rho$ for the pair of systems $(X, Y)$ must take the form $(I_X \otimes U)|\phi\rangle$ for some unitary matrix $U$, and likewise every purification of $\sigma$ for the pair $(X, Y)$ must take the form $(I_X \otimes V)|\psi\rangle$ for some unitary matrix $V$. The inner product of two such purifications can be simplified as follows:

$$
\langle \phi | (I \otimes U^\dagger)(I \otimes V) | \psi \rangle 
= \sum_{a,b=0}^{n-1} \sqrt{p_a} \sqrt{q_b} \langle u_a | v_b \rangle \langle \overline{u_a} | U^\dagger V | \overline{v_b} \rangle
$$

$$
= \sum_{a,b=0}^{n-1} \sqrt{p_a} \sqrt{q_b} \langle u_a | v_b \rangle \langle v_b | (U^\dagger V)^T | u_a \rangle
$$

$$
= \mathrm{Tr} \left( \sum_{a,b=0}^{n-1} \sqrt{p_a} \sqrt{q_b} |u_a\rangle \langle u_a| \cdot \langle u_a | v_b \rangle \langle v_b | (U^\dagger V)^T \right)
$$

$$
= \mathrm{Tr} \left( \sqrt{\rho} \sqrt{\sigma} (U^\dagger V)^T \right)
$$

As $U$ and $V$ range over all possible unitary matrices, the matrix $(U^\dagger V)^T$ also ranges over all possible unitary matrices. Thus, maximizing the absolute value of the inner product of two purifications of $\rho$ and $\sigma$ yields the following equation.

$$
\max_{U, V \text{ unitary}} \left| \mathrm{Tr} \left( \sqrt{\rho} \sqrt{\sigma} (U^\dagger V)^T \right) \right|
= \max_{W \text{ unitary}} \left| \mathrm{Tr} ( \sqrt{\rho} \sqrt{\sigma} W ) \right| 
= \| \sqrt{\rho} \sqrt{\sigma} \|_1 = F(\rho, \sigma)
$$
