# General measurements

## Table of Contents

- [Introduction](#introduction)  
- [Mathematical Formulations of Measurements](#mathematical-formulations-of-measurements)  
    - [Measurements as Collections of Matrices](#measurements-as-collections-of-matrices)  
    - [Measurements as Channels](#measurements-as-channels)  
    - [Equivalence of the Formulations](#equivalence-of-the-formulations)  
    - [Partial Measurements](#partial-measurements)  
- [Naimark's Theorem](#naimarks-theorem)  
    - [Theorem Statement and Proof](#theorem-statement-and-proof)  
    - [Non-destructive Measurements](#non-destructive-measurements)  
- [Quantum State Discrimination and Tomography](#quantum-state-discrimination-and-tomography)  
    - [Discriminating Between Two States](#discriminating-between-two-states)  
    - [Discriminating Three or More States](#discriminating-three-or-more-states)  
    - [Quantum State Tomography](#quantum-state-tomography)


# [Introduction](#introduction)

Measurements provide an interface between quantum and classical information. When a measurement is performed on a system in a quantum state, classical information is extracted, revealing something about that quantum state — and generally changing or destroying it in the process. In the simplified formulation of quantum information (as presented in the Basics of quantum information course), we typically limit our attention to projective measurements, including the simplest type of measurement: standard basis measurements. The concept of a measurement can, however, be generalized beyond projective measurements.

In this lesson we'll consider measurements in greater generality. We'll discuss a few different ways that general measurements can be described in mathematical terms, and we'll connect them to concepts discussed previously in the course.

We'll also take a look at a couple of notions connected with measurements, namely quantum state discrimination and quantum state tomography. Quantum state discrimination refers to a situation that arises commonly in quantum computing and cryptography, where a system is prepared in one of a known collection of states, and the goal is to determine, by means of a measurement, which state was prepared. For quantum state tomography, on the other hand, many independent copies of a single, unknown quantum state are made available, and the goal is to reconstruct a density matrix description of that state by performing measurements on the copies.


# [Mathematical Formulations of Measurements](#mathematical-formulations-of-measurements)  

The lesson begins with two equivalent mathematical descriptions of measurements:

1. General measurements can be described by collections of matrices, one for each measurement outcome, in a way that generalizes the description of projective measurements.
2. General measurements can be described as channels whose outputs are always classical states (represented by diagonal density matrices).
We'll restrict our attention to measurements having finitely many possible outcomes. Although it is possible to define measurements with infinitely many possible outcomes, they're much less typically encountered in the context of computation and information processing, and they also require some additional mathematics (namely measure theory) to be properly formalized.

Our initial focus will be on so-called destructive measurements, where the output of the measurement is a classical measurement outcome alone — with no specification of the post-measurement quantum state of whatever system was measured. Intuitively speaking, we can imagine that such a measurement destroys the quantum system itself, or that the system is immediately discarded once the measurement is made. Later in the lesson we'll broaden our view and consider non-destructive measurements, where there's both a classical measurement outcome and a post-measurement quantum state of the measured system.


# [Measurements as Collections of Matrices](#measurements-as-collections-of-matrices)  

Suppose $X$ is a system that is to be measured, and assume for simplicity that the classical state set of $X$ is $\{0, \ldots, n - 1\}$ for some positive integer $n$, so that density matrices representing quantum states of $X$ are $n \times n$ matrices. We won’t actually have much need to refer to the classical states of $X$, but it will be convenient to refer to $n$, the number of classical states of $X$. We’ll also assume that the possible outcomes of the measurement are the integers $0, \ldots, m - 1$ for some positive integer $m$. Note that we’re just using these names to keep things simple; it’s straightforward to generalize everything that follows to other finite sets of classical states and measurement outcomes, renaming them as desired.

Recall that a **projective measurement** is described by a collection of **projection matrices** that sum to the identity matrix. In symbols, $\{\Pi_0, \ldots, \Pi_{m-1}\}$ describes a projective measurement of $X$ if each $\Pi_a$ is an $n \times n$ projection matrix and the following condition is met:

$$
\Pi_0 + \cdots + \Pi_{m-1} = \mathbb{I}_X
$$

When such a measurement is performed on a system $X$ while it’s in a state described by some quantum state vector $|\psi\rangle$, each outcome $a \in \{0, \ldots, m - 1\}$ is obtained with probability equal to $\| \Pi_a |\psi\rangle \|^2$. (We also have that the post-measurement state of $X$ is obtained by normalizing the vector $\Pi_a |\psi\rangle$, but we’re ignoring the post-measurement state for now.)

If the state of $X$ is described by a density matrix $\rho$ rather than a quantum state vector $|\psi\rangle$, then we can alternatively express the probability to obtain the outcome $a$ as $\mathrm{Tr}(\Pi_a \rho)$. If $\rho = |\psi\rangle \langle \psi|$ is a pure state, then the two expressions are equal:

$$
\mathrm{Tr}(\Pi_a \rho) = \mathrm{Tr}(\Pi_a |\psi\rangle \langle \psi|) = \langle \psi| \Pi_a |\psi\rangle = \langle \psi| \Pi_a^2 |\psi\rangle = \| \Pi_a |\psi\rangle \|^2
$$

Here we're using the cyclic property of the trace for the second equality, and for the third equality we're using the fact that each $\Pi_a$ is a projection matrix, and therefore satisfies $\Pi_a^2 = \Pi_a$. In general, if $\rho$ is a convex combination

$$
\rho = \sum_{k=0}^{N-1} p_k |\psi_k\rangle \langle \psi_k|
$$

of pure states, then the expression $\mathrm{Tr}(\Pi_a \rho)$ coincides with the average probability for the outcome $a$, owing to the fact that this expression is linear in $\rho$.

$$
\mathrm{Tr}(\Pi_a \rho) = \sum_{k=0}^{N-1} p_k \, \mathrm{Tr}(\Pi_a |\psi_k\rangle \langle \psi_k|) = \sum_{k=0}^{N-1} p_k \, \langle \psi_k| \Pi_a |\psi_k\rangle = \sum_{k=0}^{N-1} p_k \| \Pi_a |\psi_k\rangle \|^2
$$

A mathematical description for **general measurements** is obtained by relaxing the definition of projective measurements. Specifically, we allow the matrices in the collection describing the measurement to be arbitrary **positive semidefinite matrices** rather than projections. (Projections are always positive semidefinite; they can alternatively be defined as positive semidefinite matrices whose eigenvalues are all either 0 or 1.) In particular, a general measurement of a system $X$ having outcomes $0, \ldots, m - 1$ is specified by a collection of positive semidefinite matrices $\{P_0, \ldots, P_{m-1}\}$ whose rows and columns correspond to the classical states of $X$ and that meet the condition

$$
P_0 + \cdots + P_{m-1} = \mathbb{I}_X
$$

If the system $X$ is measured while it is in a state described by the density matrix $\rho$, then each outcome $a \in \{0, \ldots, m - 1\}$ appears with probability $\mathrm{Tr}(P_a \rho)$.

As we must naturally demand, the vector of outcome probabilities

$$
(\mathrm{Tr}(P_0 \rho), \ldots, \mathrm{Tr}(P_{m-1} \rho))
$$

of a general measurement always forms a probability vector, for any choice of a density matrix $\rho$. The following two observations establish this fact:

1. Each value $\mathrm{Tr}(P_a \rho)$ is nonnegative, owing to the fact that both $P_a$ and $\rho$ are positive semidefinite matrices is allways non negative:
$$
Q, R \geq 0 \Rightarrow Tr(QR) \geq 0
$$

One way to argue this fact is to use spectral decompositions of Q and R together with the cyclic property of the trace to express the trace of the product QR as a sum of nonnegative real numbers, which must therefore be nonnegative.

The condition $ P_0 + \cdots + P_{m-1} = \mathbb{I}_X $ together with the linearity of the trace ensures that the probabilities sum to 1.

$$
\sum_{a=0}^{m-1} \mathrm{Tr}(P_a \rho) = \mathrm{Tr} \left( \sum_{a=0}^{m-1} P_a \rho \right) = \mathrm{Tr}(\mathbb{I} \rho) = \mathrm{Tr}(\rho) = 1
$$

---

### Example 1: any projective measurement

Projections are always positive semidefinite, so every projective measurement is an example of a general measurement.

For example, a standard basis measurement of a qubit can be represented by $ \{P_0, P_1\} $ where:

$$
P_0 = |0\rangle\langle 0| = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}, \quad
P_1 = |1\rangle\langle 1| = \begin{pmatrix} 0 & 0 \\ 0 & 1 \end{pmatrix}
$$

Measuring a qubit in the state $ \rho $ results in outcome probabilities as follows:

- $ \mathrm{Prob}(\text{outcome} = 0) = \mathrm{Tr}(P_0 \rho) = \mathrm{Tr}(|0\rangle\langle 0| \rho) = \langle 0| \rho |0 \rangle $
- $ \mathrm{Prob}(\text{outcome} = 1) = \mathrm{Tr}(P_1 \rho) = \mathrm{Tr}(|1\rangle\langle 1| \rho) = \langle 1| \rho |1 \rangle $

---

### Example 2: a non-projective qubit measurement

Suppose $ X $ is a qubit, and define two matrices as follows:

$$
P_0 = \begin{pmatrix} \frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} \end{pmatrix}, \quad
P_1 = \begin{pmatrix} \frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} \end{pmatrix}
$$

These are both positive semidefinite matrices: they’re Hermitian, and in both cases the eigenvalues happen to be $ \frac{1}{2} \pm \frac{\sqrt{5}}{6} $, which are both positive. We also have that $ P_0 + P_1 = \mathbb{I} $, and therefore $ \{P_0, P_1\} $ describes a measurement.

If the state of $ X $ is described by a density matrix $ \rho $ and we perform this measurement, then the probability of obtaining the outcome 0 is $ \mathrm{Tr}(P_0 \rho) $ and the probability of obtaining the outcome 1 is $ \mathrm{Tr}(P_1 \rho) $. For instance, if $ \rho = |+\rangle\langle +| $ then the probabilities for the two outcomes 0 and 1 are as follows:

$$
\mathrm{Tr}(P_0 \rho) = \mathrm{Tr}\left( \begin{pmatrix} \frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & \frac{1}{3} \end{pmatrix} \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix} \right)
= \left( \frac{2}{3} \cdot \frac{1}{2} + \frac{1}{3} \cdot \frac{1}{2} \right) + \left( \frac{1}{3} \cdot \frac{1}{2} + \frac{1}{3} \cdot \frac{1}{2} \right)
= \frac{1}{2} + \frac{1}{6} = \frac{5}{6}
$$

$$
\mathrm{Tr}(P_1 \rho) = \mathrm{Tr}\left( \begin{pmatrix} \frac{1}{3} & -\frac{1}{3} \\ -\frac{1}{3} & \frac{2}{3} \end{pmatrix} \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix} \right)
= \left( \frac{1}{3} \cdot \frac{1}{2} - \frac{1}{3} \cdot \frac{1}{2} \right) + \left( -\frac{1}{3} \cdot \frac{1}{2} + \frac{2}{3} \cdot \frac{1}{2} \right)
= 0 + \frac{1}{6} = \frac{1}{6}
$$

---

### Example 3: tetrahedral measurement

Define four single-qubit quantum state vectors as follows:

$$
|\phi_0\rangle = |0\rangle
$$

$$
|\phi_1\rangle = \frac{1}{\sqrt{3}} |0\rangle + \sqrt{\frac{2}{3}} |1\rangle
$$

$$
|\phi_2\rangle = \frac{1}{\sqrt{3}} |0\rangle + \sqrt{\frac{2}{3}} e^{2\pi i/3} |1\rangle
$$

$$
|\phi_3\rangle = \frac{1}{\sqrt{3}} |0\rangle + \sqrt{\frac{2}{3}} e^{-2\pi i/3} |1\rangle
$$

These four states are sometimes known as *tetrahedral* states because they're vertices of a *regular tetrahedron* inscribed within the Bloch sphere.

![tetrahedral-states.png](attachment:tetrahedral-states.png)

The Cartesian coordinates of these four states on the Bloch sphere are:

$$
(0, 0, 1), \quad \left( \frac{2\sqrt{2}}{3}, 0, -\frac{1}{3} \right), \quad \left( -\frac{\sqrt{2}}{3}, \sqrt{\frac{2}{3}}, -\frac{1}{3} \right), \quad \left( -\frac{\sqrt{2}}{3}, -\sqrt{\frac{2}{3}}, -\frac{1}{3} \right),
$$

which can be verified by expressing the density matrix representations of these states as linear combinations of Pauli matrices:

$$
|\phi_0\rangle\langle\phi_0| = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix} = \frac{\mathbb{I} + \sigma_z}{2}
$$

$$
|\phi_1\rangle\langle\phi_1| = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3} \\ \frac{\sqrt{2}}{3} & \frac{2}{3} \end{pmatrix} = \frac{\mathbb{I} + \frac{2\sqrt{2}}{3} \sigma_x - \frac{1}{3} \sigma_z}{2}
$$

$$
|\phi_2\rangle\langle\phi_2| = \begin{pmatrix} \frac{1}{3} & -\frac{1}{3} \sqrt{2} + \frac{i}{\sqrt{6}} \\ -\frac{1}{3} \sqrt{2} - \frac{i}{\sqrt{6}} & \frac{2}{3} \end{pmatrix} = \frac{\mathbb{I} - \frac{\sqrt{2}}{3} \sigma_x + \sqrt{\frac{2}{3}} \sigma_y - \frac{1}{3} \sigma_z}{2}
$$

$$
|\phi_3\rangle\langle\phi_3| = \begin{pmatrix} \frac{1}{3} & -\frac{1}{3} \sqrt{2} - \frac{i}{\sqrt{6}} \\ -\frac{1}{3} \sqrt{2} + \frac{i}{\sqrt{6}} & \frac{2}{3} \end{pmatrix} = \frac{\mathbb{I} - \frac{\sqrt{2}}{3} \sigma_x - \sqrt{\frac{2}{3}} \sigma_y - \frac{1}{3} \sigma_z}{2}
$$

These four states are perfectly spread out on the Bloch sphere, each one equidistant from the other three and with the angles between any two of them always being the same.

Now let us define a measurement $\{P_0, P_1, P_2, P_3\}$ of a qubit by setting:

$$
P_a = \frac{|\phi_a\rangle \langle\phi_a|}{2}
$$

for each $a = 0, \dots, 3$.

We can verify that this is a valid measurement as follows:

1. Each $P_a$ is evidently positive semidefinite, being a pure state divided by one-half. That is, each one is a Hermitian matrix and has one eigenvalue equal to 1/2 and all other eigenvalues zero.
2. The sum of these matrices is the identity matrix: $P_0 + P_1 + P_2 + P_3 = \mathbb{I}$. The expressions of these matrices as linear combinations of Pauli matrices makes this straightforward to verify.

# [Measurements as Channels](#measurements-as-channels)  

A second way to describe measurements in mathematical terms is as channels.

Classical information can be viewed as a special case of quantum information, insofar as we can identify probabilistic states with diagonal density matrices. So, in operational terms, we can think about measurements as being channels whose inputs are density matrices describing states of whatever system is being measured and whose outputs are **diagonal** density matrices describing the resulting distribution of measurement outcomes.

We'll see shortly that any channel having this property can always be written in a simple, canonical form that ties directly to the description of measurements as collections of positive semidefinite matrices. Conversely, given an arbitrary measurement as a collection of matrices, there's always a valid channel having the diagonal output property that describes the given measurement, as suggested in the previous paragraph. Putting these observations together, we find that the two descriptions of general measurements are equivalent.

Before proceeding further, let's be more precise about the measurement, how we're viewing it as a channel, and what assumptions we're making about it. As before, we'll suppose that **X** is the system to be measured, and that the possible outcomes of the measurement are the integers $0, \dots, m - 1$ for some positive integer $m$. We let **Y** be the system that stores measurement outcomes, so its classical state set is $\{0, \dots, m - 1\}$, and we represent the measurement as a channel named $\Phi$ from **X** to **Y**. Our assumption is that **Y is classical** — which is to say that no matter what state we start with for X, the state of Y we obtain is represented by a diagonal density matrix.

We can express in mathematical terms that the output of $\Phi$ is always diagonal in the following way. First define the completely dephasing channel $\Delta_m$ on Y:

$$
\Delta_m(\sigma) = \sum_{a=0}^{m-1} \langle a | \sigma | a \rangle \, |a\rangle \langle a|
$$

This channel is analogous to the completely dephasing qubit channel $\Delta$ from the previous lesson. As a linear mapping, it zeros out all of the off-diagonal entries of an input matrix and leaves the diagonal alone. And now, a simple way to express that a given density matrix $\sigma$ is diagonal is by the equation $\sigma = \Delta_m(\sigma)$. In words, zeroing out all of the off-diagonal entries of a density matrix has no effect if and only if the off-diagonal entries were all zero to begin with. The channel $\Phi$ therefore satisfies our assumption — that Y is classical — if and only if:

$$
\Phi(\rho) = \Delta_m(\Phi(\rho))
$$

for every density matrix $\rho$ representing a state of X.

# [Equivalence of the Formulations](#equivalence-of-the-formulations)  

### Channels to matrices

Suppose that we have a channel from X to Y with the property that:

$$
\Phi(\rho) = \Delta_m(\Phi(\rho))
$$

for every density matrix $\rho$. This may alternatively be expressed as:

$$
\Phi(\rho) = \sum_{a=0}^{m-1} \langle a | \Phi(\rho) | a \rangle \, |a\rangle \langle a| \tag{1}
$$

Like all channels, we can express $\Phi$ in Kraus form for some way of choosing Kraus matrices $A_0, \dots, A_{N-1}$:

$$
\Phi(\rho) = \sum_{k=0}^{N-1} A_k \rho A_k^\dagger
$$

This provides us with an alternative expression for the diagonal entries of $\Phi(\rho)$:

$$
\langle a | \Phi(\rho) | a \rangle = \sum_{k=0}^{N-1} \langle a | A_k \rho A_k^\dagger | a \rangle = \sum_{k=0}^{N-1} \text{Tr}(A_k^\dagger |a\rangle \langle a| A_k \rho) = \text{Tr}(P_a \rho)
$$

for

$$
P_a = \sum_{k=0}^{N-1} A_k^\dagger |a\rangle \langle a| A_k.
$$

Thus, for these same matrices $P_0, \dots, P_{m-1}$, we can express the channel $\Phi$ as follows:

$$
\Phi(\rho) = \sum_{a=0}^{m-1} \text{Tr}(P_a \rho) \, |a\rangle \langle a|
$$

This expression is consistent with our description of general measurements in terms of matrices, as we see each measurement outcome appearing with probability $\text{Tr}(P_a \rho)$.

Now let's observe that the two properties required of the collection of matrices $\{P_0, \dots, P_{m-1}\}$ to describe a general measurement are indeed satisfied. The first property is that they're all **positive semidefinite matrices**. One way to see this is to observe that, for every vector $|\psi\rangle$ having entries in correspondence with the classical state of X we have:

$$
\langle \psi | P_a | \psi \rangle = \sum_{k=0}^{N-1} \langle \psi | A_k^\dagger | a \rangle \langle a | A_k | \psi \rangle = \sum_{k=0}^{N-1} |\langle a | A_k | \psi \rangle|^2 \geq 0.
$$

The second property is that if we sum these matrices we get the identity matrix:

$$
\sum_{a=0}^{m-1} P_a = \sum_{a=0}^{m-1} \sum_{k=0}^{N-1} A_k^\dagger |a\rangle \langle a| A_k 
= \sum_{k=0}^{N-1} A_k^\dagger \left( \sum_{a=0}^{m-1} |a\rangle \langle a| \right) A_k 
= \sum_{k=0}^{N-1} A_k^\dagger A_k 
= \mathbb{I}_X
$$

The last equality follows from the fact that $\Phi$ is a channel, so its Kraus matrices must satisfy this condition.

---

### Matrices to channels

Now let's verify that for any collection $\{P_0, \dots, P_{m-1}\}$ of positive semidefinite matrices satisfying:

$$
P_0 + \dots + P_{m-1} = \mathbb{I}_X
$$

the mapping defined by:

$$
\Phi(\rho) = \sum_{a=0}^{m-1} \text{Tr}(P_a \rho) \, |a\rangle \langle a|
$$

is indeed a valid channel from X to Y.

One way to do this is to compute the **Choi representation** of this mapping:

$$
J(\Phi) = \sum_{b,c=0}^{n-1} |b\rangle \langle c| \otimes \Phi(|b\rangle \langle c|) \\
= \sum_{b,c=0}^{n-1} \sum_{a=0}^{m-1} |b\rangle \langle c| \otimes \text{Tr}(P_a |b\rangle \langle c|) |a\rangle \langle a| \\
= \sum_{b,c=0}^{n-1} \sum_{a=0}^{m-1} |b\rangle \langle c| \otimes \langle c| P_a^T |b\rangle |a\rangle \langle a| \\
= \sum_{a=0}^{m-1} P_a^T \otimes |a\rangle \langle a|
$$

The transpose of each $P_a$ is introduced for the third equality because $\langle c | P_a | b \rangle = \langle b | P_a^T | c \rangle$. This allows for the expressions $|b\rangle \langle b|$ and $|c\rangle \langle c|$ to appear, which simplify to the identity matrix unit sum when summing over $b$ and $c$, respectively.

By the assumption that $P_0, \dots, P_{m-1}$ are positive semidefinite, so too are the $P_0^T, \dots, P_{m-1}^T$. (Transposing a Hermitian matrix results in another Hermitian matrix, and the eigenvalues of a matrix and its transpose always agree.) It follows that $J(\Phi)$ is positive semidefinite. Tracing out the output system Y (i.e., the system on the right-hand side) gives:

$$
\text{Tr}_Y(J(\Phi)) = \sum_{a=0}^{m-1} P_a^T \cdot \text{Tr}(|a\rangle \langle a|) = \sum_{a=0}^{m-1} P_a^T = \left(\sum_{a=0}^{m-1} P_a\right)^T = \mathbb{I}_X
$$

So $\Phi$ is a valid channel.

# [Partial Measurements](#partial-measurements)  

Suppose that we have multiple systems that are collectively in a quantum state, and a general measurement is performed on one of the systems. This results in one of the measurement outcomes, selected at random according to probabilities determined by the measurement and the state of the system prior to the measurement. The resulting state of the remaining systems will then, in general, depend on which measurement outcome was obtained.

Let’s examine how this works for a pair of systems $(X, Z)$ when the system $X$ is measured. (We're naming the system on the right $Z$ because we’ll take $Y$ to be a system representing the classical output of the measurement when we view it as a channel.) We can then easily generalize to the situation in which the systems are swapped as well as to three or more systems.

Suppose the state of $(X, Z)$ prior to the measurement is described by a density matrix $\rho$, which we can write as follows:

$$
\rho = \sum_{b,c=0}^{n-1} |b\rangle \langle c| \otimes \rho_{b,c}
$$

In this expression we're assuming the classical states of $X$ are $0, \dots, n-1$.

We’ll assume that the measurement itself is described by the collection of matrices $\{P_0, \dots, P_{m-1}\}$. This measurement may alternatively be described as a channel $\Phi$ from $X$ to $Y$, where $Y$ is a new system having classical state set $\{0, \dots, m-1\}$. Specifically, the action of this channel can be expressed as follows:

$$
\Phi(\xi) = \sum_{a=0}^{m-1} \text{Tr}(P_a \xi) |a\rangle \langle a|
$$

---

### Outcome probabilities

We're considering a measurement of the system $X$, so the probabilities with which different measurement outcomes are obtained can depend only on $\rho_X$, the reduced state of $X$. In particular, the probability for each outcome $a \in \{0, \dots, m-1\}$ can be expressed in three equivalent ways:

$$
\text{Tr}(P_a \rho_X) = \text{Tr}(P_a \, \text{Tr}_Z(\rho)) = \text{Tr}((P_a \otimes \mathbb{I}_Z) \rho)
$$

The first expression naturally represents the probability to obtain the outcome $a$ based on what we already know about measurements of a single system. To get the second expression we’re simply using the definition $\rho_X = \text{Tr}_Z(\rho)$. To get the third one requires more thought — and learners are encouraged to convince themselves that it is true. (Hint: the equivalence between the second and third expressions does not depend on $\rho$ being a density matrix or on each $P_a$ being positive semidefinite. Try showing it first for tensor products of the form $\rho = M \otimes N$ and then conclude that it must be true in general by linearity.)

While the equivalence of the first and third expressions in the previous equation may not be immediate, it does make sense. Starting from a measurement on $X$, we're effectively defining a measurement on $(X, Z)$, where we simply throw away $Z$ and measure $X$. Like all measurements, this new measurement can be described by a collection of matrices, and it’s not surprising that this measurement is described by the collection $\{P_0 \otimes \mathbb{I}_Z, \dots, P_{m-1} \otimes \mathbb{I}_Z\}$.

---

### States conditional on measurement outcomes

If we want to determine not only the probabilities for the different outcomes but also the resulting state of $ Z $ conditioned on each measurement outcome, we can look to the channel description of the measurement. In particular, let’s examine the state we get when we apply $ \Phi $ to $ X $ and do nothing to $ Z $.

$$
(\Phi \otimes \text{Id}_Z)(\rho) = \sum_{b,c=0}^{n-1} \Phi(|b\rangle \langle c|) \otimes \rho_{b,c}
$$

$$
= \sum_{a=0}^{m-1} \sum_{b,c=0}^{n-1} \text{Tr}(P_a |b\rangle \langle c|) \, |a\rangle \langle a| \otimes \rho_{b,c}
$$

$$
= \sum_{a=0}^{m-1} |a\rangle \langle a| \otimes \sum_{b,c=0}^{n-1} \text{Tr}(P_a |b\rangle \langle c|) \rho_{b,c}
$$

$$
= \sum_{a=0}^{m-1} |a\rangle \langle a| \otimes \text{Tr}_X((P_a \otimes \mathbb{I}_Z)(\rho))
$$

---

Note that this is a **density matrix** by virtue of the fact that $ \Phi $ is a channel, so each matrix $ \text{Tr}_X((P_a \otimes \mathbb{I}_Z)\rho) $ is necessarily positive semidefinite.

One final step transforms this expression into one that reveals what we’re looking for:

$
\sum_{a=0}^{m-1} \text{Tr}((P_a \otimes \mathbb{I}_Z)\rho) \, |a\rangle \langle a| \otimes \frac{\text{Tr}_X((P_a \otimes \mathbb{I}_Z)\rho)}{\text{Tr}((P_a \otimes \mathbb{I}_Z)\rho)}
$

This is an example of a **classical-quantum state**,

$
\sum_{a=0}^{m-1} p(a) \, |a\rangle \langle a| \otimes \sigma_a,
$

like we saw in the *Density matrices* lesson. For each measurement outcome $ a \in \{0, \dots, m - 1\} $, we have:

- Probability $ p(a) = \text{Tr}((P_a \otimes \mathbb{I}_Z)\rho) $
- $ Y $ is in the classical state $ |a\rangle \langle a| $
- $ Z $ is in the state:

$
\sigma_a = \frac{\text{Tr}_X((P_a \otimes \mathbb{I}_Z)\rho)}{\text{Tr}((P_a \otimes \mathbb{I}_Z)\rho)}
\tag{2}
$

This is the density matrix obtained by **normalizing** $ \text{Tr}_X((P_a \otimes \mathbb{I}_Z)\rho) $ by dividing it by its trace.

> Formally, $ \sigma_a $ is only defined when $ p(a) \ne 0 $; when $ p(a) = 0 $, this state is irrelevant (refers to an event with zero probability).

Naturally, the outcome probabilities are consistent with our previous observations.

---

### Generalization

We can adapt this description to other situations, such as when the ordering of the systems is reversed or when there are three or more systems. Conceptually it is straightforward, although it can become cumbersome to write down the formulas.

In general, if we have $ r $ systems $ X_1, \dots, X_r $, the state of the compound system $ (X_1, \dots, X_r) $ is $ \rho $, and the measurement $ \{ P_0, \dots, P_{m-1} \} $ is performed on $ X_k $, the following happens:

1. Each outcome $ a $ appears with probability

$
p(a) = \text{Tr}((\mathbb{I}_{X_1} \otimes \cdots \otimes \mathbb{I}_{X_{k-1}} \otimes P_a \otimes \mathbb{I}_{X_{k+1}} \otimes \cdots \otimes \mathbb{I}_{X_r}) \rho).
$

2. Conditioned on obtaining the outcome $ a $, the state of $ (X_1, \dots, X_{k-1}, X_{k+1}, \dots, X_r) $ is then represented by the following density matrix:

$
\frac{\text{Tr}_{X_k}((\mathbb{I}_{X_1} \otimes \cdots \otimes \mathbb{I}_{X_{k-1}} \otimes P_a \otimes \mathbb{I}_{X_{k+1}} \otimes \cdots \otimes \mathbb{I}_{X_r}) \rho)}
{\text{Tr}((\mathbb{I}_{X_1} \otimes \cdots \otimes \mathbb{I}_{X_{k-1}} \otimes P_a \otimes \mathbb{I}_{X_{k+1}} \otimes \cdots \otimes \mathbb{I}_{X_r}) \rho)}
$


# [Naimark's Theorem](#naimarks-theorem)  

**Naimark's theorem** is a fundamental fact concerning measurements. It states that every general measurement can be implemented in a simple way that's reminiscent of Stinespring representations of channels:

- The system to be measured is first combined with an initialized workspace system, forming a compound system.
- A unitary operation is performed on the compound system.
- Finally, the workspace system is *measured* with respect to a **standard basis measurement**, yielding the outcome of the original general measurement.


# [Theorem Statement and Proof](#theorem-statement-and-proof)  

Let $ X $ be a system and let $ \{ P_0, \dots, P_{m-1} \} $ be a collection of positive semidefinite matrices satisfying $ P_0 + \cdots + P_{m-1} = \mathbb{I}_X $, that describes a measurement of $ X $. Also let $ Y $ be a system whose classical state set is $ \{ 0, \dots, m - 1 \} $, which is the set of possible outcomes of the given measurement.

**Naimark’s theorem** states that there exists a unitary operation $ U $ on the compound system $ (Y, X) $ so that the implementation suggested by the following figure yields measurement outcomes that agree with the given measurement $ \{ P_0, \dots, P_{m-1} \} $, meaning that the probabilities for the different possible measurement outcomes are precisely as in the original.

![Naimark.png](attachment:Naimark.png)

To be clear, the system $ X $ starts out in some arbitrary state $ \rho $ while $ Y $ is initialized to the $ |0\rangle $ state. The unitary operation $ U $ is applied to $ (Y, X) $ and then the system $ Y $ is measured with a standard basis measurement, yielding some outcome $ a \in \{0, \dots, m - 1\} $. Note that, in the figure, the system $ X $ is pictured as part of the output of the circuit — but for now we won’t concern ourselves with the state of $ X $ after $ U $ is performed, and we can alternatively imagine that it is traced out. An implementation of a measurement in this way is clearly reminiscent of a Stinespring representation of a channel, and the mathematical underpinnings are similar as well. The difference here is that the workspace system is measured rather than being traced out like in the case of a Stinespring representation.

The fact that every measurement can be implemented in this way is pretty simple to prove, but we're going to need a fact concerning positive semidefinite matrices first.

---

### Theorem
> Suppose $ P $ is an $ n \times n $ positive semidefinite matrix. There exists a unique $ n \times n $ positive semidefinite matrix $ Q $ for which $ Q^2 = P $. This unique positive semidefinite matrix is called the *square root* of $ P $ and is denoted $ \sqrt{P} $.

---

One way to find the square root of a positive semidefinite matrix is to first compute a spectral decomposition:

$
P = \sum_{k=0}^{n-1} \lambda_k |\psi_k\rangle \langle \psi_k|
$

Because $ P $ is positive semidefinite, its eigenvalues must be nonnegative real numbers, and by replacing them with their square roots we obtain an expression for the square root of $ P $:

$
\sqrt{P} = \sum_{k=0}^{n-1} \sqrt{\lambda_k} |\psi_k\rangle \langle \psi_k|
$

---

With this concept in hand, we're ready to prove Naimark's theorem. Under the assumption that $ X $ has $ n $ classical states, a unitary operation $ U $ on the pair $ (Y, X) $ can be represented by an $ mn \times mn $ matrix, which we can view as an $ m \times m $ block matrix whose blocks are $ n \times n $. The key to the proof is to take $ U $ to be any unitary matrix that matches the following pattern:

$
U = 
\begin{pmatrix}
\sqrt{P_0} & Q_{00} & \cdots & Q_{0,n(m-1)-1} \\
\sqrt{P_1} & Q_{10} & \cdots & Q_{1,n(m-1)-1} \\
\vdots & \vdots & \ddots & \vdots \\
\sqrt{P_{m-1}} & Q_{(m-1)0} & \cdots & Q_{(m-1),n(m-1)-1}
\end{pmatrix}
$

For it to be possible to fill in the blocks marked with a question mark so that $ U $ is unitary, it's both necessary and sufficient that the first $ n $ columns, which are formed by the blocks $ \sqrt{P_0}, \dots, \sqrt{P_{m-1}} $, are orthonormal. We can then use the Gram–Schmidt orthogonalization process to fill in the remaining columns, as we’ve already seen a couple of times in this series.

The first $ n $ columns of $ U $ can be expressed as vectors in the following way, where $ c = 0, \dots, n - 1 $ refers to the column number starting from 0:

$
|\gamma_c\rangle = \sum_{a=0}^{m-1} |a\rangle \otimes \sqrt{P_a} |c\rangle
$

We can compute the inner product between any two of them as follows:

$
\langle \gamma_c | \gamma_d \rangle = \sum_{a,b=0}^{m-1} \langle a | b \rangle \cdot \langle c | \sqrt{P_a}^\dagger \sqrt{P_b} | d \rangle = \langle c | \left( \sum_{a=0}^{m-1} P_a \right) | d \rangle = \langle c | d \rangle
$

This shows that these columns are in fact orthonormal, so we can fill in the remaining columns of $ U $ in a way that guarantees the entire matrix is unitary.

It remains to check that the measurement outcome probabilities for the simulation are consistent with the original measurement. For a given initial state $ \rho $ of $ X $, the measurement described by the collection $ \{ P_0, \ldots, P_{m-1} \} $ results in each outcome $ a \in \{ 0, \ldots, m - 1 \} $ with probability

$
\text{Tr}(P_a \rho).
$

To obtain the outcome probabilities for the simulation, let's first give the name $ \sigma $ to the state of $ (Y, X) $ after $ U $ has been performed. This state can be expressed as follows:

$
\sigma = U (|0\rangle\langle 0| \otimes \rho) U^\dagger = \sum_{a,b=0}^{m-1} |a\rangle\langle b| \otimes \sqrt{P_a} \rho \sqrt{P_b}
$

Equivalently, in a block matrix form, we have the following equation:

$$
\begin{bmatrix}
\sqrt{P_0} & \ast & \cdots & \ast \\
\sqrt{P_1} & \ast & \cdots & \ast \\
\vdots & \vdots & \ddots & \vdots \\
\sqrt{P_{m-1}} & \ast & \cdots & \ast \\
\end{bmatrix}
\begin{bmatrix}
0 & \cdots & 0 \\
0 & \cdots & 0 \\
\vdots & & \vdots \\
\rho & \cdots & 0
\end{bmatrix}
\begin{bmatrix}
\sqrt{P_0} & \sqrt{P_1} & \cdots & \sqrt{P_{m-1}} \\
\ast & \ast & \cdots & \ast \\
\vdots & \vdots & \ddots & \vdots \\
\ast & \ast & \cdots & \ast
\end{bmatrix}
=
\begin{bmatrix}
\sqrt{P_0} \rho \sqrt{P_0} & \cdots & \sqrt{P_0} \rho \sqrt{P_{m-1}} \\
\vdots & & \vdots \\
\sqrt{P_{m-1}} \rho \sqrt{P_0} & \cdots & \sqrt{P_{m-1}} \rho \sqrt{P_{m-1}}
\end{bmatrix}
$$

Note: entries marked $ \ast $ in the matrices above are irrelevant for the final result because they are multiplied by zero entries of $ |0\rangle\langle 0| $ when computing $ U (|0\rangle\langle 0| \otimes \rho) U^\dagger $.

---

Now we can analyze what happens when a standard basis measurement is performed on $ Y $. The probabilities of the possible outcomes are given by the diagonal entries of the reduced state $ \sigma_Y $ of $ Y $:

$
\sigma_Y = \sum_{a,b=0}^{m-1} \text{Tr} \left( \sqrt{P_a} \rho \sqrt{P_b} \right) |a\rangle \langle b|
$

In particular, using the cyclic property of the trace, we see that the probability to obtain a given outcome $ a \in \{ 0, \ldots, m-1 \} $ is as follows:

$
\langle a | \sigma_Y | a \rangle = \text{Tr}(\sqrt{P_a} \rho \sqrt{P_a}) = \text{Tr}(P_a \rho)
$

This matches with the original measurement, establishing the correctness of the simulation.


# [Non-destructive Measurements](#non-destructive-measurements)  

So far in the lesson, we’ve concerned ourselves with *destructive* measurements, where the output consists of the classical measurement result alone and there is no specification of the post-measurement quantum state of the system that was measured. *Non-destructive* measurements, on the other hand, do precisely this. Specifically, non-destructive measurements describe not only the classical measurement outcome probabilities, but also the state of the system that was measured conditioned on each possible measurement outcome. Note that the term *non-destructive* refers to the *system* being measured but not necessarily its *state*, which could change significantly as a result of the measurement.

In general, for a given destructive measurement, there will be multiple (in fact infinitely many) non-destructive measurements that are *compatible* with the given destructive measurement, meaning that the classical measurement outcome probabilities match precisely with the destructive measurement. So, there isn’t a unique way to define the post-measurement quantum state of a system for a given measurement. It is, in fact, possible to generalize non-destructive measurements even further, so that they produce a classical measurement outcome along with a quantum state output of a system that isn’t necessarily the same as the input system.

The notion of a non-destructive measurement is an interesting and useful abstraction. It should, however, be recognized that non-destructive measurements can always be described as compositions of channels and destructive measurements — so there is a sense in which the notion of a destructive measurement is the more fundamental one.

---

## From Naimark's theorem

Consider the simulation of a general measurement like we have in Naimark’s theorem. A simple way to obtain a non-destructive measurement from this simulation is revealed by the figure from before, where the system $ X $ is not traced out, but is part of the output. This yields both a classical measurement outcome $ a \in \{0, \ldots, m - 1\} $ as well as a post-measurement quantum state of $ X $.

Let’s describe these states in mathematical terms. We're assuming that the initial state of $ X $ is $ \rho $, so that after the initialized system $ Y $ is introduced and $ U $ is performed, we have that $ (Y, X) $ is in the state:

$
\sigma = U (|0\rangle\langle 0| \otimes \rho) U^\dagger = \sum_{a,b=0}^{m-1} |a\rangle \langle b| \otimes \sqrt{P_a} \rho \sqrt{P_b}.
$

The probabilities for the different classical outcomes to appear are the same as before — they can't change as a result of us deciding to ignore or not ignore $ X $. That is, we obtain each $ a \in \{0, \ldots, m - 1\} $ with probability $ \text{Tr}(P_a \rho) $.

Conditioned upon having obtained a particular measurement outcome $ a $, the resulting state of $ X $ is given by this expression:

$
\frac{\sqrt{P_a} \rho \sqrt{P_a}}{\text{Tr}(P_a \rho)}.
$

One way to see this is to represent a standard basis measurement of $ Y $ by the completely dephasing channel $ \Delta_m $, where the channel output describes classical measurement outcomes as (diagonal) density matrices. An expression of the state we obtain follows:

$
\sum_{a,b=0}^{m-1} \Delta_m(|a\rangle \langle b|) \otimes \sqrt{P_a} \rho \sqrt{P_b}
= \sum_{a=0}^{m-1} |a\rangle \langle a| \otimes \sqrt{P_a} \rho \sqrt{P_a}.
$

We can then write this state as a convex combination of product states,

$
\sum_{a=0}^{m-1} \text{Tr}(P_a \rho) |a\rangle \langle a| \otimes \frac{\sqrt{P_a} \rho \sqrt{P_a}}{\text{Tr}(P_a \rho)},
$

which is consistent with the expression we've obtained for the state of $ X $ conditioned on each possible measurement outcome.

---

## From a Kraus representation

There are alternative selections for $ U $ in the context of Naimark's theorem that produce the same measurement outcome probabilities but give entirely different output states of $ X $.

For instance, one option is to substitute $ (\mathbb{I}_Y \otimes V) U^\dagger $ for $ U $, where $ V $ is any unitary operation on $ X $. The application of $ V $ to $ X $ commutes with the measurement of $ Y $ so the classical outcome probabilities do not change, but now the state of $ X $ conditioned on the outcome $ a $ becomes

$
\frac{V \sqrt{P_a} \rho \sqrt{P_a} V^\dagger}{\text{Tr}(P_a \rho)}.
$

More generally, we could replace $ U $ by the unitary matrix $ (\mathbb{I}_Y \otimes V)U $.


# [Quantum State Discrimination and Tomography](#quantum-state-discrimination-and-tomography)  

In the last part of the lesson, we'll briefly consider two tasks associated with measurements: quantum state discrimination and quantum state tomography.

1. *Quantum state discrimination*

For quantum state discrimination, we have a known collection of quantum states $ \rho_0, \ldots, \rho_{m-1} $, along with probabilities $ p_0, \ldots, p_{m-1} $ associated with these states. (A succinct way of expressing this is to say that we have an *ensemble* $ \{(p_0, \rho_0), \ldots, (p_{m-1}, \rho_{m-1})\} $ of quantum states.) A number $ a \in \{0, \ldots, m - 1\} $ is chosen randomly according to the probabilities $ (p_0, \ldots, p_{m-1}) $ and the system $ X $ is prepared in the state $ \rho_a $. The goal is to determine, by means of a measurement of $ X $ alone, which value of $ a $ was chosen.

Thus, we have a finite number of alternatives, along with a *prior* — which is our knowledge of the probability for each $ a $ to be selected — and the goal is to determine which alternative actually happened. This may be easy for some choices of states and probabilities, and for others, it may not be possible without some chance of making an error.

2. *Quantum state tomography*

For quantum state tomography, we have an *unknown* quantum state of a system — so unlike in quantum state discrimination there's typically no prior or any information about possible alternatives. This time, however, it's not a single copy of the state that's made available, but rather many *independent* copies are made available. That is, $ N $ identical systems $ X_1, \ldots, X_N $ are each independently prepared in the state $ \rho $ for some (possibly large) number $ N $. The goal is to find an approximation of the unknown state, as a density matrix, by measuring the systems.


# [Discriminating Between Two States](#discriminating-between-two-states)  
### Discriminating between two states

The simplest case for quantum state discrimination is that there are two states to be discriminated: $ \rho_0 $ and $ \rho_1 $.

Imagine a situation in which a bit $ a $ is chosen randomly: $ a = 0 $ with probability $ p \in [0, 1] $ and $ a = 1 $ with probability $ 1 - p $. A system $ X $ is prepared in the state $ \rho_a $, meaning $ \rho_0 $ or $ \rho_1 $ depending on the value of $ a $, and given to us. It is our goal to correctly guess the value of $ a $ by means of a measurement on $ X $. To be precise, we shall aim to maximize the probability that our guess is correct.

---

### An optimal measurement

An optimal way to solve this problem begins with a spectral decomposition of a weighted difference between $ \rho_0 $ and $ \rho_1 $, where the weights are the corresponding probabilities.

$
p \rho_0 - (1 - p)\rho_1 = \sum_{k=0}^{n-1} \lambda_k |\psi_k\rangle \langle\psi_k|
$

Notice that we have a minus sign rather than a plus sign in this expression: this is a weighted *difference* not a weighted sum.

We can maximize the probability of a correct guess by selecting a projective measurement $ \{\Pi_0, \Pi_1\} $ as follows. First let's partition the elements of $ \{0, \ldots, n - 1\} $ into two disjoint sets $ S_0 $ and $ S_1 $ depending upon whether the corresponding eigenvalue of the weighted difference is nonnegative or negative.

$
S_0 = \{k \in \{0, \ldots, n - 1\} : \lambda_k \ge 0\}
$
$
S_1 = \{k \in \{0, \ldots, n - 1\} : \lambda_k < 0\}
$

We can then choose a *projective* measurement as follows.

$
\Pi_0 = \sum_{k \in S_0} |\psi_k\rangle \langle\psi_k| \quad \text{and} \quad \Pi_1 = \sum_{k \in S_1} |\psi_k\rangle \langle\psi_k|
$

(It doesn’t actually matter in which set $ S_0 $ or $ S_1 $ we include the values of $ k $ for which $ \lambda_k = 0 $. Here we're choosing arbitrarily to include these values in $ S_0 $.)

This is an optimal measurement in the situation at hand that minimizes the probability of an incorrect determination of the selected state.

---

### Correctness probability

Now we will determine the probability of correctness for the measurement $ \{\Pi_0, \Pi_1\} $.

To begin we don't really need to be concerned with the specific choice we've made for $ \Pi_0 $ and $ \Pi_1 $, though it may be helpful to keep it in mind. For any measurement $ \{P_0, P_1\} $ (not necessarily projective) we can write the correctness probability as follows.

$
p\, \mathrm{Tr}(P_0 \rho_0) + (1 - p)\, \mathrm{Tr}(P_1 \rho_1)
$

Using the fact that $ \{P_0, P_1\} $ is a measurement, so $ P_1 = \mathbb{I} - P_0 $, we can rewrite this expression as follows.

$
\begin{aligned}
p\, \mathrm{Tr}(P_0 \rho_0) + (1 - p)\, \mathrm{Tr}((\mathbb{I} - P_0)\rho_1) &= p\, \mathrm{Tr}(P_0 \rho_0) - (1 - p)\, \mathrm{Tr}(P_0 \rho_1) + (1 - p)\, \mathrm{Tr}(\rho_1) \\
&= \mathrm{Tr}(P_0 (p \rho_0 - (1 - p)\rho_1)) + 1 - p
\end{aligned}
$

On the other hand, we could have made the substitution $ P_0 = \mathbb{I} - P_1 $ instead. That wouldn't change the value but it does give us an alternative expression.

$
\begin{aligned}
p\, \mathrm{Tr}((\mathbb{I} - P_1)\rho_0) + (1 - p)\, \mathrm{Tr}(P_1 \rho_1) &= p\, \mathrm{Tr}(\rho_0) - p\, \mathrm{Tr}(P_1 \rho_0) + (1 - p)\, \mathrm{Tr}(P_1 \rho_1) \\
&= p - \mathrm{Tr}(P_1 (p \rho_0 - (1 - p)\rho_1))
\end{aligned}
$

The two expressions have the same value, so we can average them to give yet another expression for this value. (Averaging the two expressions is just a trick to simplify the resulting expression.)

$
\begin{aligned}
&\frac{1}{2} \left( \mathrm{Tr}(P_0(p \rho_0 - (1 - p)\rho_1)) + 1 - p \right) + \frac{1}{2} \left( p - \mathrm{Tr}(P_1(p \rho_0 - (1 - p)\rho_1)) \right) \\
&= \frac{1}{2} \mathrm{Tr}((P_0 - P_1)(p \rho_0 - (1 - p)\rho_1)) + \frac{1}{2}
\end{aligned}
$

Now we can see why it makes sense to choose the projections $ \Pi_0 $ and $ \Pi_1 $ (as specified above) for $ P_0 $ and $ P_1 $, respectively — because that's how we can make the trace in the final expression as large as possible. In particular,

$
(\Pi_0 - \Pi_1)(p \rho_0 - (1 - p)\rho_1) = \sum_{k=0}^{n-1} |\lambda_k| \cdot |\psi_k\rangle \langle\psi_k|
$

So, when we take the trace, we obtain the sum of the **absolute values** of the eigenvalues — which is equal to what’s known as the **trace norm** of the weighted difference.

$
\mathrm{Tr}((\Pi_0 - \Pi_1)(p \rho_0 - (1 - p)\rho_1)) = \sum_{k=0}^{n-1} |\lambda_k| = \|p \rho_0 - (1 - p)\rho_1\|_1
$

Thus, the probability that the measurement $ \{\Pi_0, \Pi_1\} $ leads to a correct discrimination of $ \rho_0 $ and $ \rho_1 $, given with probabilities $ p $ and $ 1 - p $, respectively, is as follows.

$
\frac{1}{2} + \frac{1}{2} \|p \rho_0 - (1 - p)\rho_1\|_1
$

---

This fact — that this is the optimal probability for a correct discrimination — of given quantum probabilities $ p $ and $ 1 - p $, is sometimes known as **Helstrom’s Theorem**, or sometimes just **Helstrom’s result**.


# [Discriminating Three or More States](#discriminating-three-or-more-states)  
For quantum state discrimination when there are three or more states, there is no known closed-form solution for an optimal measurement, although it is possible to formulate the problem as a semidefinite program — which allows for efficient numerical approximations of optimal measurements with the help of a computer.

It is also possible to **verify** (or **falsify**) optimality of a given measurement in a state discrimination task through a condition known as the **Holevo-Yuen-Kennedy-Lax** condition. In particular, for the state discrimination task defined by the ensemble  
$
\{(p_0, \rho_0), \ldots, (p_{m-1}, \rho_{m-1})\},
$
the measurement $\{P_0, \ldots, P_{m-1}\}$ is optimal if and only if the matrix
$
Q_a = \sum_{b=0}^{m-1} p_b \rho_b P_b - p_a \rho_a
$
is positive semidefinite for every $a \in \{0, \ldots, m-1\}$.

For example, consider the quantum state discrimination task in which one of the four tetrahedral states $\{|\phi_0\rangle, \ldots, |\phi_3\rangle\}$ is selected uniformly at random. The tetrahedral measurement $\{P_0, P_1, P_2, P_3\}$ succeeds with probability

$
\frac{1}{4} \mathrm{Tr}(P_0 |\phi_0\rangle \langle \phi_0|) + \frac{1}{4} \mathrm{Tr}(P_1 |\phi_1\rangle \langle \phi_1|) + \frac{1}{4} \mathrm{Tr}(P_2 |\phi_2\rangle \langle \phi_2|) + \frac{1}{4} \mathrm{Tr}(P_3 |\phi_3\rangle \langle \phi_3|) = \frac{1}{2}.
$

This is optimal by the Holevo-Yuen-Kennedy-Lax condition, as a calculation reveals that  
$
Q_a = \frac{1}{4}(\mathbb{I} - |\phi_a\rangle \langle \phi_a|) \geq 0
$
for $a = 0, 1, 2, 3$.



# [Quantum State Tomography](#quantum-state-tomography)

Finally, we'll briefly discuss the problem of **quantum state tomography**. For this problem, we're given a large number $N$ of independent copies of an unknown quantum state $\rho$, and the goal is to reconstruct an approximation $\hat{\rho}$ of $\rho$. To be clear, this means that we wish to find a classical description of a density matrix $\hat{\rho}$ that is as close as possible to $\rho$.

We can alternatively describe the set-up in the following way. An unknown density matrix $\rho$ is selected, and we're given access to $N$ quantum systems $X_1, \ldots, X_N$, each of which has been **independently** prepared in the state $\rho$. Thus, the state of the compound system $(X_1, \ldots, X_N)$ is
$
\rho^{\otimes N} = \rho \otimes \rho \otimes \cdots \otimes \rho \quad \text{(N times)}.
$

The goal is to perform measurements on the systems $X_1, \ldots, X_N$ and, based on the outcomes of those measurements, to compute a density matrix $\hat{\rho}$ that closely approximates $\rho$. This turns out to be a fascinating problem and there is ongoing research on it.

Different types of strategies for approaching the problem may be considered. For example, we can imagine a strategy where each of the systems $X_1, \ldots, X_N$ is measured separately, in turn, producing a sequence of measurement outcomes. Different specific choices for which measurements are performed can be made, including **adaptive** and **non-adaptive** selections. In other words, the choice of what measurement is performed on a particular system might or might not depend on the outcomes of prior measurements. Based on the sequence of measurement outcomes, a guess $\hat{\rho}$ for the state $\rho$ is derived — and again there are different methodologies for doing this.

An alternative approach is to perform a single **joint measurement** on the entire collection, where we think about $(X_1, \ldots, X_N)$ as a single system and select a single measurement whose output is a guess $\hat{\rho}$ for the state $\rho$. This can lead to an improved estimate over what is possible for separate measurements of the systems, although a joint measurement on a large number of systems together is likely to be much more difficult to implement.

## Qubit tomography using Pauli measurements

We'll now consider quantum state tomography in the simple case where $\rho$ is a qubit density matrix. We assume that we're given qubits $X_1, \ldots, X_N$ that are each independently in the state $\rho$, and our goal is to compute an approximation $\hat{\rho}$ that is close to $\rho$.

Our strategy will be to divide the $N$ qubits $X_1, \ldots, X_N$ into three roughly equal-size collections, one for each of the three Pauli matrices $\sigma_x, \sigma_y,$ and $\sigma_z$. Each qubit is then measured independently as follows:

1. For each of the qubits in the collection associated with $\sigma_x$ we perform a $\sigma_x$ measurement. This means that the qubit is measured with respect to the basis $\{|+\rangle, |-\rangle\}$, which is an orthonormal basis of eigenvectors of $\sigma_x$, and the corresponding measurement outcomes are the eigenvalues associated with the two eigenvectors: $+1$ for the state $|+\rangle$ and $-1$ for the state $|-\rangle$. By averaging together the outcomes over all of the states in the collection associated with $\sigma_x$, we obtain an approximation of the expectation value
   $
   \langle +|\rho|+\rangle - \langle -|\rho|-\rangle = \mathrm{Tr}(\sigma_x \rho).
   $

2. For each of the qubits in the collection associated with $\sigma_y$ we perform a $\sigma_y$ measurement. Such a measurement is similar to a $\sigma_x$ measurement, except that the measurement basis is $\{|+i\rangle, |-i\rangle\}$, the eigenvectors of $\sigma_y$. Averaging the outcomes over all of the states in the collection associated with $\sigma_y$, we obtain an approximation of the expectation value
   $
   \langle +i|\rho|+i\rangle - \langle -i|\rho|-i\rangle = \mathrm{Tr}(\sigma_y \rho).
   $

3. For each of the qubits in the collection associated with $\sigma_z$ we perform a $\sigma_z$ measurement. This time the measurement basis is the standard basis $\{|0\rangle, |1\rangle\}$, the eigenvectors of $\sigma_z$. Averaging the outcomes over all of the states in the collection associated with $\sigma_z$, we obtain an approximation of the expectation value
   $
   \langle 0|\rho|0\rangle - \langle 1|\rho|1\rangle = \mathrm{Tr}(\sigma_z \rho).
   $

Once we have obtained approximations $\alpha_x \approx \mathrm{Tr}(\sigma_x \rho), \alpha_y \approx \mathrm{Tr}(\sigma_y \rho),$ and $\alpha_z \approx \mathrm{Tr}(\sigma_z \rho)$ by averaging the measurement outcomes for each collection, we can approximate $\rho$ as
$
\hat{\rho} = \frac{\mathbb{I} + \alpha_x \sigma_x + \alpha_y \sigma_y + \alpha_z \sigma_z}{2} \approx \frac{\mathbb{I} + \mathrm{Tr}(\sigma_x \rho)\sigma_x + \mathrm{Tr}(\sigma_y \rho)\sigma_y + \mathrm{Tr}(\sigma_z \rho)\sigma_z}{2} = \rho.
$

In the limit as $N$ approaches infinity, this approximation converges in probability to the true density matrix $\rho$ by the Law of Large Numbers, and well-known statistical bounds (such as Hoeffding's inequality) can be used to bound the probability that the approximation $\hat{\rho}$ deviates from $\rho$ by varying amounts.

An important thing to recognize, however, is that the matrix $\hat{\rho}$ obtained in this way may fail to be a density matrix. In particular, although it will always have trace equal to 1, it may fail to be positive semidefinite. There are different known strategies for “rounding” such an approximation $\hat{\rho}$ to a density matrix, one of them being to compute a spectral decomposition, replace any negative eigenvalues with 0, and then renormalize (by dividing the matrix we obtain by its trace).

---

## Qubit tomography using the tetrahedral measurement

Another option for performing qubit tomography is to measure every qubit $X_1, \ldots, X_N$ using the tetrahedral measurement $\{P_0, P_1, P_2, P_3\}$ described earlier. That is,
$
P_0 = \frac{|\phi_0\rangle\langle \phi_0|}{2}, \quad
P_1 = \frac{|\phi_1\rangle\langle \phi_1|}{2}, \quad
P_2 = \frac{|\phi_2\rangle\langle \phi_2|}{2}, \quad
P_3 = \frac{|\phi_3\rangle\langle \phi_3|}{2}.
$

for

$
|\phi_0\rangle = |0\rangle
$
$
|\phi_1\rangle = \frac{1}{\sqrt{3}}|0\rangle + \sqrt{\frac{2}{3}}|1\rangle
$
$
|\phi_2\rangle = \frac{1}{\sqrt{3}}|0\rangle + \sqrt{\frac{2}{3}} e^{2\pi i / 3}|1\rangle
$
$
|\phi_3\rangle = \frac{1}{\sqrt{3}}|0\rangle + \sqrt{\frac{2}{3}} e^{-2\pi i / 3}|1\rangle
$

Each outcome is obtained some number of times, which we will denote as $n_a$ for each $a \in \{0, 1, 2, 3\}$, so that $n_0 + n_1 + n_2 + n_3 = N$. The ratio of these numbers with $N$ provides an estimate of the probability associated with each possible outcome:

$
\frac{n_a}{N} \approx \mathrm{Tr}(P_a \rho)
$

Finally, we shall make use of the following remarkable formula:

$
\rho = \sum_{a=0}^{3} \left(3 \, \mathrm{Tr}(P_a \rho) - \frac{1}{2} \right) |\phi_a\rangle \langle \phi_a|
$

To establish this formula, we can use the following equation for the absolute values squared of inner products of tetrahedral states, which can be checked through direct calculations:

$
|\langle \phi_a | \phi_b \rangle|^2 =
\begin{cases}
1 & \text{if } a = b \\
\frac{1}{3} & \text{if } a \neq b
\end{cases}
$

Now, the four matrices

$
|\phi_0\rangle \langle \phi_0| = \begin{pmatrix}1 & 0 \\ 0 & 0\end{pmatrix}
$

$
|\phi_1\rangle \langle \phi_1| = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3} \\ \frac{\sqrt{2}}{3} & \frac{2}{3} \end{pmatrix}
$

$
|\phi_2\rangle \langle \phi_2| = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3} e^{-2\pi i/3} \\ \frac{\sqrt{2}}{3} e^{2\pi i/3} & \frac{2}{3} \end{pmatrix}
$

$
|\phi_3\rangle \langle \phi_3| = \begin{pmatrix} \frac{1}{3} & \frac{\sqrt{2}}{3} e^{2\pi i/3} \\ \frac{\sqrt{2}}{3} e^{-2\pi i/3} & \frac{2}{3} \end{pmatrix}
$

are linearly independent, so it suffices to prove that the formula is true when $\rho = |\phi_b\rangle \langle \phi_b|$ for $b = 0, 1, 2, 3$. In particular,

$
3 \, \mathrm{Tr}(P_a |\phi_b\rangle \langle \phi_b|) - \frac{1}{2} = \frac{3}{2} |\langle \phi_a | \phi_b \rangle|^2 - \frac{1}{2} =
\begin{cases}
1 & \text{if } a = b \\
0 & \text{if } a \neq b
\end{cases}
$

and therefore

$
\sum_{a=0}^{3} \left(3 \, \mathrm{Tr}(P_a |\phi_b\rangle \langle \phi_b|) - \frac{\mathrm{Tr}(|\phi_a\rangle \langle \phi_a|)}{2} \right) |\phi_a\rangle \langle \phi_a| = |\phi_b\rangle \langle \phi_b|
$

We arrive at an approximation of $\rho$:

$
\tilde{\rho} = \sum_{a=0}^{3} \left( \frac{3 n_a}{N} - \frac{1}{2} \right) |\phi_a\rangle \langle \phi_a|.
$

This approximation will always be a Hermitian matrix having trace equal to one, but it may fail to be positive semidefinite. In this case, the approximation must be "rounded" to a density matrix, similar to the strategy involving Pauli measurements.
