$$\newcommand{\ket}[1]{\left|{#1}\right\rangle}$$$$\newcommand{\bra}[1]{\left\langle{#1}\right|}$$$$\newcommand{\braket}[2]{\left\langle{#1}\middle|{#2}\right\rangle}$$
# Single systems

This notebook introduces the basic framework of *quantum information*, including the description of quantum states as vectors with complex number entries, measurements that allow classical information to be extracted from quantum states, and operations on quantum states that are described by unitary matrices.


## 1. Classical information

To describe quantum information we will begin with an overview of *classical* information. 


### 1.1 Classical states and probability vectors

Suppose that we have a *system* that stores information. And let's assume that this system can be in one of a finite number of *classical states* at each instant. The simplest example which is at the foundation of information theory, is that of a *bit*, which is a system of classical states 0 and 1.

Let us give the name $X$ to the system being considered, and let us denote $\Sigma$ as the set of classical states of $X$. As well as the assumption that $\Sigma$ is finite, we also have to assume that $\Sigma$ is nonempty: it makes no sense for a system to have no states at all.

For example, if $X$ is a bit, then $\Sigma = \{0, 1\}$, or if $X$ is a regular die, then $\Sigma = \{1, 2, 3, 4, 5, 6\}$.

Often in information processing, the knowledge of $X$ is uncertain. We represent our knowledge of the classical state of $X$ by assigning probabilities to each classical state, resulting in a *probabilistic state*.

For example, again assume $X$ is a bit. Based on what has happened preivously, we might believe that $X$ is in the classical state 0 three out of four times, and subsequently in the classical state 1 one out of four times. We can represent this belief by writing:
$$Pr(X = 0) = \frac{3}{4} \text{   and   } Pr(X = 1) = \frac{1}{4}$$.

But a more efficient way of representing this probabilistic state is by a column vector:
$$\begin{pmatrix}\frac{3}{4}\\\frac{1}{4}\end{pmatrix}$$

We can represent any probabilistic state through a column vector statisfying two main properties:

1. All entries of the vector are *nonnegative real numbers*.
2. The sum of the entires is equal to 1.


### 1.2 Measuring probabilistic states

Now we have to consider what happens if we were to *measure* a system when it is in a probabilistic state. By measuring a system, we mean that we look at the system and unambigiously recognize the classical state it is in. Intuitively, we can never "see" a system in a probabilistic state; a measurement yields exactly one of the allowed classical states.

Measurements change our knowledge of a system, and therefore changes the probabilistic state that we associate with that system: if we recognize that $X$ is in the classical state $\alpha \in \Sigma$, then the new probability vector representing our knowledge of $X$ becomes a vector having a $1$ in the entry corresponding to $\alpha$ and $0$ for all other entries. This vector indicates that $X$ is in the classical state $\alpha$ with certainty, which we know, since we just measured and recognized it.

We denote the vector having a $1$ in the entry corresponding to $\alpha$ and $0$ for all other entries by $\ket{\alpha}$ which is called "ket $\alpha$", vectors like these are also called *standard basis vectors*.

For example, if the system in question is a bit, the standard basis vectors are given by
$$\ket{0} = \begin{pmatrix}1\\0\end{pmatrix} \text{  and  } \ket{1} = \begin{pmatrix}0\\1\end{pmatrix}$$

Remember that any two-dimensional column vector can be represented as a linear combination of these two vectors. For example
$$\begin{pmatrix}\frac{3}{4}\\\frac{1}{4}\end{pmatrix} = \frac{3}{4}\ket{0} + \frac{1}{4}\ket{1}$$.

This fact naturally generalizes to any classical state: any column vector is a linear combination over the classical states. Returning to the change of a probabilistic state uopn being measured. Suppose that we flip a fair coin, but cover up the coin before looking at it. We could then say that its probabilistic state is
$$\begin{pmatrix}\frac{1}{2}\\\frac{1}{2}\end{pmatrix} = \frac{1}{2}\ket{\text{ heads}} + \frac{1}{2}\ket{\text{ tails}}$$.

Here, the classical state $\Sigma$ of our coin $X$ is $\{\text{ heads}, \text{ tails}\}$, we'll choose to order these states as heads first, tails second.
$$\ket{\text{ heads}} = \begin{pmatrix}1\\0\end{pmatrix} \text{   and   } \ket{\text{ tails}} = \begin{pmatrix}0\\1\end{pmatrix}$$

If we were to uncover the coin and look at it, we would see either one of the two classical states: heads or tails. Supposing that the result were tails, we would update our description of the probabilistic state of the coin so that it becomes $\ket{\text{ tails}}$.

One final remark concerning measurements of probabilistic states: they may describe knowledge or belief, not necessarily something actual. The state of our coin after we flip it, but before we look, is either heads or tails, and we simply do not know which until we look. Doing so does not actually change the state, but only our knowledge of it. Upon seeing that the classical state is tails, we naturally update our knowledge by assigning the vector $\ket{\text{ tails}}$ to the coin, but to someone else who did not see the coin when it was uncovered, the probabilisitc state remains unchanged.

### 1.3 Classical operations

In this last part of the summary of classical information, we will consider the sorts of operations one might perform on a classical system.

### Deterministic operations
First, there are *deterministic* operations, where each classical state $\alpha \in \Sigma$ is transformed into $f(\alpha)$ for some function $f$ of the form $f: \Sigma \to \Sigma$.

For example, if $\Sigma = \{0, 1\}$, there are four functions of this form $f_1, f_2, f_3, \text{ and } f_4$, which can be represented by tables of values as follows:

| $\alpha$ | $f_1$ | | $\alpha$ | $f_2$ | | $\alpha$ | $f_3$ | | $\alpha$ | $f_4$ |
|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|:-|
| 0 | 0 | | 0 | 0 | | 0 | 1 | | 0 | 1 |
| 1 | 0 | | 1 | 1 | | 1 | 0 | | 1 | 1 |

The first and last of these functions are *constant*: $f_1(\alpha) = 0$ and $f_4(\alpha) = 1$, $\forall \alpha \in \Sigma$. The middle two are not constant, they are *balanced*, meaning that the two possible output values occur the same number of times as we range over the input space. The function $f_2$ is the *identity function*: $f_2(\alpha) = \alpha$, $\forall \alpha \in \Sigma$. And $f_3$ is the NOT-function, $f_3(0) = 1$ and $f_3(1) = 0$.

The actions of deterministic operations on probabilistic states can be represented by matrix-vector multiplication. More specifically, the matrix $M$ that represent a given function $f:\Sigma \to \Sigma$ is the one that satisfies
$$M\ket{\alpha} = \ket{f(\alpha)}, \forall \alpha \in \Sigma,$$
such a matrix always exists and is always unique.

For example, the matrices $M_1, ..., M_4$ corresponding to the functions $f_1, ..., f_4$ above are as follows:
$$M_1 = \begin{pmatrix}1 & 1\\0 & 0\end{pmatrix}, M_2 = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}, M_3 = \begin{pmatrix}0 & 1\\1 & 0\end{pmatrix}, M_4 = \begin{pmatrix}0 & 0\\1 & 1\end{pmatrix}.$$

Matrices that represent deterministic operations always have exactly one $1$ in each column, and $0$ for all other entires. A convenient way to represent matrices of these and other forms makes use of an analogous notation for row vectors to the one for column vectors discussed previously. We denote by $\bra{\alpha}$ the *row* vector having a $1$ in the entry corresponding to the value $\alpha$ and $0$ for all other entires, $\forall \alpha \in \Sigma$. This vector is read as "bra $\alpha$".

For example, if $\Sigma = \{0, 1\}$, then
$$\bra{0} = \begin{pmatrix}1 & 0\end{pmatrix} \text{   and   } \bra{1} = \begin{pmatrix}0 & 1\end{pmatrix}.$$

For an arbitrary choice of classical state set $\Sigma$, viewing row vectors and column vectors as matrices and performing the matrix multiplication $\ket{\beta}\bra{\alpha}$, one obtains a square matrix having a $1$ in the entry corresponding to the pair $(\beta, \alpha)$, meaning that the row of the entry corresponds to $\beta$ and the column corresponds to $\alpha$, and $0$ for all other entries. For example,
$$\ket{0}\bra{1} = \begin{pmatrix}0 & 1\\0 & 0\end{pmatrix}.$$

Using this notation, for any function $f:\Sigma \to \Sigma$, we may express the matrix $M$ corresponding to the function $f$ as 
$$M = \sum_{\alpha \in \Sigma}\ket{f(\alpha)}\bra{\alpha}.$$

Now, if we again think abotu vectors as matrices, but this time consider the multiplication $\bra{\alpha}\ket{\beta}$, then we obtain a $1\times1$ matrix, which we can think about as a scalar. For the sake of tidiness, we write this product as $\braket{\alpha}{\beta}$. This product satisfies the following simple formula:
$$\braket{\alpha}{\beta} = \begin{cases}1 & \alpha=\beta\\0 & \alpha\neq\beta\end{cases}.$$

Using this observation, together with the knowledge that matrix multiplication is associative and linear, we obtain
$$M\ket{\beta} = \biggl( \sum_{\alpha \in \Sigma}\ket{f(\alpha)}\bra{\alpha}\biggr)\ket{\beta} = \sum_{\alpha \in \Sigma}\ket{f(\alpha)}\braket{\alpha}{\beta} = \ket{f(\beta)}, \forall \beta \in \Sigma,$$
which is exactly what we require of $M$.

As we also will discuss later, $\braket{\alpha}{\beta}$ may be seen as the *inner product* between the vectors $\ket{\alpha}$ and $\ket{\beta}$. The "bracket"/braket notation and terminology is due to *Paul Dirac*, and for this reason is also known as the *Dirac notation*.
