# Probabilistic Computing
---

The field of probabilistic computing, where the concept of randomness is harnessed to enhance certain computations, is currently experiencing a resurgence driven mostly by the explosion of machine learning and AI algorithms that exploit these effects. However, in the context of classical circuits, randomness is almost always associated with the unwanted effects from noise. Bits stored in magnetic drives can randomly flip when exposed to certain magnetic fields, a **NOT** gate could mistakenly invert its output if there's a glitch in its power supply, and every so often, a cosmic ray particle can cause enough charge accumulation to modify values in memory. 

Therefore, a natural way to introduce the concept of randomness into digital logic is by "upgrading" the well-defined (deterministic) binary vector $\vec{b}$ we introduced in the previous chapter, into a probability vector $\vec{p}$, where its components represent the probability of measuring a $0$ or a $1$. This vector will then correspond a representation of a probabilistic bit, or p-bit for short.

## 1. Probabilistic Systems

### 1.1 Probabilistic Single-Bit States (p-bits)

Let us start by considering the case of a noisy **NOT** gate that, whenever its input is a $0$, the output is either $0$ with probability of a quarter, or a $1$ with probability of three quarters. Physically, this gate could be a model for a poorly-designed connection in a circuit where a clock signal gets coupled every so often changing the behavior of the gate. We could represent the output state of uncertainty of this gate with a column vector where the top element is the probability of the output being $0$, and the bottom element the probability of measuring a $1$:

$$ \vec{p} = \begin{bmatrix} \frac{1}{4} \\ \frac{3}{4} \end{bmatrix} .$$ 

More generally, we could define this vector to take arbitrary probability values $\varrho_{0}$ and $\varrho_{1}$ for the occurrence of a $0$ or a $1$, respectively:

$$ \vec{p} = \begin{bmatrix} \varrho_{0} \\ \varrho_{1} \end{bmatrix} .$$

This vector then fully represents the state of a probabilistic bit (p-bit). Now, recall from the previous chapter that the magnitude of a vector is given by its Euclidean norm:

$$ |\vec{p}| = \sqrt{\varrho_{0}^2 + \varrho_{1}^2} .$$

And because probabilities must always add up to $1$, the length of our vector will actually changes depending on the values of $\varrho_{0}$ and $\varrho_{1}$. For example, for $\vec{p} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$, we get $|\vec{p}| = 1$. On the other hand, if $\vec{p} = \begin{bmatrix} \frac{1}{4} \\ \frac{3}{4} \end{bmatrix}$, then $|\vec{p}| \approx 0.79$. The dotted line in the figure below shows how the magnitude of the probability vector changes for different probability values:

<img src="images\01_04_01_prob_vec_length.png" align = "center" width="200"/>

Although this is the traditional way of defining a [probability vector](https://en.wikipedia.org/wiki/Probability_vector), it would be nice if we could simply tie the fact that probabilities must add to $1$ with having the length of our vector also be always equal to $1$. We can easily accomplish this if we define our vector elements $\varrho_i$ such that their value **squared** correspond to the probabilities of interest. In other words, we want to satisfy the condition $ \varrho_{0}^2 + \varrho_{1}^2 = 1$ rather than simply $ \varrho_{0} + \varrho_{1} = 1$. This way, it is guaranteed we will always have $|\vec{p}| = \sqrt{\varrho_{0}^2 + \varrho_{1}^2} = 1$. Graphically, we now have a vector whose amplitude lies on a circular section of radius $1$:

<img src="images\01_04_02_prob_amps_vec_length.png" align = "center" width="200"/>

With this new condition, the probability vector that describes our example above will instead be given by:

$$ \vec{p} = \begin{bmatrix} \sqrt{\frac{1}{4}} \\ \sqrt{\frac{3}{4}} \end{bmatrix} .$$ 

At first glance, this looks like a rather impractical change. Now we need to square our vector elements to get  probabilities, which at the end, are the numbers we really care about. But what's comforting about taking this step is that, this type of vector is a subset of those that describe quantum states! The main difference is that, in the case of p-bits (described by probability vectors), their components can only take values between $0$ and $1$ (i.e., $\varrho_i \in [0,1]$); on the other hand, for qubits (described by quantum vectors), we will allow their components to be complex-valued. These vector components are usually referred as probability amplitudes (or just amplitudes), so we will start using this terminology moving forward. Furthermore, the reason behind why qubits must be complex-valued will be described in the next chapter.

Another way to express our p-bit vector is as a linear combination of the unit vectors $\hat{0}$ and $\hat{1}$ scaled by the respective probability amplitudes:

$$
\begin{aligned}
\vec{p} &= \varrho_0 \hat{0} + \varrho_1 \hat{1}
\\
\\
\vec{p} &= \varrho_0 \begin{bmatrix} 1 \\ 0 \end{bmatrix} + \varrho_1 \begin{bmatrix} 0 \\ 1 \end{bmatrix}
\end{aligned}
$$

This representation shows that, if we want to a mathematical expression to extract the amplitude $\varrho_0$ (or $\varrho_1$) from the probability vector, we could perform the [dot product](https://en.wikipedia.org/wiki/Dot_product) between $\vec{p}$ and $\hat{0}$ (or $\hat{1}$). The dot product is a measure of the overlap between two vectors, and since $\hat{0}$ and $\hat{1}$ are orthogonal, the overlap between the two of them is always $0$. Let's see this explicitly by first recalling the definition of the dot product between two general vectors $\vec{x} = \begin{bmatrix} x_0 \\ x_1 \end{bmatrix}$ and $\vec{y} = \begin{bmatrix} y_0 \\ y_1 \end{bmatrix}$:

$$ \langle \vec{x}, \vec{y} \rangle =  \vec{x}^\top \vec{y} $$

Here, the symbol $^\top$ denotes the transpose of $\vec{x}$, which is obtained by turning a column vector into a row vector:

$$ 
\begin{aligned}
\langle \vec{x}, \vec{y} \rangle &= \begin{bmatrix} x_0 & x_1 \end{bmatrix} \begin{bmatrix} y_0 \\ y_1 \end{bmatrix}
\\
\\
\langle \vec{x}, \vec{y} \rangle &= x_0 y_0 + x_1 y_1
\end{aligned}
$$

Performing the inner product between $\hat{0}$ and $\vec{p}$:

$$ 
\begin{aligned}
\langle \hat{0}, \vec{p} \rangle &= \begin{bmatrix} 1 & 0 \end{bmatrix} \begin{bmatrix} \varrho_0 \\ \varrho_1 \end{bmatrix}
\\
\\
\langle \hat{0}, \vec{p} \rangle &= 1 \times \varrho_0 + 0 \times \varrho_1
\\
\\
\langle \hat{0}, \vec{p} \rangle &= \varrho_0
\end{aligned}
$$

And the same can be shown for $\hat{1}$. Therefore, we define the probability a p-bit being in state $0$ as:

$$ \text{P}_0 = \langle \hat{0}, \vec{p} \rangle^2 = \varrho_0^2 $$

And similarly, for a p-bit being in state $1$:

$$ \text{P}_1 = \langle \hat{1}, \vec{p} \rangle^2 = \varrho_1^2 $$

### 1.2 Probabilistic Single-Bit Gates

Let's go back to our noisy **NOT** gate which, for an input of $0$ results in an output of $0$ with probability $1/4$, and an output of $1$ with probability $3/4$. Now, it turns out the gate is not noisy at all if its input is $1$! This means that its output is $0$ with probability of $1$, and $1$ with probability $0$. We can then construct a matrix to represent this gate

circuits by, for example, adding probabilistic elements (gates) that model some of these effects. 