# What is an atom?

I don't mean an atom like on the periodic table, which is really a complicated enough thing of its own "made of" electrons, protons, neutrons, and so forth. I mean, an actual atom. The word in Ancient Greek means something indivisible. Sometimes "atomism" is characterized as the belief that the world is made of tiny little hard impenetrable pebbles whizzing around in empty space, perhaps, according to Lucretius, swerving from time to time in a pique of will. But in a literal sense, atomism is the idea that the world is in some sense composed of indivisibles. We can divide something up, but at some point we won't be able to divide any longer. And what's left are the atoms. 

One could interpret this "materialistically" that what's left is tiny little bits of "matter", perhaps differently shaped, perhaps of different substances, that can fit into each other in different ways. But "idealistically", we could interpret this as a reflection on the fact that when we analyze an idea, what we do is we reduce it down to a bunch of basic concepts fitted together into a jigsaw; and the most basic concepts have to be taken for granted, atomically. Of course, an idea can be interpreted in many ways, expressed in different ways as a complex of perhaps different basic concepts. So that from one point of view, a concept can be resolved into these atoms; whereas from another point of view, the concept can be resolved into some other notion of atoms. We prove our understanding to each other by demonstrating how we can reduce some shared concept into some agreed upon atoms. And arguably, what we mean by "understanding" is the ability to apprehend what something "is made of" immediately upon recognizing it. 

It's worth observing that when the materialist wants to have their atoms be tiny little "physical" cubes or octahedra, knocking around, naturally selecting, in actual practice, what they really need to do is plausibly reduce *all their concepts* down into a single set of basic atomic concepts, which correspond exactly to the ideas of "the tiny hard impenetrable sphere," "the tiny little pyramid," out in the real world. When put this way, it actually seems rather extravagant. While it is "obvious" in some sense that ideas can be analyzed into atomic ideas, it would be really quite something if *all* concepts could be reduced to jigsaws made of exactly the same ideas. It would mean all those inequivalent ways of analyzing concepts were really worthless: after all, why not just use the universal ideas, the universal atoms? 

The materialist could respond: The real world is made up of one set of atoms; it's only in the mind that we can analyze concepts into seemingly different inequivalent atomic schemes, leading to the disagreement and confusion. The response at this juncture can only be that if you're willing to consider that, why not consider the case where, it's only in the mind that all concepts can be reduced down to a single set of ur-concepts (your own!), whereas in the real world, real things can be reduced down to ur-concepts in different inequivalent ways.

The only way to proceed is to actually try to do it: we have to try to divide the world and our concepts into universal atoms and see what happens.

<hr>


So we begin with an atom. We symbolize it with a pebble: $\Large\bullet$.  Tiny, hard, impenetrable. Point-like. You can turn over a real pebble in your hand. Of course, you can split an actual pebble, but maybe you imagine splitting it again and again, and eventually maybe you can't split it any more: and we're talking about that, that ur-pebble at the bottom in the dust, separated from its fellow dust. Or from another point of view, we're tracing the roots of our human concept of a "pebble" and what we can do with it.

So we have a pebble: $\Large\bullet$. What can we do with it? Well, if we had it, we could lose it, get rid of it, take it away. We'll denote the absence of the pebble with $\Large\circ$.

For later reasons, I want to call what we just did: homogenization. For now, the terminology isn't important. The point is in our way of representing things, we have a way of symbolizing both a thing and its absence. And indeed, this is how we perceive the situation: the empty space there in your hand is just inviting me to place a pebble in it. To understand the concept of a pebble is to see the world as one giant set of opportunities to place a pebble somewhere where it's absent or to take a pebble from where it sits. 

Furthermore, homogenization is necessary for reliable communication. If we can in fact recognize them, we can use the presence or absence of a pebble to communicate something perhaps totally unrelated to the pebble. A little rock on a ledge by a window, a sign that the riots begin tomorrow. But there's a catch! Just the same, we could have prepared a little surprise, which is that it would be precisely the *absence* of the pebble that would signify the riots begin tomorrow, whereas its presence would have signified regrouping. So it's clear: in order to communicate something with a pebble, we have to both understand pebbles, to be able to pick them out of the background of perception, promoting their presence or absence to the foreground, but also: we have to make an ultimately arbitrary choice about whether presence or absence will significant, which one is $0$ and which one is $1$. The choice is arbitrary, but a choice must be made. And this fact means that if we swap pebbles for absences, we can on the flip side correct our rule for signification by swapping them in the same way, riots for regrouping.

<hr>

So far we've just been really considering a single pebble and what we can do with it. 

But, if $1$ pebble is sitting there, why not put another one beside it? So now we have $2$ pebbles: $ \Large \{ \Large\bullet, \Large\bullet \}$. We could add another pebble. Then we'd have $3$ pebbles: $ \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \}$. And, all things being equal, there's no limit to our being able to add more pebbles. And so, we've discovered counting up. And naturally, if we can count up, we can count down, removing pebbles until we're back to our original pebble. 

One way to look at this is that we're providing more context for our pebble $\Large\bullet$. Picture it like our pebble is actually floating in an infinite sea of pebble-absences: $ \Large \{\dots, \Large\circ, \Large\bullet, \Large\circ, \dots \}$. Each absence represents an opportunity to add another pebble. You can take the bait: $ \Large \{\dots, \Large\circ, \Large\bullet, \Large\bullet, \dots \}$, and again: $ \Large \{\dots, \Large\bullet, \Large\bullet, \Large\bullet, \dots \}$, without limit: which implies there's always one more $\Large\circ$ left no matter how many $\Large\bullet$'s you add, just as you started with a single $\Large\bullet$ in a sea of $\Large\circ$'s. Of course the order doesn't matter. If before we had atoms, we now have composites made of these atoms, in which the atoms all coexist simultaneously, atop each other, however you want to look at it.

We've discovered the counting numbers $1, 2, 3, \dots$. You could picture them like: a bunch of pebbles in a pile. But notice we start counting at 1. This is necessary because we'd like to preserve the original symmetry, the fact that we could swap a pebble for its absence, as well as our interpretations, and still get a successful signification. In this case, we just flip the colors on all the pebbles. You could imagine it as "a universe which consists on a single pile of pebbles (or absences)."
<hr>

Well, naturally we'd be interested in having multiple piles of pebbles to play with.

What does it mean to group pebbles into piles? We could imagine bringing the pebbles closer and closer together until actually they're in the "same place," but since that's tricky, we signify grouping instead by drawing a circle around the pebbles, or just kind of making a little stack of them in a pile. The order/placement of the pebbles doesn't matter, just the fact that the pebbles are all in there, and there is some number of them. And just as we can have multiple pebbles in a pile, we can have multiple piles lying around. (Living analogously in a single "pile of piles" which provides the background context, just as a single pile provided the background context for a single pebble. But to that we will come in time. For now, we're just thinking about having multiple piles.)

What's the most basic thing we could do with piles? Well, we could combine piles. This is called "addition."

$ \Large \{ \Large\bullet, \Large\bullet \} + \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \} = \Large \{ \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet \} $

In other words, $2 + 3 = 5$. We could think of this like iterated counting up: we count up by as many pebbles are in the pile, "all at once," as opposed to one at a time, and if we think in terms of combining piles obviously: $2 + 3 = 3 + 2 = 5$. But if we want an inverse operation, iterated counting down, where we count down by as many pebbles are in a pile, we suddenly have a problem: what if we subtract past 1 to 0 or what's words "negative numbers"? For example, $2 - 3 = -1$:

$ \Large \{ \Large\bullet, \Large\bullet \} - \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \} = \Large \{ \Large{\color{red} \bullet} \} $

Clearly, we have to introduce a second kind of pebble (with its own kind of absence) to keep track of our "debts." The rule is that if a $\Large\bullet$ and a $\Large{\color{red} \bullet}$ are in a pile together, we can remove them both: $\Large \{ \Large\bullet, \Large{\color{red} \bullet} \} \rightarrow \Large\{ \}$. So now we can represent an empty pile, $0$, just a circle with nothing in it. Or with colored pebbles themselves as any pile with an equal number of $\Large\bullet$'s and $\Large{\color{red} \bullet}$'s. Black and red pebbles can't coexist in a pile together, and so eventually after all the credits and debts cancel out, the pebbles in a pile are all red, all black, or 0. And so we have a new kind of freedom in our representation: we can add arbitrary pairs of black and red pebbles to any pile and this won't change the number that it represents: $\Large \{ \Large\bullet, \Large\bullet,  \Large{\color{red} \bullet}, \Large{\color{red} \bullet}, \Large{\color{red} \bullet}, \Large{\color{red} \bullet}\} \rightarrow \Large\{ \Large{\color{red} \bullet}, \Large{\color{red} \bullet} \}$.

We started with an atomic pebble. We then joined pebbles into a complex we called a pile, in which multiple pebbles coexist. We then decided to work at an even higher level where we can have multiple piles, just as before we had multiple pebbles. We've invented the "integers": $ \ldots -3, -2, -1, 0, 1, 2, 3, \ldots$. 

<hr>

As if inevitably, the dialectic carries on. If we can iterate counting up to get addition, we should be able to iterate addition to get: multiplication, whose "all-at-once" inverse is division. And just as before, we graduate to a new notion of number, a new interpretation of our atoms: the rational numbers, which can be regarded as "piles of piles." Here's how it works.

If we have two piles of pebbles $A$ and $B$, we can multiply them by iterating the addition of $A$ by the number of pebbles in $B$, or vice versa.

$ \Large \{ \Large\bullet, \Large\bullet \} \times \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \} = \Large \{ \Large\bullet, \Large\bullet \} + \Large \{ \Large\bullet, \Large\bullet \}  + \Large \{ \Large\bullet, \Large\bullet \} = \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \} + \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \} $

$ \Large \{ \Large\bullet, \Large\bullet \} \times \Large \{ \Large\bullet, \Large\bullet, \Large\bullet \} = \Large \{ \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet \} $

$ 2 \times 3 = 3 \times 2 = 6 $

If we wanted we could display the last identity like this:

$ 2 \times 3 = 3 \times 2 = \begin{matrix} \Large\bullet & \Large\bullet \\\ \Large\bullet & \Large\bullet \\\ \Large\bullet & \Large\bullet \end{matrix} = \begin{matrix} \Large\bullet & \Large\bullet & \Large\bullet \\\ \Large\bullet & \Large\bullet & \Large\bullet \end{matrix} $

And correspondingly: $\frac{6}{3} = 2 $ and $ \frac{6}{2} = 3$. And we have it that $\frac{n}{1} = n$ for any $n$.

Now an important structure arises out of the interplay between addition and multiplication: the prime numbers. A prime number is a number than is only divisible by itself or $1$. Whereas $12$ could be expressed $2 \times 2 \times 3$, $7$ is just $7 \times 1$. The primes go like: $2, 3, 5, 7, 11, 13, 17, 19, \dots$. And in fact, there are an infinite number of primes. The proof goes back at least to Euclid.

First we observe, that every 2nd number is divisible by 2, every 3rd number is divisible by 3, every 4th number is divisible by 4, every 5th number is divisible by 5, and so on. It follows that if we add $1$ to any number, it won't be divisible by any of its old primes. For example, $14 = 2 \times 7$, but $15 = 3 \times 5$. If we think about it "musically," $14$ falls on the beat of $2$ and $7$: as I say, every 2nd number is divisible by $2$, every 7th number is divisible by $7$, and both beats fall on $14$. But if we add $1$ to $14$, it can't fall on the $2$-beat nor the $7$-beat, and this is a general rule.

$ 1 \mid \color{green}2 \mid \color{blue}3 \mid \color{green}2 \times \color{green}2 \mid \color{orange}5 \mid \color{green}2 \times \color{blue}3 \mid \color{purple}7\mid \color{green}2 \times \color{green}2 \times \color{green}2 \mid \color{blue}3 \times \color{blue}3 \mid \color{green}2 \times \color{orange}5 \mid \color{pink}{11}\mid \color{green}2 \times \color{green}2 \times \color{blue}3 \dots $

So suppose there were a largest prime number. We could then take that prime and all the prime numbers lower than the largest, multiply them all together and add $1$: this number couldn't be divisible by any of our known primes! And so, it must contain a yet larger prime within it. Hence, our initial assumption has led to a contradiction: therefore, the converse is true: there *are* an infinite number of prime numbers.

Now it is a fact that every counting number can be broken down uniquely into a product of primes. Which suggests the following idea. What if we introduce now an infinite number of colored pebbles, one for each prime? Under the hood, each will be just a pile of pebbles.

$ 1 \rightarrow \Large{\color{black} \bullet} \rightarrow \{ \Large\bullet \}$

$ 2 \rightarrow \Large{\color{green} \bullet} \rightarrow \{ \Large\bullet, \Large\bullet \}$

$ 3 \rightarrow \Large{\color{blue} \bullet} \rightarrow \{ \Large\bullet, \Large\bullet, \Large\bullet \}$

$ 5 \rightarrow \Large{\color{orange} \bullet} \rightarrow \{ \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet \}$

$ 7 \rightarrow \Large{\color{purple} \bullet} \rightarrow \{ \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet \}$

$ 11 \rightarrow \Large{\color{pink} \bullet} \rightarrow \{ \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet \}$

We could now write $12$ like this:

$ 12 = 2 \times 2 \times 3 = \Large\{ \Large{\color{green} \bullet}, \Large{\color{green} \bullet}, \Large{\color{blue} \bullet} \} = \Large\{ \{ \Large\bullet, \Large\bullet \}, \{ \Large\bullet, \Large\bullet \}, \{ \Large\bullet, \Large\bullet, \Large\bullet \} \} = \{ \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet, \Large\bullet \}$

So we have a new kind of pile: a pile of prime numbers. And the rule is: we *multiply* elements in a pile of primes together into a composite. Just as pebbles coexist "unordered" in an "additive pile," prime pebbles coexist "unordered" in a "multiplicative pile." At this level, the primes are our new atoms and we interpret "piles of piles" multiplicatively. We can have, of course, multiple "piles of piles" and we can combine them just like piles:

$ \Large\{ \Large{\color{green} \bullet}, \Large{\color{green} \bullet} \Large\} \times \Large\{ \Large{\color{orange} \bullet} \} \rightarrow \Large\{ \Large{\color{green} \bullet}, \Large{\color{green} \bullet}, \Large{\color{orange} \bullet} \Large\}$ 

$ 2 \times 5 = 10$

But what happens if we do something crazy like $\frac{3}{2}$? There's no counting number $n$ such that $n \times 2 = 3$. But if we can multiply, we must be able to divide. So again we have to introduce a new kind of number: a rational number. We just interpret $\frac{3}{2}$ as a fraction. We could think of this like introducing a little inverse pebble for each prime.

$ \frac{1}{1} \rightarrow \bar{\Large{\color{black} \bullet}}$

$ \frac{1}{2} \rightarrow \bar{\Large{\color{green} \bullet}}$

$ \frac{1}{3} \rightarrow \bar{\Large{\color{blue} \bullet}}$

$ \frac{1}{5} \rightarrow \bar{\Large{\color{orange} \bullet}}$

$ \frac{1}{7} \rightarrow \bar{\Large{\color{purple} \bullet}}$

$ \frac{1}{11} \rightarrow \bar{\Large{\color{pink} \bullet}}$

And the rule is that:

$\Large\{ \Large{\color{green} \bullet}, \bar{\Large{\color{green} \bullet}} \Large\} \rightarrow \Large\{ \Large{\color{black} \bullet} \Large\}$

$ 2 \times \frac{1}{2} = 1$.

If a prime pebble and its inverse appear in a pile together, we can remove them both: which is the same as replacin them with a $1$ pebble. And actually we can add as many $1$ pebbles as we please without changing the number represented by the pile. Moreover, we can add any number of pairs of primes and inverses, and our pile will still represent the same number. Once all the cancellations have taken place, we say we've reduced the rational number to "lowest terms." 

We can take advantage of this freedom to add rational numbers. First, we have to find a common denominator, and then we can add the numerators without trouble.

$ \frac{1}{5} + \frac{2}{3} = \frac{3}{15} + \frac{10}{15} = \frac{13}{15} $

Finally, we also have our $-1$ pebble: $\Large{\color{red} \bullet}$, which can make piles negative. We can toss pairs of these pebbles into any multiplicative pile without changing the number.

$ \Large{\color{black} \bullet} \times \Large{\color{red} \bullet} = \Large{\color{red} \bullet}$ 

$ \Large{\color{red} \bullet} \times \Large{\color{red} \bullet} = \Large{\color{black} \bullet}$ 

If integers can keep track of credits and debts, which is related to the ability to arbitrarily decide what to regard as "0", rational numbers can keep track of different denominations, which is related to the ability to arbitrarily decide what to regard as "1".

For instance, maybe I have a ruler $A$ and I measure something to be $3$ units. You have a ruler $B$, and each unit of your ruler is worth two of mine: $\frac{yours}{mine}$. We could describe the relationship between our rulers with a rational number: $\frac{1}{2}$. So you would measure the same object to be $3 \times \frac{1}{2}$ or $\frac{3}{2}$ in your units. So rational numbers allow us to switch between different "currencies": it provides the exchange rate from gold to silver, dollars to cents, hours to minutes. Indeed, we recognize that if we want to communicate a number, in general we need to provide more context since we could be using different units to measure our numbers: we need to agree on how much of your "1" there is per my "1".

Finally, there remains a question about $0$. What happens if we take $\frac{1}{0}$? If we want to be able to add, subtract, multiply, and divide all our numbers, we need to find an interpretation of this. (This assumes we've added new kind of pebble for $0$, which must have an inverse.) So we actually need one last number.

We say: $\frac{1}{0} \rightarrow \infty$ and $\frac{1}{\infty} = 0$. But if that's the case, then $-\frac{1}{\infty}$ must also be $0$ since $-0 = 0$. So we have the idea of identifying positive and negative $\infty$.

Here's one way to think of this:

![](img/2d_stereographic_projection.png)

The idea is imagine all the rational numbers except $\infty$ arranged on a line. This can be done because every rational number divides the rationals into those less than and those greater than it. We then imagine a point at infinity beyond all the points on the left, and a point at infinity beyond all the points on the right, and we say that it's the same infinity. What we've done is wrap up the number line into a circle. Every rational number can be uniquely mapped to the circle via a stereographic projection. This we know: we can think about a rational number as divisions of a pie.

<hr>

The saga continues, and here's where things may begin to come as a surprise. The story so far is ultimately an old one: almost all the aforementioned was known even to the world of Antiquity. But even then, there was recognized to be an anomaly: the irrational numbers.

If we continue on our path, we ought now to iterate multiplication to get exponentiation, whose inverse is root-taking. If addition is repeated counting all-at-once, and multiplication is repeated addition all at once, then exponentiation is repeated multiplication all-at-once. 

$2^{3} = 2 \times 2 \times 2 = 8$

Notice that the order matters:

$3^{2} = 3 \times 3 = 9$.

Root-taking is just the reverse:

$ \sqrt[3] 8 = 2 $

$ \sqrt 9 = 3$

But what about something like $\sqrt 2$?

Indeed, Pythagoras tells us that if we have a right triangle, then the relationship between the sides $a$ and $b$ and the hypoteneuse $c$ is:

$a^{2} + b^{2} = c^{2}$

![](img/pythagorean_theorem.png)

Suppose $a = 1$ and $b = 1$, then:

$ 1^{2} + 1^{2} = c^{2} \rightarrow c = \sqrt 2$

There is a famous proof that the square root of 2 can't be any rational number. It is again a proof by contradiction.

Suppose $\sqrt 2 = \frac{p}{q}$, where $\frac{p}{q}$ is a rational number in lowest terms, so that $p$ and $q$ share no prime factors in common. Then:

$\sqrt 2 = \frac{p}{q}$

$ 2 = \frac{p^{2}}{q^{2}}$

$ 2q^{2} = p^{2} $

This says that $p^{2}$ has a factor of $2$, which actually means that $p$ has a factor of $2$ and $p^{2}$ has a factor of $4$. Let's factor out that $4$ by introducing a new symbol $r$.

$ 2q^{2} = 4r^{2}$, where $r^{2} = \frac{p^{2}}{4}$

We can then divide out by 2.

$ q^{2} = 2r^{2}$

This says that $q^{2}$ has a factor of $2$, really a factor of $4$, just as before. So we've shown that both $p$ and $q$ are even, but we assumed at the beginning that $p$ and $q$ had no factors in common! Therefore our initial assumption was wrong:

$ \sqrt 2 \neq \frac{p}{q} $

In other words, $\sqrt 2$ is not any of our rational numbers. It must be a new kind of number, an irrational number. 

There's something to do with 2D. We can use a rational number to convert between a horizontal ruler and a vertical ruler; but we can't use a rational number to convert between those rulers and a ruler at $45^{\circ}$. There is no way to come to a common "1" between them. Think about it like: the triangle is actually being displayed by tiny square pixels on the computer screen. This is no problem if we have horizontal and vertical lines, but a diagonal line would have to zig-zag.

<img src="img/pixel_triangle.png" width=200>

The idea is that if we kept adding finer and finer zig-zags to the diagonal, we'd get ever closer to the hypoteneuses "actual" length: $\sqrt 2$. If we assume that we can always refine our zig-zags, we're making an assumption about continuity, that between any two points, there always lies another point between them.

Indeed, let's observe that whether the length of something is irrational is somewhat in the eye of the beholder. For example, suppose we set $c = 1$ in $a^{2} + b^{2} = c^{2}$, and assume $a=b$.

$2a^{2} = 1$

$ a = b = \frac{1}{\sqrt 2} $

Imagine instead we'd started with diagonal grid lines. Then the hypoteneuse would be "measurable," but the base and height wouldn't be, unless we imagine shrinking the grid lines literally to an infinitesimally small size.

How can we work with the $\sqrt 2$ then? Well, we could treat it as a symbol to which we can do algebra, obviously. But also, we could approximate it.

The following idea works for any $\sqrt n $. We make a starting guess at the $\sqrt n$. Call it $g_{0}$. Now if $g_{0}$ is an over estimate, then $\frac{n}{g_{0}}$ will be an under estimate. Therefore, the average of the two should provide a better approximation.

$g_{1} = \frac{1}{2}(g_{0} + \frac{n}{g_{0}})$

We then make the same argument about $g_{1}$, that if it's an overestimate the $\frac{n}{g_{1}}$ will be an underestimate, and the average is our next guess $g_{2}$. 

$g_{n} = \frac{1}{2}(g_{n-1} + \frac{n}{g_{n-1}})$

Clearly, the longer we do this, the closer our guess will be to the $\sqrt n$, and we say that the $\sqrt n$ is the unique limit of this procedure. And so we can approximate $\sqrt n$ as closely as we like.

In the case of $\sqrt 2$ case:

$g_{0} = 1$

$g_{1} = \frac{1}{2}(1 + 2) = \frac{3}{2} = \textbf{1} .5$

$g_{2} = \frac{1}{2}(\frac{3}{2} + \frac{2}{\frac{3}{2}}) = \frac{17}{12} = \textbf{1.4}16\dots$

$g_{3} = \frac{577}{408}(\frac{17}{12} + \frac{2}{\frac{12}{12}}) = \frac{577}{408} = \textbf{1.41421}5\dots$

$g_{4} = \frac{665857}{470832} = \textbf{1.41421356237}46\dots$

And so, we get a series of fractions that gets ever closer to the $\sqrt 2$.

Now, one thing that's nice about having exponents around is we can use a base-numerical system of representation for our numbers, as we just did.

For example, we write $123 = 1 \times 10^{2} + 2 \times 10^{1} + 3 \times 10^{0} = 100 + 20 + 3$.

This is called "base 10." We fix 10 numerals: $0, 1, 2, 3, 4, 5, 6, 7, 8, 9$, and we can then represent any number as an ordered sequence of these numerals, with the understanding that the weight powers of $10$ in the above sum. Every whole number will have a unique representation.

$ \dots d_{2}d_{1}d_{0} = \dots d_{2} \times b^{2} + d_{1} \times b^{1} + d_{0} \times b^{0}$, for some base $b$.

We can use decimals to represent fractions:

$ 1.23 = 1 \times 10^0 + 2 \times 10^{-1} \times 10^{-2}$

In other words, we continue with "negative powers," which are defined like:

$ a^{-b} = \frac{1}{a^b}$

So we have:

$ \dots d_{2}d_{1}d_{0}.d_{-1}d_{-2} \dots = \dots d_{2} \times b^{2} + d_{1} \times b^{1} + d_{0} \times b^{0} + d_{-1} \times b^{-1} + d_{-2} \times b^{-2} \dots$, for some base $b$.

Some rational numbers have infinite long decimal expansions which repeat. For example:

$ \frac{1}{3} = 0.\overline{333} $

Irrational numbers have infinitely long decimal expansions which don't repeat, e. g., $\sqrt{2}$.

Clearly, infinity is starting to play an important role at this stage of the game, but it's a different kind of infinity that we've met before. Before we dealt with the infinity of the counting numbers. But now we're dealing with a continuous infinity of rationals and irrationals, and this infinity is actually larger.

The famous proof is thanks to Cantor and very nicely it is known as the "diagonal argument."

Now it depends on the fact that you can prove that there are no more rational numbers than counting numbers: in other words, they are the same size of infinity, and can be placed in a 1-to-1 correspondence via an enumeration. I won't give the full proof, but the intuition is that there is the following more or less obvious enumeration of the rationals:

![](img/rational_enumeration.png)

So in what follows, we prove the theorem for counting numbers, but it also applies to the rationals as a whole.

Suppose we're working in base-2, and we try to make a list of all the numbers.

$\begin{matrix} 
0 & 1 & 0 & 0 & 1 & 1 & \dots \\
1 & 1 & 1 & 1 & 0 & 0 & \dots \\
0 & 0 & 1 & 0 & 0 & 1 & \dots \\
1 & 0 & 0 & 0 & 0 & 1 & \dots \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \\
\end{matrix}$

We think we can do this since it seems by enumeration which can obtain all possible sequences of 0's and 1's.

$ 0 \rightarrow 0 $

$ 1 \rightarrow 1 $

$ 2 \rightarrow 10 $

$ 3 \rightarrow 11 $

$ 4 \rightarrow 100 $

$ 5 \rightarrow 101 $

$ 6 \rightarrow 110 $

$ 7 \rightarrow 1111 $

But now go down the diagonal of our infinite list and flip every $0 \rightarrow 1$ and every $1 \rightarrow 0$.

$\begin{matrix} 
\textbf{0} & 1 & 0 & 0 & 1 & 1 & \dots \\
1 & \textbf{0} & 1 & 1 & 0 & 0 & \dots \\
0 & 0 & \textbf{0} & 0 & 0 & 1 & \dots \\
1 & 0 & 0 & \textbf{1} & 0 & 1 & \dots \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \ddots \\
\end{matrix}$

The sequence of 0's and 1's along the diagonal is different by construction from the first sequence in the list in the first place, the second sequence in the list in the second place, the third sequence in the list in the third place. Therefore this sequence of 0's and 1's is well defined, but it can't be found anywhere on our list which apparently *enumerates* all possible sequences of 0's and 1's. Such a sequence represents an irrational number, and you get a different one for each possible ordering of the counting numbers. This is a proof that the infinity of the "real numbers" is greater than the infinity of the counting numbers/rationals, in the sense that the real numbers can't be *enumerated*. (The argument can be iterated to show there is actually an infinite hierarchy of ever more infinite infinities.)

One nice definition of real numbers goes like this. It's based on an idea called the Dedekind cut. We imagine "cutting" the rational number line into two infinite sets $A$ and $B$ such that every element of $A$ is less than every element of $B$, and $A$ has no greatest element. We consider the least element of $B$. If this number is rational, then the cut defines that rational number. But there might be an irrational number which is greater than every element in $A$. Since $A$ has no greatest element, then the first number in $B$ ought to be that irrational number, but instead: there's a gap, since we're working with the rational number line. We fill in that gap: that is the irrational number defined by the "cut". $A$ then contains every rational number less than the cut, and $B$ contains every rational number greater than or equal to the cut.

Clearly, at this stage, infinity is taking center stage and also a kind of self-referential logic: approximations where we feed outputs reflexively back into inputs, lists we make and then negate to infinity...

The culmination of these ideas can be expressed in terms of computability: the general theory of following definite rules. After all, there are many ways to calculate things, feeding outputs back into inputs... and not all of them correspond to rationals, or even real numbers!

For example, let's build a computer out of fractions. This is an idea due to John Conway, and goes by the name "fractran."

A computer program in fractran is an ordered list of fractions. For example:

![](img/fractran_primes.png)

The input to the program is some integer $n_{0}$. The idea is: you start going through the list and you try multiplying $n$ by the first element in the list. If the denominator cancels out and you get an integer back, then that's the output. You stop, go back to the beginning, and start over with the input to the program being $n_{1}$. If you don't get an integer back, however, then you move on to the second fraction, and so on. If you get to the end of the list without getting an integer back, then the program halts. And that's fractran!

For example, the above program, if the input is $2$, generates a sequence of integers which contains the following powers of 2:

$ 2^{2} \dots 2^{3} \dots 2^{5} \dots 2^{7} \dots 2^{11} \dots $

In other words, it calculates the prime numbers!

A program which adds two integers is given simply by $ \frac{3}{2}$. Given an input $2^{a}3^{b}$, it eventually outputs: $3^{a+b}$: it adds the integers! More examples can be found on the wikipedia page. 

This idea is just as we can write out an integer in base-n representation as an ordered list of numerals, we can also write out an integer as a product of primes.

$60 = 2^{2}3^{1}5^{1}$

In fact, we could really just denote it $211$, where it is understood that the first digit is the exponent of the first prime ($2$), the second digit is the exponent of the second prime ($3$), and so on.

A computer needs a memory, a series of registers. The idea is that each prime acts as a register for our computer, and the value of the register is the exponent of that prime. Multiplying by fractions allows us to shift data between the registers, and these rules are complete for universal classical computation. In other words, this computer can compute anything any other computer can compute. For example, we could have a list of fractions which represents: a computer program that approximates the $\sqrt 2$. 

So some computations can be shown to represent irrational numbers: if the more you run the computation, the closer the output gets to a single real number. But of course, not all algorithms "converge" on a number. And in fact, not everything can be computed!

The famous proof of this is due to Godel/Turing. Godel actually used something quite akin to fractran in his proof, but Turing's argument is easier to understand. The whole thing rests on self-reference: the fact that the instructions in a computer program which computes numbers are also numbers themselves, so that one can write computer programs *whose input is other computer programs*. It's yet another example of a "reductio ad absurdum" argument, and also a diagonal argument all in one!

The question is this. It would be great to have a computer program $H(A, I)$ that takes as input another computer program $A$ and an input $I$ tells you whether $A$ will halt or run forever. In some cases, it's clear, like if $A$ has an obvious infinite loop, or just returns a constant. But is there a program $H$ that can handle all cases?

Suppose there were. We have some $H(A, I)$ which returns true if $A$ halts, and false if $A$ doesn't, when run on $I$.

Now consider that Godel function: $G(A, I)$, which takes a program $A$ and an input $I$, and if $A$ halts, then $G$ loops forever; but if $A$ doesn't halt, then $G$ halts. This is akin to going along the diagonal in Cantor's proof and flipping the bits.

Now we consider: $H(G, G)$. This is supposed to return true if $G$ halts on $G$, but false if $G$ runs forever on $G$. If according to $G$, if $G$ halts, then $G$ runs forever, but if $G$ runs forever, then $G$ halts! This is a contradiction, and so: there is no program $H$ that takes a computer program and an input and can determine whether the program definitely halts or not on that input.

The moral is that if you want to know if a computer program halts or not, in general you have to run it potentially "forever" and just wait and see if it does actually halt. There may be no shortcut.

And so we see that there is an intrinsic limit to computation: there are questions which are in principle *not computable*.

Now Godel originally proved this theorem in terms of mathematical logic. He assumed the axioms which give you the elementary operations of addition, multiplication, subtraction, and division, and the quantifiers like: There exists... and For all... And so he was able to turn statements of mathematical logic into numbers (whose an encoding based on primes! Indeed, the primes in their interplay between addition and multiplication are necessary!), so that then statements of mathematical logic could talk about other statements. He then constructed a self-referential statement that led to a contradiction. 

The full statement of Godel's Incompleteness Theorems is that:

A system of logic powerful enough to contain basic arithmetic is necessarily incomplete in the sense that there are statements that can neither be proved nor disproved by those rules of logic. One such statement is the consistency of that formal system, i. e. whether the rules of logic lead to a contradiction. So that a system of logic powerful enough to contain arithemtic can't prove its own consistency. It may, in fact, be consistent; but that can only be proven in a different system of logic. Like: you could try to repair your incomplete system of axioms and rules of inference by adding more axioms and rules of inference, and then you might be able to resolve some previously unanswerable questions; but this will lead to yet more unanswerable questions, which can only be resolved by adding more axioms...

In other words, mathematics cannot be reduced down to a single set of axioms, from which all true statements can be derived via the mechanical procedure of following out all the rules of inference. This goes a long way towards answering our original question whether in some sense all composites can be broken down into the same kinds of atoms! But the unpacking of our concept of atom is hardly complete, and many unanticipatable surprises are in store.

To return to our main subject, the real line, which has a third point between any two points, contains rationals, irrationals, even uncomputable numbers!

But sometimes we can have recursively defined algorithms that converge on a certain real number, for example, the $\sqrt 2$. And it is these numbers that we are now taking as our atoms, which each contain within themselves a whole completed infinity "all at once."

Recall that we're now working with exponentiation and root-taking. And actually, there's something we've completely overlooked. What happens if we take $\sqrt -1$?

There is no rational (or even real number) such that when you square it, you get $-1$. So as usual, we have to add this number to our system. It's usually called $i = \sqrt -1$, because such numbers were originally thought of as "imaginary."

Once we have $i$, we can think of $\sqrt -7$ as $i\sqrt 7$. 

Note that we have:

$ 1 \times i = i $

$ i \times i = -1 $

$ -1 \times i = -i $

$ -i \times i = 1 $

There's a four-fold repeating pattern. This suggest the interpretation of multiplying by $i$ to be a $90^{\circ}$ rotation.

![](img/complex_plane.png)

So actually, we have more than just the real axis! There's a second axis, the imaginary axis, and actually our new kinds of numbers can live anywhere on the resulting plane, called the "complex plane."

So our "complex numbers" can have a real part and an imaginary part, and they can be written $z = a+bi$, which picks out a point on the plane.

![](img/complex_cartesian.gif)

Alternatively, we could use polar coordinates: $z = r(cos(\theta) + i sin(\theta))$, where $r$ is the radius and $\theta$ the angle.

<img src="img/eulers_formula.png"  width=300>

Later we'll come to understand why we can also write this $z = re^{i\theta}$, but for now take it as a useful notation.

Complex numbers follow the rules of algebra. You can add them:

$ a + bi + c + di = (a + c) + i(b + d)$

This just corresponds to laying the arrows represented by the two numbers end to end.

![](img/complex_addition.gif)

Multiplying complex numbers means to stretch the one by the other's length, and rotate by the other's angle.

$ re^{i\theta} \times se^{i\phi} = rs e^{i(\theta + \phi)}$

Finally, we have to mention complex conjugation. Given a complex number $z = a + bi = re^{i\theta} $, its conjugate $z* = a - bi = r^{-i\theta}$.

<img src="img/complex_conjugate.png" width=200>

Look at what happens when we multiply a complex number by its conjugate.

$zz* = (a + bi)(a - bi) = a^{2} - abi + abi - b^{2}i^{2} = a^{2} + b^{2}$

So if we consider a complex number $z$ to represent a right triangle with sides $a$ and $b$, the length of its hypoteneuse is given by $\sqrt zz*$.

If it wasn't clear before, it is now undeniable: there is something actually two-dimensional going on at this stage of the game. In fact, our numbers now have two parts and live on the plane, where they can represent points/arrows, and rotations/stretches thereon.

And this is exactly what we need.

Think about it like, all this time, we've been providing more and more context to the interpretation of a single pebble. First, we have to decide whether the presence of the pebble or its absence is significant. Then, once we have multiple pebbles, we have to agree where we start counting from. Then, once we have integers, we have to agree on where $0$ is, and so if we should consider the number to be positive or negative. Then, once we have the rational numbers, we have to agree on what is $1$, we need a rational number to translate between or different units. But now at this stage, we need to agree *on the angle between our axes*. And that's just what a complex number can do. It can take a diagonal line, which has to be represented with an "unwritable" $\sqrt 2$ for example, and could translate it into "our reference frame" by multipling by $e^{-i \pi/4}$, which by rotation aligns the diagonal line with the real axis. So each stage in our journey represents another kind of context, another kind of number that we have to give to align our reference frames, so that we can agree with certainty about the meaning of a pebble.

Before we transcend this plane, however, we have one final point to make. What about division by 0? Before, this wrapped up the line into a circle. Now that we have complex numbers, how can we interpret division by 0?

The answer is the Riemann sphere.

![](img/riemann_sphere_brr.jpg)

In the case of the line, there were two infinities, positive and negative. In the plane, the horizon is an infinite circle, and we imagine a point beyond the horizon, and you read the same point no matter which direction you approach it in: that's the point at infinity. And if it's there, what it means is that our plane is really a sphere: at the point $\infty$ is just the point directly opposite you on the sphere.

![](img/stereographic_projection.jpg)

We imagine standing at the North Pole of the sphere, and drawing a straight line from that point to a chosen point on the complex plane. That line will intersect the sphere at one location. If point on the plane is inside the unit circle, it gets mapped to the Southern Hemisphere and if the point on the plane is outside the unit circle, it gets mapped to the Northern Hemisphere. All points on the plane are mapped uniquely to a point on the sphere, but we have an extra point left over, and that's the one at the North Pole, the point of projection itself.

In what follows, however, we'll find it most convenient to take our projection from the South Pole. In that case:

If we have a complex number $c = a+bi$ or $\infty$, then:

$ c \rightarrow (x, y, z) = (\frac{2a}{1 + a^{2} + b^{2}}, \frac{2b}{1+a^{2}+b^{2}}, \frac{1 - a^{2} - b^{2}}{1 + x^{2} + y^{2}})$ or $(0, 0, -1)$ if $c = \infty$

And inversely, $(x, y, z) \rightarrow \frac{x}{1+z} + i\frac{y}{1+z}$ or $\infty$ if $(x, y, z) = (0, 0, -1)$.

And so, we discover that our numbers at this stage are actually: points on a sphere, picking out a direction in 3D space. At this level, such a number is a composite, a completed infinity defined by rational numbers. And we can count, add, subtract, multiply, divide, exponentiate, and take roots with wild abandon.

<hr>

At this stage, we should expect that the complex numbers will be our atoms, and we'll form composites out of them. What are these composites? You've probably heard of them before: they are *polynomials*. Indeed, we will now take for our composites "equations" themselves.

Recall that a polynomial is defined quite like a "base-n" number, but what before was the "base" is now a *variable*. The $c_{n}$ are generally complex coefficients.

$ f(z) = \dots + c_{4}z^{4} +  c_{3}z^{3} + c_{2}z^{2} + c_{1}z + c_{0} $

A classic problem is to solve for the "roots" of $f(z)$. These are the values of $z$ that make $f(z) = 0$.

A degree of a polynomial is the highest power of the variable that appears within it. It turns out that a degree $n$ polynomial has exactly $n$  complex roots. A polynomial can therefore be factored into roots, just as a whole number can be factored into primes.

$ f(z) =  \dots + c_{4}z^{4} +  c_{3}z^{3} + c_{2}z^{2} + c_{1}z + c_{0} = (z - \alpha_{0})(z - \alpha_{1})(z - \alpha_{2})\dots $

The only catch is that the roots of the polynomial $f(z)$ are left invariant if you multiply the whole polynomial by any complex number. So the roots define the polynomial up to multiplication by a complex "scalar." 

The proof of this is called the fundamental theorem of algebra, just as the proof that whole numbers can be uniquely decomposed into primes is called the fundamental theorem of arithmetic. Interestingly, the proof doesn't require much more than some geometry. Here's a brief sketch:

So suppose we have some polynomial $ f(z) = c_{n}z^n + \dots + c_{4}z^{4} +  c_{3}z^{3} + c_{2}z^{2} + c_{1}z + c_{0}$. Now suppose that we take $z$ to be very large. For a very large $z$ the difference between $z^{n-1}$ and $z^n$ is considerable, and we can say that the polynomial is dominated by the $c_{n}z^n$ term. Now imagine tracing out a big circle in the complex plane corresponding to different values of $z$. If we look at $f(z)$, it'll similarly wind around a circle $n$ times as fast (with a little wiggling as it does so corresponding to the other tiny terms). Now imagine shrinking the $z$ circle down smaller and smaller until $z=0$. But $f(0) = c_{0}$, which is just the constant term. So as the "input" circle shrinks down to 0, the "output" circle shrinks down to $c_{0}$. To get there, however, the shrinking circle in the output plane must have passed the origin at some point, and therefore the polynomial has at least one root. You factor that root out of the polynomial leading to a polynomial of one less degree, and repeat the argument, until there are no more roots left. Therefore, a degree $n$ polynomial has exactly $n$ roots in the complex numbers.

![](img/fundamental_theorem_of_algebra.png)


This is related to the fact that the complex numbers represent the algebraic closure of the elementary operations of arithmetic. One could consider other number fields: the real numbers, for instance.

It's worth noting "Vieta's formulas" which relate the roots to the coefficients:

Given a polynomial $f(z) = c_{n}z^n + \dots + c_{4}z^{4} +  c_{3}z^{3} + c_{2}z^{2} + c_{1}z + c_{0} = (z - \alpha_{0})(z - \alpha_{1})(z - \alpha_{2})\dots(z - \alpha_{n-1}) $, we find:

$ 1 = \frac{c_{n}}{c_{n}}$ 

$ \alpha_{0} + \alpha_{1} + \alpha_{2} + \dots + \alpha_{n-1} = -\frac{c_{n-1}}{c_{n}}$

$ \alpha_{0}\alpha_{1} + \alpha_{0}\alpha_{2} + \dots + \alpha_{1}\alpha_{2} + \dots + \alpha_{2}\alpha_{3} \dots = \frac{c_{n-2}}{c_{n}}$

$ \alpha_{0}\alpha_{1}\alpha_{2} + \alpha_{0}\alpha_{1}\alpha_{3} + \dots + \alpha_{1}\alpha_{2}\alpha_{3} + \dots + \alpha_{2}\alpha_{3}\alpha_{4} \dots = - \frac{c_{n-3}}{c_{n}}$

$ \vdots $

$ \alpha_{0}\alpha_{1}\alpha_{2}\dots = (-1)^{n-1}\frac{c_{0}}{c_{n}} $ 

In other words, the $c_{n-1}$ coefficient is the sum of roots taken one at a time times $(-1)^{1}$, the $c_{n-2}$ coefficient is given by the sum of the roots taken two at a time times $(-1)^{2}$, and so on, until the constant term is given by the product of the roots times $(-1)^{n-1}$, in other words the $n-1$ roots taken all at a time. We divide out by $c_{n}$ as the roots are defined up to multiplication by a complex sclar.

In other words, the relationship between coefficients and roots is "holistic"--just like the relationship between a composite whole number and its prime factors, the latter of which are contained "unordered" within it.

Now an important point is that polynomials form a "vector space." A vector space consists of multidimensional arrows called "vectors" and "scalars" which are just normal numbers. You can multiply vectors by scalars, and add them up, and you'll always get a vector in the vector space. Implicitly, we've worked with real vector spaces when we used $(x, y)$ and $(x, y, z)$ coordinates. Here, we're working with complex vector spaces. (Note we could use other "division algebras" for our scalars: reals, complex numbers, quaternions, or octonions, but complex numbers give us all we need.) But just like real vector spaces, the point is that we can expand any vector in some "basis." For real 3D vectors, we often use $(1,0,0)$, $(0,1,0)$, and $(0,0,1)$. The idea is that any 3D point can be written as a linear combination of these three vectors, which are orthonormal: they are at right angles, and of unit length. 

Eg. $(x, y, z) = x(1,0,0) + y(0,1,0) + z(0,0,1)$. Any set of linearly independent vectors can form a basis, although we'll stick to using orthnormal vectors.

In real vector spaces, the length of a vector is given by $\sqrt{v \cdot v}$, where $v \cdot v$ is the inner product: you multiply the entries of the two vectors pairwise and sum. If you take $u \cdot v$, this quantity will be 0 if the vectors are at right angles. For complex vector spaces, we use the bracket notation $\langle v \mid v \rangle$, where recall that the $\langle v \mid$ is a bra, which is the complex conjugated row vector corresponding to the column vector $\mid v \rangle$. 

Already this is looking like quantum mechanics. Indeed, in what follows, we'll be using polynomials to represent quantum spin states. You should note that while the exposition has been more or less self-contained so far, in what follows we'll be assuming knowledge from the previous essays.

For example, we've already alluded to the fact that instead of working with $\mathbb{C} + \infty$, we can work in the two dimensional complex projective space. Given some $\alpha$ which can be a complex number or infinity:

$\alpha \rightarrow \begin{pmatrix} 1 \\ \alpha \end{pmatrix}$ or $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$ if $\alpha = \infty$.

In reverse:

$ \begin{pmatrix} a \\ b \end{pmatrix} \rightarrow \frac{b}{a}$ or $\infty$ if $a=0$.

We're free to multiply this complex vector by any complex number and this won't change the root, so we can always normalize the vector so its length is 1. Then it represents the state of a qubit, a spin-$\frac{1}{2}$ particle, consisting of an average spin-axis (a point on the sphere), and a complex phase. Supposing we've quantized along the Z-axis, then $aa*$ is the probability of measuring this qubit to be $\uparrow$ along the Z direction, and $bb*$ is the probability of measuring it to be $\downarrow$. ("Quantized along the Z-axis" just means that we take eigenstates of the Pauli Z operator to be our basis states, so that each component of our complex vector weights one of those eigenstates. All Hermitian matrices have eigenvectors which form an orthogonal basis.)

Now we can see where this comes from. It's the difference between considering a "root" vs a "monomial".

Suppose we have a polynomial with a single root, a monomial: $f(z) = c_{1}z + c_{0}$. Let's solve it:

$ 0 = c_{1}z + c_{0}$

$ z = -\frac{c_{0}}{c_{1}} $

Indeed, we could have written $f(z) = (z + c_{0}/c_{1})$. The point is that the monomial can be identified with its root up to multiplication by any complex number--just like our complex projective vector!

So if we had some complex number $\alpha$, which we upgraded to a complex projective vector, we could turn it into a monomial by remembering about that negative sign:

$\alpha \rightarrow \begin{pmatrix} 1 \\ \alpha  \end{pmatrix}  \rightarrow f(z) = z -\alpha  $

$\begin{pmatrix} c_{1} \\ c_{0} \end{pmatrix} \rightarrow f(z) = c_{1}z - c_{0} \rightarrow \frac{c_{0}}{c_{1}} $


But what if $\alpha = \infty$? Above, we suggested that this should get mapped to the vector $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$.

Interpreted as a monomial this says that $f(z) = 0z - 1 = -1$. Which has no roots! Indeed, we've reduced the polynomial by a degree: from degree 1 to degree 0. However, it makes a lot of sense to interpret this polynomial as having a root: $\infty$.

So we could just tack on the rule that if we lose a degree, we add a root at infinity.

But there's a more systematic way to deal with this. We *homogenize* our polynomial. In other words, we add a second variable so that each term in the resulting two-variable polynomial has the same degree. In this case, we want to do something like:

$f(z) = c_{1}z + c_{0} \rightarrow f(w, z) = c_{1}z + c_{0}w$

Suppose we want $f(w, z)$ to have a root $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$. In other words, we want $f(1, 0) = 0$. Then we should want $f(w, z) = 1z + 0w = z$, which has a root when $z=0$ and $w$ is anything. If we want $f(w, z)$ to have a root $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$, aka $f(0, 1) = 0$, then $f(w, z) = 0z + 1w = w$, which has a root when $w=0$ and $z$ is anything. So we have it now that a $z$-root lives at the North Pole, and a $w$-root lives at the South Pole.

We also have to consider the sign. If we want $f(w, z)$ to have a root $\begin{pmatrix} 2 \\ 3 \end{pmatrix}$, aka $f(2, 3) = 0$, then we want $f(w, z) = 2z - 3w$, so that $f(2, 3) = 2(3) - 3(2) = 0$. 

So the correct homogenous polynomial with root $\begin{pmatrix} c_{1} \\ c_{0} \end{pmatrix}$ is $f(w, z) = c_{1}z - c_{0}w$. Or the other way around, if we have $f(w, z) = c_{1}z + c_{0}w$, then its root will be $\begin{pmatrix} c_{1} \\ -c_{0} \end{pmatrix}$, up to overall sign.

Now it turns out that this construction generalizes to any spin-$j$, not just a spin-$\frac{1}{2}$. If a degree 1 polynomial represents a spin-$\frac{1}{2}$ state as a 2d complex vector, a degree 2 polynomial represents a spin-$1$ state as a 3d complex vector, a degree 3 polynomial represents a spin-$\frac{3}{2}$ state as a 4d complex vector, and so on. Considering the roots, instead of the coefficients, this is saying that up to a complex phase, a spin-$\frac{1}{2}$ state can be identified with point on the sphere, a spin-$1$ state with two points on the sphere, a spin-$\frac{3}{2}$ state with three points on the sphere.

This construction is due to Ettore Majorana, and is often known as the "stellar representation" of spin, and the points on the sphere are often referred to as "stars." The theory of quantum spin, therefore, becomes in many ways the theory of "constellations on the sphere."

So far example, we could have a degree 2 polynomial. For simplicity, let's consider one that has two roots at $\begin{pmatrix} 2 \\ 3 \end{pmatrix}$. 

If before we had $f(w,z) = 2z - 3w$, we now want $f(w,z) = (2z - 3w)^2 = (2z - 3w)(2z - 3w) = 4z^2 - 12zw + 9w^2$.

If we want a polynomial with two roots at $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$, we want $f(w, z) = z^2$. If we want a polynomial with two roots at $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$, we want $f(w, z) = w^2$. If we want a polynomial with one root at $\begin{pmatrix} 1 \\ 0 \end{pmatrix}$ and one root at $\begin{pmatrix} 0 \\ 1 \end{pmatrix}$, we want $f(w, z) = zw$.

Indeed, we can use these last three as basis states. Let's take a look:

$
\begin{array}{ |c|c|c| } 
 \hline
 z^{2} & z & 1 \\ 
 \hline
 z^{2} & zw & w^{2} \\
 \hline
 \hline
 1 & 0 & 0 & \rightarrow f(w, z) = z^{2} = 0 & \{ \begin{pmatrix} 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 1 \\ 0 \end{pmatrix} \} & \{ 0, 0 \}\\ 
 0 & 1 & 0 & \rightarrow f(w, z) = zw = 0 & \{\begin{pmatrix} 1 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ 1 \end{pmatrix} \} & \{ 0, \infty \}\\ 
 0 & 0 & 1 & \rightarrow f(w, z) = w^{2} = 0 & \{ \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \begin{pmatrix} 0 \\ 1 \end{pmatrix}\} & \{ \infty, \infty\}\\
 \hline
\end{array}
$


So we have three basis states $z^{2}$, $zw$, and $w^{2}$, and they correspond to three constellations, albeit simple ones. The first constellation has two stars at the North Pole. The second has one star at the North Pole and one star at the South Pole. And the third has two stars at the South Pole.

Now the interesting fact is that *any 2-star constellation can be written as a complex superposition of these three constellations*. The stars are just given by the roots of the homogenous polynomial (or the single variable polynomial with the rule about roots at infinity.)

There's one more subtlety, however, we need to take into account to be consistent with how the X, Y, Z operators are defined. The full story is this:

If we have a spin state in the usual $\mid j, m \rangle$ representation, quantized along the Z-axis, we can express it as a n-dimensional ket, where $n = 2j + 1$. E.g, if j = $\frac{1}{2}$, the dimension of the representation is 2; for j = $1$, the dimension is 3, and so on.

$\begin{pmatrix} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \end{pmatrix}$

Then the polynomial whose 2j roots correspond to the correct stars, taking into account all the secret negative signs, and the various conventions for the spin matrices, is given by:

$p(z) = \sum_{m=-j}^{m=j} (-1)^{j+m} \sqrt{\frac{(2j)!}{(j-m)!(j+m)!}} a_{j+m} z^{j-m}$.

$p(z) = \sum_{i=0}^{i=2j} (-1)^{i} \sqrt{\begin{pmatrix} 2j \\ i \end{pmatrix}} a_{i} z^{2j-i}$, where $\begin{pmatrix} n \\ k \end{pmatrix}$ is the binomial coefficient aka $\frac{n!}{k!(n-k)!}$.

Or homogenously, $p(w, z) = \sum_{i=0}^{i=2j} (-1)^{i} \sqrt{\begin{pmatrix} 2j \\ i \end{pmatrix}} a_{i} z^{2j-i} w^{i}$.

This is known as the Majorana polynomial. Why does the binomial coefficient come into play? I'll just note that $\begin{pmatrix} 2j \\ i \end{pmatrix}$ is the number of groupings of 2j roots taken 0 at a time, 1 at a time, 2 at a time, 3 at a time, eventually 2j at a time. That's just the number of terms in each of Vieta's formulas, which relate the roots to the coefficients! So when we go from a polynomial $\rightarrow$ the $\mid j, m \rangle$ state, we're normalizing each coefficient by the number of terms that contribute to that coefficient.

In [None]:
# Displays the eigenstates of the X, Y, Z operators 
# for a given value of j and the associated constellations.
# X, Y, Z are arranged left to right, and their height is given by their m value.
import numpy as np
import qutip as qt
import vpython as vp
np.set_printoptions(precision=3)

scene = vp.canvas(background=vp.color.white)

##########################################################################################

# from the south pole
def c_xyz(c):
        if c == float("Inf"):
            return np.array([0,0,-1])
        else:
            x, y = c.real, c.imag
            return np.array([2*x/(1 + x**2 + y**2),\
                             2*y/(1 + x**2 + y**2),\
                   (1-x**2-y**2)/(1 + x**2 + y**2)])

# np.roots takes: p[0] * x**n + p[1] * x**(n-1) + ... + p[n-1]*x + p[n]
def poly_roots(poly):
    head_zeros = 0
    for c in poly:
        if c == 0:
            head_zeros += 1 
        else:
            break
    return [float("Inf")]*head_zeros + [complex(root) for root in np.roots(poly)]

def spin_poly(spin):
    j = (spin.shape[0]-1)/2.
    v = spin
    poly = []
    for m in np.arange(-j, j+1, 1):
        i = int(m+j)
        poly.append(v[i]*\
            (((-1)**(i))*np.sqrt(np.math.factorial(2*j)/\
                        (np.math.factorial(j-m)*np.math.factorial(j+m)))))
    return poly

def spin_XYZ(spin):
    return [c_xyz(root) for root in poly_roots(spin_poly(spin))]

##########################################################################################

def display(spin, where):
    j = (spin.shape[0]-1)/2
    vsphere = vp.sphere(color=vp.color.blue,\
                        opacity=0.5,\
                        pos=where)
    vstars = [vp.sphere(emissive=True,\
                        radius=0.3,\
                        pos=vsphere.pos+vp.vector(*xyz))\
                            for i, xyz in enumerate(spin_XYZ(spin.full().T[0]))]
    varrow = vp.arrow(pos=vsphere.pos,\
                      axis=vp.vector(qt.expect(qt.jmat(j, 'x'), spin),\
                                     qt.expect(qt.jmat(j, 'y'), spin),\
                                     qt.expect(qt.jmat(j, 'z'), spin)))
    return vsphere, vstars, varrow

##########################################################################################

j = 1/2
XYZ = {"X": qt.jmat(j, 'x'),\
       "Y": qt.jmat(j, 'y'),\
       "Z": qt.jmat(j, 'z')}
for i, o in enumerate(["X", "Y", "Z"]):
    L, V = XYZ[o].eigenstates()
    for j, v in enumerate(V):
        display(v, vp.vector(3*i,3*L[j],0))
        spin = v.full().T[0]
        print("%s(%.2f):" % (o, L[j]))
        print("\t|j, m> = %s" % spin)
        poly_str = "".join(["(%.1f+%.1fi)z^%d + " % (c.real, c.imag, len(spin)-k-1) for k, c in enumerate(spin_poly(spin))])
        print("\tpoly = %s" % poly_str[:-2])
        print("\troots = ")
        for root in poly_roots(spin_poly(spin)):
            print("\t  %s" % root)
        print()

So we can see that the eigenstates of the X, Y, Z operators for a given spin-$j$ representation correspond to constellations with 2j stars, and there are 2j+1 such constellations, one for each eigenstate, and they correspond to the following simple constellations:

All the stars at, say, Y+, and none at Y-;
then all but one star at Y+, and one star at Y-;
then all but two stars at Y+, and two stars at Y-;
then all but three stars at Y+, and three stars at Y-;
until you get to all the stars at Y-. 

We could choose X, Y, Z or some combination thereof to be the axis we quantize along, but generally we'll choose the Z axis. 

And the remarkable fact is that any constellation of 2j stars can be written as a superposition of these basic constellations. We simply find the roots of the corresponding polynomial. And you can check, for instance, that the whole constellation rotates rigidly around the X axis when you evolve the spin state with $e^{iXt}$, for example. (And $e^{Xt}$ corresponds to a boost!) It is fun to watch the constellation evolve under some arbitrary Hamiltonian: the stars swirl around, permuting among themselves, seeming to repel each other like little charged particles.

Below, you can display a random spin-$j$ state and evolve it under some random Hamiltonian and watch its constellation evolve. Meanwhile, you can see to the side the Z-basis constellations, and the amplitudes corresponding to them: in other words, the yellow arrows are the components of the spin vector in the Z basis.

It's worth noting: if the state is an eigenstate of the Hamiltonian, then the constellation doesn't change: there's only a phase evolution. Perturbing the state slightly from that eigenstate will cause the stars to precess around their former locations. Further perturbations will eventually cause the stars to begin to swap places, until eventually it because visually unclear which point they had been precessing around.

In [None]:
import numpy as np
import qutip as qt
import vpython as vp
scene = vp.canvas(background=vp.color.white)

##########################################################################################

# from the south pole
def c_xyz(c):
        if c == float("Inf"):
            return np.array([0,0,-1])
        else:
            x, y = c.real, c.imag
            return np.array([2*x/(1 + x**2 + y**2),\
                             2*y/(1 + x**2 + y**2),\
                   (1-x**2-y**2)/(1 + x**2 + y**2)])

# np.roots takes: p[0] * x**n + p[1] * x**(n-1) + ... + p[n-1]*x + p[n]
def poly_roots(poly):
    head_zeros = 0
    for c in poly:
        if c == 0:
            head_zeros += 1 
        else:
            break
    return [float("Inf")]*head_zeros + [complex(root) for root in np.roots(poly)]

def spin_poly(spin):
    j = (spin.shape[0]-1)/2.
    v = spin
    poly = []
    for m in np.arange(-j, j+1, 1):
        i = int(m+j)
        poly.append(v[i]*\
            (((-1)**(i))*np.sqrt(np.math.factorial(2*j)/\
                        (np.math.factorial(j-m)*np.math.factorial(j+m)))))
    return poly

def spin_XYZ(spin):
    return [c_xyz(root) for root in poly_roots(spin_poly(spin))]

##########################################################################################

def display(spin, where, radius=1):
    j = (spin.shape[0]-1)/2
    vsphere = vp.sphere(color=vp.color.blue,\
                        opacity=0.5,\
                        radius=radius,
                        pos=where)
    vstars = [vp.sphere(emissive=True,\
                        radius=radius*0.3,\
                        pos=vsphere.pos+vsphere.radius*vp.vector(*xyz))\
                            for i, xyz in enumerate(spin_XYZ(spin.full().T[0]))]
    varrow = vp.arrow(pos=vsphere.pos,\
                      axis=vsphere.radius*vp.vector(qt.expect(qt.jmat(j, 'x'), spin),\
                                                    qt.expect(qt.jmat(j, 'y'), spin),\
                                                    qt.expect(qt.jmat(j, 'z'), spin)))
    return vsphere, vstars, varrow

def update(spin, vsphere, vstars, varrow):
    j = (spin.shape[0]-1)/2
    for i, xyz in enumerate(spin_XYZ(spin.full().T[0])):
        vstars[i].pos = vsphere.pos+vsphere.radius*vp.vector(*xyz)
    varrow.axis = vsphere.radius*vp.vector(qt.expect(qt.jmat(j, 'x'), spin),\
                                           qt.expect(qt.jmat(j, 'y'), spin),\
                                           qt.expect(qt.jmat(j, 'z'), spin))
    return vsphere, vstars, varrow

##########################################################################################

j = 3/2
n = int(2*j+1)
dt = 0.001
XYZ = {"X": qt.jmat(j, 'x'),\
       "Y": qt.jmat(j, 'y'),\
       "Z": qt.jmat(j, 'z')}
state = qt.rand_ket(n)#qt.basis(n, 0)#
H = qt.rand_herm(n)#qt.jmat(j, 'x')#qt.rand_herm(n)#
U = (-1j*H*dt).expm()

vsphere, vstars, varrow = display(state, vp.vector(0,0,0), radius=2)

ZL, ZV = qt.jmat(j, 'z').eigenstates()
vamps = []
for i, v in enumerate(ZV):
    display(v, vp.vector(4, 2*ZL[i], 0), radius=0.5)
    amp = state.overlap(v)
    vamps.append(vp.arrow(color=vp.color.yellow, pos=vp.vector(3, 2*ZL[i], 0),\
                            axis=vp.vector(amp.real, amp.imag, 0)))

T = 100000
for t in range(T):
    state = U*state
    update(state, vsphere, vstars, varrow)
    for i, vamp in enumerate(vamps):
        amp = state.overlap(ZV[i])
        vamp.axis = vp.vector(amp.real, amp.imag, 0)
        vp.rate(2000)

Finally, let's check out our "group equivariance."

In other words, we could treat our function $f(w, z)$ as a function that takes a spinor/qubit/2d complex projective vector/spin-$\frac{1}{2}$ state as input. So we'll write $f(\psi_{little})$, and it's understood that we plug the first component of $\psi_{little}$ in for $w$ and the second component in for $z$. And we'll say $f(\psi_{little}) \rightarrow \psi_{big}$, where $\psi_{big}$ is the corresponding $\mid j, m \rangle$ state of our spin-$j$ . If $\psi_{little}$ is a root, then if we rotate it around some axis, and also rotate $\psi_{big}$ the same amount around the same axis, then $U_{little}\psi_{little}$ should be a root of $U_{big}\psi_{big}$.


In [None]:
import numpy as np
import qutip as qt
import vpython as vp
scene = vp.canvas(background=vp.color.white)

##########################################################################################

# from the south pole
def c_xyz(c):
        if c == float("Inf"):
            return np.array([0,0,-1])
        else:
            x, y = c.real, c.imag
            return np.array([2*x/(1 + x**2 + y**2),\
                             2*y/(1 + x**2 + y**2),\
                   (1-x**2-y**2)/(1 + x**2 + y**2)])

# np.roots takes: p[0] * x**n + p[1] * x**(n-1) + ... + p[n-1]*x + p[n]
def poly_roots(poly):
    head_zeros = 0
    for c in poly:
        if c == 0:
            head_zeros += 1 
        else:
            break
    return [float("Inf")]*head_zeros + [complex(root) for root in np.roots(poly)]

def spin_poly(spin):
    j = (spin.shape[0]-1)/2.
    v = spin if type(spin) != qt.Qobj else spin.full().T[0]
    poly = []
    for m in np.arange(-j, j+1, 1):
        i = int(m+j)
        poly.append(v[i]*\
            (((-1)**(i))*np.sqrt(np.math.factorial(2*j)/\
                        (np.math.factorial(j-m)*np.math.factorial(j+m)))))
    return poly

def spin_XYZ(spin):
    return [c_xyz(root) for root in poly_roots(spin_poly(spin))]

def spin_homog(spin):
    n = spin.shape[0]
    print("".join(["(%.2f + %.2fi)z^%dw^%d + " % (c.real, c.imag, n-i-1, i) for i, c in enumerate(spin_poly(spin))])[:-2])
    def hom(spinor):
        w, z = spinor.full().T[0]
        return sum([c*(z**(n-i-1))*(w**(i)) for i, c in enumerate(spin_poly(spin))])
    return hom

def c_spinor(c):
    if c == float('Inf'):
        return qt.Qobj(np.array([0,1]))
    else:
        return qt.Qobj(np.array([1,c])).unit()

def spin_spinors(spin):
    return [c_spinor(root) for root in poly_roots(spin_poly(spin))]

##########################################################################################

j = 3/2
n = int(2*j+1)
spin = qt.rand_ket(n)
print(spin)
h = spin_homog(spin)
spinors = spin_spinors(spin)

for spinor in spinors:
    print(h(spinor))
print()

dt = 0.5
littleX = (-1j*qt.jmat(0.5, 'x')*dt).expm()
bigX = (-1j*qt.jmat(j, 'x')*dt).expm()

spinors2 = [littleX*spinor for spinor in spinors]
spin2 = bigX*spin
print(spin2)
h2 = spin_homog(spin2)

for spinor in spinors2:
    print(h2(spinor))

Now you might ask what is all this good for?

After all, if we really can represent a spin-$j$ state as $2j$ points on the sphere, why not just keep track of those $(x, y, z)$ points and not worry about all these complex vectors and polynomials and so forth? Even for the spin-$\frac{1}{2}$ case, what's the use of working with a two dimensional complex vector representation?

And so, now we need to talk about the Stern-Gerlach experiment.

Suppose we have a bar magnet. It has a North Pole and a South Pole, and it's oriented along some axis. Let's say you shoot it through a magnetic field that's stronger in one direction, so that it gets weaker the more you move up, stronger the more you move down. To the extent that the magnet is aligned with or against the magnetic field, it'll be deflected up or deflected down and tilted a little bit. For a big bar magnet, it seems like it can be deflected any continuous amount, depending on the original orientation of the magnet.

But what makes a bar magnet magnetic? We know that moving charges produce a magnetic field, so we might imagine that there is something "circulating" within the bar magnet.

What happens if we start dividing the bar magnet into pieces? We divide it in two: now we have two smaller bar magnets, each with their own pole. We keep dividing. Eventually, we reach the "atoms" themselves whose electrons are spin-$\frac{1}{2}$ particles, which means they're spinning, and this generates a little magnetic field: to wit, a spin-$\frac{1}{2}$ particle is like the tiniest possible bar magnet. It turns out that what makes a big bar magnet magnetic is that all of its spins are entangled so that they're all pointing in the same direction, leading to a big magnetic field.

Okay, so what happens if we shoot a spin-$\frac{1}{2}$ particle through the magnetic field?

Here's the surprising thing. It is deflected up or down by some fixed amount, with a certain probability, and if it is deflected up, then it ends up spinning perfectly aligned with the magnetic field; and if it is deflected down, then it ends up spinning perfectly anti-aligned with the magnetic field. There's only two outcomes.

![](img/stern_gerlach1.jpg)

The experiment was originally done with silver atoms which are big neutral atoms with a single unpaired electron in their outer shells. In fact, here's the original photographic plate.

![](img/stern_gerlach2.jpg)

This was one of the crucial experiments that established quantum mechanics. It was conceived in 1921 and performed in 1922. The message is that spin angular momentum isn't "continuous" when you go to measure it; it comes in discrete quantized amounts.

To wit, suppose your magnetic field is oriented along the Z-axis. If you have a spin-$\frac{1}{2}$ state $\begin{pmatrix} a \\ b \end{pmatrix}$ represented in the Z-basis, then the probability that the spin will end up in the $\mid \uparrow \rangle$ state is $aa*$ and the probability that the spin will end up in the $\mid \downarrow \rangle$ state is given by $bb*$. This is of course if we've normalized our state so that $aa* + bb* = 1$, which of course we always can.

What if we measured the spin along the Y-axis? Or the X-axis? Or any axis? We'd express the vector $\begin{pmatrix} a \\ b \end{pmatrix}$ in terms of the eigenstates of the Y operator or the X operator, etc. We'd get another vector $\begin{pmatrix} c \\ d \end{pmatrix}$ and $cc*$ would be the probability of getting $\uparrow$ in the Y direction or getting $\downarrow$ in the Y direction.

For spin-$\frac{1}{2}$ there is an easy way to think about where these probabilities come from geometrically.

![](img/spin_probability.jpg)

The eigenstates of a 2x2 Hermitian matrix correspond to orthogonal complex vectors, but antipodal points on the sphere: so they define an axis: this is the axis you're measuring along, the axis of the magnetic field in the Stern-Gerlach set-up. Now given a point on the sphere representing spin-$\frac{1}{2}$ particle's spin axis, you project that point perpendicularly onto the measurement axis. This divides that line segment into two pieces. If you imagine that the line is like an elastic band that snaps randomly in some location, and then the two ends are dragged to the two antipodal points, carrying the projected point with it, the projected point will ends up at one or other of the two antipodal locations with a certain probability, which is indeed the correct probability for that experiment. Note that this simple geometric picture only works for spin-$\frac{1}{2}$; the generalization to higher spin isn't so easily visualizable, but is basically analogous.

What happens if we send a spin-$1$ particle through the machine?

![](img/stern_gerlach3.jpg)

The particle ends up in one of three locations, and it's spin will be in one of the eigenstates of the operator in question, each with some probability: recall they correspond to, two points at the North pole; one point at North Pole, one point at South Pole; two points at South Pole. So if we're in the Z-basis we have some $\begin{pmatrix} a \\ b \\ c \end{pmatrix}$ and the probability of the first outcome is $aa*$, the second outcome is $bb*$, and the third outcome is $cc*$.

And in general, a spin-$j$ particle sent through a Stern-Gerlach set-up will end up in one of $2j+1$ locations, each correlated with one of the $2j+1$ eigenstates of the spin-operator in that direction.

The idea is that a big bar magnet could be treated as practically a spin-$\infty$ particle, with an infinite number of stars at the same point. And so, it "splits into so many beams" that it seems like it could end up in a whole continuum of possible locations, so it took so many thousands of years before we realized that (spin) angular momentum is actually quantized.

Below is a simple version of a Stern-Gerlach set-up. We make some simplifications. We work in 2D. The particle starts to the left and zips along horizontally to the right. The magnetic field/Z direction is up/down. The particle's Z operator couples to its position along the Z axis. And the magnetic field acts on the spin. Finally, we make a finite approximation which is effectively like putting the particle in a box, so after a while it starts reflecting off the edges. In any case, the probability of the particle to be at a given location is visualized as the radius of the sphere shown at that point. Try changing the particles $j$-value, it's initial state, and even the number of lattice spacings!

In [None]:
import numpy as np
import qutip as qt
import vpython as vp

vp.scene.background = vp.color.white

dt = 0.001
n = 5
spinj = 0.5
spinn = int(2*spinj+1)

S = {"X": qt.tensor(qt.identity(n), qt.identity(n), qt.jmat(spinj, 'x')/spinj),\
     "Y": qt.tensor(qt.identity(n), qt.identity(n), qt.jmat(spinj, 'y')/spinj),\
     "Z": qt.tensor(qt.identity(n), qt.identity(n), qt.jmat(spinj, 'z')/spinj)}

P = {"X": qt.tensor(qt.momentum(n), qt.identity(n), qt.identity(spinn)),\
     "Z": qt.tensor(qt.identity(n), qt.momentum(n), qt.identity(spinn))}

B = np.array([0, 0, -1])
H = (P["X"]*P["X"] + P["Z"]*S["Z"]) + \
        sum([B[i]*S[o] for i, o in enumerate(S)])
U = (-1j*dt*H).expm()

Q = qt.position(n)
Ql, Qv = Q.eigenstates()
zero_index = -1
for i, l in enumerate(Ql):
    if np.isclose(l, 0):
        zero_index = i

initial = qt.tensor(Qv[0], Qv[zero_index], qt.basis(2, 0))
state = initial.copy()

vspheres = [[vp.sphere(color=vp.color.blue,\
                        radius=0.5, opacity=0.3,\
                        pos=vp.vector(Ql[i], Ql[j], 0))\
                for j in range(n)] for i in range(n)]
varrows = [[vp.arrow(pos=vspheres[i][j].pos,\
                      axis=vp.vector(0,0,0))\
                 for j in range(n)] for i in range(n)]

Qproj = [[qt.tensor(qt.tensor(Qv[i], Qv[j])*qt.tensor(Qv[i], Qv[j]).dag(), qt.identity(spinn))\
              for j in range(n)] for i in range(n)]

evolving = True
def keyboard(event):
    global evolving
    key = event.key
    if key == "q":
        evolving = False if evolving else True
#vp.scene.bind('keydown', keyboard)

while True:
    if evolving:
        for i in range(n):
            for j in range(n):
                vspheres[i][j].radius = qt.expect(Qproj[i][j], state)
                proj_state = Qproj[i][j]*state
                axis = [qt.expect(S[o], proj_state) for i, o in enumerate(["X", "Z"])]
                varrows[i][j].axis = vp.vector(axis[0], axis[1], 0)
                vp.rate(2000)
        state = U*state

Okay, so let's take a moment to stop and think.

If we wanted to be fancy, we could describe a point on a plane $(x, y)$ as being in a "superposition" of $(1,0)$ and $(0,1)$. For example, the point $(1,1)$ which makes a diagonal line from the origin would be an equal superposition of being "horizontal" and "vertical." In other words, we can add locations up and the sum of two locations is also a location. 

It's clear that we can describe a point on the plane in many different ways. We could rotate our coordinate system, use any set of two linearly independent vectors in the plane, and write out coordinates for the same point using many different frames of reference. This doesn't have anything to do with the point. The point is where it is. But we could describe it in many, many different ways as different superpositions of different basis states. Nothing weird about that. They just correspond to "different perspectives" on that point.

Another example of superposition, in fact, the ur-example if you will is waves. You drop a rock in the water and the water starts rippling outwards. Okay. You wait until the water is calm again, and then you drop another rock in the water at a nearby location and the water ripples outwards, etc. Now suppose you dropped both rocks in at the same time. How will the water ripple? It's just a simple sum of the first two cases. The resulting ripples will be a  sum of the waves from the one rock and the waves from the other rock. In other words, you can superpose waves, and that just makes another wave. This works for waves in water, sound waves, even light waves.

Waves are described by differential equations. And solutions to (homogenous) differential equations have the property that a sum of two solutions is also a solution.

For example, suppose we have a simple ordinary differential equation $y'' + 2y' + y = 0$. The idea is to find some function $y(x)$ such that the sum of its second derivative plus two times its first derivative plus the function equals 0. Suppose we had such a function. And what if we had another function $z(x)$ such that $z'' + 2z' + z = 0$, in other words, $z(x)$ is also a solution to the differential equation? If we took any linear combination of $y$ and $z$ then that will also be a solution.

E.g:

$y'' + 2y' + y = 0$

$z'' + 2z' + z = 0$

$(y+z)'' + 2(y+z)' + (y+z) = y'' + z'' + 2y' + 2z' + y + z = (y'' + 2y' + y) + (z'' + 2z' + z) = 0 + 0 = 0 $

This works because taking the derivative is a linear operator! So that $(y + z)' = y' + z'$.

Indeed, we can find a general solution to the different equation using an auxilliary polynomial.

$y'' + 2y' + y = 0 \rightarrow k^{2} + 2k + 1 = 0 \rightarrow (k + 1)(k + 1) = 0 \rightarrow k = \{-1,-1\}$

So that the general solution is $y = c_{0}e^{-x} + c_{1}xe^{-x}$. It doesn't matter the whole theory of how I solved this equation. You can confirm that it works:

$y' = -c_{0}e^{-x} + c_{1}(-xe^{-x} + e^{-x})$

$y'' = c_{0}e^{-x} + c_{1}(xe^{-x} - e^{-x} - e^{-x})$

$ y'' + 2y' + y \rightarrow (c_{0}e^{-x} + c_{1}xe^{-x} - c_{1}e^{-x} - c_{1}e^{-x}) + 2(-c_{0}e^{-x} - c_{1}xe^{-x} + c_{1}e^{-x}) + (c_{0}e^{-x} + c_{1}xe^{-x}) = 2c_{0}e^{-x} - 2c_{0}e^{-x} + 2c_{1}xe^{-x} - 2c_{1}e^{-x} + 2c_{1}e^{-x} - 2c_{1}e^{-x} = 0 $

The point is that the solutions form a 2D vector space and any solution can be written as a linear combination of $e^{-x}$ and $xe^{-x}$. In other words, I can plug any scalars in for $c_{0}$ and $c_{1}$ and I'll still get a solution.

Furthermore, we've seen this happen in terms of our constellations. We can take any two constellations (with the same number of stars) and add them together, and we always get a third constellation. Moreover, any given constellation can be expressed in a multitude of ways, as a complex linear combination of (linearly independent) constellations.

We could take one at the same constellation and express it as a complex linear superposition of X eigenstates, Y eigenstates, or any eigenstates of some Hermitian operator. The nice thing about Hermitian operators is that their eigenvectors are all orthogonal, and so we can use them as a basis.

In other words, the constellation is how it is, but it can be described in different reference frames in different ways, as a superposition of different basis constellations.

Now the daring thing about quantum mechanics is that is says that basically, everything is just like waves. If you have one state of a physical system and another state of a physical system, then if you add them up, you get another state of the physical system. In theory, everything obeys the superposition principle!

There is something mysterious about this, but not in the way it's usually framed. Considering a spin-$\frac{1}{2}$ particle, it seems bizarre to say that its quantum state is a linear superposition of being $\uparrow$ and $\downarrow$ along the Z axis. But we've seen that it's not bizarre at all! Any point on the sphere can be written as linear superposition of $\uparrow$ and $\downarrow$ along some axis! So that you really can imagine that the spin is definitely just spinning around some definite axis, given by a point on the sphere. But when you go to measure it, you pick out a certain special axis: the axis you're measuring along. You can express the point in a sphere as a linear combination of $\uparrow$ and $\downarrow$ along that axis, and then $aa*$ and $bb*$ give you the probabilities that you get either the one or the other outcome.

In other words, the particle may be spinning totally concretely around some axis, but when you go to measure it, you just get one outcome or the other with a certain probability. You can only reconstruct what that axis must have been before you measured it by preparing lots of such particles in the same state, and measuring them all in order to empirically determine the probabilities. In fact, to nail down the state you have to calculate (given some choice of X, Y and Z) $(\langle \psi \mid X \mid \psi \rangle, \langle \psi \mid Y \mid \psi \rangle, \langle \psi \mid Z \mid \psi \rangle)$, in other words, you have to do the experiement a bunch of times measuring the (identically prepared) particles along three orthogonal axes, and get the proportions of outcomes in each of those cases, weight those probabilities by the eigenvalues of X, Y, Z, and then you'll get the $(x, y, z)$ coordinate of the spin's axis.

So yeah, we could just represent our spin-$j$ state as a set of $(x, y, z)$ points on the sphere. But it's better to use the *unitary* representation, as a complex vector, because then we can describe the constellation as a superposition of *outcomes to an experimental situation* where the components of the vector have the interpretation of *probability amplitudes* whose "squares" give the probability for that outcome.

In other words, quantum mechanics effects a radical generalization of the idea of a perspective shift or reference frame. We normally think of a reference frame as provided by three orthogonal axis in 3D, say, by my thumb, pointer finger, and middle finger splayed out. In quantum mechanics, the idea is that *an experimental situation* provides a reference frame in the form of a Hermitian operator--an *observable*--whose eigenstates provide "axes" along which we can decompose the state of system, even as the correspond to *possible outcomes to the experiment*. The projection of the state on each of these axes gives the probability (amplitude) of that outcome.

It's an amazing twist on the positivist idea of describing everything as a black box: some things go in, some things come out, and you look at the probabilities of the outcomes, and anything else one must pass over in silence. And that's all one can hope for. In the spin case, we can see that this works, but with a twist: the spin state is a superposition of "possible outcomes to the experiment," which at first seems metaphysically bizarre, but this superposition can also be interpreted geometrically as: a perfectly concrete constellation on the sphere. (Indeed, you can interpret the "stars" in the constellations as little vortices, little tornados, churning the sphere around--more on this momentarily.)

Indeed, Dirac writes in his Principles of Quantum Mechanics, "[QM] requires us to assume that ... whenever the system is definitely in one state we can consider it as being partly in each of two or more other states. The original state must be regarded as the result of a kind of superposition of the two or more new states, in a way that cannot be conceived on classical ideas."

Similarly, Schrodinger proposed his cat as a kind of reductio ad absurdum to the universal applicability of quantum ideas. He asks us to consider whether a cat can be in a superposition of $\mid alive \rangle \ + \mid dead \rangle$. Such a superposition seems quite magical if we're considering a cat. How can a cat be both alive and dead? And yet, the analogous question for a spin isn't mysterious at all. How can a spin be in a superposition of $\mid \uparrow \rangle \ + \mid \downarrow \rangle$ (in the Z basis)? Easily! It's just pointed in the X+ direction! The analogous thing for a cat would be something like $\mid alive \rangle \ + \mid dead \rangle = \mid sick? \rangle$--I don't propose that seriously, other than to emphasize that what the superposition principle is saying is that superpositions of states *have to be perfectly good comprehensible states of the system as well*.

In the case of a quantum spin, we can see that we can regard the spin's constellation as being perfectly definite, but describable from many different reference frames, different linearly independent sets of constellations. Which reference frame you use is quite irrelevant to the constellation, of course, until you go to measure it, and then the spin ends up in one of the reference states with a certain probability. So the mystery isn't superposition per se; it has something to do with measurement. (You might ask how this plays out in non-spin cases, for example, superpositions of position: I'll return to this issue.)

One nice analogy is the idea of a "forced choice question." I show you a picture of a diagonal line and I ask you is it a vertical line or a horizontal line? And you're like it's both! It's equal parts vertical and horizontal! How can I choose? I have as many reasons to choose vertical or horizontal? If the line were inclined a little to the horizontal, then maybe I'd be inclined to say it was horizontal more than vertical, although not by much! But I force you to choose. You have to pick one! What do you do? You have to make a choice, but there is literally no reason for you to choose one or the other, in the sense that you have as many reasons to choose horizontal as to choose vertical. And so, you have to choose randomly. 

It's the same with the spin. Suppose it's spinning in the X+ direction. And I ask it, Are you up or down along the Z axis? Well, both! In terms of the question, it's equal parts up and down in the Z direction. But the experimental situation forces the particle to give some answer, and it has as many reasons to choose the one or the other, and so it just... picks one at random! In other words, this is the ultimate origin for quantum indeterminism: you've forced a physical system to make a choice, but it has no reason to choose: and so, it just decides for itself. It's like: you ask a stupid question, you get a stupid answer.

There's a old folktale from the Middle Ages called Buridan's Ass or the Philosopher's Donkey. The old donkey has been ridden hard by the philosopher all day, and by evening, it's both hungry and thirsty. The philosopher (being, who knows, perhaps in an experimental mindset) lays out a bucket of water and a bucket of oats before the donkey. The donkey is really hungry, so it wants to go for the oats, but it's also really thirsty, so it wants to go for the water. In fact, it's just as hungry as it is thirsty. It has to pick oats or water to dig into first, but it has as many reasons to pick the oats first as the water first, and so it looks back and forth and back and forth, and can't decide, and eventually passes out, steam coming out of its ears.

Leibniz articulated something called the Principle of Sufficient Reason, which was once very popular, and it basically said that: everything happens for a reason. The donkey exposes the consequence of this idea. If everything has to happen for a reason, and the donkey has as many reasons to choose the water as the oats, then the donkey can't do anything, since there would be no reason for choosing one over the other. And so, the donkey keels over--for what reason? It had as many reasons to choose the water as the oats.

Quantum mechanics strongly suggests that the Principle of Sufficient Reason is false. Nature, as it were, cuts the Gordian knot of the philosopher. An X+ spin presented with a bucket of $\uparrow$ and a bucket of $\downarrow$ doesn't glitch out, even though it has as many reasons to choose the one as the other. Instead: it just fucking picks one, for no reason at all. It decides all on its own without any help from "reasons." 

If the spin had been in the Z+ state, of course, and we asked $\uparrow$ or $\downarrow$ along the Z axis, then it would answer up with certainty, and vice versa for Z-.

A further point. A spin in itself can be described in terms of its constellation. This constellation can be described from many points of view, each corresponding to some "observable," some experimental situation, which as it were provides a filter, separating out this outcome from that outcome. We can also think about this in terms of entanglement. In the Stern-Gerlach apparatus, we actually measure the position of the particle, which has become entangled with the spin, so that if the particle is deflected up, it's spin is $\uparrow$ and if it's deflected down, it's spin is $\downarrow$. So that even though the particle's constellation is defined intrinsically, regardless of reference frame, the entanglement between the particle's spin and its position provides a reference frame, in that some (arbitrary) basis states of the particle's spin are paired with some (arbitrary) basis states of the particle's position. So that now there is a dependency between the outcomes. If the particle is measured to be here, then it's spin will be $\uparrow$. If the particle is measured to be there, then it's spin will be $\downarrow$--and vice versa. (You might wonder how we know that the spin's state at all? Well, if we measure a spin to be $\uparrow$ along the Z-axis, and then send it through another Stern-Gerlach set-up oriented along the Z-axis, then it'll always deflect up, but if we set it through a X-oriented apparatus, it'll give $\uparrow$ or $\downarrow$ with equal probability, etc--and the spin state explains why this is the case, which can't be derived from the position state alone.)

So although the reference frame by which we describe something is arbitrary, things can get entangled which pairs basis states of the one with basis states of the other, so that each pairing of basis states now has its own probability, and the pairs are all in superposition, leading to correlations between the outcomes of experiments done on the two things.

So that, while in its own terms the spin has its constellation, it interfaces with the world via some decomposition of that constellation into a superposition over "experimental outcomes" (which are constellations themselves), which can get entangled with other outcomes of other systems. So that the in itself arbitrary decomposition of the constellation into basis constellations provides the hooks by which the spin becomes correlated with the rest of the world.

(Returning to the pebble. If at first, to communicate with a pebble, we had to agree on whether an absence or a presence was significant, then on where to start counting, and then where 0 was, and then what rational number translated between our different notions of "1", and then a complex number to rotate from my axis to your axis, now we need to specify the basis vectors in terms of which we've written our "polynomial." This could mean: the 3D coordinate frame for some $(x, y, z)$ point, but in the general case, the basis vectors are bundled into a Hermitian matrix representing a quantum observable, so that to specify the basis vectors is to: specify which experiment you did. To wit, I do the Stern-Gerlach experiment, and I put a pebble in pile A if I got up, and a pebble in pile B if I got down. In order to successfully communicate to you what I mean by those pebbles, I have to specify: what experiment provided the reference frame for those outcomes, and which eigenvalue/eigenvector pair the pebble corresponds to. Then we can both compute the proportions between the outcomes, and reconstruct the state of the physical system, and we three will all be in agreement about what we're talking about.)

A further point.

Do we ever observe the "stars" themselves? Well, not exactly: we can reconstruct the quantum state of the spin via a buch of measurements on identically prepared systems, and then: express it in the Z-basis, and find the roots of the polynomial, etc. But they really are there, as the following argument shows.

A "spin coherent state" is just a state where all the stars in the same location. 


Operational meaning of stars: coherent states. Husimi wave function on sphere. Antipodal to constellation. Coherent basis, etc.

-> Symmeterized spinors. Geometric quantization. Whole > Part => Whole = Product of Parts. Clebsch...
-> Two oscillators. QFT. Bosons. Schwinger. "Number operators."

Godel resolves question of whether we can reduce our concepts to  a single set of atoms, like if we were computer, we couldn't--but the world still could--and quantum mechanics shows...

Full magic.

Position meditation. Atiyah.