# General Relativity

**General relativity** is a theory of how gravity works. In General Relativity, gravity is not a force, but rather an effect caused by curved spacetime. This conclusion is based on two fundamental principles:

- The **Equivalence Principle**, which says that gravity is indistinguishable from the effect of an accelerating reference frame
- The **Principle of Covariance**, which says that all laws of physics should be in the same form in all reference frames

The culminating breakthrough of General Relativity is summarized succinctly by the Einstein Field Equations:

$$
G_{\alpha \beta} = \frac{8\pi G}{c^4} T_{\alpha \beta}
$$

This deceptively simple tensor equation relates the curvature of spacetime to the distribution of matter and energy within it, giving rise to John Archibald Wheeler's famous summary of General Relativity:

> "Spacetime tells matter how to move; matter tells spacetime how to curve"

## The Equivalence Principle

Consider an observer inside a closed room. This room is accelerating upwards at a constant rate of $9.81\mathrm{\ m/s^2}$. The observer holds a 1-kilogram ball. What would happen if the observer would drop a ball?

Well, we know that the room is under constant upwards acceleration, so when the observer releases the ball, the floor of the room will travel upwards towards it at $9.81\mathrm{\ m/s^2}$. However, to the observer, who is moving upwards *along with* the floor, it would look like everything is stationary, and the *ball* is the object that is falling down.

If we use Newton's second law of motion, we find that the force experienced by the ball would be given by:

$$
\vec F_b = m \vec a
$$

Thus, the force would be:

$$
\vec F_b = -9.81 N
$$

Now, consider another observer, inside another closed room. This room is placed on the surface of the Earth. The observer inside this second room drops another 1-kilogram ball. What would happen next?

Well, the ball will experience the force of Earth's gravity, causing it to fall downwards as well. If we use Newton's law of universal gravitation, we find that the force experienced by the ball would be given by:

$$
\vec F_b = -G\frac{M_1 M_2}{r^2}
$$

We can rearrange this equation to the form:

$$
\vec F_b = \left(\frac{-GM_\oplus}{(r_\oplus)^2}\right) M_2
$$

Using our closest measurements of the mass of the Earth and its radius, we arrive at the result:

$$
\vec F_b = -9.81 N
$$

Notice that this is the **same result** as our closed room moving upwards through space at $9.81\mathrm{\ m/s^2}$. The effect of gravity and of an accelerated reference frame is the same. But this is just a coincidence, right? Or is it...?

Imagine you were in either the closed room in space or the closed room on Earth, but you weren't told which one. Is there any way you could tell which room it was? No, it would be impossible to perform an experiment to tell the closed room in space from the closed room on Earth.

So, gravity is **indistinguishable** from accelerated reference frames. This is the **equivalence principle**.

## Reviewing the spacetime metric

To understand General Relativity, we must first be familiar with the idea of **events**.

An event is _anything that happens_. This could be, “a spaceship flew through my window”. Yes, that is an event.

We can describe the event by finding the position and time it occured, *relative* to a chosen reference frame:

- E.g. “a spaceship flew through my window at 5 meters left and 6 meters in front of my head, at 2 meters above sea level, at 2:30 pm, on January 15th, 2021”
- We have a x-coordinate (5 meters left of my head), a y-coordinate (6 meters in front of my head), a z-coordinate (2 meters above sea level), and a time coordinate (2:30 pm 1/15/21)

To describe the distance between two events, we use a spacetime metric. This could be the Euclidean metric $\delta_{\alpha \beta}$, the Minkowski metric $\eta_{\alpha \beta}$, or the general metric $g_{\alpha \beta}$. As we've seen before, the Minkowski metric in particular is given by $\eta_{\alpha \beta}$, where $\alpha$ and $\beta$ represent the $(\alpha, \beta)$-th entry of the matrix:

$$
\eta_{\alpha\beta} = \begin{pmatrix}-1 & 0 & 0 &0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\end{pmatrix} = \begin{pmatrix}\eta_{{00}}& \eta_{{01}}& \eta_{{02}}& \eta_{{03}}\\\eta_{{10}}& \eta_{{11}}& \eta_{{12}}& \eta_{{13}}\\\eta_{{20}}& \eta_{{21}}& \eta_{{22}}& \eta_{{23}}\\\eta_{{30}}& \eta_{{31}}& \eta_{{32}}& \eta_{{33}}\\\end{pmatrix}
$$

```{admonition} A note about really confusing metric notation
In General Relativity, it is customary to count time as the 0th dimension rather than the 4th dimension. This is why $\alpha$ and $\beta$ range from 0 to 3 instead of 1 to 4 (as you ordinarily might expect).
```

```{admonition} A note about signatures
The Minkowski metric is similar to the Euclidean metric $\delta_{\alpha \beta}$ with one major difference: $\eta_{00} = -1 \neq \delta_{00}$ (see second appendix for why). So, the metric tensor has a _signature_ - that of $(- + + +)$ along its diagonals. Note that sometimes, physicists will confusingly use a metric signature of $(+---)$, which returns the same distance in spacetime. These two metric signatures are _functionally equivalent_ because the metric tensor is symmetric; the only difference is your personal preference. I will stick with $(- + + +)$ for consistency here.
```

Recall that the line element can be written in terms of the product of two infinitesimal displacement vectors multiplied by the metric:

$$
ds^2 = \begin{bmatrix}cdt & dx & dy & dz\end{bmatrix} \begin{bmatrix}cdt \\ dx \\ dy \\ dz\end{bmatrix} \begin{pmatrix}-1 & 0 & 0 &0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\end{pmatrix}
$$

We can denote one of these infinitesimal displacement vectors $dx^\alpha$ and the other $dx^\beta$, so we have:

$$
ds^2 = \eta_{\alpha \beta} dx^\alpha dx^\beta
$$

Additionally, since the components of metric tensors can vary as spacetime is curved and distances change, we shouldn't expect that the metric of spacetime will always be $\eta_{\alpha \beta}$; in fact, that would only be true for flat (uncurved) spacetime. So, we replace $\eta_{\alpha \beta}$ with the more general form of the metric tensor $g_{\alpha \beta}$, which applies to all spacetime metrics. We finally arrive at the general metric tensor:

$$
ds^2 = g_{\alpha \beta} dx^\alpha dx^\beta
$$

This is the most common form of the spacetime metric you will see, and it is the form we will use going forward.

### Spacelike, Timelike, and Lightlike Intervals

When finding the spacetime interval between two events, we can describe the interval using one of three terms:

* If the spacetime interval is **positive**, it is spacelike: the two events are separated by space
* If the spacetime interval is **negative**, it is timelike: the two events are separated by time
* If the spacetime interval is **zero**, it is lightlike: a beam of light could travel directly from one event to the other

## The Einstein Summation Convention

Previously, we have already been introduced to index notation - for example, we saw that position could be represented by $x^\mu = x, y, z$, and that an equation such as $v^\mu = \frac{dx^\mu}{dt}$ is actually a system of three equations, one each for $x$, $y$, and $z$. W've also seen that we can generally let the letters we use for tensor indices to be whatever we want, and the equations will still be consistent. For example, $g_{\alpha \beta} = g_{ij} = g_{\alpha \gamma} = g_{\mu \gamma}$. There is a difference between these two forms of indices, however.

The first type, where we have a single index, is called a **free index**. Free indices result in a different equation for each coodinate - for instance, given $F^\mu = ma^\mu$, then we have the system of equations:

$$
\begin{cases}
F^x = ma^x \\
F^y = ma^y \\
F^z = ma^z
\end{cases}
$$

The second type, where we have an index that repeats once as a lower index and once as an upper index in a term, is used to stand for summation. For example, recall the multivariable chain rule:

$$
\frac{\partial f}{\partial t} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial t} + \frac{\partial f}{\partial z} \frac{\partial z}{\partial t}
$$

We can rewrite this with summation:

$$
\frac{\partial f}{\partial t} = \sum_{i = 1}^3 \frac{\partial f}{\partial x_i} \frac{\partial x_i}{\partial t}
$$

Note that we have an index $i$ that appears in the lower index and one that appears in the upper index, so by the Einstein summation convention, we can get rid of the summation sign:

$$
\frac{\partial f}{\partial t} = \frac{\partial f}{\partial x_i} \frac{\partial x_i}{\partial t}
$$

And because the index is used for summation, it makes no difference if we change $i$ to $j$ or $k$ or $u$, the equation will mean the same thing:

$$
\frac{\partial f}{\partial t} = \frac{\partial f}{\partial x_j} \frac{\partial x_j}{\partial t}
$$

This works the same way if are working with tensors. Consider the following tensor equation:

$$
K_{i} = a_{ij} b^{j} 
$$

Notice that $j$ appears once as a top index, and once as a bottom index. Thus, $j$ must be a summation index - in GR we call these **dummy indices**. In contrast, $i$ and $k$ don't appear twice on the top and bottom, so are free indices, representing a system of equations. Therefore, if we were to expand the dummy summation index, where $j$ goes from 1 - 3, we'll get:

$$
K_i = a_{i1} b^1 + a_{i2} b^2 + a_{i3} b^3
$$

And we let $x = 1, y = 2, z = 3$, then:

$$
K_i = a_{ix} b^x + a_{iy} b^y + a_{iz} b^z
$$

In comparison, remember that $i$ is a free index that expands into a system of equations. So, using $i = (x, y, z)$, we have:

$$
\begin{cases}
K_x = a_{xx} b^x + a_{xy} b^y + a_{xz} b^z \\
K_y = a_{yx} b^x + a_{yy} b^y + a_{yz} b^z \\
K_z = a_{zx} b^x + a_{zy} b^y + a_{zz} b^z
\end{cases}
$$

In general, dummy indices can be changed at will, but free indices cannot. This is because changing a dummy index is just changing the index you use for the summation, which is totally arbitrary, but changing a free index would change the system of equations into a completely different system of equations.

The use of the Einstein summation convention allows the equations of General Relativity to be written very compactly. For instance, take the definition of the Ricci tensor, given by:

$$
R_{ij} = \frac{\partial \Gamma^k_{ij}}{\partial x^k} - \frac{\partial \Gamma^k_{ik}}{\partial x^j} + \Gamma^k_{ij} \Gamma^m_{km} - \Gamma^k_{im} \Gamma^m_{jk}
$$

Observe that only $i$ and $j$ are free indices - the other two indices $m$ and $k$ appear both on upper and on lower indices, making them summation indices. If we were to fully write out just the summation of the Ricci tensor, where $m$ and $k$ both sum from 0 to 3, and we assume $0 = t$, $1 = x$, $2 = y$, $3 = z$, we'd get:

$$
R_{ij} = \frac{\partial \Gamma^t_{ij}}{\partial x^t} - \frac{\partial \Gamma^t_{it}}{\partial x^j} + \Gamma^t_{ij} \Gamma^t_{tt} - \Gamma^t_{it} \Gamma^t_{jt} + \frac{\partial \Gamma^x_{ij}}{\partial x^x} - \frac{\partial \Gamma^x_{ix}}{\partial x^j} + \Gamma^x_{ij} \Gamma^t_{xt} - \Gamma^x_{it} \Gamma^t_{jx} + \frac{\partial \Gamma^y_{ij}}{\partial x^y} - \frac{\partial \Gamma^y_{iy}}{\partial x^j} + \Gamma^y_{ij} \Gamma^t_{yt} - \Gamma^y_{it} \Gamma^t_{jy} + \frac{\partial \Gamma^z_{ij}}{\partial x^z} - \frac{\partial \Gamma^z_{iz}}{\partial x^j} + \Gamma^z_{ij} \Gamma^t_{zt} - \Gamma^z_{it} \Gamma^t_{jz} + \frac{\partial \Gamma^t_{ij}}{\partial x^t} - \frac{\partial \Gamma^t_{it}}{\partial x^j} + \Gamma^t_{ij} \Gamma^x_{tx} - \Gamma^t_{ix} \Gamma^x_{jt} + \frac{\partial \Gamma^x_{ij}}{\partial x^x} - \frac{\partial \Gamma^x_{ix}}{\partial x^j} + \Gamma^x_{ij} \Gamma^x_{xx} - \Gamma^x_{ix} \Gamma^x_{jx} + \frac{\partial \Gamma^y_{ij}}{\partial x^y} - \frac{\partial \Gamma^y_{iy}}{\partial x^j} + \Gamma^y_{ij} \Gamma^x_{yx} - \Gamma^y_{ix} \Gamma^x_{jy} + \frac{\partial \Gamma^z_{ij}}{\partial x^z} - \frac{\partial \Gamma^z_{iz}}{\partial x^j} + \Gamma^z_{ij} \Gamma^x_{zx} - \Gamma^z_{ix} \Gamma^x_{jz} + \frac{\partial \Gamma^t_{ij}}{\partial x^t} - \frac{\partial \Gamma^t_{it}}{\partial x^j} + \Gamma^t_{ij} \Gamma^y_{ty} - \Gamma^t_{iy} \Gamma^y_{jt} + \frac{\partial \Gamma^x_{ij}}{\partial x^x} - \frac{\partial \Gamma^x_{ix}}{\partial x^j} + \Gamma^x_{ij} \Gamma^y_{xy} - \Gamma^x_{iy} \Gamma^y_{jx} + \frac{\partial \Gamma^y_{ij}}{\partial x^y} - \frac{\partial \Gamma^y_{iy}}{\partial x^j} + \Gamma^y_{ij} \Gamma^y_{yy} - \Gamma^y_{iy} \Gamma^y_{jy} + \frac{\partial \Gamma^z_{ij}}{\partial x^z} - \frac{\partial \Gamma^z_{iz}}{\partial x^j} + \Gamma^z_{ij} \Gamma^y_{zy} - \Gamma^z_{iy} \Gamma^y_{jz} + \frac{\partial \Gamma^t_{ij}}{\partial x^t} - \frac{\partial \Gamma^t_{it}}{\partial x^j} + \Gamma^t_{ij} \Gamma^z_{tz} - \Gamma^t_{iz} \Gamma^z_{jt} + \frac{\partial \Gamma^x_{ij}}{\partial x^x} - \frac{\partial \Gamma^x_{ix}}{\partial x^j} + \Gamma^x_{ij} \Gamma^z_{xz} - \Gamma^x_{iz} \Gamma^z_{jx} + \frac{\partial \Gamma^y_{ij}}{\partial x^y} - \frac{\partial \Gamma^y_{iy}}{\partial x^j} + \Gamma^y_{ij} \Gamma^z_{yz} - \Gamma^y_{iz} \Gamma^z_{jy} + \frac{\partial \Gamma^z_{ij}}{\partial x^z} - \frac{\partial \Gamma^z_{iz}}{\partial x^j} + \Gamma^z_{ij} \Gamma^z_{zz} - \Gamma^z_{iz} \Gamma^z_{jz}
$$

And remember, this is just expanding the summations! This isn't even writing out the system of equations for each free index! You can go and see the full Einstein field equations with each system of equations written out at <https://github.com/bnschussler/Fully-Expanded-Einstein-Field-Equations>, and it is 22 pages long!

## The Geodesic Equation

We know from Newton's first law of motion that an object in motion stays in motion at constant speed - that is, it undergoes no acceleration. In other terms:

$$
\frac{d^2x^\alpha}{d\tau^2} = 0
$$

Where:

$$
x^\alpha = x^1, x^2, x^3 ... \: x^n = x, y, z \: ... \: n
$$

This is why, for instance, a ball rolling along an infinitely long hallway will keep going in a path in the same direction - its velocity vector, and thus its directions, stays constant. In nice Euclidean space, we call this path a "straight line" - the effect of going ahead in the same direction forever.

As we know, in Euclidean space, a straight line is the shortest path between two points, which we call a _geodesic_. We might be tempted to phrase Newton's first law to say that "a particle in motion will travel along a straight line". However, in non-Euclidean geometries, a geodesic is not necessarily a straight line. So, we must generalize Newton's first law with modifications: **a particle in motion will move along a geodesic**.

To formulate this law mathematically, we can say that the action along the path $x^k(\lambda)$ between 2 points $A = x^k(0)$ and $B = x^k(1)$ must be minimized. We include the units of $mc$ to get the units for action right, so the full action along the path $x^k (\lambda)$ is:

$$
S = -mc \int \sqrt{-ds^2}
$$

```{admonition} About the line element
Here, the line element is negative to get rid of the annoying $-c^2 dt^2$ element which would yield an imaginary number if square rooted. Because the distance along the path is the same whether you travel in the forwards direction ($\sqrt{ds^2}$) or in the backwards direction ($\sqrt{-ds^2}$), the result is equivalent.
```

We can expand this out by writing $ds^2 = g_{ij} dx^i dx^j$, so:

$$
S = -mc \int \sqrt{-g_{ij} dx^i dx^j}
$$

Now, to actually be able to solve, we need to write the integrand in terms of our path parameter $\lambda$. To do this, we can divide $dx^i$ and $dx^j$ both by $d\lambda$, then, to keep the integrand the same, multiply by $d\lambda \cdot d\lambda = d\lambda^2$:

$$
S = -mc \int \sqrt{-g_{ij} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} d\lambda^2}
$$

We can then take out the $d\lambda$ from the square root to have:

$$
S = -mc \int \sqrt{-g_{ij} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}} d\lambda
$$

Knowing that the integrand of the action is the Lagrangian, we can write:

$$
\mathcal{L} = \sqrt{-g_{ij} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}}
$$

We can more specifically write it out that that metric $g_{ij}$ is a function of $x^k$, so:

$$
\mathcal{L} = \sqrt{-g_{ij}(x^k) \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}}
$$

We can apply the familiar Euler-Lagrange equations to our Lagrangian, as we've done before, to find the equations of motion for the particle traveling along the path:

$$
\frac{d}{d\lambda} \frac{\partial \mathcal{L}}{\partial \dot x^k}
= \frac{\partial \mathcal{L}}{\partial x^k}
$$

Let's first take the derivative with respect to $x^k$. We use the chain rule and the fact that $\frac{\partial}{\partial x} \sqrt{u} = \frac{1}{2} u^{-\frac{1}{2}} \frac{\partial u}{\partial x}$. In our Lagrangian, the only part that actually depends on $x^k$ is $g_{ij}$, so the rest of the Lagrangian can be thought of as a constant. So we have:

$$
\frac{\partial \mathcal{L}}{\partial x^k} = -\frac{1}{2} 
\left(g_{ij}(x^k) \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}\right)^{-1/2}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}
$$

We can simplify this by noticing that the Lagrangian itself appears in its derivative, so we can write:

$$
\frac{\partial \mathcal{L}}{\partial x^k} = -\frac{1}{2} 
\mathcal{L}^{-1}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}
$$

Which simplifies to:

$$
\frac{\partial \mathcal{L}}{\partial x^k} = -\frac{1}{2 \mathcal{L}}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}
$$

Now, let's take the derivative with respect to $\dot x^k$. Here, using the same Lagrangian substitution technique and square root derivative, we find that:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
-\frac{1}{2\mathcal{L}}
\left(-\frac{\partial}{\partial \dot x^k} g_{ij} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}\right)
$$

Remember that $g_{ij}$ isn't dependent on $\dot x^k$, so we can simplify factor it out for now:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{2\mathcal{L}} g_{ij}
\left(\frac{\partial}{\partial \dot x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda}\right)
$$

To differentiate the part inside brackets of the previous expression, we use the product rule, namely $\frac{\partial}{\partial x} fg = \frac{\partial}{\partial x}f \cdot g + \frac{\partial}{\partial x} g \cdot f$, to get:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
-\frac{1}{2\mathcal{L}} g_{ij}
\left(\frac{\partial \dot x^i}{\partial \dot x^k} \frac{dx^j}{d\lambda} + \frac{\partial \dot x^j}{\partial \dot x^k} \frac{dx^i}{d\lambda}\right)
$$

Now, we will use the Kronecker delta-partial derivative rule:

$$
\frac{\partial \dot x^i}{\partial \dot x^k} = {\delta^i}_k
$$

This simplifies the expression to:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{2\mathcal{L}} g_{ij}
\left({\delta^i}_k \frac{dx^j}{d\lambda} + {\delta^j}_k \frac{dx^i}{d\lambda}\right)
$$

Finally, we distribute the expression with the metric $g_{ij}$:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{2\mathcal{L}}
\left(g_{ij} {\delta^i}_k \frac{dx^j}{d\lambda} + g_{ij} {\delta^j}_k \frac{dx^i}{d\lambda}\right)
$$

Remembering that the Kronecker delta can be used to relabel indices:

$$
g_{\alpha \beta} {\delta^\alpha}_\mu = g_{\beta \mu}
$$

We rewrite the expression as:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{2\mathcal{L}}
\left(g_{jk} \frac{dx^j}{d\lambda} + g_{ik} \frac{dx^i}{d\lambda}\right)
$$

Finally, let's remember the Einstein summation convention, which tell us that dummy indices can be changed to whatever indices we want, because they are just summation indices. Here, since $i$ and $j$ appear both as lower indices and as upper indices, they are dummy indices. That means:

$$
g_{jk} \frac{dx^j}{d\lambda} = g_{ik} \frac{dx^i}{d\lambda}
$$

So to simplify the expression, we can replace all the $j$'s with $i$'s (as we can do with dummy indices), to obtain:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{2\mathcal{L}}
\left(g_{ik} \frac{dx^i}{d\lambda} + g_{ik} \frac{dx^i}{d\lambda}\right)
$$

Which simplifies to:

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{2\mathcal{L}}
\left(2 g_{ik} \frac{dx^i}{d\lambda}\right)
$$

$$
\frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{\mathcal{L}}
g_{ik} \frac{dx^i}{d\lambda}
$$

Now we simply need to differentiate our previous result with respect to $\lambda$:

$$
\frac{d}{d\lambda} \frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{d}{d\lambda} \left(\frac{1}{\mathcal{L}}
g_{ik} \frac{dx^i}{d\lambda}\right) = 
\frac{1}{\mathcal{L}} \frac{d}{d\lambda} \left(
g_{ik} \frac{dx^i}{d\lambda}\right)
$$

Here, we use the product rule again with the terms in the brackets, which give us:

$$
\frac{d}{d\lambda} \frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{\mathcal{L}} \left(
\frac{\partial g_{ik}}{\partial \lambda} \frac{dx^i}{d\lambda} + \frac{d^2 x^i}{d\lambda^2} g_{ik}
\right)
$$

Now, we know that technically $g_{ij} = g_{ij} (x^k (\lambda))$, so we can use the chain rule to write:

$$
\frac{\partial g_{ik}}{\partial \lambda} = \frac{\partial g_{ik}}{\partial x^a}
\frac{d x^a}{d \lambda}
$$

Here, $a$ can be any index that isn't already one of the free indices (the free indices are $i$ and $k$ here). We'll choose $a$, but really it can be anything.

So we have:

$$
\frac{d}{d\lambda} \frac{\partial \mathcal{L}}{\partial \dot x^k} =
\frac{1}{\mathcal{L}} \left(
\frac{\partial g_{ik}}{\partial x^a}
\frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda} + \frac{d^2 x^i}{d\lambda^2} g_{ik}
\right)
$$

Equating the two sides of the Euler-Lagrange equation, we have:

$$
-\frac{1}{2 \mathcal{L}}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 
\frac{1}{\mathcal{L}} \left(
\frac{\partial g_{ik}}{\partial x^a}
\frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda} + \frac{d^2 x^i}{d\lambda^2} g_{ik}
\right)
$$

We can move the left-hand side to the right to get:

$$
\frac{1}{\mathcal{L}} \left(
\frac{\partial g_{ik}}{\partial x^a}
\frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda} + \frac{d^2 x^i}{d\lambda^2} g_{ik}
\right) -\frac{1}{2 \mathcal{L}}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

And we can factor out the common factor of the Lagrangian and get rid of it by dividing it from both sides of the equation (remember zero divided by anything is still zero):

$$
\left(
\frac{\partial g_{ik}}{\partial x^a}
\frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda} + \frac{d^2 x^i}{d\lambda^2} g_{ik}
\right) -\frac{1}{2}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

As well as rearranging the terms to put the second derivative in front:

$$
\left(
\frac{d^2 x^i}{d\lambda^2} g_{ik} + \frac{\partial g_{ik}}{\partial x^a} \frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda}
\right) -\frac{1}{2}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

Now, we can remove the brackets:

$$
\frac{d^2 x^i}{d\lambda^2} g_{ik} + \frac{\partial g_{ik}}{\partial x^a} \frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda}
 -\frac{1}{2}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

Notice that this equation has **three** dummy indices - $i$ (which appears both in $d^2 x^i$ as upper index and $g_{ik}$ as lower index), $a$ (which appears both in the lower part of a partial derivative and upper part of another derivative $dx^a$), and $j$ (which appears as a lower index in $g_{ij}$ and an upper index in $dx^j$). Remember this! It'll be very important later! 

Let's take a close look at the middle term:

$$
\frac{\partial g_{ik}}{\partial x^a} \frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda}
$$

It can be seen that it can be alternatively written as:

$$
\frac{1}{2} \left(\frac{\partial g_{ik}}{\partial x^a} + \frac{\partial g_{ik}}{\partial x^a} \right)
\frac{d x^a}{d \lambda} \frac{dx^i}{d\lambda}
$$

Let's distribute this:

$$
\frac{1}{2} \frac{\partial g_{ik}}{\partial x^a} \frac{dx^a}{d \lambda} \frac{dx^i}{d\lambda} + \frac{1}{2} \frac{\partial g_{ik}}{\partial x^a} \frac{dx^a}{d \lambda} \frac{dx^i}{d\lambda}
$$

Note here that again, $a$ and $i$ are dummy indices - $a$ appears on the lower partial derivative and upper partial derivative terms, and $i$ appears in the lower $g_{ik}$ and upper $dx^i$ term. So let's do two index substitutions which will make the equation so much easier to solve. First, note that the equation we extracted this expression from had three dummy indices - let's reduce it to just two by swapping $a$ with $j$:

$$
\frac{1}{2} \frac{\partial g_{ik}}{\partial x^j} \frac{dx^j}{d \lambda} \frac{dx^i}{d\lambda} + \frac{1}{2} \frac{\partial g_{ik}}{\partial x^j} \frac{dx^j}{d \lambda} \frac{dx^i}{d\lambda}
$$

Second, in a somewhat bizarre move, we will do a twin substitution $i \rightarrow j$ and $j \rightarrow i$ in the second term, while leaving the first term alone:

$$
\frac{1}{2} \frac{\partial g_{ik}}{\partial x^j} \frac{dx^j}{d \lambda} \frac{dx^i}{d\lambda} + \frac{1}{2} \frac{\partial g_{jk}}{\partial x^i} \frac{dx^i}{d \lambda} \frac{dx^j}{d\lambda}
$$

Third, we will switch the order of the ordinary derivative terms in both terms, which we can do, as they are commutative products:

$$
\frac{1}{2} \frac{\partial g_{ik}}{\partial x^j} \frac{dx^i}{d\lambda} \frac{dx^j}{d \lambda} + \frac{1}{2} \frac{\partial g_{jk}}{\partial x^i} \frac{dx^i}{d \lambda} \frac{dx^j}{d\lambda} 
$$

Plugging our modified but technically identical version of the middle term back into the equation, we have:

$$
\frac{d^2 x^i}{d\lambda^2} g_{ik} + \frac{1}{2} \frac{\partial g_{ik}}{\partial x^j} \frac{dx^i}{d\lambda} \frac{dx^j}{d \lambda} + \frac{1}{2} \frac{\partial g_{jk}}{\partial x^i} \frac{dx^i}{d \lambda} \frac{dx^j}{d\lambda} 
 -\frac{1}{2}
\frac{\partial g_{ij}}{\partial x^k} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

We now have common factors, which we can combine to form:

$$
\frac{d^2 x^i}{d\lambda^2} g_{ik} + \frac{1}{2} \left(
\frac{\partial g_{ik}}{\partial x^j} + \frac{\partial g_{kj}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k}\right) \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

Now we want to get rid of the $g_{ik}$ term. To do this, we will multiply both sides of the equation with the inverse metric $g^{\mu k}$. This gives:

$$
g^{\mu k} \frac{d^2 x^i}{d\lambda^2} g_{ik} + \frac{1}{2} g^{\mu k} \left(
\frac{\partial g_{ik}}{\partial x^j} + \frac{\partial g_{kj}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k}\right) \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

Let's focus on the first term:

$$
g^{\mu k} \frac{d^2 x^i}{d\lambda^2} g_{ik}
$$

We'll make the tensor contractions easier to see by moving the terms around:

$$
g^{\mu k} g_{ik} \frac{d^2 x^i}{d\lambda^2} 
$$

Recall that $g^{\mu k} g_{ik} = {\delta^\mu}_i$ by the rules of tensor contraction, so we have:

$$
{\delta^\mu}_i \frac{d^2 x^i}{d\lambda^2}
$$

Now, the upper index on the derivative $x^i$ and the lower index $i$ of the Kronecker delta cancel to relabel $i$ to $\mu$, as we saw before in tensor calculus:

$$
{\delta^\mu}_i \frac{d^2 x^i}{d\lambda^2} = \frac{d^2 x^\mu}{d\lambda^2}
$$

So our full equation is:

$$
\frac{d^2 x^\mu}{d\lambda^2} + 
\frac{1}{2} g^{\mu k} \left(
\frac{\partial g_{ik}}{\partial x^j} + 
\frac{\partial g_{kj}}{\partial x^i} - 
\frac{\partial g_{ij}}{\partial x^k}\right) 
\frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

We can further simplify this equation by extracting out the partial derivatives terms in the middle as the **Christoffel symbols**:

$$
\Gamma^\mu_{ij} = \frac{1}{2} g^{\mu k} \left(
\frac{\partial g_{ik}}{\partial x^j} + \frac{\partial g_{kj}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k}\right)
$$

Where (this is clearer if we expand out the partial derivatives term by term, but it can be seen by just glancing at the equation too) the free indices are $\mu$, $j$, and $i$ (which is why they appear on the Christoffel symbol itself), while the dummy index $k$ is used for summation (which is why it's only present in the partial derivatives).

So we finally arrive at the **geodesic equation**:

$$
\frac{d^2 x^\mu}{d\lambda^2} + 
\Gamma^\mu_{ij} \frac{dx^i}{d\lambda} \frac{dx^j}{d\lambda} = 0
$$

Any path that obeys the geodesic equation in spacetime is a geodesic. Since spacetime can be curved, these geodesics are not straight lines. The curvature of spacetime - what we experience as gravity - causes distances to change. This causes the metric to change, which in turn affects the paths of particles.

## The Covariant Derivative

As we've seen by this point, the laws of physics are primarily written in partial differential equations, and so it would be natural to think that General Relativity can be characterized by partial differential equations too. The issue is, consider a vector $\vec V = V^a e_a$. If we were to take its partial derivative, we'd have (using the product rule):

$$
\frac{\partial \vec V}{\partial x^b} = \frac{\partial V^a}{\partial x^b} e_a + \frac{\partial e_a}{\partial x^b} V^a
$$

But remember that tensors should transform like tensors, where each component is at most a partial derivative multiplied by the original tensor? The additional $\frac{\partial e_a}{\partial x^b} V^a$ term means that the regular partial derivative doesn't transform like a tensor. So instead of partial derivatives, we need to define a new type of derivative, the **covariant derivative**, which compensates for the additional term in the partial derivative. The covariant derivative takes the form:

$$
\nabla_b V^a = \frac{\partial V^a}{\partial x^b} + V^k \Gamma^a_{kb}
$$

for vectors (those with an upper index), and the form:

$$
\nabla_b V_a = \frac{\partial V_a}{\partial x^b} - V_k \Gamma^k_{ab}
$$

for covectors (those with a lower index). To take the covariant derivative of tensors formed from both vectors and covectors, such as the metric tensor $g_{\mu \nu}$, we add a term for each upper index the tensor has and subtract a term for each lower index the tensor has (you'll see how this works in just a moment). For example, for the metric tensor, we first write out the covariant derivative as a partial derivative, plus an unknown term:

$$
\nabla_b g_{\mu \nu} = \frac{\partial g_{\mu \nu}}{\partial x^b} + \dots
$$

Then, we notice that the metric tensor has two lower indices, so we need two correction terms. The first correction term is for the index $\mu$, and the second correction term is for the index $\nu$. To emphasize which index each correction term is for, there is a little hat on that index:

$$
\nabla_b g_{\mu \nu} = \frac{\partial g_{\mu \nu}}{\partial x^b} - g_{\hat \mu \nu} A - g_{\mu \hat \nu} B
$$

Now comes the slightly bizarre part. We're going to replace whichever index we're interested in (the one with a hat on) with a dummy index $\alpha$. This is so that the rules of tensor algebra work out such that the covariant derivative transforms like a tensor. So:

$$
\frac{\partial g_{\mu \nu}}{\partial x^b} - g_{\alpha \nu} A - g_{\mu \alpha} B
$$

To figure out the correct index convention for $A$ and $B$, we use the rule that we multiply $\Gamma^\alpha_{\gamma b}$ for each lower index correction term, and multiply $\Gamma^\gamma_{\alpha b}$ for each upper index correction term. Here:

- $\alpha$ is the dummy index we're using
- $\gamma$ is the index of the term we're interested in
- $b$ is the index we take the covariant derivative with respect to. 

So $A = \Gamma^\alpha_{\mu b}$ and $B = \Gamma^\alpha_{\nu b}$. Thus we have:

$$
\nabla_b g_{\mu \nu} = \frac{\partial g_{\mu \nu}}{\partial x^b} - g_{\alpha \nu} \Gamma^\alpha_{\mu b} - g_{\mu \alpha} \Gamma^\alpha_{\nu b}
$$

Also, it should be noted that "covariant derivative" is a bit of a misnomer - here, the definition of the word "covariant" pre-dates the idea of contra- and covariant tensors (tensors with upper/lower indices), and referes to the earlier definition of "invariant". Thus, the covariant derivative is really just a fancy way of saying a derivative of a tensor that is invariant of the coordinates used.

## The Riemann tensor

What can we use to measure the curvature of spacetime? We already know that with the covariant derivative, we can take fully-invariant derivatives in spacetime. But if spacetime is to be curved, then if we take a derivative of a vector along direction $\mu$, then another along direction $\nu$, we'd expect to get a different result than if we were to take a derivative along direction $\nu$, then along direction $\mu$. We can qualitatively describe this as:

$$
\nabla_\mu \nabla_\nu V^\alpha \neq \nabla_\nu \nabla_\mu V^\alpha
$$

The difference between the two sets of derivatives is going to tell us how much the curvature of spacetime varies between the two points. So we simply need to compute:

$$
\nabla_\mu \nabla_\nu V^\alpha - \nabla_\nu \nabla_\mu V^\alpha
$$

First, we expand out the covariant derivatives:

$$
\nabla_\mu (\nabla_\nu V^\alpha) - \nabla_\nu (\nabla_\mu V^\alpha)
$$

$$
\nabla_\mu (\partial_\nu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \nu}) - \nabla_\nu (\partial_\mu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \mu})
$$

Let's take this step by step, and we'll start with the first covariant derivative:

$$
\nabla_\mu (\partial_\nu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \nu})
$$

Notice that because $\sigma$ is a dummy index and contracts, we can rewrite the inside of the parentheses as another tensor:

$$
C^\alpha_\nu = \partial_\nu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \nu}
$$

If we take the covariant derivative of $C^\alpha_\nu$, we know that we have a partial derivative, plus several other correction terms::

$$
\nabla_\mu C^\alpha_\nu = \partial_\mu C^\alpha_\nu + \dots
$$

We can write out the remaining correction terms as the tensor multiplied by several coefficients (add correction term if upper index, subtract correction term if lower index), with hats indicating which terms we're interested in:

$$
\nabla_\mu C^\alpha_\nu = \partial_\mu C^\alpha_\nu + C^{\hat \alpha}_\nu A - C^\alpha_{\hat \nu} B
$$

We replace the hatted indices with our dummy index $\lambda$ (we chose a new variable so as not to cause confusion with the existing variables):

$$
\nabla_\mu C^\alpha_\nu = \partial_\mu C^\alpha_\nu + C^\lambda_\nu A - C^\alpha_\lambda B
$$

Using the correction term coefficient rule described earlier for the Christoffel symbols, we recall that:

- For an upper index coefficient term we have the dummy index on the bottom and the interested index on the top
- For a lower index coefficient term we have the dummy index on top and the interested index on the bottom
- The rightmost lower term on the coefficient is always the index we're taking the covariant derivative with respect to (in our case $\mu$)

So we can figure out that $A = \Gamma^\alpha_{\lambda \mu}$, and $B = \Gamma^\lambda_{\nu \mu}$. So:

$$
\nabla_\mu C^\alpha_\nu = \partial_\mu C^\alpha_\nu + C^\lambda_\nu \Gamma^\alpha_{\lambda \mu} - C^\alpha_\lambda \Gamma^\lambda_{\nu \mu}
$$

But recall that:

$$
C^\alpha_\nu = \partial_\nu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \nu}
$$

From which we can perform index substitutions on every index to get:

$$
C^\lambda_\nu = \partial_\nu V^\lambda + V^\sigma \Gamma^\lambda_{\sigma \nu}
$$

$$
C^\alpha_\lambda = \partial_\lambda V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \lambda}
$$

```{note}
The last two were obtained by changing the indices $\alpha \to \lambda$ and $\nu \to \lambda$. Shouldn't this be illegal in tensor algebra, where we're not supposed to swap free indices in the same equation? There is a nuance here - we can swap free indices **only** if we subtitute each and every index with a corresponding different index. That, is, if you have an equation $x^i = g^{ij} b_j$, you can't simply say "I want to swap $j \to i$ and make the equation $x^i = g^{ii} b_i$". Here, you're selectively substituting $j \to i$ without making a substitution for $i$, so the equation is wrong! But you can say that "I'll rewrite the equation using different indices, substituting $i \to a$ and $j \to b$, so I have $x^a = g^{ab} b_b$". Since we swapped _every_ index with a _unique_ different index, this is acceptable.
```

We can therefore rewrite the covariant derivative of $C^\alpha_\nu$ as:

$$
\nabla_\mu C^\alpha_\nu = \partial_\mu (\partial_\nu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \nu}) + (\partial_\nu V^\lambda + V^\sigma \Gamma^\lambda_{\sigma \nu}) \Gamma^\alpha_{\lambda \mu} - (\partial_\lambda V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \lambda}) \Gamma^\lambda_{\nu \mu}
$$

Be careful! The second and third terms are just products, but the first term is a derivative, so we have to use the product rule to expand - $\partial_\mu (\partial_\nu V^\alpha + V^\sigma \Gamma^\alpha_{\sigma \nu}) = \partial_\mu \partial_\nu V^\alpha + \partial_\mu (V^\sigma \Gamma^\alpha_{\sigma \nu})$. Using that, and expanding the rest of the terms out, we get:

$$
\nabla_\mu \nabla_\nu V^\alpha = \nabla_\mu C^\alpha_\nu = \partial_\mu \partial_\nu V^\alpha + \partial_\mu (V^\sigma \Gamma^\alpha_{\sigma \nu}) + \Gamma^\alpha_{\lambda \mu} \partial_\nu V^\lambda + V^\sigma \Gamma^\lambda_{\sigma \nu} \Gamma^\alpha_{\lambda \mu} - \partial_\lambda V^\alpha \Gamma^\lambda_{\nu \mu} - V^\sigma \Gamma^\alpha_{\sigma \lambda} \Gamma^\lambda_{\nu \mu}
$$

Phew! We're almost there, just hang in there for the remaining derivation. Good news! Things are going to look simpler from this point on. We've already solved the left double covariant derivative, $\nabla_\mu \nabla_\nu V^\alpha$. The right double covariant derivative is just the left double covariant derivative with an index swap $\mu \leftrightarrow \nu$ (that means every time we see a $\mu$, we replace it with a $\nu$, and every time we see a $\nu$, we replace it with a $\mu$). So it is:

$$
\nabla_\nu \nabla_\mu V^\alpha = 
\partial_\nu \partial_\mu V^\alpha + \partial_\nu (V^\sigma \Gamma^\alpha_{\sigma \mu}) + \Gamma^\alpha_{\lambda \nu} \partial_\mu V^\lambda + V^\sigma \Gamma^\lambda_{\sigma \mu} \Gamma^\alpha_{\lambda \nu} - \partial_\lambda V^\alpha \Gamma^\lambda_{\mu \nu} - V^\sigma \Gamma^\alpha_{\sigma \lambda} \Gamma^\lambda_{\mu \nu}
$$

Now is the glorious part - when we subtract one from the other, the terms cancel each other out. Because second partial derivatives are equal no matter what order you take them, $\partial_\mu \partial_\nu = \partial_\nu \partial_\mu$, so those cancel. The last two terms are identical for both (given the symmetry of the Christoffel symbols, so they cancel as well. We're left with:

$$
\nabla_\mu \nabla_\nu V^\alpha - \nabla_\nu \nabla_\mu V^\alpha = (\partial_\mu (V^\sigma \Gamma^\alpha_{\sigma \nu}) + \Gamma^\alpha_{\lambda \mu} \partial_\nu V^\lambda - V^\sigma \Gamma^\lambda_{\sigma \nu} \Gamma^\alpha_{\lambda \mu}) - (\partial_\nu (V^\sigma \Gamma^\alpha_{\sigma \mu}) + \Gamma^\alpha_{\lambda \nu} \partial_\mu V^\lambda - V^\sigma \Gamma^\lambda_{\sigma \mu} \Gamma^\alpha_{\lambda \nu})
$$

Notice a third less obvious cancellation where $\partial_\mu (V^\sigma \Gamma^\alpha_{\sigma \nu}) = \partial_\mu V^\sigma \Gamma^\alpha_{\sigma \nu} + \partial_\mu \Gamma^\alpha_{\sigma \nu} V^\sigma$ which cancels out $\Gamma^\alpha_{\lambda \nu} \partial_\mu V^\lambda$ on the right (because dummy indices don't matter). This simplifies the expression to:

$$
\nabla_\mu \nabla_\nu V^\alpha - \nabla_\nu \nabla_\mu V^\alpha = \partial_\mu \Gamma^\alpha_{\sigma \nu} V^\sigma - \partial_\nu \Gamma^\alpha_{\sigma \mu} V^\sigma + V^\sigma \Gamma^\lambda_{\sigma \nu} \Gamma^\alpha_{\lambda \mu} - V^\sigma \Gamma^\lambda_{\sigma \mu} \Gamma^\alpha_{\lambda \nu}
$$

We can now finally (!!!) factor out the $V^\sigma$ from the expression, to get:

$$
\nabla_\mu \nabla_\nu V^\alpha - \nabla_\nu \nabla_\mu V^\alpha = V^\sigma [\partial_\mu \Gamma^\alpha_{\sigma \nu} - \partial_\nu \Gamma^\alpha_{\sigma \mu} + \Gamma^\lambda_{\sigma \nu} \Gamma^\alpha_{\lambda \mu} -  \Gamma^\lambda_{\sigma \mu} \Gamma^\alpha_{\lambda \nu}]
$$

The term in the brackets can be written as a new tensor:

$$
R^\alpha_{\sigma \mu \nu} = \partial_\mu \Gamma^\alpha_{\sigma \nu} - \partial_\nu \Gamma^\alpha_{\sigma \mu} + \Gamma^\lambda_{\sigma \nu} \Gamma^\alpha_{\lambda \mu} - \Gamma^\lambda_{\sigma \mu} \Gamma^\alpha_{\lambda \nu}
$$

The is the **Riemann curvature tensor**, and it measures how vectors diverge due to the curvature of space. It is a monster tensor - it has 256 components in 4D space, making it a $4 \times 4 \times 4 \times 4$ matrix.

To make this tensor easier to work with, we often contract it by making the 1st and 3rd indices identical, creating the **Ricci tensor**, which is defined by:

$$
R_{\sigma \nu} = R^\mu_{\sigma \mu \nu} = \partial_\mu \Gamma^\mu_{\sigma \nu} - \partial_\nu \Gamma^\mu_{\sigma \mu} + \Gamma^\lambda_{\sigma \nu} \Gamma^\mu_{\lambda \mu} - \Gamma^\lambda_{\sigma \mu} \Gamma^\mu_{\lambda \nu}
$$

We can further contract the Ricci tensor by multiplying it by the inverse metric, giving the **Ricci scalar**:

$$
R = g^{\mu \nu} R_{\mu \nu}
$$

We now have all the tensors we need to derive the ultimate equation - the **Einstein Field Equations**.

## Deriving the Einstein Field Equations

As with before, we can use the Euler-Lagrange equations and the principle of least action to obtain the Einstein Field Equations.

The action for General Relativity in empty spacetime can be generalized as:

$$
S =  \kappa \int R \sqrt{-g} ~d^4 x
$$

Here, $g = \det(g_{\mu \nu})$, $d^4x = dt\, dx\, \, dy\, dz$ and $\kappa$ is simply a proportionality constant. Note that while it describes a vacuum, that spacetime can still be curved. For example, you could say that the spacetime _outside_ of a black hole is a vacuum (because there is no matter), but the spacetime would still be curved (because the black hole warps its surrounding spacetime, even if we only include the spacetime around a black hole and not the black hole itself).

The action can be derived from one of two ways. It can be shown to be correct through dimensional analysis - the units on the left and right side of the equation match up. However, there is also a more intuitive way to illustrate this.

The action must be composed of scalar-valued functions (or scalars), as it is an integral over all spacetime, and multidimensional integrals can only take scalar-valued functions or scalars to integrate over (see for yourself that this must be true). But it must also include information about the curvature of spacetime and spacetime itself. As we know, all the information about the curvature of spacetime is captured in the Riemann tensor. But the Riemann tensor is not a scalar-valued function - it is instead a (rank-4) tensor-valued function. So we have to find a way to get a scalar from the Riemann tensor. We already know of a scalar that can be formed from the Riemann tensor - the Ricci scalar. We want to add an additional proportionality constant in front, which is also a scalar, because we'd expect to see constants in our final field equations as well. We can always set the constant $\kappa = 1$ if we find it's not necessary later. Since both the curvature of spacetime and the matter and energy present within spacetime should act on the metric, we add them together. Finally, since spacetime is often curved, we need a factor of $\sqrt{-g}$ to make sure the volume element $d^4 x$ is the same size no matter what coordinates or what spacetime we use. So from there, we obtain the action.

From our action, we know that the Lagrangian is:

$$
\mathscr{L} = \kappa R \sqrt{-g}
$$

We will use the Euler-Lagrange field equations, a slight variation of the original Euler-Lagrange equations we derived:

$$
\frac{\partial \mathscr{L}}{\partial \varphi} - \frac{\partial}{\partial x^\beta} \left( \frac{\partial \mathscr{L}}{\partial (\partial_\beta \varphi)}\right) = 0
$$

Here, $\varphi$ is the field, and in our case, the field is the metric tensor field $g_{\mu \nu} (x^\beta)$, thus $\varphi = g_{\mu \nu}$, so if we substitute, we have:

$$
\frac{\partial \mathscr{L}}{\partial g_{\mu \nu}} - \frac{\partial}{\partial x^\beta} \left( \frac{\partial \mathscr{L}}{\partial (\partial_\beta g_{\mu \nu})}\right) = 0
$$

Note that we use the curly L for the Lagrangian because it is not technically the Lagrangian per se, but the field equivalent of the Lagrangian, known as the **Lagrangian density**. But we'll just call it the Lagrangian here. The distinction between the Lagrangian density and the Lagrangian isn't important here; the practical difference here is that the Lagrangian uses the typical Euler-Lagrange equation, while the Lagrangian density uses the Euler-Lagrange _field_ equation.

We notice in the Euler-Lagrange field equations that the second term contains the partial derivative with respect to the derivatives of the metric. But note that in our Lagrangian, there are no terms that take the derivative of the metric as input. So the second term vanishes, and we are left with a comparatively easier equation:

$$
\frac{\partial \mathscr{L}}{\partial g_{\mu \nu}} = 0
$$

Before we take this derivative, let us first rewrite our Lagrangian as:

$$
\mathscr{L} = \kappa g^{\mu \nu} R_{\mu \nu} \sqrt{-g}
$$

Now, we can finally take the derivative with respect to the metric:

$$
\frac{\partial \mathscr{L}}{\partial g_{\mu \nu}} = \kappa \frac{\partial}{\partial g_{\mu \nu}} \left(g^{\mu \nu} R_{\mu \nu} \sqrt{-g}\right) = 0
$$

We immediately run into a hurdle! The Lagrangian has three multiplied functions, the inverse metric, the Ricci tensor, and the square root of the determinant of the metric. How do we differentiate a triple product? We can use the triple product rule:

$$
(f \cdot g \cdot h)' = f'gh + fg'h + fgh'
$$

Another problem! How do we differentiate the inverse metric with respect to the metric? The answer comes from a matrix calculus identity, which, translated to tensor notation, is this:

$$
\frac{\partial g^{\mu \nu}}{\partial g_{\mu \nu}} = -g^{\mu \nu} g^{\mu \nu}
$$

Final problem! How do we differentiate the determinant of the metric with respect to the metric? This answer also comes from a matrix calculus identity, which is this:

$$
\frac{\partial \det(g_{\mu \nu})}{\partial g_{\mu \nu}} = \frac{\partial g}{\partial g_{\mu \nu}} = g g^{\mu \nu}
$$

With all this in mind, we can finally compute the derivatives. The first term of the derivative is just the derivative of the inverse metric, multiplied by the other two terms in the triple product. The derivative of the Ricci tensor with respect to the metric is zero (it doesn't depend on the metric), so the second term of the derivative of the triple product is zero. In the third term, we need to use the chain rule to differentiate the square root. The final result is this:

$$
-\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} - \kappa \frac{1}{2\sqrt{-g}}g g^{\mu \nu} g^{\mu \nu} R_{\mu \nu} = 0
$$

We can clean this up a bit. First, we can multiply both sides by $-1$, to get:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} + \kappa \frac{1}{2\sqrt{-g}}g g^{\mu \nu} g^{\mu \nu} R_{\mu \nu} = 0
$$

Then, we can multiply both sides of the equation by $\frac{1}{\sqrt{-g}}$, which results in:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} - \kappa \frac{1}{2} g^{\mu \nu} g^{\mu \nu} R_{\mu \nu} = 0 
$$

We remember that $R = g^{\mu \nu} R_{\mu \nu}$, so we can substitute it in:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} - \kappa \frac{1}{2} g^{\mu \nu} R = 0
$$

We want to get rid of the double $g^{\mu \nu}$ terms, so we can multiply both sides of the equation by $g_{\mu \nu} g_{\mu \nu}$, to get:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} g_{\mu \nu} g_{\mu \nu} - \kappa \frac{1}{2} g^{\mu \nu} g_{\mu \nu} g_{\mu \nu} R = 0
$$

The inverse metric contracts with the metric:

$$
g^{\mu \nu} g^{\mu \nu} g_{\mu \nu} g_{\mu \nu} = g^{\mu \nu} g_{\mu \nu} = \delta_\mu^\mu = \sum_{i = 0}^3 1 = 4
$$

So this entire expression becomes:

$$
\kappa 4R_{\mu \nu} - \kappa 2 R g_{\mu \nu} = 0
$$

But we can divide by 4 right after as the right-hand side is zero, to yield:

$$
\kappa R_{\mu \nu} - \kappa \frac{1}{2} R g_{\mu \nu} = 0
$$

We can factor out the constant:

$$
\kappa \left(R_{\mu \nu} - \frac{1}{2} R g_{\mu \nu}\right) = 0
$$

The term inside the parentheses is called the **Einstein tensor** and describes the curvature and characteristics of spacetime:

$$
G_{\mu \nu} = R_{\mu \nu} - \frac{1}{2} R g_{\mu \nu}
$$

In vacuum, the equation we just derived is the Einstein Field Equation:

$$
G_{\mu \nu} = 0
$$

Now, there is matter and energy within space, then we use a modified action, where $\mathcal{M}$ is the contribution to the action of the gravitating matter and energy:

$$
S = \int (\kappa R -\mathcal{M} )\sqrt{-g}~d^4x
$$

So the Lagrangian is:

$$
\mathscr{L} =  (\kappa R -\mathcal{M} )\sqrt{-g}
$$

Using the Euler-Lagrange field equations, this becomes:

$$
-\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} \sqrt{-g} - \kappa \frac{1}{2\sqrt{-g}}g g^{\mu \nu} g^{\mu \nu} R_{\mu \nu} - \frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} \sqrt{-g} + \frac{1}{2 \sqrt{-g}} gg^{\mu \nu} \mathcal{M} = 0
$$

First, we multiply by $-1$:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu} \sqrt{-g} + \kappa \frac{1}{2\sqrt{-g}}g g^{\mu \nu} g^{\mu \nu} R_{\mu \nu} + \frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} \sqrt{-g} - \frac{1}{2 \sqrt{-g}} gg^{\mu \nu} \mathcal{M} = 0
$$

Then we multiply by $\frac{1}{\sqrt{-g}}$:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu}  - \kappa \frac{1}{2} g^{\mu \nu} g^{\mu \nu} R_{\mu \nu} + \frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} \sqrt{-g} + \frac{1}{2} g^{\mu \nu} \mathcal{M} = 0
$$

We use the definition $R = g^{\mu \nu} R_{\mu \nu}$:

$$
\kappa R_{\mu \nu} g^{\mu \nu} g^{\mu \nu}  - \kappa \frac{1}{2} g^{\mu \nu} R + \frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} \sqrt{-g} + \frac{1}{2} g^{\mu \nu} \mathcal{M} = 0
$$

And by contraction with $g_{\mu \nu} g_{\mu \nu}$ we have:

$$
\kappa R_{\mu \nu}   - \kappa \frac{1}{2} g_{\mu \nu} R + \frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} g_{\mu \nu} g_{\mu \nu} \sqrt{-g} + \frac{1}{2} g_{\mu \nu} \mathcal{M} = 0
$$

We can move the second and third terms, which depend on $\mathcal{M}$ to the right of the equation:

$$
\kappa R_{\mu \nu}   - \kappa \frac{1}{2} g_{\mu \nu} R = -\frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} g_{\mu \nu} g_{\mu \nu} \sqrt{-g} - \frac{1}{2} g_{\mu \nu} \mathcal{M}
$$

And factor the left-hand side of the equation:

$$
\kappa \left(R_{\mu \nu} - \frac{1}{2} g_{\mu \nu} R\right) = -\frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} g_{\mu \nu} g_{\mu \nu} \sqrt{-g} - \frac{1}{2} g_{\mu \nu} \mathcal{M}
$$

We recognize our familiar friend, the Einstein tensor, on the left. If we define a tensor $T_{\mu \nu}$ to equal the right-hand side:

$$
T_{\mu \nu} = -\frac{\partial \mathcal{M}}{\partial g_{\mu \nu}} g_{\mu \nu} g_{\mu \nu} \sqrt{-g} - \frac{1}{2} g_{\mu \nu} \mathcal{M}
$$

Then we have the complete field equations:

$$
G_{\mu \nu} = \frac{1}{\kappa} T_{\mu \nu}
$$

The tensor $T_{\mu \nu}$ on the right is called the **stress-energy tensor**. There is no one "general formula" for the stress-energy tensor; we can define different expressions for the stress-energy tensor depending on what matter, energy, momentum, and stresses are present within the region of spacetime being analyzed, with the only real rule being that the resulting expression follow tensor algebra conventions (e.g. same number of free indices on both sides of the equation). One of the simplest expressions for a stress-energy tensor is:

$$
T_{\mu \nu} = \rho U_\mu U_\nu
$$

Here, $U_\mu$ and $U_\nu$ are four-velocities, as shown before in special relativity, and $\rho$ is the density of the gravitating matter-energy.

But back to the equation:

$$
G_{\mu \nu} = \frac{1}{\kappa} T_{\mu \nu}
$$

What is the constant $\kappa$? We will need to use the Newtonian limit of relativity to answer that question. When gravity is weak, and objects are moving much slower than the speed of light, we expect that we can recover Poisson's equation from the field equation. We will cover that in the following derivation.

Given that four-velocity is defined as $U_\mu = (\gamma c, \gamma v)$, and we defined objects to be moving much slower than the speed of light, the 0th component of four-velocity, so slow that their speeds are effectively zero, we can effectively say that $\gamma \approx 1$ and $U_\mu \approx (c, 0, 0, 0)$. Therefore, the component $T_{00}$ of the stress-energy tensor is just $\rho c^2$, and all other components of the stress-energy tensor are zero. It can also be shown that the component $G_{00}$ of the Einstein tensor is equal to $\frac{2}{c^2} \nabla^2 \phi$. Therefore:

$$
\frac{2}{c^2} \nabla^2 \phi = \frac{1}{\kappa} \rho c^2
$$
$$
\nabla^2 \phi = \frac{1}{2\kappa} \rho c^4
$$

Compare this with Poisson's equation:

$$
\nabla^2 \phi = 4\pi G\rho
$$

This means that:

$$
\frac{1}{2\kappa} \rho c^4 = 4\pi G\rho
$$

$$
\kappa = \frac{c^4}{8\pi G}
$$

Remember the field equations:

$$
G_{\mu \nu} = \frac{1}{\kappa} T_{\mu \nu}
$$

Now knowing the value of $\kappa$, we need only substitute to get:

$$
G_{\mu \nu} = \frac{8\pi G}{c^4} T_{\mu \nu}
$$

This elegant equation is the apotheosis of general relativity, and it rightfully deserves its place as one of the most famous equations in all of physics.

Note that sometimes, there is an alternate form of the Einstein Field Equations that is easier to solve. To do this, we expand out the full equations:

$$
R_{\mu \nu} - \frac{1}{2} Rg_{\mu \nu} = \frac{8\pi G}{c^4} T_{\mu \nu}
$$

We now multiply both sides by $g^{\mu \nu}$:

$$
g^{\mu \nu} R_{\mu \nu} - \frac{1}{2} R g_{\mu \nu} g^{\mu \nu} = \frac{8\pi G}{c^4} T_{\mu \nu} g^{\mu \nu}
$$

Using the fact that $g_{\mu \nu} = g^{\mu \nu} = 4$ and $T_{\mu \nu} g^{\mu \nu} = T$, this becomes:

$$
R - \frac{1}{2} (4R) = \frac{8\pi G}{c^4} T \Rightarrow -R = \frac{8\pi G}{c^4} T
$$

So, substituting back into the original EFEs:

$$
R_{\mu \nu} +\frac{1}{2} \frac{8\pi G}{c^4} T g_{\mu \nu} = \frac{8\pi G}{c^4} T_{\mu \nu}
$$

$$
R_{\mu \nu} = \frac{8\pi G}{c^4} \left(T_{\mu \nu} - \frac{1}{2} g_{\mu \nu} T \right)
$$

This makes the field equations simpler for vacuum solutions, where $T_{\mu \nu} = T = 0$. Thus, the equations just become:

$$
R_{\mu \nu} = 0
$$

which is still incredibly hard to solve, but more manageable than the typical case.

Finally, there is one more important fact about the field equations: taking the covariant derivatives of both sides is equal to zero. This means that:

$$
\nabla_\mu T_{\mu \nu} = 0
$$

This expression may look familiar if we recall that the covariant derivative with a repeated index is just the divergence of a field. What this is saying is that the total change in matter-energy flux in all of spacetime is zero - essentially, the conservation of energy.