---

# <u> **The Naïve, Barnes-Hut, and Fast Multipole Methods**  </u>
### *Implementation, Analysis, and Evaluation of different Algorithms for $N$-Body Simulations*  

**Author:** Hugo Robijns
**Date:** April 2025  

---

### **Introduction:**

This report aims to analyse and implement two of these algorithms, the Barnes-Hut and fast multipole method (FMM) techniques, alongside the brute-force $\mathcal{O}(N^2)$ approach, in order to compare their performance and complexities. For the purpose of comparison, we will base the analysis around the simple case of 2D simulation of point masses under the influence of gravity.  


In [2]:
from IPython.display import Image, Video
from IPython.core.display import HTML

_Figure 1: a simulation of X_

---
## **1. Analysis of Algorithms**  


### *1.1 Naïve pairwise calculation, $\mathcal{O}(N^2)$*

By Newton's Law of Gravitation, the gravitational force experienced by particle $i$ (mass $m_i$, position $\vec{r}_i$) due to a particle $j$ (mass $m_j$, position $\vec{r}_j$) is given by:
$$ \vec{F}_{ij} = \frac{Gm_im_j}{|\vec{r}_j-\vec{r}_i|^2}\cdot \frac{\vec{r}_j-\vec{r}_i}{|\vec{r}_j-\vec{r}_i|} = \frac{Gm_im_j (\vec{r}_j-\vec{r}_i)}{|\vec{r}_j-\vec{r}_i|^3} $$
&emsp; &emsp; therefore for an $N$-body system, the equations of motion become:
$$m_i \frac{\text{d}^2\vec{r}_i}{\text{d}t^2}  = \sum_{j=1, j\neq i}^{N} \frac{Gm_im_j (\vec{r}_j-\vec{r}_i)}{|\vec{r}_j-\vec{r}_i|^3}$$ 
It is clear to see that in order to simulate such a system directly, we must carry out $N-1$ calculations for each of the $N$ bodies (or half of the bodies, if exploiting Newton's $3^\text{rd}$ law), leading to a theoretical complexity of $\mathcal{O}(N^2)$. Note also that the force between two bodies diverges as they get close to each other: this requires the inclusion of a softening parameter $\epsilon$ which will be discussed further in section X. Once the net force and therefore acceleration on each body has been calculated, its position and velocity can be updated through integration.

### *1.2 Barnes-Hut Algorithm, $\mathcal{O}(N\log N)$*  
The Barnes-Hut Algorithm, introduced by Josh Barnes and Piet Hut in 1986 [CITE] attempts to reduce complexity by treating distant clusters of bodies as a single mass. It does this through the use of tree data structures, specifically an quadtree in 2D. 

##### 1.2.1 Tree construction
Beginning with a topmost node that represents the whole simulation space, an octree is formed by subdividing the space into 4 quadrants recursively until each body lies in its own leaf node, a node with no children (see Figure 2). Each node, both internal and leaf nodes, contains information about the total mass and centre of mass position of the node. Formally, they are constructed by inserting bodies into the data structure one after the other:

1. if the node into which the body is being placed is an empty leaf node, the body is simply placed there and the total mass and centre of mass position of the node are trivially updated.

2. if the node is an internal node (i.e. a node with children), the total mass and centre of mass position of the node are updated, and the body moves down a level in the tree to the appropriate octant. 

3. if the node is a leaf node which already contains a body, the total mass and centre of mass position of the node is updated, and the node is subdivided into 4 quadrants. Then, both bodies are placed into their appropriate octants as above, which may require further subdivision if they end up in the same octant again.   


In [2]:
HTML('<div style="text-align: center;"><img src="figures/quadtree_plot.png" width="800"/></div>')

_Figure 2: a visualisation of the quadtree constructed for 10 bodies in a 2D simulation. The square grid-lines seen in the left plot are the spatial borders of the leaf nodes, the smallest subdivisions of the tree which have no children of themselves, and therefore contain either 0 or 1 bodies. The figure on the right shows a schematic of the quadtree: hollow markers indicate empty nodes, and solid markers indicate nodes containing bodies._

##### 1.2.2 Force calculation
To calculate forces, the tree is recursively traversed from the root for each body:

1. If the node is a leaf node and contains a body that is not itself, then the force is calculated in the usual way.

2. If the node is an internal node, the ratio: $$\frac{\text{size of node in space (i.e. side length of the cube)}}{\text{distance from body to centre of mass of node}} \equiv \frac{s}{d} $$ is calculated. If this is less than a threshold value $\theta$ (usually taken to be $\sim$ 1), then the node is seen to contain bodies that are closely packed and far away, and as a result can be treated as one single body, ignoring its children nodes. The force on the body is calculated in the usual way, using the centre of mass position and total mass of the node.

3. If $s/d > \theta$, the process recursively continues for each of the children of the node, until reaching the leaf node or a node which satisfies the condition.

These forces are then summed to give an overall acceleration for the body, allowing positions and velocities to be updated through integration as in the the naïve approach. It is clear to see that $\theta$ determines the accuracy of the simulation: a larger $\theta$ leads to fewer calculations but a more approximate solution, whilst a smaller $\theta$ may take longer but provide a more reliable solution. In the limit $\theta\rightarrow0$ the algorithm reduces to the brute-force approach, since no internal nodes are treated as single bodies.

##### 1.2.3 Complexity
For a _balanced_ quadtree with $h$ levels, the bottom level has $4^h$ nodes. Therefore:
$$ 4^h \geq N \Rightarrow h \sim \log_4 N = \frac{\log_2 N}{\log_2 4} = \frac{1}{2} \log_2 N \text{ or } \mathcal{O}(\log N)$$
Insertion of each of the $N$ bodies requires travelling the $h$ levels of the tree from root to leaf, therefore the overall complexity of constructing a quadtree is given by $\mathcal{O}(N\log N)$. 

The complexity of the force calculation step is more complicated. A typical reasoning (as laid out in the original paper) follows as such: again assuming a balanced quadtree (i.e. homogoneously distributed mass), consider quadrupling the number of bodies. This is equivalent to adjoining 3 root nodes to the existing one. To calculate the force on a body situated in the 'original tree', these additional bodies will contribue a fixed number of extra calculations, which is a function of $\theta$ but not $N$. To imagine this, consider $\theta \gg 1$: this will simply add 3 force calculations, since the centre of mass of all three of the the 'additional root nodes' are far enough away, and they can be treated as a single body. Reducing $\theta$ to more reasonable values will increase the number of children of these 'additional nodes' that need to be explored, but assuming $N$ is large enough (and this further exploration is therefore not reaching leaf nodes), then the number of additional calculations does not depend on $N$. Since the number of force calculations is increasing by a additive constant (a function of $\theta$) whilst $N$ is increasing by a multiplicative factor, the complexity goes as $\mathcal{O}(\log N)$ for a single body, or $\mathcal{O}(N\log N)$ for $N$ bodies. 

Summing the tree construction and force calculation complexities leads to overall complexity $\mathcal{O}(N\log N)$, a major improvement on the naive $\mathcal{O}(n^2)$.


### *1.3 Fast Multipole Method (FMM), $\mathcal{O}(N)$* 
The FMM algorithm was introduced by Leslie Greengard and Vladimir Rokhlin Jr. in the 1980s [CITE]. It uses similar hierarchical decomposition methods as the Barnes-Hut algorithm, but instead works with more accurate local and multipole expansions of the potential rather than simple information on the centre of mass of a node. Re-use of computation by traversing the tree in both directions allows complexity to be reduced to $\mathcal{O}(N)$ for arbitrary precision. 

A high-level overview of the algorithm is as follows:

1. A quadtree is constructed as in the Barnes-Hut method, however the approach need not be as rigorous, since the algorithm simply requires leaf nodes to house a small ($\mathcal{O}(1)$) number of bodies.

2. The tree is traversed from bottom to top, calculating the multipole expansion for the potential produced by each node at a point far away.

3. The tree is traversed from top to bottom, calculating the local expansion, or the potential felt at each node due to other nodes far away.

4. At the bottom 'leaf' level, direct pair-wise interractions are calculated with nearest neigbours - since in step 1. we ensured each leaf node had a small number of bodies, this is inexpensive.

##### 1.3.1 Multipole expansions and translations


##### 1.3.1 Quadtree construction
The quadtree used in the Barnes-Hut implementation was adaptive, i.e. could respond to non-uniform distribution of mass and therefore had leaf cells of different sizes. Since this approach simply requires leaf nodes to have a small number of particles, we can make this step very simple and just make a balanced quadtree where the depth is $h=\text{round}(\log_4 N)$.

<center> **IMAGE** </center>

Since we simply need to iterate through all the $N$ bodies and place them in their respective leaf node, irrespective of if this node already houses a body, the complexity of this is $\mathcal{O}(N)$.

##### 1.3.2 Bottom-up pass
Starting from the leaf nodes, the multipole expansion for each node is calculated using analytical expressions.
$$...$$

Going up the tree, the expansion of each node can be formed through aggregating the expansion of its children, using a 'multiple-to-multipole translation':
$$ ...$$

Each node therefore stores its multipole expansion coefficients for later use. This step is $\mathcal{O}(N)$ since it is essentially the (weighted) summation of $N$ numbers. Further aggregation to calculate the expansion for higher nodes is also linear, leading to overall linear complexity.

##### 1.3.3 Top-down pass
Working downards from the top of the tree, we calculate the local (Taylor) expansion of the potential experienced felt at each node. Firstly, the local expansion of the parent node (if it exists) is shifted to the centre of the new node using a 'local-to-local translation':

$$...$$

Then, the multipole expansion of non-adjacent (far away) nodes is translated into a local expansion at the location of the node, in a 'multipole-to-local translation':

$$...$$


<center> **IMAGE** </center>


Finally, if we have reached the bottom leaf level, we calculate the direct contributions from the few remaining bodies.

##### 1.3.2 Complexity

---

## **2. Implementation and Performance**

This section discusses how each method was implemented, including:  
- **Data structures used (e.g., trees, arrays, linked lists)**  
- **Optimization strategies (e.g., parallelization, caching, cutoff thresholds)**  
- **Performance metrics (e.g., execution time, memory usage, error analysis)**  

A performance comparison will be provided using **timing benchmarks** for different N.  

---

## **3. Results and Discussion**


### _3.1 Complexities as a function of N_
- showing number of calculations as a function of N
- showing time as a function of N
discussion

### _3.1 Accuracy (conservation of energy, linear momentum and angular momentum)_

### _3.2 Extensions_
##### 3.2.1 3D
##### 3.2.2 Parallelisation
---

## Bibliography  

1. J. Barnes and P. Hut, *A Hierarchical O(N log N) Force-Calculation Algorithm*, Nature, 1986.  
2. L. Greengard and V. Rokhlin, *A Fast Algorithm for Particle Simulations*, Journal of Computational Physics, 1987.  
3. D. J. Griffiths, *Introduction to Electrodynamics*, Cambridge University Press, 2017.  