Skip to content
Fast N-body gravitational force and potential in Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


pykdgrav is a package that implements the Barnes-Hut method for computing the combined gravitational field and/or potential of N particles. We implement a kd-tree as a numba jitclass to achieve much higher peformance than the equivalent pure Python implementation, without writing a single line of C or Cython.

Despite the similar name, this project has no affiliation with the N-body code pkdgrav, however it is where I got the idea to use a kd-tree instead of an octree.


First let's import the stuff we want and generate some particle positions and masses

import numpy as np
from pykdgrav import Accel, Potential
from pykdgrav.bruteforce import *
x = np.random.rand(10**5,3) # positions randomly sampled in the unit cube
m = np.random.rand(10**5) # masses

Now let's compare the runtimes of the tree methods and brute force methods for computing the potential and acceleration. Note that all functions are jit-compiled by Numba, so the brute force in particular will run at C-like speeds.

%time phi_tree = Potential(x,m)
%time a_tree = Accel(x,m)
%time phi_brute = BruteForcePotential(x,m)
%time a_brute = BruteForceAccel(x,m)
CPU times: user 2.26 s, sys: 62 ms, total: 2.32 s
Wall time: 2.32 s
CPU times: user 3.94 s, sys: 66 ms, total: 4.01 s
Wall time: 4.01 s
CPU times: user 29.5 s, sys: 203 ms, total: 29.7 s
Wall time: 29.6 s
CPU times: user 1min 2s, sys: 274 ms, total: 1min 2s
Wall time: 1min 2s

pykdgrav also supports OpenMP multithreading, but no support for higher parallelism is implemented nor planned. We can make it even faster by running in parallel (here on a dual-core laptop):

%time a_tree = Accel(x,m,parallel=True)
CPU times: user 5.38 s, sys: 83.8 ms, total: 5.46 s
Wall time: 2.18 s

Nice, basically perfect scaling.

The treecode will almost always be faster than brute force for particle counts greater than ~10000. Below is a tougher benchmark for more realistic problem, run on a single core on my laptop. The particles were arranged in a Plummer distribution and an opening angle of 0.7 was used instead of the default 1: Benchmark

The method is approximate, using a Barnes-Hut opening angle of 1 by default; we can check the RMS force error here:

delta_a = np.sum((a_brute-a_tree)**2,axis=1)
amag = np.sum(a_brute**2,axis=1)
print("RMS force error: %g"%np.sqrt(np.average(delta_a/amag)))
RMS force error: 0.0345507

We can improve the accuracy by choosing a smaller theta:

a_tree = Accel(x,m,parallel=True, theta=0.7)
delta_a = np.sum((a_brute-a_tree)**2,axis=1)
amag = np.sum(a_brute**2,axis=1)
print("RMS force error: %g"%np.sqrt(np.average(delta_a/amag)))
RMS force error: 0.0123373

Planned Features

  • Greater OpenMP parallelism, e.g. in the tree-build algorithm.
  • Support for computing approximate correlation and structure functions.
  • Gravitational softening with abitrary kernels.

Stay tuned!

You can’t perform that action at this time.