REal-space VOid Locations from surVEy Reconstruction
Repository containing code to:
- reconstruct real space positions from redshift-space tracer data, by subtracting RSD through FFT-based reconstruction (optional)
- apply a void-finding algorithm to create catalogue of voids in these tracers
The tracers used will normally be galaxies from a redshift survey, but could also be halos or dark matter particles from a simulation box.
Two different void-finding routines are provided. Both operate by identifying minima of the tracer density field, but differ in the method of reconstructing the tracer density.
ZOBOVmethod is based on Neyrinck 2008 (arXiv:0712.3049) and uses Voronoi tessellation to estimate the local tracer density from the discrete tracer input.
voxelmethod estimates the density field using a particle-mesh interpolation of tracer positions on a grid. For survey data this is normalized by the values for the random catalogue characterizing the survey window function and selection effects. This method is intended to make more efficient use of fragmented survey data, but has not been tested to publication-standard yet – if you are interested in helping with this, please get in touch!
Both methods then use a modified version of the original
ZOBOV watershed algorithm to grow voids around these
minima, and employ additional post-processing and quality control steps to generate the final output catalogues.
- python (default Python3, but for now should still work with 2.7 as well)
Some earlier versions of
scipy will fail due to changes in functionality of some methods
scipy.spatial.cKDTree). The code has only been tested with the stated versions of the
other packages: other versions may or may not work!
Installation and running:
To install and run:
- if you don't have MPI compilers/headers, in the Makefile change the line
make -C src allto
make -C src all_nompi
- in the top-level directory, do
make clean, then
- edit input parameters as required in parameters/params.py (for full parameter list and instructions, see parameters/default_params.py)
python revolver.py --par parameters/params.py
Input tracer data can be provided in FITS files, native
numpy .npy files, or plain ASCII files. At a minimum
these files should contain tracer positions, either as (RA, Dec, z) or Cartesian (X, Y, Z). Additional information on
systematics or FKP weights for survey data can also be provided. For FITS files these are assumed to be in data fields
as for BOSS/eBOSS.
numpy arrays and ASCII files they can be given as additional columns;
see the comments in parameters/default_params.py for more information about the file formatting.
For uniform simulation data in a cubic box, only the tracer data is required.
For survey-like data on the sky, an additional input file containing the randoms characterising the survey visibility mask (as used for galaxy clustering studies) is required for reconstruction and voxel-based void-finding. If you are using data from a survey, this file should be easy to obtain. If not (e.g. analysing mock survey data from a simulation), you can generate your own randoms file. Make sure it has a much higher number density (>=50x) than the tracers.
For operation on survey-like data on the sky, the
ZOBOV method needs a binary survey mask file (in
FITS format) combining the survey geometry, holes, missing pixels etc. Example masks for the BOSS DR12 public data
releases are provided with this code. If no mask file is provided, an approximate one will be generated on the fly from
the survey data, but this may result in less accurate results. If using
ZOBOV void-finding as a standalone (i.e.
no reconstruction and no
voxel void-finding), then no randoms catalogue is required.
MPI and parallel processing:
ZOBOV-based void-finding, there is an option to perform the slow tessellation step in
separate chunks run in parallel, achieved using MPI (set
use_mpi = True in
the params.py file). If you have many (i.e. >~10) CPUs available, this can
be faster than doing it in one shot. If not, single-shot tessellation is usually faster.
If your data are in a simulation box with periodic boundary conditions, the code will always break the tessellation
into chunks (single-shot is not available in this case). If
use_mpi is False, these chunks will be run serially.
It may then be more efficient to reduce the value of input parameter
zobov_box_div (integer, minimum 2,
Separately, the FFTs used in reconstruction can be performed over multiple CPUs (without using MPI). This is always
beneficial, more is faster. Set
nthreads in the parameter file to the number of cores available.
Log files for various steps are generated and stored in in the
log/ subfolder in the output directory. Check
these logs (even if the code appears to have produced successful output):
- if you see warnings about the number of guard particles, try increasing the value of
- if you see warnings about cells with zero volume, check your input data for bad/duplicate tracer positions (these will cause the tessellation to fail)
The following people contributed to the concept, development and testing of this code in various ways:
- Hans Winther
- Slađana Radinovic
- Julian Bautista
- Paul Carter
- Will Percival
- Shaun Hotchkiss