The scripts described below are wrappers of the Corrfunc code (Sinha & Garrison 2017), used to create full-survey and jackknife correlation functions. The former are used in the computation of the Gaussian covariance matrices, and the latter allow for determination of the shot-noise rescaling parameter, if the JACKKNIFE mode is used. If the correlation function is required to be computed in a different manner, user-input correlation functions can simply replace the output of these codes, with the file-types described in file-inputs
.
To compute the covariance matrix estimates Ĉab we require some estimate of the correlation function. Here, we compute the full-survey correlation function with a specified binning using Corrfunc. Specialized routines are provided for periodic and aperiodic data-set, and each can compute either a single correlation function or the three non-trivial correlation functions ({ξ11(r, μ), ξ12(r, μ), ξ22(r, μ)}) from a pair of input fields. Both scripts require a correlation function binning file, such as created by write-binning-file
.
In the periodic case, we use the standard estimator main-code
.
For the aperiodic case, the estimations are computed via the Landy-Szalay (1993) estimator, using
The scripts output a (set of) correlation function(s) in a supplied directory. These give estimates of the correlation function averaged over some r, μ bin. Note that the binned correlation function estimates ξ̂aXY cannot simply placed at the bin-centers and interpolated to give a smooth ξXY(r, μ) estimate; within the main RascalC code we use an iterative approach to convert these estimates into interpolation points for the smooth ξXY(r, μ).
Usage
For analysis of a periodic box (e.g. from an N-body simulation output):
python python/xi_estimator_periodic.py {GALAXY_FILE} {RADIAL_BIN_FILE} {BOXSIZE} {MU_MAX} {N_MU_BINS} {NTHREADS} {OUTPUT_DIR} [{GALAXY_FILE_2}]
For an analysis of an aperiodic data-set (e.g. mock galaxy catalogs or observational data):
python python/xi_estimator_aperiodic.py {GALAXY_FILE} {RANDOM_FILE_DR} {RANDOM_FILE_RR} {RADIAL_BIN_FILE} {MU_MAX} {N_MU_BINS} {NTHREADS} {OUTPUT_DIR} [{GALAXY_FILE_2} {RANDOM_FILE_2_DR} {RANDOM_FILE_2_RR}] [{RR_counts_11}] [{RR_counts_12} {RR_counts_22}]
NB: If a second galaxy (and random) file is specified, the script will compute all three non-trivial (cross-)correlations between the two fields, giving a runtime of ∼ 3 times that of the single-field case. The two fields should be distinct to avoid issues with double counting. If this is not specified, the script will simply compute the auto-correlation function of the single set of galaxies.
Input Parameters
- {GALAXY_FILE}, {GALAXY_FILE_2}: Input ASCII file(s) containing galaxy positions and weights in {x,y,z,weight,jackknife_ID} format such as that created with the
pre-processing
scripts. (Jackknives are not used in this script and may be omitted). This should be in.csv
,.txt
or.dat
format with space-separated columns. For periodic computation, weights are set to unity if not included. - (Aperiodic Only): {RANDOM_FILE_DR}, {RANDOM_FILE_2_DR}: Input ASCII file containing random particle positions and weights to be used for DR pair counting (with filetype as for the galaxy files).
- (Aperiodic Only): {RANDOM_FILE_RR}, {RANDOM_FILE_2_RR}: Input ASCII file containing random particle positions and weights to be used for RR pair counting (with filetype as for the galaxy files). NB: If pre-computed RR pair counts are specified only the length of the RR random file is used by the code (for normalization).
- {RADIAL_BIN_FILE}: ASCII file specifying the radial bins for ξ(r, μ), as described in
file-inputs
. This can be user-defined or created by thewrite-binning-file
scripts. NB: This bin-file specifies the bins for the correlation function, which may be distinct from the covariance-matrix bins. In particular, the lowest bin should extend to r = 0. - {MU_MAX}: Maximum μ = cos θ used in the angular binning.
- {N_MU_BINS}: Number of angular bins used in the range [0, μmax].
- {NTHREADS}: Number of CPU threads to use for pair counting parallelization.
- {OUTPUT_DIR}: Directory in which to house the correlation functions. This will be created if not in existence.
(Optional) {RR_counts_XY}: Pre-computed RR pair counts between fields X and Y. These should be in the format described in
file-inputs
, and must use the same number of radial and angular bins as specified above. If not specified, these are recomputed by the code. Since the full correlation function typically uses a different binning to the output covariance matrix, we typically cannot use the pair counts computed injackknife-weights
and must recompute them. In addition, these should be normalized by the squared sum of weights (∑iwi)2 where i runs across all random particles in the dataset; this can be achieved with NORMED set to 1 whileRR_counts
. This is now also supported for single-field analyses, the command line options then shall be:python python/xi_estimator_aperiodic.py {GALAXY_FILE} {RANDOM_FILE_DR} {RANDOM_FILE_RR} {RADIAL_BIN_FILE} {MU_MAX} {N_MU_BINS} {NTHREADS} {OUTPUT_DIR} {RR_counts_11}
Output Files
ASCII files are created specifying the correlation function in the file-format given in file-inputs
. The filename has the format xi_n{N}_m{M}_[periodic]_{INDEX}.dat
, where N and M specify the number of radial and angular bins respectively. INDEX specifies the correlation function type, where 11 = field 1 auto-correlation, 22 = field 2 auto-correlation, 12 = cross-correlation of fields 1 and 2, and the string 'periodic' is included if the data were created assuming a periodic simulation. The first and second lines of the .dat
file list the radial and angular bin centers, then each subsequent line lists the ξ(r, μ) estimate, with the column specifying the μ bin and the row specifying the r bin. This is read automatically by the main C++ code.
NB: The code also prints the number of galaxies in each dataset to the terminal, Ngal. This quantity is important for later normalization of the C++ code.
Having computed the correlation function estimates ξ(r, μ), we can simply compute the corresponding (even) Legendre multipoles ξℓ(r), via the standard definition. Note that these are not used directly by the RascalC code, but they may be useful for the user. These are simply computed via:
python python/convert_xi_to_multipoles.py {INFILE} {MAX_L} {OUTFILE}
Input Parameters
- {INFILE}: Filename of the correlation function file computed by the above script.
- {MAX_L}: Maximum Legendre multipole required (must be even). The script will compute all even Legendre multipoles of the correlation function up to this value.
- {OUTFILE}: Name of the output file containing the correlation function moments.
Output Files
This creates an ASCII file containing the radial bins and the correlation function multipoles. For an input correlation function with N bins, the file has N rows with each radial bin in a separate row (plus an additional header row). The first element in each row is the radial bin center, then the correlation function multipoles are listed, such that column two contains ξ0(r), column three contains ξ2(r) etc.
For later comparison of the jackknife covariance matrix estimate with the data, we require the jackknife covariance matrix, which is derived from the correlation function estimates in each unrestricted jackknife. The scripts below are provided to compute these using Corrfunc. For jackknife J and fields {X, Y}, we compute the pair counts FGaXY in bin a (where F, G ∈ [D, R] for data and random fields D and R), from a cross-pair counts between particles in jackknife A of FX and the entire of field GY. These are added to the pair counts from the cross of particles in jackknife A of field GY with the entire of field FX if the fields are distinct. This allows us to compute all njack correlation functions ξAXY(r, μ) via the Landy-Szalay estimator
NB: The binning file used here should be the same as that used for the covariance matrix not the full correlation function, to allow comparison with the CabJ estimate.
For both periodic and aperiodic data, the RR (and DR) pair counts are computed numerically (from a set of random particles), rather than analytically, unlike the full-field correlation functions. This is to allow for arbitrary choice of jackknife assignment scheme, though we expect the computation time to be larger in this instance. For this reason, we provide separate scripts for single- and multi-field analyses rather than periodic and aperiodic analyses, with periodicity specified by an input parameter.
Usage
For a single field analysis:
python python/xi_estimator_jack.py {GALAXY_FILE} {RANDOM_FILE_DR} {RANDOM_FILE_RR} {RADIAL_BIN_FILE} {MU_MAX} {N_MU_BINS} {NTHREADS} {PERIODIC} {OUTPUT_DIR} [{RR_jackknife_counts}]
For an analysis using two distinct fields:
python python/xi_estimator_jack_cross.py {GALAXY_FILE_1} {GALAXY_FILE_2} {RANDOM_FILE_1_DR} {RANDOM_FILE_1_RR} {RANDOM_FILE_2_DR} {RANDOM_FILE_2_RR} {RADIAL_BIN_FILE} {MU_MAX} {N_MU_BINS} {NTHREADS} {PERIODIC} {OUTPUT_DIR} [{RR_jackknife_counts_11} {RR_jackknife_counts_12} {RR_jackknife_counts_22}]
This computes estimates of the auto- and cross-correlations for all unrestricted jackknife regions. Since there are three distinct correlations for each, the run-time is increased by a factor of 3.
Following computation of ξaAJ we can estimate the single-survey jackknife covariance matrix via Cab, dataJ = ∑AwaAwbA(ξaAJ − ξ̄aJ)(ξbAJ − ξ̄bJ)/(1 − ∑BwaBwbB). This is done internally in the post-processing-jackknife
code.
Input Parameters
See the input parameters for the full-correlations
script. In addition, the {RR_jackknife_counts_XY} quantities are the RRaAXY pair counts which can be specified to avoid recomputation. These have been previously output by the jackknife-weights
code as jackknife_pair_counts_n{N}_m{M}_j{J}_{INDEX}.dat
(using the correct covariance-matrix binning) hence can be used here for a significant speed boost. The RRaAXY pair counts must be normalized by the squared full-survey summed weights (∑iwi)2 - this is done automatically in the preceding script. The {PERIODIC} parameter is unity if the data is computed from a periodic box simulation and zero else.
Output Files
This script creates ASCII files for each output correlation function, of the form xi_jack_n{N}_m{M}_{INDEX}.dat
for N radial bins, M angular bins and INDEX specifying the correlation function type (11 = autocorrelation of field 1 (default), 12 = cross-correlation of fields 1 and 2, 22 = autocorrelation of field 2). NB: These have a different file format to the non-jackknife correlation functions. The first and second lines of the .dat
file list the radial and angular bin centers, but each succeeding line gives the entire correlation function estimate for a given jackknife. The rows indicate the jackknife and the columns specify the collapsed bin, using the indexing bincollapsed = binradial × nμ + binangular for a total of nμ angular bins.
These files are read automatically by the post-processing-jackknife
code.