## Hands on real clustering data - eBOSS LRG (plus BOSS CMASS) sample
In this session we will estimate (and interpret) the correlation function / power spectrum of galaxy catalogs.
This is the first (compression) step of a standard clustering analysis; the second step consists in fitting these compressed measurements with a theory model to derive constraints on cosmological parameters, which will be explored in the Y2 TD (bao_inverse_distance_ladder.ipynb).

### Installation

#### Packages
**If packages are already installed, skip this part.**

In [None]:
!python -m pip install matplotlib cython mpi4py fitsio
!python -m pip install git+https://github.com/cosmodesi/cosmoprimo#egg=cosmoprimo[class,astropy]
!USE_GPU=0 python -m pip install git+https://github.com/cosmodesi/pycorr#egg=pycorr[corrfunc]
# When running on Google Colab, you can use GPU (setting "execution type" to GPU):
#!CUDA_HOME=/usr/local/cuda python -m pip install git+https://github.com/cosmodesi/pycorr#egg=pycorr[corrfunc]
!python -m pip install git+https://github.com/cosmodesi/pypower#egg=pypower[extras]

#### Note
Within the DESI collaboration we have put some effort into setting up some ~ easy to use Python packages for standard clustering analyses. Check them out here: https://github.com/cosmodesi.
You may e.g. be interested by:
- [cosmoprimo](https://github.com/cosmodesi/cosmoprimo): primordial cosmology (class, camb, isitgr, fftlog, interpolator, BAO filtering)
- [pycorr](https://github.com/cosmodesi/pycorr): correlation function estimation
- [pypower](https://github.com/cosmodesi/pypower): power spectrum (and window function) estimation
- [pyrecon](https://github.com/cosmodesi/pyrecon): standard BAO reconstruction
- [mockfactory](https://github.com/cosmodesi/mockfactory): tools to be build fast mocks
- [desilike](https://github.com/cosmodesi/desilike): DESI likelihoods, fits of 2-pt statistics, Fisher, bindings with cosmological samplers (Cobaya, CosmoSIS, MontePython)

#### Catalogs
Let's download eBOSS LRGpCMASS catalogs here: https://drive.google.com/drive/folders/1bs0YSEwCOghx2YzCb2IbdIkn-7uttQz-?usp=sharing.
These are a "light version" of the official catalogs provided at https://data.sdss.org/sas/dr16/eboss/lss/catalogs/DR16/

In [None]:
!python -m pip install gdown
!gdown --no-check-certificate --folder https://drive.google.com/drive/folders/1lN0xu7mWuu46POSaJ1t5tKCmtxOETfXh?usp=sharing
#!gdown --no-check-certificate --folder https://drive.google.com/drive/folders/1egpLxKnteOQgYIetNDk7TFmk-vDx_O11?usp=sharing

## Inspecting catalogs

In [None]:
path_data = './catalogs/lite/eBOSS_LRGpCMASS_clustering_data-NGC-vDR16_lite.fits'
path_randoms = './catalogs/lite/eBOSS_LRGpCMASS_clustering_random-NGC-vDR16_lite.fits'

In [None]:
import numpy as np

from astropy.table import Table

data = Table.read(path_data)
randoms = Table.read(path_randoms)
data.columns
randoms.columns

Galaxies in the data and randoms catalogs receive weights, to correct for observational systematic effects, such that the ensemble average of galaxy density (= "survey selection function") and that of randoms match:
- WEIGHT_SYSTOT: weights to correct for photometric systematics: what are they?
- WEIGHT_CP: weights to correct for fiber collisions: what are they?
- WEIGHT_NOZ: weights to correct for redshift failures: what are they?

The total (completeness) weight is: WEIGHT_COMP = WEIGHT_SYSTOT * WEIGHT_CP * WEIGHT_NOZ.

In [None]:
# Make a scatter plot of data RA/Dec and a histogram of data redshifts
# Check that weighted randoms have the same angular (RA/Dec) and redshift (Z) distribution as the data
from matplotlib import pyplot as plt
# Tip: for the RA/Dec plot, downsample the data and randoms for faster plots:
rng = np.random.RandomState(seed=42)
mask_data = rng.uniform(0., 1., len(data)) < 0.1
mask_randoms = rng.uniform(0., 1., len(randoms)) < mask_data.sum() / len(randoms)
# Then, e.g. plt.scatter(data['RA'][mask_data], data['DEC'][mask_data], s=1, label='data')
#plt.legend()
#plt.show()

In [None]:
#for catalog in [data, randoms]:
#    catalog['WEIGHT_COMP'] = define weights

#Then, e.g.: plt.hist(data['Z'], weights=data['WEIGHT_COMP'], histtype='step', density=True, label='data')

Let's first transform redshifts Z into distances, assuming a fiducial cosmology. Take BOSS (and eBOSS) fiducial cosmology:
$\Omega_{m} = 0.31$, $\omega_{b} = 0.022$, $h = 0.676$, $\sigma_{8} = 0.8$, $n_{s} = 0.97$, $\sum m_{\nu} = 0.06 \; \mathrm{eV}$.

In [None]:
# Fiducial cosmological parameters are:
# dict(Omega_m=0.31, omega_b=0.022, h=0.676, sigma8=0.8, n_s=0.97, m_ncdm=0.06)
# To compute comoving_radial_distance(z), various options:
# 1) code this yourself if you have never done so! You just need \Omega_{m} (flat Universe, negligible radiation: \Omega_{\Lambda} = 1 - \Omega_{m})
# H(z) = H_{0} \sqrt{\Omega_{m} (1 + z)^3 + \Omega_{\Lambda}}
# d(z) = \int cdz / H(z)
# WARNING: let's work in Mpc/h units (H0 = 100 h km/s/Mpc)
# 2) use classy, camb, astropy...
# 3) use cosmoprimo (engine='class' or 'camb' or 'astropy'), see cell 9 of https://github.com/cosmodesi/cosmoprimo/blob/main/nb/examples.ipynb
from cosmoprimo import Cosmology
cosmo_fid = Cosmology(Omega_m=0.31, omega_b=0.022, h=0.676, sigma8=0.8, n_s=0.97, m_ncdm=[0.06], engine='class')
# Or simply:
#from cosmoprimo.fiducial import BOSS
#cosmo_fid = BOSS(engine='class')

def get_xyz(ra, dec, z):
    # Compute distance d
    d = cosmo_fid.comoving_radial_distance(z)
    # Turn distance d, RA (\phi), Dec (\pi/2-\theta) (mind degree -> radians!) into x, y, z Cartesian positions
    return x, y, z

#data_positions = get_xyz(data['RA'], data['DEC'], data['Z'])
# Same for randoms
#randoms_positions = get_xyz(randoms['RA'], randoms['DEC'], randoms['Z'])

For fun, make a 'wedge plot' of the data: a Cartesian 2D (x, y) slice between 0 and 1 deg in Dec. Do you see structures, filaments, voids?

In [None]:
lim_dec = (0., 1.)
mask_data = (data['DEC'] > lim_dec[0]) & (data['DEC'] < lim_dec[1])
mask_randoms = (randoms['DEC'] > lim_dec[0]) & (randoms['DEC'] < lim_dec[1])

plt.gcf().set_size_inches((15,) * 2)
ax = plt.gca()
ax.set_aspect('equal')
# Plot (-x, y) (minus sign just to orient the figure)
# e.g. ax.scatter(- data_positions[0][mask_data], data_positions[1][mask_data], marker='.', s=4., alpha=0.8)
#ax.tick_params(bottom=False, labelbottom=False, left=False, labelleft=False) 
#plt.show()

In addition to completeness weights above, when computing 2pt statistics (correlation function or power spectrum), one can apply weights to minimize its variance: WEIGHT_FKP = 1/(1 + NZ * P0), with P0 the typical value of the power spectrum at the scales of interest, e.g. $10000 \; (\mathrm{Mpc}/h)^{3}$ (NZ is in $(\mathrm{Mpc}/h)^{3}$).
See e.g. https://arxiv.org/pdf/astro-ph/9304022.pdf, eq. 2.3, for the variational demonstration (another, broader point-of-view is that of the optimal quadratic estimator, of which the FKP estimator we will use below is a simplification under some assumptions).

In [None]:
# Compute data weights, WEIGHT_COMP * WEIGHT_FKP
data_weights =
# Same for randoms
randoms_weights =

## Correlation function

### Pair counts as a function of $(s, \mu)$

Correlation functions are usually estimated by binning the (weighted) number of pairs of particles (galaxies, randoms) as a function of the separation $s$ between particles (and optionally the cosine angle to the line-of-sight $\mu$).

This is basically a double loop, schematically:
```
for i1 in n1:  
    for i2 in n2:  
        counts[index(s(i1, i2)), index(mu(i1, i2))] += weight(i1, i2)  
```
For n1 ~ n2 ~ O(1e6) objects, this will be slow (especially in Python). How to speed this up?
Auto pair counts (particles i1 and i2 come from the same catalog) can be computed ~ twice faster than cross pair counts (particles i1 and i2 come from different catalogs), why?

In [None]:
# Compute data - data pair counts DD
# You can use pycorr (wrapping a modified version of Corrfunc)
# If so, take a quick look at https://github.com/cosmodesi/pycorr/blob/main/nb/basic_examples.ipynb
#edges = (np.linspace(0., 200, 51), np.linspace(-1., 1., 100))  # s, mu binning
#nthreads, gpu = 4, False
# When running on Google Colab, you can use GPU:
#nthreads, gpu = 1, True
#from pycorr import TwoPointCounter
# If positions are x, y, z, pass position_type='xyz' instead
#D1D2 = TwoPointCounter(mode='smu', edges=edges, positions1=data_positions, weights1=data_weights,
#                       position_type='xyz', dtype='f8', engine='corrfunc', nthreads=nthreads, gpu=gpu)
# D1D2.sep are (average) separations
# D1D2.wcounts are pair counts (weighted by the product of particle weights)
# D1D2.wnorm is the normalization (see below)
# Same for randoms - randoms pair counts RR
# WARNING: This will take a few minutes, be patient!
#R1R2 = ...

Can you guess how RR counts typically evolve as a function of $s$ at small separation? This relation is not perfectly verified at small scales, due to fine-grained veto masks, but why is it really not valid anymore at large scales?

You see now why randoms are important!

In [None]:
# Plot RR(s) / f(s) counts as a function of s (log-scale)
# (let's sum wcounts over mu to keep only the s-dependence)
# s, wcounts, sedges = np.mean(R1R2.sep, axis=-1), np.sum(R1R2.wcounts, axis=-1), R1R2.edges[0]
#plt.plot(s, ...)
#plt.xscale('log')
#plt.show()

### Natural estimator
The natural estimator (the simplest one can typically think of) for the correlation function compares the (weighted) number of galaxy pairs to that of randoms (what we would have in absence of clustering), i.e. DD / RR - 1, with:
- DD the (normalized) galaxy - galaxy pair counts
- RR the (normalized) randoms - randoms pair counts

Why 'normalized'? What is the normalization factor?
Clue: what is the total (weighted) number of pairs (until infinite separation)?

In [None]:
# Compute DD / RR - 1
# WARNING: you need to normalize DD and RR first before computing DD / RR - 1!
#D1D2_wnorm = ...
#R1R2_wnorm = ...
#xi = (D1D2.wcounts / D1D2_wnorm) / (R1R2.wcounts / R1R2_wnorm) - 1
# Compute sep, mean s
#sep = np.sum(R1R2.sep * R1R2.wcounts, axis=-1) / np.sum(R1R2.wcounts, axis=-1)

### Multipoles of the correlation function
Compute and plot the correlation function multipoles $\ell = 0$ (monopole), $\ell = 2$ (quadrupole) and $\ell = 4$ (hexadecapole), given by $\xi_{\ell}(s) = \frac{2 \ell + 1}{2} \int_{-1}^{1} d\mu \xi(s,\mu) \mathcal{L}_{\ell}(\mu)$,
with $\mathcal{L}_{\ell}(\mu)$ Legendre polynomials (see https://en.wikipedia.org/wiki/Legendre_polynomials, also https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.legendre.html).

In [None]:
# Trick: write d\mu \mathcal{L}_{\ell}(\mu) as the difference of \mathcal{L}_{\ell}(\mu) primitives
# This makes sure $\xi_{\ell > 0} is 0 in case $\xi(s, \mu)$ is constant in \mu
ells = (0, 2, 4)
# Compute xiell, list of multipoles
#from scipy import special
#xiell = []
#for ell in ells:
#    muedges = R1R2.edges[1]
#    poly = special.legendre(ell).integ()(muedges)
#    legendre = (2 * ell + 1) * (poly[1:] - poly[:-1])
#    xiell.append(...)

# Plot the correlation function multipoles
#for ill, ell in enumerate(ells):
#    plt.plot(sep, sep**2 * xiell[ill], label='$\ell = {:d}$'.format(ell))
#plt.grid(True)
#plt.xlabel(r'$s$ [$\mathrm{Mpc}/h$]')
#plt.ylabel(r'$s^{2}\xi_{\ell}(s)$ [$(\mathrm{Mpc}/h)^{2}$]')
#plt.legend()
#plt.show()

In [None]:
# Notes about pycorr
# You can obtain this directly with:
#from pycorr import NaturalTwoPointEstimator
#estimator = NaturalTwoPointEstimator(D1D2=D1D2, R1R2=R1R2)
#s, xiell = estimator(ells=ells, return_sep=True)
#estimator.plot(ells=ells)
# Or, starting from the beginning:
#from pycorr import TwoPointCorrelationFunction, setup_logging
#setup_logging()  # to activate logging
#estimator = TwoPointCorrelationFunction(mode='smu', edges=edges, data_positions1=data_positions, data_weights1=data_weights,
#                                        randoms_positions1=randoms_positions, randoms_weights1=randoms_weights,
#                                        estimator='natural', position_type='xyz', dtype='f8', engine='corrfunc', nthreads=nthreads, gpu=gpu)
#estimator.plot(ells=ells)
#plt.show()

The monopole looks odd (does not go to ~ zero) on large scales!

No worries, this is not a bias (in average, it would be fine), but the natural estimator has a larger variance than the Landy-Szalay estimator (see e.g. https://articles.adsabs.harvard.edu/pdf/1993ApJ...412...64L): (DD - 2DR + RR) / RR.

Estimate the correlation function with the Landy-Szalay estimator.

In [None]:
# You can compute all pair counts DD, DR and RR as above
# Or use pycorr's general interface (see https://github.com/cosmodesi/pycorr/blob/main/nb/basic_examples.ipynb)
#from pycorr import TwoPointCorrelationFunction
#estimator = TwoPointCorrelationFunction(mode='smu', edges=edges, ..., estimator='landyszalay')

Plot the correlation function multipoles. Looks better!  
You should clearly see the BAO peak (where?).
What does the non-zero quadrupole mean? Where does it come from?

You can also compare data to a theory correlation function (just scale the linear correlation function by an arbitrary factor).

In [None]:
# Plot
#estimator.plot(ells=ells)
#zeff = np.average(data['Z'], weights=data_weights)  # (quite arbitrary) effective redshift

# Linear (real-space) correlation function:
# xi = cosmo_fid.get_fourier().pk_interpolator().to_1d(z=zeff).to_xi()(sep)
#kaiser_factor = 6.5  # scaling by hand, just to match the amplitude
#xi_model = kaiser_factor * cosmo_fid.get_fourier().pk_interpolator().to_1d(z=zeff).to_xi()(sep)
#plt.plot(sep, sep**2 * xi_model, color='k')
#plt.show()

## Power spectrum

Computing pair counts for correlation function estimation remains somewhat slow (still tractable for current surveys, DESI, Euclid). Also, theorists tend to prefer the power spectrum, as different $k$-modes are initially (almost) uncorrelated and evolve independently in the linear regime (see Julien's course).
Let's compute the power spectrum monopole step-by-step, to show how this works.

First, "paint" data and randoms to a 3D mesh, to get an estimate of the 3D over density field $F(\mathbf{r}) = n_{d}(\mathbf{r}) - \alpha n_{r}(\mathbf{r})$, with:
- $n_{d}(\mathbf{r})$ the data density
- $n_{r}(\mathbf{r})$ the randoms density
- $\alpha = \sum_{i \in \mathrm{data}} w_{i} / \sum_{i \in \mathrm{randoms}} w_{i}$: scale the randoms density to the data density, i.e. such that $\int d^3 r F(\mathbf{r}) = 0$ (integral constraint!)

In [None]:
from pypower import CatalogMesh
# This step involves a kernel (resampler='tsc') that effectively smoothes the density field,
# which must be compensated for by a kernel in Fourier space (compensate=True),
# see e.g. https://arxiv.org/abs/astro-ph/0409240
# Wider is this kernel, better is the mitigation of aliasing effects (greater than Nyquist frequencies contaminating lower frequencies)
# Another technique to mitigate aliasing is 'interlacing': shifting the mesh by a fraction of mesh cell size
# see e.g. https://arxiv.org/abs/1512.07295
# WARNING: If you have < 8 GB RAM, use smaller nmesh (and / or dtype='f4' i.e. simple precision)
#mesh = CatalogMesh(data_positions=data_positions, data_weights=data_weights,
#                   randoms_positions=randoms_positions, randoms_weights=randoms_weights,
#                   boxsize=5000., nmesh=256, resampler='tsc', interlacing=3, position_type='xyz',
#                   dtype='f8', mpiroot=0)
#rfield = mesh.to_mesh(compensate=True)
# To estimate the density, divide by the cell volume:
#cellsize = mesh.boxsize / mesh.nmesh
#dv = cellsize.prod()
#rfield /= dv

Next, the power spectrum is (up to some normalization A), the square modulus of the Fourier-space field:
$\hat{P}(\mathbf{k}) = |F(\mathbf{k})|^2 / A - S$.

In [None]:
# Compute F(k)
#cfield = rfield.r2c()  # Fast Fourier Transform, F(r) -> F(k)
# FFT convention is F(k) = 1/N^3 \sum_{r} e^{-ikr} F(r), so compensate by N^3 * dv
#cfield *= mesh.nmesh.prod() * dv
#cfield[...] = cfield[...] * cfield[...].conj()  # |F(k)|^2
#del rfield  # save memory
#print(cfield.value[0, 0, 0])  0 by construction: integral constraint!

We want to compute the 1D power spectrum, as a function of the norm $k = |\mathbf{k}|$. Let's bin $|F(\mathbf{k})|^2$ as a function of  $k = |\mathbf{k}|$.

In [None]:
# edges = (k-edges, mu-edges = [-1, 1])
#knyq = np.pi / np.max(cellsize)  # Nyquist frequency
#edges = (np.arange(0., knyq, 0.005), np.linspace(-1., 1., 2))
#from pypower.fft_power import project_to_basis
#k, _, pk, nk, _ = project_to_basis(cfield, edges)[0]
#k, pk = k.ravel(), pk.ravel().real
#del cfield  # save memory

Let's find the normalization $\int d^3 r \bar{F}^2(\mathbf{r})$, with $\bar{F}$ the selection function (sampled by the randoms).

In [None]:
# Trick: estimate \bar{F}^2 with data x randoms instead of randoms^2 to avoid shot noise
#wnorm = (mesh.to_mesh(field='data', compensate=False) / dv * mesh.to_mesh(field='data-normalized_randoms', compensate=False) / dv).csum() * dv
#pk /= wnorm

Plot the power spectrum (typically $k P(k)$). What is its unit?
The small scales (high $k$) monopole looks fishy (too high --- it should ~ decrease), did we forget anything?

In [1]:
# Plot
#plt.plot(k, k * pk)
#plt.xlabel(...)
#plt.ylabel(...)
#plt.show()

Indeed, we forgot to remove the Poisson shot noise $S$! What is it?

$S$ is to be computed as $(\sum_{i \in \mathrm{data}} w_{i}^{2} + \alpha^2 \sum_{i \in \mathrm{randoms}} w_{i}^{2}) / A$.
Subtract it from the estimated power spectrum, and plot the result.

In [None]:
# Plot

Looks much better! Do you recognize the BAO wiggles? To see them better, you can compare data to a theory power spectrum.

In [None]:
# Plot
#pk_model = kaiser_factor * cosmo_fid.get_fourier().pk_interpolator().to_1d(z=zeff)(k)

### Multipoles of the power spectrum
Estimating higher multipoles (quadrupole, hexadecapole) is slightly more difficult, as we have to account for varying line-of-sight. We typically use the first-point / end-point for line-of-sight $\hat{\mathbf{\eta}}$, to split:
$\hat{P}(\mathbf{k}) = (2\ell + 1) / A \sum_{\mathbf{r}_{1}, \mathbf{r}_{2}} e^{ik(\mathbf{r}_{2} - \mathbf{r}_{1})} F(\mathbf{r}_{1}) F(\mathbf{r}_{2}) \mathcal{L}_{\ell}(\hat{\mathbf{k}} \cdot \hat{\mathbf{\eta}}) - \delta_{\ell 0} S$
into:
$\hat{P}(\mathbf{k}) = (2\ell + 1) / A F_{\ell}^{\star}(\mathbf{k}) F_{0}(\mathbf{k}) - \delta_{\ell 0} S$
with:
$F_{\ell}(\mathbf{k}) = \sum_{\mathbf{r}} e^{i\mathbf{k} \cdot \mathbf{r}} F(r) \mathcal{L}_{\ell}(\hat{\mathbf{k}} \cdot \hat{\mathbf{r}})$
This last term can be split, using e.g. decomposition of Legendre polynomial into spherical harmonics, into a pure $\mathbf{k}$-dependent term, times a standard Fourier transform, which can be FFT'ed. See e.g. https://arxiv.org/abs/1704.02357.

In [None]:
# For this, use pypower.
# (You can take a quick look at https://github.com/cosmodesi/pypower/blob/main/nb/basic_examples.ipynb)
#from pypower import CatalogFFTPower
#power = CatalogFFTPower(data_positions1=data_positions, data_weights1=data_weights,
#                        randoms_positions1=randoms_positions, randoms_weights1=randoms_weights,
#                        boxsize=5000., nmesh=256, resampler='tsc', interlacing=3,
#                        edges={'step': 0.005}, ells=(0, 2, 4), position_type='xyz', dtype='f8', mpiroot=0).poles

Plot the power spectrum multipoles.
Again, BAO wiggles and non-zero quadrupole!

In [None]:
# Plot
#power.plot()
#plt.show()

Note that the above power spectrum measurements (using the 'FKP estimator') cannot (should not) be directly compared to theory power spectrum models... Indeed, in these measurements the power spectrum is convolved with the survey selection function (that we essentially already removed in the correlation function estimation with the division by RR). Therefore, usually people multiply the theory power spectrum by the window matrix, which can be computed from the random catalogs; see https://github.com/cosmodesi/pypower/blob/main/nb/window_examples.ipynb if you are interested.

### Kaiser formula
Still, let's just look how the power spectrum multipoles compare to the Kaiser formula, which gives the expected power spectrum at linear order (i.e. accurate on large scales = low $k$):
$P(k, \mu) = (b + f \mu^2)^{2} P^{\mathrm{lin}}(k) = (\beta^{-1} + \mu^2)^{2} P^{\mathrm{lin}}_{\theta\theta}(k)$ ($\beta = f / b$)
where $b$ is the galaxy bias, $f = \frac{d\ln{D}}{d\ln{a}} \simeq \Omega_m(z)^{0.55}$ is the logarithmic growth rate of structure, $P^{\mathrm{lin}}(k)$ is the linear power spectrum.

The model depends on $f (\times \sigma_{8})$ or --- better formulated --- the amplitude of the velocity divergence power spectrum $P_{\theta\theta}(k)$, which we can fit to the data.

Integrate $P(k, \mu)$ over Legendre polynomials, either symbolically (easy!) or by numerical integration, and plot model and data.

In [None]:
#pklin = cosmo_fid.get_fourier().pk_interpolator()(k, z=zeff)
#f = cosmo_fid.growth_rate(zeff)
#b =  # tune by hand to match data amplitude
pk_model = []
# Append the multipoles
#power.plot()
#for ill, ell in enumerate(ells):
#    plt.plot(k, k * pk_model[ill], color='C{:d}'.format(ill), linestyle='--')
#plt.show()

The quadrupole in particular looks a small scales (high $k$) a bit more damped in the data than in the model, why?

## Take-home messages
- in standard analyses, the observed (RA, Dec, z) are compressed into power spectrum or correlation function multipoles
- BAO: peak at 100 $\mathrm{Mpc}/h$ in the correlation function, wiggles in the power spectrum
- measuring the position of the BAO peak = measuring a fixed comoving distance ('standard ruler') at a given redshift = constraing the Universe's expansion
- non-zero quadrupole = anisotropy (w.r.t the line-of-sight): redshift-space distortions (linear model: Kaiser formula)
- measuring redshift-space distortions = constraining the amplitude of the velocity power spectrum: growth of structure

## Bonus
- pen-and-paper exercise: derive the Kaiser formula, knowing that:
$\mathbf{s} = \mathbf{r} + v_{z} \hat{\mathbf{z}}$ with $v_{z}(\mathbf{k}) = i f \mathbf{k} / k^{2} \delta_{r}(\mathbf{k})$ (which can be obtained from linear theory)
and mass conservation (mass in redshift space = mass in real space): $\left[1 + \delta_{s}(\mathbf{s})\right]d^{3}s = \left[1 + \delta_{r}(\mathbf{r})\right]d^{3}r$. Correction: Section II. of https://arxiv.org/abs/1006.0699.
- we have worked so far with 'pre-reconstruction' catalogs. A step called 'reconstruction' can be used to sharpen the BAO peak. Compute the correlation function and / or power spectrum of reconstructed catalogs (with *rec* in name).
Note that, in this case, you should provide reconstructed data for data_positions and data_weights, reconstructed randoms for shifted_positions and shifted_weights, and pre-reconstruction randoms for randoms_positions, randons_weights. What is the difference w.r.t. pre-reconstruction measurements? (BAO peak, quadrupole?).
- you can try applying reconstruction yourself to pre-reconstruction catalogs with https://github.com/cosmodesi/pyrecon
- here pair weights are simple product of galaxy individual weights... but we can imagine other schemes, e.g. to correctly compensate for fiber collisions: see https://github.com/cosmodesi/pycorr/blob/main/nb/pip_examples.ipynb (and similar correction for power spectrum, https://github.com/cosmodesi/pypower/blob/main/nb/pip_examples.ipynb)
- start Y2 TD, cosmological constraints with BAO: bao_inverse_distance_ladder.ipynb
- why are you still here? go to the beach...