## Hands on real clustering data - eBOSS LRG (plus BOSS CMASS) sample
In this session we will compute the correlation function / power spectrum of the input galaxy catalogs.

In [1]:
import numpy as np

from astropy.table import Table

import environment
from environment import Measurement

Download eBOSS LRGpCMASS catalogs here: https://drive.google.com/drive/folders/182f-FSa0uWgovIxVNdpp0pFYWOVc3jc7?usp=sharing  
(These are a "light version" of the official catalogs provided at https://data.sdss.org/sas/dr16/eboss/lss/catalogs/DR16/)

In [2]:
tracer = 'LRGpCMASS'
cap = 'NGC'
recon = False
path_data, path_randoms = environment.path_catalogs(tracer=tracer,cap=cap,recon=recon)

In [3]:
data = Table.read(path_data)
randoms = Table.read(path_randoms)
data.columns
randoms.columns

<TableColumns names=('RA','DEC','Z','WEIGHT_SYSTOT','WEIGHT_CP','WEIGHT_NOZ','NZ')>

In [4]:
# do a scatter plot of RA/Dec, histogram of NZ

We will transform input angular coordinates RA/Dec and redshifts Z into cartesian positions, assuming a fiducial cosmology. You can e.g. take BOSS and eBOSS fiducial cosmology.

In [5]:
from nbodykit.lab import cosmology

def get_cosmo_BOSS():
    # BOSS and eBOSS fiducial cosmologies
    cosmo_kwargs = dict(Omega_m=0.31,omega_b=0.022,h=0.676,sigma8=0.8,n_s=0.97,N_ur=2.0328,m_ncdm=[0.06])
    cosmo_kwargs['Omega0_b'] = cosmo_kwargs.pop('omega_b')/cosmo_kwargs['h']**2
    Omega0_m = cosmo_kwargs.pop('Omega_m')
    sigma8 = cosmo_kwargs.pop('sigma8')
    cosmo = cosmology.Cosmology(**cosmo_kwargs).match(Omega0_m=Omega0_m).match(sigma8=sigma8)
    return cosmo

cosmo_fid = get_cosmo_BOSS()
data['DISTANCE'] = cosmo_fid.comoving_distance(data['Z'])
# same for random catalog

Galaxies (and randoms) in the catalog receive weights, to correct for systematic effects
(such that the ensemble average of galaxy density matches that of randoms):
- WEIGHT_SYSTOT: weights to correct for photometric systematics: what are they?  
- WEIGHT_CP: weights to correct for fiber collisions: what are they?  
- WEIGHT_NOZ: weights to correct for redshift failures: what are they?

The total (completenes) weight is: WEIGHT_COMP = WEIGHT_SYSTOT * WEIGHT_CP * WEIGHT_NOZ.  
    In addition, you can apply weights to minimize variance: WEIGHT_FKP = 1/(1 + NZ * P0), with P0 the typical value of the power spectrum at the scales of interest, e.g. $10000 \; (\mathrm{Mpc}/h)^{3}$ (NZ is in $(\mathrm{Mpc}/h)^{3}$).

In [6]:
# data['WEIGHT'] = data['WEIGHT_COMP'] * data['WEIGHT_FKP']
# same for randoms

### Now compute pair counts as a function of (s,mu)
If you are running out of time, you can directly use https://nbodykit.readthedocs.io/en/latest/api/_autosummary/nbodykit.algorithms.paircount_tpcf.tpcf.html#nbodykit.algorithms.paircount_tpcf.tpcf.SurveyData2PCF  
Otherwise, it is more instructive to start from Corrfunc https://corrfunc.readthedocs.io/en/master/api/Corrfunc.mocks.html#Corrfunc.mocks.DDsmu_mocks   
which you will use to compute data - data pairs (DD), data - random pairs (DR), random - random pairs (RR).  
Then, use the Landy-Szalay estimator to compute the correlation function: $\xi(s,\mu) = \frac{DD(s,\mu) - 2DR(s,\mu) + RR(s,\mu)}{RR(s,\mu)}$.  
Note that you should normalize each pair count by the total weighted number of pairs in the survey. What is it?  
(An example of implementation of the Landy-Szalay estimator can be found in correlation_function.py and a use case in estimators.py)

In [7]:
# e.g. to cross-correlate catalogs cat1 and cat2
# import Corrfunc
# Corrfunc.mocks.DDsmu_mocks(False,cosmology=1,nthreads=4,mu_max=1.,nmu_bins=100,binfile=sedges,
#                            RA1=cat1['RA'],DEC1=cat1['DEC'],CZ1=cat1['DISTANCE'],weights1=cat1['WEIGHT'],
#                            RA2=cat2['RA'],DEC2=cat2['DEC'],CZ2=cat2['DISTANCE'],weights2=cat2['WEIGHT'],
#                            is_comoving_dist=True,verbose=True,output_savg=False,weight_type='pair_product')

### Compute multipoles of the correlation function
Multipoles are given by $\xi_{\ell}(s) = \frac{2 \ell + 1}{2} \int_{-1}^{1} d\mu \xi(s,\mu) \mathcal{L}(\mu)$,
with $\mathcal{L}(\mu)$ Legendre polynomials (see https://en.wikipedia.org/wiki/Legendre_polynomials, also https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.legendre.html).

In [8]:
# Plot the correlation function multipoles!

### Estimate power spectrum multipoles
https://nbodykit.readthedocs.io/en/latest/api/_autosummary/nbodykit.algorithms.convpower.fkp.html
https://nbodykit.readthedocs.io/en/0.1.11/algorithms/survey-power.html   
Weighted data and randoms are interpolated onto a mesh.   
The Fourier-space field is computed using Fast Fourier Transforms (FFTs).   
Note that such sampling of the galaxy density field yields to artefacts that must be corrected:
smoothing (due to interpolation on the mesh) and aliasing (high frequencies entering into low frequency modes).
How are they corrected for? (already included in nbodykit's ConvolvedFFTPower).  
A Poisson shot noise term is usually subtracted from the monopole. Why?  
(An example of running power spectrum estimation can be found in estimators.py)

In [9]:
# from nbodykit.lab import FITSCatalog, FKPCatalog, ConvolvedFFTPower
# data = FITSCatalog(data)
# randoms = FITSCatalog(randoms)
# fill in WEIGHT_COMP, WEIGHT_FKP
# fkp = FKPCatalog(data,randoms)
# BoxSize = 3000.
# Nmesh = 100 # increase if you have more memory on your computer
# mesh = fkp.to_mesh(position='POSITION',fkp_weight='WEIGHT_FKP',comp_weight='WEIGHT_COMP',nbar='NZ',BoxSize=BoxSize,Nmesh=Nmesh,resampler='tsc',interlaced=True,compensated=True)
# power = ConvolvedFFTPower(mesh,poles=(0,2,4),kmin=0.,dk=0.01)
# print(power.poles)

### Bonus, cosmological constraints with BAO!
Follow instructions in bao_inverse_distance_ladder.ipynb