# Generate Mock Data

In this example we generate mock data with a variety of systematic effects including photometric redshifts, source galaxy distributions, and shape noise.  We then populate a galaxy cluster object. This notebooks is organised as follows:
- Imports and configuration setup
- Generate mock data with different source galaxy options
- Generate mock data with different field-of-view options
- Generate mock data with different galaxy cluster options (only available with the Numcosmo and/or CCL backends). Use the `os.environ['CLMM_MODELING_BACKEND']` line below to select your backend.

In [None]:
import os
## Uncomment the following line if you want to use a specific modeling backend among 'ct' (cluster-toolkit), 'ccl' (CCL) or 'nc' (Numcosmo). Default is 'ct'
#os.environ['CLMM_MODELING_BACKEND'] = 'nc'

In [None]:
try: import clmm
except:
    import notebook_install
    notebook_install.install_clmm_pipeline(upgrade=False)
    import clmm
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Make sure we know which version we're using

In [None]:
clmm.__version__

## Import mock data module and setup the configuration 

In [None]:
from clmm.support import mock_data as mock
from clmm import Cosmology

Mock data generation requires a defined cosmology

In [None]:
mock_cosmo = Cosmology(H0 = 70.0, Omega_dm0 = 0.27 - 0.045, Omega_b0 = 0.045, Omega_k0 = 0.0)

Mock data generation requires some cluster information. The default is to work with the NFW profile, using the "200,mean" mass definition. The Numcosmo and CCL backends allow for more flexibility (see last section of this notebook)

In [None]:
cosmo = mock_cosmo
cluster_id = "Awesome_cluster"
cluster_m = 1.e15 # M200,m
cluster_z = 0.3
src_z = 0.8
concentration = 4 
ngals = 1000 # number of source galaxies

# Cluster centre coordinates
cluster_ra = 50.
cluster_dec = 87.

## Generate the mock catalog with different source galaxy options

- Clean data: no noise, all galaxies at the same redshift

In [None]:
zsrc_min = cluster_z + 0.1 

In [None]:
ideal_data = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, src_z, ngals=ngals,
                                          cluster_ra=cluster_ra, cluster_dec=cluster_dec)

- Noisy data: shape noise, all galaxies at the same redshift

In [None]:
noisy_data_src_z = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                                src_z, shapenoise=0.05, ngals=ngals, 
                                                cluster_ra=cluster_ra, cluster_dec=cluster_dec)

* Noisy data: shape noise plus measurement error, all galaxies at the same redshift

In [None]:
noisy_data_src_z_e_err = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                                      src_z, shapenoise=0.05, mean_e_err=0.05, ngals=ngals, 
                                                      cluster_ra=cluster_ra, cluster_dec=cluster_dec)

<div class="alert alert-warning">

**WARNING:** Experimental feature. Uncertainties are created by simply drawing random numbers near the value specified by `mean_e_err`. Use at your own risk. This will be improved in future releases.
    
</div>

- Noisy data: photo-z errors (and pdfs!), all galaxies at the same redshift

In [None]:
noisy_data_photoz = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                                 src_z, shapenoise=0.05, photoz_sigma_unscaled=0.05, ngals=ngals, 
                                                 cluster_ra=cluster_ra, cluster_dec=cluster_dec)

- Clean data: source galaxy redshifts drawn from a redshift distribution instead of fixed `src_z` value. Options are `chang13` for Chang et al. 2013 or `desc_srd` for the distribution given in the DESC Science Requirement Document. No shape noise or photoz errors.

In [None]:
ideal_with_src_dist = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                                   'chang13', zsrc_min=zsrc_min, zsrc_max=7.0, ngals=ngals, 
                                                   cluster_ra=cluster_ra, cluster_dec=cluster_dec)


- Noisy data: galaxies following redshift distribution, redshift error, shape noise

In [None]:
allsystematics = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                              'chang13', zsrc_min=zsrc_min, photoz_sigma_unscaled=0.05, 
                                              ngals=ngals, cluster_ra=cluster_ra, cluster_dec=cluster_dec)

In [None]:
allsystematics2 = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                               'desc_srd', zsrc_min=zsrc_min, zsrc_max=7.0, shapenoise=0.05, 
                                               photoz_sigma_unscaled=0.05, ngals=ngals, 
                                               cluster_ra=cluster_ra, cluster_dec=cluster_dec)

Sanity check: checking that no galaxies were originally drawn below zsrc_min, before photoz errors are applied (when relevant)

In [None]:
print('Number of galaxies below zsrc_min:')
print('ideal_data:',np.sum(ideal_data['ztrue']<zsrc_min))
print('noisy_data_src_z:',np.sum(noisy_data_src_z['ztrue']<zsrc_min))
print('noisy_data_photoz:',np.sum(noisy_data_photoz['ztrue']<zsrc_min))
print('ideal_with_src_dist:',np.sum(ideal_with_src_dist['ztrue']<zsrc_min))
print('allsystematics:',np.sum(allsystematics['ztrue']<zsrc_min))

### Inspect the catalog data

- Ideal catalog first entries: no noise on the shape measurement, all galaxies at z=0.8, no redshift errors (z = ztrue)

In [None]:
for n in ideal_data.colnames: 
    if n!='id':
        ideal_data[n].format = "%6.3e" 
ideal_data[0:3].pprint(max_width=-1)

- With photo-z errors

In [None]:
for n in noisy_data_photoz.colnames: 
    if n!='id':
        noisy_data_photoz[n].format = "%6.3e"
noisy_data_photoz[0:3].pprint(max_width=-1)

- Histogram of the redshift distribution of background galaxies, for the true (originally drawn) redshift and the redshift once photoz errors have been added. By construction no true redshift occurs below zsrc_min, but some 'observed' redshifts (i.e. including photoz errors) might be.

In [None]:
plt.hist(allsystematics['z'], bins=50, alpha=0.3, label='measured z (i.e. including photoz error)');
plt.hist(allsystematics['ztrue'], bins=50, alpha=0.3, label='true z');
plt.axvline(zsrc_min, color='red', label='requested zmin')
plt.xlabel('Source Redshift')
plt.legend()

In [None]:
plt.hist(allsystematics['ztrue'], bins=50, alpha=0.3, label='true z');
plt.hist(allsystematics2['ztrue'], bins=50, alpha=0.3, label='true z');


In [None]:
# pdz for one of the galaxy in the catalog, 
galid = 0
plt.plot(allsystematics['pzbins'][galid], allsystematics['pzpdf'][galid])
plt.axvline(allsystematics['z'][galid], label='Observed z', color='red')
plt.axvline(allsystematics['ztrue'][galid], label='True z', color='g')
plt.xlabel('Redshift')
plt.ylabel('Photo-z Probability Distribution')
plt.legend(loc=1)

Populate in a galaxy cluster object

In [None]:
gc_object = clmm.GalaxyCluster(cluster_id, cluster_ra, cluster_dec, 
                               cluster_z, allsystematics)

Plot source galaxy ellipticities

In [None]:
plt.scatter(gc_object.galcat['e1'],gc_object.galcat['e2'])

plt.xlim(-0.2, 0.2)
plt.ylim(-0.2, 0.2)
plt.xlabel('Ellipticity 1',fontsize='x-large')
plt.ylabel('Ellipticity 2',fontsize='x-large')

## Generate the mock data catalog with different field-of-view options

In the examples above, `ngals=1000` galaxies were simulated in a field corresponding to a 8 Mpc x 8 Mpc (proper distance) square box at the cluster redshift (this is the default). The user may however vary the field size and/or provide a galaxy density (instead of a number of galaxies). This is examplified below, using the `allsystematics` example.

- `ngals = 1000` in a 4 x 4 Mpc box. Asking for the same number of galaxies in a smaller field of view yields high galaxy density

In [None]:
allsystematics2 = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                               'chang13', zsrc_min=zsrc_min, zsrc_max=7.0, 
                                               shapenoise=0.05, photoz_sigma_unscaled=0.05,
                                               field_size=4, ngals=ngals, 
                                               cluster_ra=cluster_ra, cluster_dec=cluster_dec)

In [None]:
plt.scatter(allsystematics['ra'],allsystematics['dec'], marker='.', label = 'default 8 x 8 Mpc FoV')
plt.scatter(allsystematics2['ra'],allsystematics2['dec'],marker='.', label = 'user-defined FoV')
plt.legend()

- Alternatively, the user may provide a galaxy density (here ~1 gal/arcmin2 to roughly match 1000 galaxies, given the configuration) and the number of galaxies to draw will automatically be adjusted to the box size.

In [None]:
allsystematics3 = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                               'chang13', zsrc_min=zsrc_min, zsrc_max=7.0, 
                                              shapenoise=0.05, photoz_sigma_unscaled=0.05, 
                                              ngal_density=1.3,
                                              cluster_ra=cluster_ra, cluster_dec=cluster_dec)
print(f'Number of drawn galaxies = {len(allsystematics3)}')

In [None]:
allsystematics4 = mock.generate_galaxy_catalog(cluster_m, cluster_z, concentration, cosmo, 
                                               'desc_srd', zsrc_min=zsrc_min, zsrc_max=7.0, 
                                              shapenoise=0.05, photoz_sigma_unscaled=0.05, 
                                              ngal_density=1.3,
                                              cluster_ra=cluster_ra, cluster_dec=cluster_dec)
print(f'Number of drawn galaxies = {len(allsystematics4)}')

In [None]:
plt.scatter(allsystematics['ra'],allsystematics['dec'], marker='.', label = 'ngals = 1000')
plt.scatter(allsystematics3['ra'],allsystematics3['dec'],marker='.', label = 'ngal_density = 1 gal / arcmin2')
plt.legend()

## Generate mock data with different galaxy cluster options
WARNING: Available options depend on the modeling backend:
- Cluster-toolkit allows for other values of the overdensity parameter, but is retricted to working with the mean mass definition
- Both CCL and Numcosmo allow for different values of the overdensity parameter, but work with both the mean and critical mass definition
- Numcosmo further allows for the Einasto or Burkert density profiles to be used instead of the NFW profile



### Changing the overdensity parameter (all backend) - `delta_so` keyword (default = 200)

In [None]:
allsystematics_500mean = mock.generate_galaxy_catalog(
    cluster_m, cluster_z, concentration, cosmo, 'chang13', delta_so=500,
    zsrc_min=zsrc_min,
    zsrc_max=7.0, shapenoise=0.05, photoz_sigma_unscaled=0.05, ngals=ngals,
    cluster_ra=cluster_ra, cluster_dec=cluster_dec)

### Using the critical mass definition (Numcosmo and CCL only) - `massdef` keyword (default = 'mean')
WARNING: error will be raised if using the cluster-toolkit backend

In [None]:
allsystematics_200critical = mock.generate_galaxy_catalog(
    cluster_m, cluster_z, concentration, cosmo,'chang13',  massdef='critical', zsrc_min=zsrc_min,
    zsrc_max=7.0, shapenoise=0.05, photoz_sigma_unscaled=0.05, ngals=ngals,
    cluster_ra=cluster_ra, cluster_dec=cluster_dec)

### Changing the halo density profile (Numcosmo only) - `halo_profile_model` keyword (default = 'nfw')
WARNING: error will be raised if using the cluster-toolkit or CCL backends

In [None]:
allsystematics_200m_einasto = mock.generate_galaxy_catalog(
    cluster_m, cluster_z, concentration, cosmo,'chang13', halo_profile_model='einasto', zsrc_min=zsrc_min,
    zsrc_max=7.0, shapenoise=0.05, photoz_sigma_unscaled=0.05, ngals=ngals,
    cluster_ra=cluster_ra, cluster_dec=cluster_dec)