# Introduction to the DESC Data Challenge 2

**Author**: Javier Sánchez

**Date**: 02-07-2019

In this notebook we are going to give a general overview of what the DESC Data Challenge 2 (DC2) is, what products are available, and how to acess them. We are also going to explain the differences between the different runs and give some examples.

## The basics

LSST is planned to map ~20,000 sq-deg of the sky with a median 5-sigma depth of $r \sim 27.5$. This means that the object density and data volume that LSST will handle will greatly surpass any current astronomy experiment (HSC and Hubble have deeper fields but they are not as wide). This increased statistics increase the cosmological sensitivity but, at the same time, this means that we have to be more careful with the treatment of certain systematic effects.

During the Stony Brook collaboration meeting (Summer 2017) the science working groups (SWGs) provided input for the computing working groups (CWGs) about the kind of projects that they are interested in. The CWGs came up with the specifications of cosmological simulations and image simulations to be analyzed by the SWGs and that will provide a controlled dataset in which to study the different effects in which they were interested. These simulations are the products of the data challenge 2. 

The DESC data challenges follow an staged approach in which each data challenge builds upon the previous one. The first DESC data challenge was a simplistic 40 sq-deg image simulation of galaxies from CatSim in a single band and with limited realism.

Some of the study cases proposed by the SWGs required significant improvements in both the realism of the image simulations and the underlying galaxy catalogs. For DC2 we started to make these improvements to fulfill the SWGs needs.

We are going to present the different products available for DC2. More information can be found at:

https://confluence.slac.stanford.edu/display/LSSTDESC/DC2+Data+Product+Overview

### Galaxy catalog

The underlying galaxy catalog is a brand-new state-of-the-art simulation. A 5000 sq-deg *extragalactic catalog* based on the Outer Rim N-Body simulation ([Habib et al 2014](http://arxiv.org/abs/1410.2805)) is planned. The catalog is populated using the Galacticus semi-analytic model ([Benson et al. 2010](https://arxiv.org/abs/1008.1786)) which, in order to scale up to LSST densities is accelerated using an emulation approach called GalSampler (Hearin et al., in prep.). This catalog is the so-called `cosmoDC2`. There are now ~700 sq-deg available `cosmoDC2_v1.1.4`.

This catalog has a higher resolution and more complicated biasing model than the previous CatSim model based on CatSim. Using the galaxies from this extragalactic catalog we started to generate the image simulations.

### Image simulations

Two different software packages were used to generate the DC2 images: PhoSim and imSim. PhoSim ([Peterson et al 2015.](https://arxiv.org/abs/1504.06570)) is a Monte Carlo simulator that simulates photons from different astronomical sources and simulates their interaction with the atmosphere, telescope and camera. The galaxies are modeled as Sérsic (bulge+disk) models and stars are point-like sources. On the other hand imSim (Walter et al., in prep.) is a software package based on GalSim ([Rowe et al. 2015](https://arxiv.org/abs/1407.7676)) that follows a modular approach to generate LSST-like synthetic images. We use information from `lsst_sims` to build the sky-background model (from Yoachim et al.), LSST project optical simulations from Bo Xin to create the optical model (which is added as an extra layer to the atmosphere). The (PSF) atmospheric model is based on PhoSim's model and was developed by Josh Meyers. Effects like cosmic-rays, saturation, the brighter-fatter effect, tree-rings and others were included in imSim for DC2. The galaxies are also represented as Sérsic (bulge+disk) models, and François Lanusse added the functionality to simulate complex galaxy morphologies (also known as *knots*).

### Runs

*Note: The suffix i/p after the run number mark the software that was used to generate the images*


| Name | Extragalactic catalog used | Run | Software | Target area WFD | Target area DDF | Depth WFD | Depth DDF | Realized area | realized depth |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| Run 1.1p | protoDC2 v2.1.2 | 1.1 | PhoSim | 5 deg x 5 deg | 1.1 deg x 1.1 deg | 10 year  (r ~ 27.5, Ivezic et al.) | 10 year DDF (r~28.4 Brandt et al.) | ~25.6 sq-deg | 25.1 (5-sigma r-band) |	 
| Run 1.2p | protoDC2 v3.0 | 1.2 | PhoSim | 5 deg x 5 deg | 1.1 deg x 1.1 deg | 10 year (r ~ 27.5, Ivezic et al.) | 10 year DDF (r~28.4 Brandt et al.) | ~31.3 sq-deg | 25.9 (5-sigma r-band) |	 
| Run 1.2i | protoDC2 v3.0 | 1.2 | ImSim | 5 deg x 5 deg | 1.1 deg x 1.1 deg | 10 year (r ~ 27.5, Ivezic et al.) |10 year DDF (r~28.4 Brandt et al.) |~29.4 sq-deg	 | 25.7 (5-sigma r-band) |
| Run 2.0i | cosmoDC2 v1.0 | 2.0 | ImSim | 300 sq-deg | 1.25 sq-deg | 10 year (r ~ 27.5, Ivezic et al.) | 10 year DDF (r~28.4 Brandt et al.) | 127 focal plane visits | ~24.5 (5-sigma r-band, single visit) |	 
| Run 2.1i | cosmoDC2 v1.1.4 | 2.1 | ImSim | TBA | TBA | TBA | TBA |  ~ 1000 single visits | ~24.5 (5-sigma r-band, single visit) | |

 

### Products

See [here](https://confluence.slac.stanford.edu/display/LSSTDESC/DC2+Data+Product+Overview) for full description but as a brief summary we have:

Boldface catalogs are the most commonly used.

* **Extragalactic catalog**: The "clean" catalog from the cosmological simulation. This is used as input for the image simulations. Access through `GCR` (the catalogs are either HDF5 files or FITS files).

* Instance catalogs: These are catalogs formatted for imSim and PhoSim to generate the images and they use and translate the information from the extragalactic catalog to be understood by the image generation software. Access using `GCR` or reading the text files.

* **Truth catalog**: This is the version of the extragalactic catalog with the conversions made at the instance catalog stage and they should be used as "ground truth". Access through `GCR`.

* `calexp`: Calibrated exposure, single-visit images. They can be accessed using the DM `Butler` (they are `fits` files).

* `calexp_src`: The source catalog resulting from processing (detection+deblending+measurement) the `calexp`. Access using the `Butler`, `GCR` (they also are fits files).

* `deepCoadd`: Coadd image. The can be accessed using the `Butler` (also `fits` files).

* `deepCoadd_meas`: Source catalog resulting from processing the `deepCoadd`. Access using the `Butler` (also with `GCR` but configuration files needed, they are also `.fits` files).

* **Object catalog**: Pre-ingested catalog from the `merged` forced photometry catalog. These are the catalogs that most people will use. They can be accessed through `GCR`.

## Hands-on with DC2

The goal of this notebook is to give a wide overview of how to access different data products and show some cool 

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib
