<img align="left" src = "images/linea.png" width=120 style="padding: 20px"> 
<img align="left" src = "images/rubin.png" width=140 style="padding: 30px"> 

# PZ Compute - E2E Notebook 
## Photo-zs for LSST DP0.2 Object catalog



Notebook contributors: Julia Gschwend, Luigi Silva, Heloisa Mengisztki <br>
Contact: [julia@linea.org.br](mailto:julia@linea.org.br) <br>
Last verified run: **2024-Nov-08** <br>

### README - Disclaimer
This notebook is an alternative front-end for the pipeline Photo-z Compute, originally developed for command line execution on LIneA's HPC environment. It is meant to be used by the "photo-z experts" in charge of the production tasks related to the Brazilian in-kind contribution to LSST. It should not be considered as a source of documentation or user guide. 

After each complete execution, this notebook must be exported and saved as HTML file to serve as an execution report for future provenance tracking. Additional process metadata and provenance info are available in the `provenance_info.yaml` file attached. 

### Table of contents 

1. Pre-processing: data preparation, photo-z training and validation 
2. Photo-z Compute 
3. Post-processing: analize results and performance  


Each one of these steps was carefuly explored in separate notebooks. This notebook contains only the final decisions regarding sample selection and configuration choices.   

--- 

Setup:

In [None]:
import os 

# 1. Pre-processing 

## 1.1 Create Skinny tables 

** Input data

The very first input data of this sequence is the original LSST Object catalog for DP0.2, stored in Lustre system at: 

`/lustre/t1/cl/lsst/dp02/primary/catalogs/object/` 

Filename pattern: `objectTable_tract_xxxx_DC2_2_2i_runs_DP0_2_v23_0_1_PREOPS-905_step3_x_2022xxxxTxxxxxxZ.parq`

**Column Selection** 

Skinny tables are a subset of the [LSST Object catalog](https://sdm-schemas.lsst.io/dp02.html#Object) that includes only the columns of interest for photo-z algorithms, with ready-to-use data, i.e.: fluxes converted into deredded magnitudes.  

Columns included in the skinny tables: 

| column name | data type |  description |
| ---         | ---       |  ---         |
| objectId	  | int  	  | Unique identifier | 
| coord_ra	  | float64	  | Fiducial ICRS Right Ascension of centroid (degrees)|
| coord_dec   |	float64	  | Fiducial ICRS Declination of centroid (degrees)| 
| detect_isPrimary	| boolean	| True if source has no children and is in the inner region of a coadd patch and is in the inner region of a coadd tract and is not a sky source | 
| mag_{u, g, r, i, z, y} | float64 | {u, g, r, i, z, y}-band magnitude converted from final cmodel fit flux measurements | 
| magerr_{u, g, r, i, z, y} | float64 | {u, g, r, i, z, y}-band magnitude errors converted from final cmodel fit flux error measurements | 


Data cleaning to reduce the number of rows in the catalog is recommended when using the pipeline to create value-added catalogs for science cases. It must not be applied on the production runs when creating official data products to be delivered as part of the in-kind contribution.    

In [None]:
os.system(

### Basic QA of skinny tables 

## 1.2 Create Training and Test Sets 


### Representative spectroscopic sample 

A true-z sample randomly selected from the DC2 simulation to mimic a representative spectroscopic sample regarding the color-magnitude-redshift space. 

Split the sample into two random subsets, with 70% of the galaxies designated for training and 30% for tests by adding an extra column `test`: 
* `test=0`: galaxies included in the **training** procedure
* `test=1`: galaxies included in the **test** procedure, mandatorily excluded from the training procedure 



#### Basic QA of the representative training set 

### Realistic spectroscopic sample (TBD)

A true-z sample arbitrarily selected from the DC2 simulation to mimic realistic spectroscopic sample regarding the color-magnitude-redshift space, based on current spectroscopic data available from the literature . 




#### Basic QA of the realistic training set 

## 1.3 Train the photo-z algorithm  

Train the photo-z algorithm with RAIL (`rail_inform`). Available options: BPZ, FlexZBoost, GPz, LePHARE,and  TPZ.  


## 1.4 Photo-z Validation    

### PZ estimates for the Test Set

Run `rail_estimate` module to produce the photo-z estimates (PDFs) for the Test Set. 

### PZ validation results

#### Metrics and plots 

Run `rail_evaluate` module to compute PDF metrics. 

#### PZ Validation conclusions 

Quality assessment, comparison with science requirements. 

# 2. Photo-z Compute 

## Submit pipeline to Apollo cluster 

## Real-time monitoring  

# 3. Post-processing

## Performance evaluation 

## PZ Estimates - QA of final results 