<img align="left" src = https://linea.org.br/wp-content/themes/LIneA/imagens/logo-header.jpg width=100 style="padding: 20px"> 


<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=160 style="padding: 20px">  

# Photo-z Server - Tutorial Notebook

**Contact author**: Julia Gschwend ([julia@linea.org.br](mailto:julia@linea.org.br)) 

**Last verified run**: 2022-10-06 <br>



## 0. Introduction
The Photo-z (PZ) Server is an online service available for the LSST Community to host and share lightweight photo-z related data products. The upload and download of data and metadata can be done at the website https://pz-server.linea.org.br/. There, you will find two separate pages containing a list of data products each: one for LSST Data Management's oficial data products, and other for user-generated data products. **The registered data products can also be accessed directly from Python code using the PZ Server's data access API, as demonstrated below.**

The PZ Server is developed and delivered as part of the in-kind contribution program BRA-LIN, from LIneA to the Rubin Observatory's LSST. The service is hosted in the Brazilian IDAC, not directly connected to the [Rubin Science Platform (RSP)](https://data.lsst.cloud/). However, it requires RSP credentials for user's authentication. 

For a comprehensive documentation about the PZ Server, please visit the [PZ Server's documentation page](https://linea-it.github.io/pz-lsst-inkind-doc/). There, you will find also an overview of all LIneA's contributions related to Photo-zs.


### Installation

The PZ Server API is avalialble on **pip** as  `pz-server-lib`. To install the API and its dependencies, type:  

<font style="background-color:black; color:white;" face="Courier New"> $ pip install pz-server-lib </font>  

on your Terminal. 


### Imports and Setup

In [None]:
from pz_server import PzServer, pz_plots
import pandas as pd
%reload_ext autoreload 
%autoreload 2

The connection with the PZ Server from Python code is done by an object of the class `PzServer`. To get authorization to define an instance of `PzServer`, the users must provide an **API Token** generated on the top right menu on the [PZ Server website](https://pz-server-dev.linea.org.br/). 
<font color=red> to do: check link URL </font>

<img src="./images/ScreenShotTokenMenu.png" width=150pt align="top"/> <img src="./images/ScreenShotTokenGenerator.png" width=300pt/>

In [None]:
pz_server = PzServer(token="<paste your API Token here>") 

In [None]:
pz_server = PzServer(token="80cb575a78740f5ab7f03524607e0d67e9e2dd4c", host="pz-dev") 

### Get basic info from PZ Server

The object `pz_server` provides useful functions for users to navigate through the available contents. All the functions return tables stored in _[Pandas DataFrames](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)_ for the sake of having an easy-to-read display mode. For instance:

Display the list of product types supported with a short description;

In [None]:
pz_server.list_product_types()      

Display the list of users who uploaded data products to the server;

In [None]:
pz_server.list_users()

Display the list of data releases available at the time; 

In [None]:
pz_server.list_releases()

---
Display all data products available (WARNING: this list can rapdly grow during the survey's operation). 

In [None]:
pz_server.list_products() 

The information about product type, users, and releases shown above can be used to filter the data products of interest for your search. For that, the function `list_products` receives as argument a dictionary mapping the products attributes to their values. 

In [None]:
pz_server.list_products(filters={"release": "LSST DP0", 
                                 "product_type": "Spec-z Catalog",
                                 "uploaded_by": "Gschwend"})

It also works if we type a string pattern that is part of the value. For instance, just "DP0" instead of "LSST DP0": 

In [None]:
pz_server.list_products(filters={"release": "DP0"})

It also allows the search for multiple strings by adding the suffix `__or` (two underscores + "or") to the searh key. For instance, to get training and validation sets in the same search:

In [None]:
pz_server.list_products(filters={"product_type__or": ["train", "valid"]})

The functions that starts with the preffix `list_` shown above are supposed to be used on Jupyter Notebooks (or similar). They only display lists and do not return any value. The funtions that starts with the preffix `get_` return data (or metadata) retrieved from the PZ Server, so they can attribute the results to a variable.    

---

In the cells below, let's see examples of usage organized by product types. 

## 1. Spec-z Catalog 

In the context of the PZ Server, Spec-z Catalogs are defined as any catalog containing spherical equatorial coordinates and spectroscopic redshift measurements (or, analogously, true redshifts from simulations). A Spec-z Catalog can include data from a single spectroscopic survey or a combination of data from several sources and should be provided as a single file to PZ Server's the upload tool. 

Mandatory columns: 
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Spectroscopic or true redshift - `float`

Recommended columns: 
* Spectroscopic redshift error - `float`
* Quality flag - `integer`, `float`, or `string`
* Survey name (recommended for compilations of data from different surveys)




#### List Spec-z Catalogs available on PZ Server

In [None]:
pz_server.list_products(filters={"product_type": "Spec-z Catalog"})

#### Get metadata of a given Spec-z Catalog 

The metadata of a given data product is the information provided by the user on the upload form. This information is attached to the data product contents and is available for consulting on the website or this way, using the Python API. 

All data products stored on PZ Server are identified by a unique number, the product **id**. This number is the only information required to access the data or the corresponding metadata. 

The function `get_product_metadata(<id>)` displays (optional) and returns (as a dictionary) the attibutes stored in the PZ Server about a given data product identified by its **id** number.

In [None]:
#metadata_specz_catalog = pz_server.get_product_metadata(<product id number>) 
metadata_specz_catalog = pz_server.get_product_metadata(6) 

In [None]:
metadata_specz_catalog

#### Retrieve a given Spec-z Catalog 

The function `get_product()` returns the Spec-z Catalog of interest as a _Pandas DataFrame_.

In [None]:
#specz_catalog = pz_server.get_product(<product id number>)
specz_catalog = pz_server.get_product(6)
specz_catalog

Display basic statistics

In [None]:
specz_catalog.describe()

Quick visualization of spec-z catalog properties

In [None]:
pz_plots.specz_plots(specz_catalog)

By default, the function `get_product` just return the data to be used on memory. To store the results in a file, inform the file name to be saved in the **save_file_as** argument. The file formats supported are CSV (default, in case of no suffix), plus those supported by the Python library [**tables_io**](https://github.com/LSSTDESC/tables_io), i.e., those with suffixes: 'fits', 'hf5', 'hdf5', 'fit', 'h5', and 'pq'.   

In [None]:
specz_catalog = pz_server.get_product(6, save_file_as="example_specz_cat.csv")

## Training and Validation Sets 
#### List Training and Validation Sets available on Pz Server

Training and Validation (or Test) Sets are the product of matching (spatially) a given Spec-z Catalog (single survey or compilation) to the photometric data, in this case, the LSST Objects Catalog. In fact, in most cases, the Training and Validation Sets are just the result of splitting the product of matching into two parts. Hence, the Training and Validation Sets are usually found together. In the case of simulations, the Training and Validation Sets can be just a selection from the simulated catalog that contains the true redshifts and the photometric data required to train and validate the photo-z algorithms.

In the Photo-z Server, there is no dependency between these two products. Users can upload Training and Validation Sets separately, even though they are very similar in format and contents. For each pair of Training and Validation Sets, the user will perform two uploads and, consequently, two new entries will be added to the database. 

_Note 1: There is an ambiguity between the so-called Validation and Test sets found in the literature. In some cases, it is just a matter of terminology, and both play the same role: be used for computing the photo-z metrics as an independent sample from that used for training. In other cases, when the training procedure has a recursive optimization method, the three sets of Training/Validation/Test are distinct, and each one plays a different role. In the context of the PZ Server, there is no distiction between Validation or Test sets. The users are responsible for giving information on how to interpret the subsets on the description field._


  
_Note 2: The Training and Validation Sets supported by the PZ Server are only those used by algorithms that work on the catalog level. Training and Validation Sets for image-based methods, such as image-based deep-learning algorithms are not supported._


Mandatory column: 
* Spectroscopic (or true) redshift - `float`

Other expected columns
* Object ID from LSST Objects Catalog - `integer`
* Observables: magnitudes (and/or colors, or fluxes) from LSST Objects Catalog - `float`
* Observable errors: magnitude errors (and/or color errors, or flux errors) from LSST Objects Catalog - `float`
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Quality Flag - `integer`, `float`, or `string`




#### List Training and Validation Sets available on PZ Server

In [None]:
pz_server.list_products(filters={"product type": "Training Set"})

In [None]:
pz_server.list_products(filters={"product type": "Validation Set"})

#### Get metadata of a given Training Set

In [None]:
metadata_train_set = pz_server.get_product_metadata(9)
metadata_train_set 

#### Retrieve a given Training Set 

In [None]:
train_set = pz_server.get_product(9)
train_set

Display basic statistics

In [None]:
train_set.describe()

Quick visualization of training/validation sets properties. The function `train_valid_plots` can receive a single input: 

In [None]:
pz_plots.train_valid_plots(train=train_set) 

Or separate training and validation samples for comparison: 

In [None]:
valid_set = pz_server.get_product(10)

In [None]:
pz_plots.train_valid_plots(train=train_set, valid=valid_set) 

## Photo-z Validation Results

Validation Results are the outputs of any photo-z algorithm applied on a Validation Set. The format and number of files of this data product are strongly dependent on the algorithm used to create it, so there are no constraints on these two parameters. In the case of multiple files, for instance, if the user includes the results of training procedures (e.g., neural nets weights, decision trees files, or any machine learning by-product) or additional files (SED templates, filter transmission curves, theoretical magnitudes grid, Bayesian priors, etc.), it will be required to put all files together in a single compressed file (.zip or .tar) before uploading it to the Photo-z Server.   

#### List Validation Results available on PZ Server

In [None]:
pz_server.list_products(filters={"product type": "Validation Results"})

#### Get metadata of a given data product of Photo-z Validation Results

In [None]:
metadata_valid_results = pz_server.get_product_metadata(99)
metadata_valid_results

#### Retrieve a given Photo-z Validation Result: download .tar file

This product type is not necessarily (only) tabular data and can be a list of files, so the function `get_product` will not return an Astropy table. Instead, it will return the name of the tar file downloaded to the local directory. 

In [None]:
pz_result = pz_server.get_product(product_id="0006", save_file="True")
pz_result # string tar file name

#### Basic Photo-z Validation Plots

If the photo-z results were originated from a run of [RAIL](https://github.com/LSSTDESC/RAIL) Estimation module, we can use directly the plots fuctions from RAIL Evaluation module.


In [None]:
#  open .tar file 

In [None]:
from RAIL.examples.evaluation.utils import *
from RAIL.rail.evaluation.metrics.pit import *
from RAIL.rail.evaluation.metrics.cdeloss import *

In [None]:
my_path = 'xxx/xxx/xx' 
pdfs_file =  os.path.join(my_path, "pdfs_FZBoost.hdf5")
ztrue_file =  os.path.join(my_path, "ztrue_validation_set.hdf5")
pdfs, zgrid, ztrue, photoz_mode = read_pz_output(pdfs_file, ztrue_file) # all numpy arrays

Plot PIT-QQ

In [None]:
plot_pit_qq(pdfs, zgrid, ztrue, title="PIT-QQ - toy data", code="FZBoost",
                pit_out_rate=pit_out_rate, savefig=False)

PZ Validation metrics table

In [None]:
summary = Summary(pdfs, zgrid, ztrue)
summary.markdown_metrics_table(pitobj=pitobj) # pitobj as optional input to speed-up metrics evaluation