<img align="left" src = https://www.linea.org.br/wp-content/themes/LIneA/imagens/logo-header.png width=100 style="padding: 30px"> 
<img align="left" src = https://cdn2.webdamdb.com/1280_c3PXjCZbPM23.png width=180> <!-- style="padding: 20px"--> 

# Photo-z Server - Tutorial Notebook

**Contact author**: Julia Gschwend ([julia@linea.org.br](mailto:julia@linea.org.br)) 

**Last verified run**: 2022-11-07 <br>



## 0. Introduction
The Photo-z (PZ) Server is an online service available for the LSST Community to host and share lightweight photo-z related data products. The upload and download of data and metadata can be done at the website https://pz-server.linea.org.br/ (during the test phase, the test environment is available at https://pz-server-dev.linea.org.br/). There, you will find two separate pages containing a list of data products each: one for LSST Data Management's oficial data products, and other for user-generated data products. **The registered data products can also be accessed directly from Python code using the PZ Server's data access API, as demonstrated below.**

The PZ Server is developed and delivered as part of the in-kind contribution program BRA-LIN, from LIneA to the Rubin Observatory's LSST. The service is hosted in the Brazilian IDAC, not directly connected to the [Rubin Science Platform (RSP)](https://data.lsst.cloud/). However, it requires RSP credentials for user's authentication. 

For a comprehensive documentation about the PZ Server, please visit the [PZ Server's documentation page](https://linea-it.github.io/pz-lsst-inkind-doc/). There, you will find also an overview of all LIneA's contributions related to Photo-zs. The internal documentation of the API functions is available on the [API's documentation page](https://linea-it.github.io/pz-server-lib/html/index.html).  


### Installation

#### Via pip

The PZ Server API is avalialble on **pip** as  `pz-server-lib`. To install the API and its dependencies, type, on the Terminal:  

```shell
$ pip install pz-server-lib 
``` 

#### Via setup.py 

Alternatively, if you have cloned the repository with:

```shell
$ git clone https://github.com/linea-it/pz-server-lib.git  
``` 

To install the API and its dependencies, type:

```shell
$ python setup.py install
```


OBS: You might need to restart the kernel on the notebook to incorporate the new library. 


### Imports and Setup

In [None]:
from pz_server import PzServer, pz_plots
%reload_ext autoreload 
%autoreload 2

The connection with the PZ Server from Python code is done by an object of the class `PzServer`. To get authorization to define an instance of `PzServer`, the users must provide an **API Token** generated on the top right menu on the [PZ Server website](https://pz-server.linea.org.br/). 

<img src="../images/ScreenShotTokenMenu.png" width=150pt align="top"/> <img src="../images/ScreenShotTokenGenerator.png" width=300pt/>

In [None]:
pz_server = PzServer(token="<paste your API Token here>", host="pz-dev") # temporary host for test phase  

### Get basic info from PZ Server

Besides providing access to data and metadata, the object `pz_server` also brings useful functions for users to navigate through the available contents. The functions with the preffix "get_" return the result of a query on the PZ Server database as a Python dictionary, and are most useful to be used programatically out of the Jupyter environment (see the API documentatio here). Those with the preffix "display_" just show the results as a styled [_Pandas DataFrames_](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html). For instance:

Display the list of product types supported with a short description;

In [None]:
pz_server.display_product_types()

Display the list of users who uploaded data products to the server;

In [None]:
pz_server.display_users()

Display the list of data releases available at the time; 

In [None]:
pz_server.display_releases()

---
Display all data products available (WARNING: this list can rapdly grow during the survey's operation). 

In [None]:
pz_server.display_products_list() 

The information about product type, users, and releases shown above can be used to filter the data products of interest for your search. For that, the function `list_products` receives as argument a dictionary mapping the products attributes to their values. 

In [None]:
pz_server.display_products_list(filters={"release": "LSST DP0", 
                                 "product_type": "Spec-z Catalog",
                                 "uploaded_by": "Gschwend"})

It also works if we type a string pattern that is part of the value. For instance, just "DP0" instead of "LSST DP0": 

In [None]:
pz_server.display_products_list(filters={"release": "DP0"})

It also allows the search for multiple strings by adding the suffix `__or` (two underscores + "or") to the search key. For instance, to get spec-z catalogs and training sets in the same search:

In [None]:
pz_server.display_products_list(filters={"product_type__or": ["Spec-z Catalog", "training set"]}) # notice that filtering is not case sensitive 

To fetch the results of a search and attribute to a variable, just change the preffix "display_" by "get_", like this:  

In [None]:
search_results = pz_server.get_products_list(filters={"product_type": "results"}) # PZ Validation results
search_results



---

Next, let's see examples of usage of the functions that starts with the preffixes "get_" and "download_" to retrie data (or metadata),  organized by product types. 



## 1. Spec-z Catalog 

In the context of the PZ Server, Spec-z Catalogs are defined as any catalog containing spherical equatorial coordinates and spectroscopic redshift measurements (or, analogously, true redshifts from simulations). A Spec-z Catalog can include data from a single spectroscopic survey or a combination of data from several sources and should be provided as a single file with tabular data to PZ Server's the upload tool. 

Mandatory columns: 
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Spectroscopic or true redshift - `float`

Recommended columns: 
* Spectroscopic redshift error - `float`
* Quality flag - `integer`, `float`, or `string`
* Survey name (recommended for compilations of data from different surveys)




#### List Spec-z Catalogs available on PZ Server

In [None]:
pz_server.display_products_list(filters={"product_type": "Spec-z Catalog"})

#### Get metadata of a given Spec-z Catalog 

The metadata of a given data product is the information provided by the user on the upload form. This information is attached to the data product contents and is available for consulting on the PZ Server page, or this way, using the Python API. 

All data products stored on PZ Server are identified by a unique number, the product **id** number, or by a _string_ called **internal_name**, which is created automatically by concatenating the product **id** to the product name given by its owner and removing blank spaces, upper cases, and special charecters. 

The function `get_product_metadata()` returns (as a dictionary) the attibutes stored in the PZ Server about a given data product identified by its **id** number or **internal_name**.

In [None]:
# metadata_specz_catalog = pz_server.get_product_metadata(<id number or internal name, str or int>) 
# metadata_specz_catalog = pz_server.get_product_metadata(6) 
# metadata_specz_catalog = pz_server.get_product_metadata("6")  
metadata_specz_catalog = pz_server.get_product_metadata("6_true_redshifts") 
metadata_specz_catalog

As the previous "get_" functions, there is a correspondent "display_" function for the product metadata: 

In [None]:
pz_server.display_product_metadata("6_true_redshifts") 

#### Retrieve a given Spec-z Catalog 

The function `get_product()` returns the Spec-z Catalog of interest as a _Pandas DataFrame_.

In [None]:
#specz_catalog = pz_server.get_product(<product id number>)
specz_catalog = pz_server.get_product(6)
specz_catalog

Display basic statistics

In [None]:
specz_catalog.describe()

The `pz_server` library brings a couple of very basic in-built plot functions for a quick visualization of catalog properties. 

In [None]:
pz_plots.specz_plots(specz_catalog)

## 2. Training Sets 

In the context of the PZ Server, Training Sets are defined as the product of matching (spatially) a given Spec-z Catalog (single survey or compilation) to the photometric data, in this case, the LSST Objects Catalog. The PZ Server API offers a tool called _Training Set Maker_ for users to build customized Training Sets based on the Spec-z Catalogs available. Please see the companion Jupyter Notebook `pz_tsm_tutorial.ipynb` for details.   


_Note 1: Commonly the training set is split into two or more subsets for photo-z validation purposes. If the Training Set owner has previously defined which objects should belong to each subset (trainining and validation/test sets), this information must be available as an extra column in the table or as clear instructions for reproducing the subsets separation in the data product description._

  
_Note 2: The PZ Server only supports catalog-level Training Sets. Image-based Training Sets, e.g., for deep-learning algorithms, are not supported yet._


Mandatory column: 
* Spectroscopic (or true) redshift - `float`

Other expected columns
* Object ID from LSST Objects Catalog - `integer`
* Observables: magnitudes (and/or colors, or fluxes) from LSST Objects Catalog - `float`
* Observable errors: magnitude errors (and/or color errors, or flux errors) from LSST Objects Catalog - `float`
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Quality Flag - `integer`, `float`, or `string`
* Subset Flag - `integer`, `float`, or `string`



#### List Training Sets available on PZ Server

In [None]:
pz_server.display_products_list(filters={"product_type": "Training Set"})

#### Display metadata of a given Training Set

In [None]:
metadata_train_set = pz_server.get_product_metadata(9)
pz_server.display_product_metadata("14_goldenspike_train") 

#### Retrieve a given Training Set 

In [None]:
train_set = pz_server.get_product(9)
train_set

Display basic statistics

In [None]:
train_set.describe()

Use the function `train_set_plots` for a quick visualization of training sets properties: 

In [None]:
pz_plots.train_set_plots(train_set, mag_name="mag_i", redshift_name="z_true")

## 3. Photo-z Validation Results

Validation Results are the outputs of any photo-z algorithm applied on a Validation Set. The format and number of files of this data product are strongly dependent on the algorithm used to create it, so there are no constraints on these two parameters. In the case of multiple files, for instance, if the user includes the results of training procedures (e.g., neural nets weights, decision trees files, or any machine learning by-product) or additional files (SED templates, filter transmission curves, theoretical magnitudes grid, Bayesian priors, etc.), it will be required to put all files together in a single compressed file (.zip or .tar, or .tar.gz) before uploading it to the Photo-z Server.   

#### List Validation Results available on PZ Server

In [None]:
pz_server.display_products_list(filters={"product_type": "Validation Results"})

#### Display metadata of a given data product of Photo-z Validation Results

In [None]:
pz_server.display_product_metadata("21_pz_validation_goldenspike_fzboost")

#### Retrieve a given Photo-z Validation Results: download file

This product type is not necessarily (only) tabular data and can be a list of files. The function `get_product` shown above just return the data to be used on memory and only supports single tabular files. To retrieve Photo-z Validation Results, you must download the data to open locally. 

In [None]:
pz_server.download_product(21, save_in=".")

## 4. Photo-z Tables 

The Photo-z Tables are the results of photo-z estimation on photometrics samples. The size limit for uploading files on the PZ Server is 200MB, therefore it does not support large Photo-z Tables such as the photo-zs of the LSST Objects catalog. Instead, the PZ Server can host only Photo-z Tables of small data sets. 

In [None]:
#pz_server.download_product(<id number or internal name>)