<img align="left" src = "images/linea.png" width=140 style="padding: 20px"> 
<img align="left" src = "images/rubin.png" width=180 style="padding: 30px"> 

# **Photo-z Server** <br> Tutorial Notebook


Contact author: [Julia Gschwend](mailto:julia@linea.org.br) <br>
Last verified run: **2024-Dec-23**

## Notebook contents 

* [The PZ Server website](#the-pz-server-website)
* [The pzserver Python library](#the-pzserver-python-library) 
* [Data product types](#data-product-types)
    * [Spec-z Catalog](#spec-z-catalog)
    * [Training Set](#training-set)
    * [Training Results](#training-results)
    * [Validation Results](#validation-results)
    * [PZ Tables](#pz-table) 
* [PZ Server Pipelines](#pz-server-pipelines)
    * [Combine Spec-z Catalogs](#combine-spec-z-catalogs)
    * [Training Set Maker](#training-set-maker)  

## The PZ Server website

### About the PZ Server 

The Photo-z (PZ) Server is an online service for the LSST Community to host and share lightweight PZ-related data products. The upload and download of data and metadata can be done at the website  [pz-server.linea.org.br](https://pz-server.linea.org.br/)$^{\dagger}$. There, you will find two pages containing a list of data products each: one for official data products distributed by Rubin Observatory's Data Management department and the other for user-generated data products.

The PZ Server is developed and delivered by LIneA as part of the in-kind contribution program BRA-LIN to the Rubin Observatory. The service is hosted in the Brazilian IDAC, with access authorized to the LSST Community through [Rubin Science Platform (RSP)](https://data.lsst.cloud/) login credentials. For more information about other contributions from BRA-LIN, please visit the [PZ Server's documentation page](https://linea-it.github.io/pz-lsst-inkind-doc/). 

$^{\dagger}$ During the development phase, a test environment is available at [pz-server-dev.linea.org.br](https://pz-server-dev.linea.org.br/).

### How to upload a data product on the PZ Server website

To upload a data product, click the button **NEW PRODUCT** on the top left of the [User-generated Data Products page](https://pz-server-dev.linea.org.br/user_products) 


<center>
    <img src="images/ScreenshotNewProductButton.png"> 
</center>

and fill in the Upload Form with relevant metadata. 

<center>
    <img src="images/ScreenshotUploadForm.png" > 
</center>



### How to download a data product from the PZ Server website

To download a data product available on the Photo-z Server, go to one of the two data products' pages. The **download** button is on the right side of each data product. Also, there are buttons to **share**, **remove**, and **edit** data products. 

<center>
    <img src="images/ScreenshotProductListButtons.png" width=150pt/> 
</center>


## The pzserver Python library 

The `pzserver` Python library is a convenient tool for accessing the PZ Server's data products and metadata programmatically from anywhere, including the Notebook Aspect of RSP. 

### Installation

The PZ Server Python library is avalialble on **pip** as  `pzserver`.

```
$ pip install pzserver 
```
 

In [None]:
! pip install pzserver

--- 
OBS: Depending on your Jupyter Lab version, you might need to restart the kernel to incorporate the new library. 

### Imports and Setup

In [None]:
from pzserver import PzServer 
import matplotlib.pyplot as plt
%reload_ext autoreload 
%autoreload 2

The connection with the PZ Server is made by an object of the class `PzServer`. To get authorization to define an instance of `PzServer`, the users must provide an **API Token** generated on the top right menu on the [PZ Server website](https://pz-server.linea.org.br/) . 

<img src="images/ScreenShotTokenMenu.png" width=150pt align="top"/> <img src="images/ScreenShotTokenGenerator.png" width=350pt />

In [None]:
# pz_server = PzServer(token="<your token>", host="pz-dev") 

For convenience, the token can be saved in a text file, e.g., **token.txt** (which is already listed in the .gitignore file in this repository). 

In [None]:
with open('token.txt', 'r') as file:
    token = file.read()
#pz_server = PzServer(token=token, host="pz")
pz_server = PzServer(token=token, host="pz-dev") # "pz-dev" is the temporary host during the test phase  

### How to get general info from PZ Server

The object `pz_server` created above can provide access to data and metadata stored in the PZ Server. It also brings additional methods for users to navigate through the available content. The methods with the prefix `get_` return the result of a query on the PZ Server database as a Python dictionary and are most useful to be used programmatically (see details on the [API documentation page](https://linea-it.github.io/pzserver/html/index.html)). Alternatively, those with the prefix `display_` show the results as a styled [_Pandas DataFrames_](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), optimized for Jupyter Notebook (note: column names might change in the display version). 

For instance,

display the list of product types supported with a short description, 

In [None]:
pz_server.display_product_types()

display the list of users who uploaded data products to the server, 

In [None]:
pz_server.display_users()

display the list of data releases available at the time, 

In [None]:
pz_server.display_releases()

and display all available data products. 

<font color='red'>WARNING: This list can rapidly grow during the survey's operation (cell output scrolling recommended)</a>

In [None]:
pz_server.display_products_list() 

The information about product type, users, and releases shown above can be used to filter the data products of interest for your search. For that, the method `list_products` receives as an argument a dictionary mapping the product's attributes to their values. 

In [None]:
pz_server.display_products_list(filters={"release": "LSST DP0.2", 
                                 "product_type": "Training Set"})

It also works if we type a string pattern that is part of the value. For instance, just "DP0" instead of "LSST DP0.2": 

In [None]:
pz_server.display_products_list(filters={"release": "DP0"})

It also allows the search for multiple strings by adding the suffix `__or` (two underscores + "or") to the search key. For instance, to get spec-z catalogs and training sets in the same search (notice that filtering is not case-sensitive):

In [None]:
pz_server.display_products_list(filters={"product_type__or": ["Spec-z Catalog", "training set"]})

To fetch the results of a search and attribute to a variable, just change the prefix `display_` by `get_`, like this:  

In [None]:
search_results = pz_server.get_products_list(filters={"product_type": "results"}) 
search_results

### How to upload a data product to via Python API (alternative method)  

As shown above, the default method to upload a data product to the PZ Server is the upload form on the PZ Server website. Alternatively, the `pzserver` Python library can send data products to the host service. 

First, prepare a dictionary with the relevant information about your data product: 

In [None]:
data_to_upload = {
    "name":"example upload via lib",
    "product_type": "specz_catalog",  # Product type 
    "release": None, # LSST release, use None if not LSST data 
    "main_file": "upload_example.csv", # full path 
    "auxiliary_files": ["upload_example.html", "upload_example.ipynb"] # full path
}

In [None]:
upload = pz_server.upload(**data_to_upload)  

In [None]:
product_id = upload.product_id
product_id

### How to display the metadata of a data product    

The metadata of a given data product is the information the user provides on the upload form. This information is attached to the data product contents and is available for consulting on the PZ Server page or using this Python API (`pzserver`). 

All data products stored on PZ Server are identified by a unique **id** number or a unique name, a _string_ called **internal_name**, which is created automatically at the moment of the upload by concatenating the product **id** to the name given by its owner (replacing blank spaces by "_", lowering cases, and removing special characters). 

The `PzServer`'s method `get_product_metadata()` returns a dictionary with the attibutes stored in the PZ Server about a given data product identified by its **id** or **internal_name**. For use in a Jupyter notebook, the equivalent `display_product_metadata()` shows the results in a formated table.

In [None]:
pz_server.display_product_metadata(product_id)

### How to download data products as .zip files   

To download any data product stored in the PZ Server, use the `PzServer`'s method `download_product` informing the product's **internal_name** and the path to where it will be saved (the default is the current folder). This method downloads a compressed .zip file, which contains all the files uploaded by the user, including data, ancillary files, and description files. Let's try it with a small data product. 

In [None]:
pz_server.download_product(product_id, save_in=".")

### How to share data products with other RSP users

All data products uploaded to the PZ Server are immediately available and visible to all PZ Server users (people with RSP credentials) through the PZ Server website or Python library. One way to share a data product is by providing the product's URL, which leads to the product's download page. The URL is composed by the PZ Server website address + **/products/** + **internal_name**:

https://pz-server.linea.org.br/product/ + **internal_name** 

or, if still in the development phase, 

https://pz-server-dev.linea.org.br/product/ + **internal_name**

<font color=red> WARNING:</font> The URL works only with the **complete internal name**, not with just the **id** number. 


For example:

In [None]:
url = f'https://pz-server-dev.linea.org.br/product/{product_id}_example_upload_via_lib'
url 

--- 
### How to retrieve contents of data products (work on memory)

Instead of downloading the files, the `pzserver` library also allows users to retrieve the contents of a given data product to work on memory using the method `get_product()`. This feature is available only for tabular data (product types: **Spec-z Catalog**, **Training Set**, and **Photo-z Table**). 

By default, the method `get_product` returns an object from a particular class, depending on the product's type. The classes `SpeczCatalog` and `TrainingSet` are simple extensions of `pandas.DataFrame` (via class composition) with a couple of additional attributes and methods, such as the attribute `metadata`, and the method `display_metadata()`. Let's see an example: 

In [None]:
catalog = pz_server.get_product(product_id)
catalog

In [None]:
catalog.display_metadata()

The tabular data is allocated in the attribute `data`, a `pandas.DataFrame`. 

In [None]:
type(catalog.data)

In [None]:
catalog.data

It preserves the useful methods from `pandas.DataFrame`, such as:  

In [None]:
catalog.data.info()

In [None]:
catalog.data.describe()

For those who prefer working with `astropy.Table` or pure `pandas.DataFrame`, the method `get_product()` gives the flexibility to choose the output format (`fmt="pandas"` or `fmt="astropy"`).     

In [None]:
dataframe = pz_server.get_product(product_id, fmt="pandas")
print(type(dataframe))
dataframe

In [None]:
table = pz_server.get_product(product_id, fmt="astropy")
print(type(table))
table

---

Next, let's explore specific features for each product type...  

## Data Products

### Spec-z Catalog 

In the context of the PZ Server, Spec-z Catalogs are defined as any catalog containing spherical equatorial coordinates and spectroscopic redshift measurements (or, analogously, true redshifts from simulations). A Spec-z Catalog can include data from a single spectroscopic survey or a combination of data from several sources. To be considered a single Spec-z Catalog, the data should be provided as a single file to PZ Server's upload tool. Adding the survey name or identification as an extra column is recommended for multi-survey catalogs. 


Mandatory columns: 
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Spectroscopic or true redshift - `float`

Recommended columns: 
* Spectroscopic redshift error - `float`
* Quality flag - `integer`, `float`, or `string`
* Survey name (recommended for compilations of data from different surveys)

Let's see an example of Spec-z Catalog: 

In [None]:
gama = pz_server.get_product(14)

In [None]:
gama.display_metadata()

Display basic statistics

In [None]:
gama.data.describe()

The attribute `data`, which is a `DataFrame` preserves the `plot` method from Pandas.   

In [None]:
gama.data.plot(x="RA", y="DEC", kind="scatter")  

In [None]:
gama.data.hist('Z')

### Training Sets 
    
In the context of the PZ Server, Training Sets are defined as the product of the spatial cross-matching between a given Spec-z Catalog (single survey or compilation) and the photometric data, in this case, the LSST Objects Catalog. The PZ Server's *Training Set Maker* pipeline allows users to build customized Training Sets based on the available Spec-z Catalogs (details below).    

_Note 1: Training sets are commonly split into two or more subsets for photo-z validation purposes. If the Training Set owner has previously defined which objects should belong to each subset (training and validation/test sets), this information must be available as an extra column in the table or as clear instructions for reproducing the subset separation in the data product description._

  
_Note 2: The PZ Server only supports catalog-level Training Sets. Image-based Training Sets, e.g., for deep-learning algorithms, are not supported._


Mandatory column: 
* Spectroscopic (or true) redshift - `float`

Other expected columns
* Object ID from LSST Objects Catalog - `integer`
* Observables: magnitudes (and/or colors, or fluxes) from LSST Objects Catalog - `float`
* Observable errors: magnitude errors (and/or color errors, or flux errors) from LSST Objects Catalog - `float`
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Quality Flag - `integer`, `float`, or `string`
* Subset Flag - `integer`, `float`, or `string`


For example, the training set created in [RAIL's Goldenspike example notebook](https://github.com/LSSTDESC/rail/blob/main/examples/goldenspike_examples/goldenspike.ipynb): 

In [None]:
train_goldenspike = pz_server.get_product(9)

In [None]:
train_goldenspike.display_metadata()

Display basic statistics

In [None]:
train_goldenspike.data.describe()

In [None]:
train_goldenspike.data.hist('redshift', bins=20)

In [None]:
train_goldenspike.data.hist('mag_i_lsst', bins=20)

### Training Results

The training results of machine learning-based PZ algorithms can also be hosted in the PZ Server to be shared and reused. This product type allows files in free format. When the training results are generated with RAIL, they are stored as *pickle* files and can be downloaded to the local work directory. 

OBS: The method `download_product` always brings the data as a compressed (.zip) file, regardless of the number of auxiliary files attached to the data. 

In [None]:
pz_server.download_product("197_goldenspike_flexzboost", save_in=".") 

### Validation Results

The PZ Server is also a good place to safely store the results of a photo-z validation procedure. Users can upload a list of files in free format, such as tabular files with photo-z estimates (single estimates and/or PDFs) of a validation set, auxiliary files with photo-z validation metrics, validation plots, etc. 

In [None]:
pz_server.download_product("11_goldenspike_flexzboost", save_in=".") 

### Photo-z Tables 

Photo-z tables are the results of a photo-z estimation procedure. If the data is larger than the file upload limit of 200MB (for instance, the PZ tables for the LSST Object catalogs delivered as part of annual data releases), the product entry stores only the metadata (and instructions on accessing the data should be provided in the description field).

# PZ Server Pipelines (under development) 

Spec-z Catalogs and Training Sets can be created using the cross-matching pipelines available on the PZ Server. Any catalog built by the pipeline is automatically registered as a regular user-generated data product and is the same as the uploaded ones. 


--- 

## User feedback 

Is something important missing? [Click here to open an issue in the PZ Server library repository on GitHub](https://github.com/linea-it/pzserver/issues/new). 