<img align="left" src = "images/linea.png" width=140 style="padding: 20px"> 
<img align="left" src = "images/rubin.png" width=180 style="padding: 30px"> 

<font size=5> **Photo-z Server** Tutorial Notebook
 </font>

Contact author: [Julia Gschwend](mailto:julia@linea.org.br) <br>
Contributors: Luigi Silva, Cristiano Singulani <br> 
Last verified run: **2025-Jul-11**

# Introduction

The Photo-z (PZ) Server is an online service for the LSST Community to create, host and share lightweight PZ-related data products. The PZ Server is developed and maintained by LIneA as part of the in-kind contribution program (BRA-LIN) to the Rubin Observatory. The service is hosted in the Brazilian IDAC, with access restricted to the LSST Community. The access authorization is granted through [Rubin Science Platform (RSP)](https://data.lsst.cloud/) login credentials. For more information about the PZ Server and pther contribuitions related to photometric redshifts, please visit the [BRA-LIN's description page](https://linea-it.github.io/pz-lsst-inkind-doc/). 

The PZ Server has two main user interfaces: the website and the API, accessed via the `pzserver` Python library. 

This notebook contains instructuions for new users on how to use the `pzserver` Python library, with examples for all functions and methods available. The documentation on how to use the website is available on [LIneA's Documentation for Users webpage](https://docs.linea.org.br/en/sci-platforms/pz_server.html).     

<font size=5>Notebook contents </font>

- [Introduction](#introduction)
- [Getting Started](#getting-started) 
   - [Installation](#installation)
   - [The PzServer class](#the-pzserver-class)
   - [Basic methods: query general info](#basic-methods-query-general-info)
   - [Data products: access data and display metadata](#data-products-access-data-and-display-metadata)
   - [Sharing data products](#sharing-data-products) 
- [Data product types](#data-products)
    - [Reference Redshift Catalog](#reference-redshift-catalog)
    - [Training Set](#training-set)
    - [Training Results](#training-results)
    - [Validation Results](#validation-results)
    - [Photo-z Estimates](#photo-z-estimates)
- [Advanced methods](#advanced-methods)
    - [PZ Server Pipelines](#pz-server-pipelines)
        -  [Combine Redshift Catalogs](#combine-redshift-catalogs)
        -  [Training Set Maker](#training-set-maker)
    - [Upload data products via pzserver lib](#upload-data-products-via-pzserver-lib)
    - [Update data products via pzserver lib](#update-data-products-via-pzserver-lib)
- [User's feedback](#users-feedback)

# Getting Started

## Installation

The PZ Server's Python library is avalialble on **pip** as `pzserver`.

```
$ pip install pzserver 
```
OBS 1: Depending on your Jupyter Notebook/Lab version, you might need to restart the kernel to incorporate the new library.

OBS 2: If you are installing it on RSP Notebook Aspect on top of the LSST kernel, you might get some warnings regarding dependency versions. They must not affect the library usage. If you have any issues, please contact the [PZ Server team](mailto:julia@linea.org.br).   

In [None]:
! pip install pzserver 

Imports and Setup

In [None]:
from pzserver import PzServer 
#import matplotlib.pyplot as plt
#%reload_ext autoreload 
#%autoreload 2

## The PzServer class 

The `PzServer` class object opens the connection with the PZ Server database and allows access to data and metadata. To create a `PzServer` object, users must be authorized by using an API Token which is generated in the menu at the top right corner of the [PZ Server website](https://pzserver.linea.org.br/).  

<img src="images/ScreenShotTokenMenu.png" width=150pt align="top"/> <img src="images/ScreenShotTokenGenerator.png" width=350pt />

Uncomment the next cell and paste the API Token, replacing the placeholder below: 

In [None]:
# pz_server = PzServer(token="<your token here>") 

API tokens can be reused indefinitely. However, an old token automatically expires whenever you create a new one. 

For convenience, the API token can be saved in a text file, e.g., **token.txt** (already listed in the .gitignore file in this repository). 

<font color=red> API tokens MUST NOT BE SHARED! Users are responsible for keeping their tokens private. </font> 

In [None]:
# with open('token.txt', 'r') as file:
#    token = file.read()
# pz_server = PzServer(token=token)

## Basic methods: query general info

The object `pz_server` created above can provide access to data and metadata stored in the PZ Server. It also brings additional methods for users to navigate through the available content. The methods with the prefix `get_` return the result of a query on the PZ Server database as a Python dictionary and are most useful to be used programmatically (see details on the [API documentation page](https://linea-it.github.io/pzserver/html/index.html)). Alternatively, those with the prefix `display_` show the results as a styled [_Pandas DataFrames_](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), optimized for Jupyter Notebook (note: column names might change in the display version). 

For instance:

display the list of product types supported with a short description, 

In [None]:
pz_server.display_product_types()

display the list of data releases available at the time, 

In [None]:
pz_server.display_releases()

and display all available data products. 

<font color='green'>WARNING: This list can rapidly grow during the survey's operation (cell output scrolling recommended)</a>

In [None]:
pz_server.display_products_list() 

The information about product type, users, and releases shown above can be used to filter the data products of interest for your search. For that, the method `display_products_list` receives as an argument a dictionary mapping the product's attributes to their values. 

In [None]:
pz_server.display_products_list(filters={"release": "DP0.2", 
                                         "product_type": "Training Set"})

It also works if we type a string pattern that is part of the value.

In [None]:
pz_server.display_products_list(filters={"product_type": "estimates"})

To fetch the results of a search and attribute to a variable, just change the prefix `display_` by `get_`:  

In [None]:
search_results = pz_server.get_products_list(filters={"product_type": "training results"}) 
search_results

## Data products: access data and display metadata

<font size=4>**product_id** and **internal_name**</font>

All data products stored on PZ Server are identified by its unique **product_id** number or its **internal_name**, which is created automatically at the moment of the upload by concatenating the **product_id** to the name given by its owner (replacing blank spaces by "_", lowering cases, and removing special characters) (e.g.: `30_simple_training_set`). 

<font size=4>Display the metadata of a data product</font>

The metadata of a given data product is all the information available about it, including what the user provided on the upload form. 

The `PzServer`'s method `get_product_metadata()` returns a dictionary with the attibutes stored in the PZ Server about a given data product identified by its **id** or **internal_name**. For use in a Jupyter notebook, the equivalent `display_product_metadata()` shows the results in a formated table.

In [None]:
product_id = 30
pz_server.display_product_metadata(product_id)

<font size=4>Download data products as .zip files*</font>

To download any data product stored in the PZ Server, use the method `download_product` informing the **product_id** or **internal_name** and the path to where it will be saved (the default is the current folder). This method downloads a compressed .zip file, which contains all the files uploaded by the user, including data, auxiliary files, and description files. Let's try it with a small data product. 

In [None]:
pz_server.download_product(product_id, save_in=".")

<font size=4>Retrieve contents of data products (work on memory)</font> 


########################################################333

PAREI AQUI

#############################################################



Instead of downloading the files, the `pzserver` library also allows users to retrieve the contents of a given data product to work on memory using the method `get_product()`. This feature is available only for tabular data, such as redshift catalogs and training sets.

By default, the method `get_product` returns an object from a particular class, depending on the product's type. The classes `SpeczCatalog` and `TrainingSet` are simple extensions of `pandas.DataFrame` (via class composition) with a couple of additional attributes and methods, such as the attribute `metadata`, and the method `display_metadata()`. Let's see an example: 

In [None]:
catalog = pz_server.get_product(2)
catalog

In [None]:
catalog.display_metadata()

The tabular data is allocated in the attribute `data`, a `pandas.DataFrame`. 

In [None]:
type(catalog.data)

In [None]:
catalog.data

It preserves the useful methods from `pandas.DataFrame`, such as:  

In [None]:
catalog.data.info()

In [None]:
catalog.data.describe()

For those who prefer working with `astropy.Table` or pure `pandas.DataFrame`, the method `get_product()` gives the flexibility to choose the output format (`fmt="pandas"` or `fmt="astropy"`).     

In [None]:
dataframe = pz_server.get_product(product_id, fmt="pandas")
print(type(dataframe))
dataframe

In [None]:
table = pz_server.get_product(product_id, fmt="astropy")
print(type(table))
table

## Sharing data products

All data products uploaded to the PZ Server are immediately available and visible to all PZ Server users (people with RSP credentials) through the PZ Server website or Python library. One way to share a data product is by providing the product's URL, which leads to the product's download page. The URL is composed by the PZ Server website address + **/products/** + **internal_name**:

https://pzserver.linea.org.br/product/ + **id**

or 

https://pzserver.linea.org.br/product/ + **internal_name** 

<font color=green> WARNING: if still in the development phase, the URL works only with the **complete internal name**: </font> 

https://pzserver<font color=red>-dev</font>.linea.org.br/product/ + **internal_name**


For example, for the data just uploaded above:

In [None]:
internal_name = pz_server.get_product_metadata(product_id)['internal_name']
url = f'https://pzserver-dev.linea.org.br/product/{internal_name}'
url

---

Next, let's explore specific features for each product type...  

# Data Products

## Reference Redshift Catalog 

In the context of the PZ Server, Spec-z Catalogs are defined as any catalog containing spherical equatorial coordinates and spectroscopic redshift measurements (or, analogously, true redshifts from simulations). A Spec-z Catalog can include data from a single spectroscopic survey or a combination of data from several sources. To be considered a single Spec-z Catalog, the data should be provided as a single file to PZ Server's upload tool. Adding the survey name or identification as an extra column is recommended for multi-survey catalogs. 


Mandatory columns: 
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Spectroscopic or true redshift - `float`

Recommended columns: 
* Spectroscopic redshift error - `float`
* Quality flag - `integer`, `float`, or `string`
* Survey name (recommended for compilations of data from different surveys)

Let's see an example of Spec-z Catalog: 

In [None]:
gama = pz_server.get_product(14)

In [None]:
gama.display_metadata()

Display basic statistics

In [None]:
gama.data.describe()

The attribute `data`, which is a `DataFrame` preserves the `plot` method from Pandas.   

In [None]:
gama.data.plot(x="RA", y="DEC", kind="scatter")  
plt.xlabel("R.A. (degrees)")
plt.ylabel("Dec. (degrees)")

In [None]:
gama.data.hist('Z')
plt.xlabel("spec-z")
plt.ylabel("counts")
plt.title(None)

## Training Sets 
    
In the context of the PZ Server, Training Sets are defined as the product of the spatial cross-matching between a given Spec-z Catalog (single survey or compilation) and the photometric data, in this case, the LSST Objects Catalog. The PZ Server's *Training Set Maker* pipeline allows users to build customized Training Sets based on the available Spec-z Catalogs (details below).    

_Note 1: Training sets are commonly split into two or more subsets for photo-z validation purposes. If the Training Set owner has previously defined which objects should belong to each subset (training and validation/test sets), this information must be available as an extra column in the table or as clear instructions for reproducing the subset separation in the data product description._

  
_Note 2: The PZ Server only supports catalog-level Training Sets. Image-based Training Sets, e.g., for deep-learning algorithms, are not supported._


Mandatory column: 
* Spectroscopic (or true) redshift - `float`

Other expected columns
* Object ID from LSST Objects Catalog - `integer`
* Observables: magnitudes (and/or colors, or fluxes) from LSST Objects Catalog - `float`
* Observable errors: magnitude errors (and/or color errors, or flux errors) from LSST Objects Catalog - `float`
* Right ascension [degrees] - `float`
* Declination [degrees] - `float`
* Quality Flag - `integer`, `float`, or `string`
* Subset Flag - `integer`, `float`, or `string`


For example, the training set created in [RAIL's Goldenspike example notebook](https://github.com/LSSTDESC/rail/blob/main/examples/goldenspike_examples/goldenspike.ipynb): 

In [None]:
train_goldenspike = pz_server.get_product(9)

In [None]:
train_goldenspike.display_metadata()

Display basic statistics

In [None]:
train_goldenspike.data.describe()

In [None]:
train_goldenspike.data.hist('redshift', bins=20)

In [None]:
train_goldenspike.data.hist('mag_i_lsst', bins=20)

## Training Results

The training results of machine learning-based PZ algorithms can also be hosted in the PZ Server to be shared and reused. This product type allows files in free format. When the training results are generated with RAIL, they are stored as *pickle* files and can be downloaded to the local work directory. 

OBS: The method `download_product` always brings the data as a compressed (.zip) file, regardless of the number of auxiliary files attached to the data. 

In [None]:
pz_server.download_product('197_goldenspike_flexzboost', save_in='.') 

## Validation Results

The PZ Server is also a good place to safely store the results of a photo-z validation procedure. Users can upload a list of files in free format, such as tabular files with photo-z estimates (single estimates and/or PDFs) of a validation set, auxiliary files with photo-z validation metrics, validation plots, etc. 

In [None]:
pz_server.download_product("11_goldenspike_flexzboost", save_in=".") 

## Photo-z Tables 

Photo-z tables are the results of a photo-z estimation procedure. If the data is larger than the file upload limit of 200MB (for instance, the PZ tables for the LSST Object catalogs delivered as part of annual data releases), the product entry stores only the metadata (and instructions on accessing the data should be provided in the description field).

---
# Advanced methods

## PZ Server Pipelines 

In addition to PZ-related data hosting and curation services, PZ Server also provides tools to help users prepare training data for PZ algorithms. The pipeline *Training Set Maker* uses the data partitioning method [HATS](https://hats.readthedocs.io/en/stable/) and the Python framework [LSDB](https://docs.lsdb.io/en/stable/) (both developed by [LINCC](https://lsstdiscoveryalliance.org/programs/lincc/)) as cross-matching back-end engine, coupled with a user interface on the PZ Server website plugged to the IDAC-Brazil's high-performance computing infrastructure. With *Training Set Maker*, users can create training sets by matching objects from one given spec-z catalog available in the server with objects from an LSST Object catalog. In a previous step, the spec-z catalog might have been prepared as a combination of spectroscopic redshift measurements from different sources grouped into a single catalog with the pipeline *Combine Spec-z Catalogs*. 

<img src="./images/tsm.png" width="600" style="display: block; margin: auto;" />

Both pipelines are executed as asynchronous processes triggered from the PZ Server website or directly from Python scripts using the `pzserver` library, and the outputs are automatically registered as new data products. See below for an example of how to use them.     

### Combine Spec-z Catalogs 

The pipeline Combine Spec-z Catalogs (CSC) simply concatenates multiple Spec-z catalogs into a single table and registers it as a new data product on the PZ Server. It was designed to help aggregate multiple samples from individual surveys into a single catalog before they are associated with LSST data through spatial cross-matching. 

On the PZ Server website, go to **PZ Server Pipelines** > **Combine Spec-z Catalogs**, fill in the submission form with relevant metadata, such as the name for the new spec-z catalog to be created and a short description, select the catalogs to include by marking at least two checkboxes, and press the **Run** button. 


<img src="./images/ScreenshotCSC.png"  width=600 /> 


Alternatively, the pipeline can be submitted using the method `pz_server.combine_specz_catalogs` from the `pzserver` library. 

Start creating a "csc" process object instance by providing a name (string) for the new spec-z catalog to the method.

In [None]:
#csc = pz_server.combine_specz_catalogs(<new product's name>)
csc = pz_server.combine_specz_catalogs("csc example")

In [None]:
type(csc)

Check status of the `csc` process 

In [None]:
csc.check_status()

In [None]:
csc.summary()

Then, add at least two individual spec-z catalogs to be included in the sample using the `append_catalog` method. These catalogs must already exist in the PZ Server, and their internal names identify them. Let's browse the spec-z catalogs available and choose from the list: 

In [None]:
pz_server.display_products_list(filters={"product_type": "Spec-z Catalog", 'uploaded_by':'gschwend'})

Let's add those six small samples extracted from DP0.2 central tracts arbitrarily selected. 

<img src='./images/dpdd_dc2_zoom.png'/>

Figure from: https://dp0-2.lsst.io/data-products-dp0-2/index.html 

In [None]:
# csc = append_catalog(specz_id=None, internal_name=None)
csc.append_catalog(213) # tract 4029
csc.append_catalog(211) # tract 3831
csc.append_catalog(210) # tract 4031
csc.append_catalog(209) # tract 3448
csc.append_catalog(208) # tract 3450
csc.append_catalog(207) # tract 3833

When data observed with the LSST Camera become available, compilations of real data will be useful, for instance: 

In [None]:
# csc.append_catalog('13_vvds_specz_subsample') 
# csc.append_catalog('41_deimos_10k_public_specz') 
# csc.append_catalog('42_3dhst_public_specz') 
# csc.append_catalog('45_gama_public_specz') 
# csc.append_catalog('51_zcosmos_public_specz')
# csc.append_catalog('52_2dflens_public_specz')   

But for now, let's stick with the mock data from DP0.2.

Let's check the summary of `csc` attributes, now updated with the input catalogs added above:  

In [None]:
csc.summary()

Now, use the method `run` to submitt the process as an asychronous job to the PZ Server's back-end.   

In [None]:
csc.run()

Still during the process, we can check the `id` and the `internal_name` of the output data product with the `summary` method. 

In [None]:
csc.summary()

Or get it from the object `csc`

In [None]:
catalog_id = csc.output.get('id') 
catalog_id

In [None]:
catalog_name = csc.output.get('internal_name') 
catalog_name

Let's check if the process is done, if the status is 'Successful', we can move on to the next cell. 

In [None]:
csc.check_status()

Now, the new spec-z catalog named as "csc example" is available to be downloaded or retrieved to memory: 

In [None]:
my_new_specz_catalog = pz_server.get_product(catalog_name)

In [None]:
my_new_specz_catalog.display_metadata()

In [None]:
my_new_specz_catalog.data 

In [None]:
my_new_specz_catalog.data.plot(x="ra", y="dec", kind="scatter")  
plt.xlabel("R.A. (degrees)")
plt.ylabel("Dec. (degrees)")
plt.tigh_layout()

### Training Set Maker 

Let's add photometric data to our spectroscopic catalog to make a training set. On the PZ Server website, go to PZ Server Pipelines > Training Set Maker, fill in the submission form with relevant metadata, such as the name for the new training set to be created and a short description, select the input data and configuration parameters, and press the Run button. 

The configuration parameters are inherited from LSDB. For more information, please check the [LSDB documentation website](https://docs.lsdb.io/en/stable/).  


For this pipeline, the number of inputs is fixed to two: one spec-z catalog and one LSST Object catalog (identified by the LSST data release tag). 


<img src="./images/ScreenshotTSM.png" width=600 /> 

Alternatively, the pipeline can be submitted using the `pz_server.make_training_set` method from the `pzserver` library.

While waiting for the first LSST Object catalog with observed data becoming available, let's see how the *Training Set Maker* works with simulated data from DP0.2. Again, let's instantiate an object for the process, a "tsm" object, giving a name (string) for the new training set to be created. 

In [None]:
tsm = pz_server.training_set_maker("tsm example 2")                          

Let's set our spec-z catalog created above as input data: 

In [None]:
# tsm.set_specz(specz_id=None, internal_name=None)
tsm.set_specz(catalog_id)                                    

In [None]:
tsm.summary()

In [None]:
pz_server.display_releases()

And the data release for the object catalog. 

In [None]:
tsm.set_release(name='dp02')        

In [None]:
tsm.summary()

Set the configuration parameters `dict`: 

* The dictionary "cross-matching" refers to the LSDB configuration parameters `{'n_neighbors': 1, 'radius_arcsec': 1.0, 'suffixes': ['','']}`; 
* In addition, there is an extra parameter to define what to do when there are multiple matches for the same spectroscopic object: keep all matches or keep only the closest one (default). 
to be used by LSDB. 

```python
tsm.set_config(                                             
{'crossmatch': {'n_neighbors': 1, 'radius_arcsec': 1.0, 'suffixes': ['_specz','']}, 'duplicate_criteria': 'closest'}    
)    
```

In [None]:
#tsm.set_config({'crossmatch': {'n_neighbors': 1, 'radius_arcsec': 1.0, 'suffixes': ['_specz','']}, 'duplicate_criteria': 'closest'}) 

In [None]:
tsm.summary()

This pipeline might take longer than the previous one, so letting the notebook cell run until the process is finished is convenient instead of checking the status once in a while. For that, use the method `run_and_wait()`. 

OBS: the method `run_and_wait()` works if the process is shorter than 30 minutes. If it takes longer, the notebook cell is released and the process switches to the asynchronous mode.     

In [None]:
pz_server.run_and_wait(tsm)

In [None]:
tsm.summary()

In [None]:
training_set_id = tsm.output.get('id') 
training_set_id

Now, the new training set named as "tsm example" is available to be downloaded or retrieved to memory: 

In [None]:
my_new_training_set = pz_server.get_product(training_set_id)

In [None]:
my_new_training_set.display_metadata()

In [None]:
my_new_training_set.data 

In [None]:
my_new_training_set.data.plot(x="coord_ra", y="coord_dec", kind="scatter")  
plt.xlabel("R.A. (degrees)")
plt.ylabel("Dec. (degrees)")
plt.tight_layout()

In [None]:
my_new_training_set.data.hist('z_specz')
plt.xlabel("spec-z")
plt.ylabel("counts")
plt.title(None)
plt.tight_layout()

In [None]:
my_new_training_set.data.hist('mag_i')
plt.xlabel("i-band magnitude")
plt.ylabel("counts")
plt.title(None)
plt.tight_layout()

## Upload 
### How to upload a data product to PZ Server via Python API (alternative method)

As shown above, the default method to upload a data product to the PZ Server is the upload form on the PZ Server website. Alternatively, the `pzserver` Python library can send data products to the host service. 

First, prepare a dictionary with the relevant information about your data product: 

In [None]:
data_to_upload = {
    "name":"example upload via lib",
    "product_type": "specz_catalog",  # Product type 
    "release": None, # LSST release, use None if not LSST data 
    "main_file": "upload_example.csv", # full path 
    "auxiliary_files": ["upload_example.html", "upload_example.ipynb"] # full path
    #"auxiliary_files": [] # you must give a empty list if you don't have any auxiliary_files
}

In [None]:
upload = pz_server.upload(**data_to_upload)  

In [None]:
product_id = upload.product_id
product_id

After an upload object is created, you can also add auxiliary files before saving.

In [None]:
upload.add_auxiliary_file("upload_example.txt")

To save your product in PZ Server, you must give the columns names of your data. For a specz catalog, for example:

In [None]:
columns = {
    "<your-RA-column-name>": "RA",
    "<your-Dec-column-name>": "Dec",
    "<your-z-column-name>": "z"
}

upload.make_columns_association(columns)

Now, you can finally save it.

In [None]:
upload.save()

## Update
### How to edit an existing product via Python API

To do any modification to an existing product, first you need to define the product object.

In [None]:
po = pz_server.get_product_object(product_id)

You can see the attributes of this product.

In [None]:
po.attributes

#### Adding an auxiliary file

You can add an auxiliary file and/or description file, given their paths.

In [None]:
po.attach_auxiliary_file(path_to_auxiliary_file)
po.attach_description_file(path_to_description_file)

Now, you can check if the uploads were done correctly.

In [None]:
po.get_auxiliary_files()

In [None]:
po.get_description_files()

#### Updating the description

You can also upddate the product description as shown in pzserver.

In [None]:
po.update_description("test update description")

#### Deleting a single file of the product

To delete a single file of the product, you must give the file id to the ```remove_file``` method. Be careful, it is the file id, not the product id.

In [None]:
po.remove_file(file_id)

#### Deleting a full product

To delete the product with all its files (main and auxiliary), you can use the method ```delete_product```. **BE CAREFUL! THIS CAN'T BE UNDONE!**

In [None]:
pz_server.delete_product(product_id)

--- 

# User feedback 

Is something important missing? Send your feedback to us or [open an issue in the PZ Server library repository on GitHub](https://github.com/linea-it/pzserver/issues/new).  