<img align="left" src = "images/linea.png" width=140 style="padding: 20px"> 
<img align="left" src = "images/rubin.png" width=180 style="padding: 30px"> 

<font size=5> **Photo-z Server** Tutorial Notebook
 </font>

Contact author: [Julia Gschwend](mailto:julia@linea.org.br) <br>
Contributors: Luigi Silva, Cristiano Singulani <br> 
Last verified run: **2025-Jul-14**

# Introduction

The Photo-z (PZ) Server is an online service for the LSST Community to create, host and share lightweight PZ-related data products. The PZ Server is developed and maintained by LIneA as part of the in-kind contribution program (BRA-LIN) to the Rubin Observatory. The service is hosted in the Brazilian IDAC, with access restricted to the LSST Community. The access authorization is granted through [Rubin Science Platform (RSP)](https://data.lsst.cloud/) login credentials. For more information about the PZ Server and pther contribuitions related to photometric redshifts, please visit the [BRA-LIN's description page](https://linea-it.github.io/pz-lsst-inkind-doc/). 

The PZ Server has two main user interfaces: the website and the API, accessed via the `pzserver` Python library. 

This notebook contains instructuions for new users on how to use the `pzserver` Python library, with examples for all functions and methods available. The documentation on how to use the website is available on [LIneA's Documentation for Users webpage](https://docs.linea.org.br/en/sci-platforms/pz_server.html).     

# Getting Started

## Installation

The PZ Server's Python library is avalialble on **pip** as `pzserver`.

```
$ pip install pzserver 
```
OBS 1: Depending on your Jupyter Notebook/Lab version, you might need to restart the kernel to incorporate the new library.

OBS 2: If you are installing it on RSP Notebook Aspect on top of the LSST kernel, you might get some warnings regarding dependency versions. They must not affect the library usage. If you have any issues, please contact the [PZ Server team](mailto:pzserver-admin@linea.org.br ).   

In [None]:
! pip install pzserver 

Imports and Setup

In [None]:
from pzserver import PzServer 

## The PzServer class 

The `PzServer` class object opens the connection with the PZ Server database and allows access to data and metadata. To create a `PzServer` object, users must be authorized by using an API Token which is generated in the menu at the top right corner of the [PZ Server website](https://pzserver.linea.org.br/).  

<img src="images/ScreenShotTokenMenu.png" width=150pt align="top"/> <img src="images/ScreenShotTokenGenerator.png" width=350pt />

Paste the API Token replacing the placeholder below: 

In [None]:
pz_server = PzServer(token="<your token here>") 

API tokens can be reused indefinitely. However, an old token automatically expires whenever you create a new one. 

For convenience, the API token can be saved in a text file, e.g., **token.txt** (already listed in the .gitignore file in this repository). 

<font color=red> API tokens MUST NOT BE SHARED! Users are responsible for keeping their tokens private. </font> 

In [None]:
# with open('token.txt', 'r') as file:
#    token = file.read()
# pz_server = PzServer(token=token)

--- 
# Basic methods

## Query general info

The object `pz_server` created above can provide access to data and metadata stored in the PZ Server. It also brings additional methods for users to navigate through the available content. The methods with the prefix `get_` return the result of a query on the PZ Server database as a Python dictionary and are most useful to be used programmatically (see details on the [API documentation page](https://linea-it.github.io/pzserver/html/index.html)). Alternatively, those with the prefix `display_` show the results as a styled [_Pandas DataFrames_](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), optimized for Jupyter Notebook (note: column names might change in the display version). 

For instance:

display the list of product types supported with a short description (for a complete expanation of the product types with their upload requirements, please see the [product types section](https://docs.linea.org.br/en/sci-platforms/pz_server.html#data-product-types) on the PZ Server's documentation page), 

In [None]:
pz_server.display_product_types()

display the list of data releases available at the time, 

In [None]:
pz_server.display_releases()

and display all available data products. 

<font color='green'>WARNING: This list can rapidly grow during the survey's operation (cell output scrolling recommended)</a>

In [None]:
pz_server.display_products_list() 

The information about product type, users, and releases shown above can be used to filter the data products of interest for your search. For that, the method `display_products_list` receives as an argument a dictionary mapping the product's attributes to their values. 

In [None]:
pz_server.display_products_list(filters={"release": "DP0.2", 
                                         "product_type": "Training Set"})

It also works if we type a string pattern that is part of the value.

In [None]:
pz_server.display_products_list(filters={"product_type": "estimates"})

To fetch the results of a search and attribute to a variable, just change the prefix `display_` by `get_`:  

In [None]:
search_results = pz_server.get_products_list(filters={"product_type": "training results"}) 
search_results

## Display metadata

<font size=4>**product_id** and **internal_name**</font>

All data products stored on PZ Server are identified by its unique **product_id** number or its **internal_name**, which is created automatically at the moment of the upload by concatenating the **product_id** to the name given by its owner (replacing blank spaces by "_", lowering cases, and removing special characters) (e.g.: `30_simple_training_set`). 

<font size=4>Display the metadata of a data product</font>

The metadata of a given data product is all the information available about it, including what the user provided on the upload form. 

The `PzServer`'s method `get_product_metadata()` returns a dictionary with the attibutes stored in the PZ Server about a given data product identified by its **id** or **internal_name**. For use in a Jupyter notebook, the equivalent `display_product_metadata()` shows the results in a formated table.

In [None]:
product_id = 30
pz_server.display_product_metadata(product_id)

## Download 

To download any data product stored in the PZ Server, use the method `download_product` informing the **product_id** or **internal_name** and the path to where it will be saved (the default is the current folder). This method downloads a compressed .zip file, which contains all the files uploaded by the user, including data, auxiliary files, and description files. Let's try it with a small data product. 

In [None]:
pz_server.download_product(product_id, save_in=".")

## Load 

Instead of downloading the files, the `pzserver` library also allows users to retrieve the contents of a given data product to work on memory using the method `get_product()`. This feature is available only for tabular data, such as redshift catalogs and training sets.

By default, the method `get_product` returns an `astropy.Table`, which can be easily converted into a `pandas.DataFrame`. Let's see an example with the data product mentioned above: 

In [None]:
data = pz_server.get_product(30).to_pandas()
data

In [None]:
data.describe()

## Share 

All data products uploaded to the PZ Server are immediately available and visible to all PZ Server users (people with RSP credentials) through the PZ Server website or Python library. One way to share a data product is by providing the product's URL, which leads to the product's download page. The URL is composed by the PZ Server website address + **/products/** + **internal_name**:

For example, for the data downloaded above:

In [None]:
internal_name = pz_server.get_product_metadata(product_id)['internal_name']
url = f'https://pzserver.linea.org.br/product/{internal_name}'
url

---
# Advanced methods

## Upload 


The default method to upload a data product to the PZ Server is the upload form on the website. Alternatively, the `pzserver` Python library can send data products to the host service. 

The first step is to prepare a dictionary with the relevant information about your data product. For example:  

In [None]:
data_to_upload = {
    "name":"example upload via lib",
    "product_type": "redshift_catalog",  # Product type 
    "release": None, # LSST release, use None if not LSST data 
    "main_file": "./examples/upload_example.csv", # full path 
    "auxiliary_files": ["./examples/upload_example.html", "./examples/upload_example.ipynb"] # full path
    #"auxiliary_files": [] # you must give a empty list if you don't have any auxiliary_files
}

Then, execute the `upload` method from the `pz_server` class informing the product details as the dictionary as a kwargs argument. 

In [None]:
upload = pz_server.upload(**data_to_upload)  
upload.product_id

The upload is not done yet! The step below just starts the process. 
For **Reference Redshift Catalogs** and **Training Sets**, users must inform the columns names association. 

```python 
columns = {
    "<your-RA-column-name>": "RA",
    "<your-Dec-column-name>": "Dec",
    "<your-z-column-name>": "z"
}
```


For instance: 

In [None]:
columns = {
    "RA": "RA",
    "DEC": "Dec",
    "Z": "z"
}

upload.make_columns_association(columns)

Now, you can finally save it.

In [None]:
upload.save()

Check out the results of your upload: 

In [None]:
pz_server.display_product_metadata(upload.product_id) 

## Update

To do any modification to an existing product, first you need to define the product object.

In [None]:
#product_object = pz_server.get_product_object(<product_id>)
product_object = pz_server.get_product_object(upload.product_id)

You can see the attributes of this product.

In [None]:
product_object.attributes

<font size=4>Adding an auxiliary file</font>

You can add an auxiliary file and/or description file, given their paths. The difference between these two files is that the description file is the HTML file displayed in the product details page, on PZ Server website (e.g. an exported notebook).   

In [None]:
# product_object.attach_auxiliary_file("<path_to_auxiliary_file>")
# product_object.attach_description_file("<path_to_description_file>")

Now, you can check if the uploads were done correctly.

In [None]:
# product_object.get_auxiliary_files()

In [None]:
# product_object.get_description_files()

<font size=4>Update the description</font>

You can also upddate the product description as shown in pzserver.

In [None]:
product_object.update_description("test update description")

In [None]:
product_object.attributes['description']

In [None]:
pz_server.display_product_metadata(upload.product_id) 

## Delete 

To delete a data product with all its files (main and auxiliary), you can use the method ```delete_product```. 

<font color=red> **BE CAREFUL! THIS CAN'T BE UNDONE!** </font>

In [None]:
#pz_server.delete_product(<product_id>)
pz_server.delete_product(upload.product_id)

# PZ Server Pipelines 


In addition to PZ-related data hosting and curation services, PZ Server also provides tools to help users prepare training data for PZ algorithms. The pipeline *Training Set Maker* uses the data partitioning method [HATS](https://hats.readthedocs.io/en/stable/) and the Python framework [LSDB](https://docs.lsdb.io/en/stable/) (both developed by [LINCC](https://lsstdiscoveryalliance.org/programs/lincc/)) as cross-matching back-end engine, coupled with a user interface on the PZ Server website plugged to the IDAC-Brazil's high-performance computing infrastructure. With *Training Set Maker*, users can create training sets by matching objects from one given reference redshift catalog available in the server with objects from an LSST Object catalog. The reference catalog can by previously uploaded by user or created as a combination of multiple redshift catalogs by the other pipeline, the so called *Combine Redshift Catalogs*. 

<img src="./images/tsm.png" width="600" style="display: block; margin: auto;" />

Both pipelines are executed as asynchronous processes triggered from the PZ Server website or directly from Python scripts using the `pzserver` library, and the outputs are automatically registered as new data products. 


### Combine Redshift Catalogs 

<font color=red> Pipeline under development. </font>


The release will be annouced on the Community Forum soon!  

### Training Set Maker 

<font color=red> Pipeline under development. </font>

The release will be annouced on the Community Forum soon!  

--- 

# User feedback 

Did you find a bug? Is something important missing? 

Send your feedback to us [via email](mailto:pzserver-admin@linea.org.br) or feel free to [open an issue](https://github.com/linea-it/pzserver/issues/new) in the PZ Server library repository on GitHub.  