
<img align="left" src = https://www.linea.org.br/wp-content/themes/LIneA/imagens/logo-header.png width=100 style="padding: 20px"> 
<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=160 style="padding: 20px"> 

## Training Set Maker - Demo Notebook
**Contact**: [Julia Gschwend](mailto:julia@linea.gov.br) <br> 
**Last verified run**: yyyy-mm-dd <br>



Training Set Maker (TSM) is a Python package to support the creation of Training (and Validation/Test) Sets for catalog-level photo-z algorithms using spec-z (or true z, for simulations) catalogs provided by the [PZ Server](https://github.com/linea-it/pz-server) and photometric data from LSST Objects Catalogs.

#### Setup

In [None]:
from pz_server import PzServer
from specz_sample import SpeczSample
#from train_valid import TrainValid

In [None]:
pz_server = PzServer()

#### Get basic info from PZ Server

In [None]:
pz_server.list_product_types

In [None]:
pz_server.list_users

In [None]:
pz_server.list_releases

In [None]:
pz_server.list_products_available(user="PZ Coord. Group", product_type="specz_catalog", release="all")

## 1. Spec-z Catalog 
#### 1.1 List Spec-z Catalogs available on Pz Server

In [None]:
pz_server.list_products_available(user="PZ Coord. Group", product_type="specz_catalog", release="all")

#### 1.2 Get metadata of a list of Spec-z Catalogs 

In [None]:
metadata_specz_1 = PzServer.get_product_metadata(product_id="0001")
metadata_specz_1  # markdown table

In [None]:
metadata_specz_2 = PzServer.get_product_metadata(product_id="0002")
metadata_specz_2  # markdown table

In [None]:
metadata_specz_3 = PzServer.get_product_metadata(product_id="0003")
metadata_specz_3  # markdown table

#### 1.3 Create a single Spec-z Sample from a single Spec-z Catalogs

In [None]:
specz_sample_one = SpeczSample(catalogs=["0001"])

In [None]:
specz_sample_one.metadata # metadata

In [None]:
specz_sample_one.data # dataframe

Apply filters on data

In [None]:
# to do: pandas query

In [None]:
# include info about filters used in the readme txt

Save file with filtered spec-z sample

In [None]:
specz_sample_one.save_file() # save 2 files (data and readme), ready to upload on PZ Server

#### 1.4 Create a single Spec-z Sample from a list of Spec-z Catalogs

In [None]:
specz_sample_all = SpeczSample(catalogs=["0001", "0002", "0003"], resolve_multiple="keep all")


In [None]:
specz_sample_best = SpeczSample(catalogs=["0001", "0002", "0003"], resolve_multiple="best") # best flag, then smallest error


In [None]:
specz_sample_newest = SpeczSample(catalogs=["0001", "0002", "0003"], resolve_multiple="newest") # newest survey

In [None]:
specz_sample_best.metadata # combined metadata (markdown table)

In [None]:
specz_sample_best.data # dataframe

Apply filters on data

In [None]:
# to do: pandas query

Save file with filtered spec-z sample

In [None]:
specz_sample_best.save_file(file_format="parquet") # save 2 files (data and readme(json)), ready to upload on PZ Server

Mini QA of combined Spec-z Sample

## 2. Training and Validation Sets

The training and Validation Sets are build as the combination of a Spec-z Sample (a Spec-z Sample object as defined above, or a Spec-z Catalog data product retrieved from the PZ Server) and photometric data retrieved from the API aspect of RSP.   

(to do: include case for 3 subsets (train/valid/test)
#### 2.1 Create Training and Validation Sets from a Spec-z Sample object

In [None]:
split="random"
train_fraction=0.7 # use 1.0 or 0.0 to create training set or validation set separately
observables=["mag_u", "mag_g", "mag_r", "mag_i", "mag_z", "mag_y", 
             "magerr_u", "magerr_g", "magerr_r", "magerr_i", "magerr_z", "magerr_y"] # any list of columns from LSST objects catalog

In [None]:
train_valid_local_specz = TrainValid(specz_sample_best, split=split, train_fraction=train_fraction, observables=observables)      

In [None]:
train_valid_local_specz.train_metadata # markdown table

In [None]:
train_valid_local_specz.train_data # dataframe

In [None]:
train_valid_local_specz.valid_metadata # markdown table

In [None]:
train_valid_local_specz.valid_data  # dataframe

Mini QA of Training and Validation Sets

Save files

In [None]:
train_valid_local_specz.save_file(file_format="parquet")  # save 4 files (2 data and 2 readme (json)), ready to upload on PZ Server

#### 2.2 Create Training and Validation Sets from a Spec-z Catalog data product in PZ Server

In [None]:
train_valid_remote_specz = TrainValid("0001", split=split, train_fraction=train_fraction, observables=observables)      

In [None]:
train_valid_remote_specz.train_metadata

In [None]:
train_valid_remote_specz.train_data # dataframe

In [None]:
train_valid_remote_specz.valid_metadata

In [None]:
train_valid_remote_specz.valid_data  # dataframe

Mini QA of Training and Validation Sets