# 

<img align="left" src = https://linea.org.br/wp-content/themes/LIneA/imagens/logo-header.jpg width=100 style="padding: 20px"> 

<img align="left" src = https://project.lsst.org/sites/default/files/Rubin-O-Logo_0.png width=160 style="padding: 20px">  

# First Try of executing RAIL using DP0.2

**Contact author**: Heloisa da Silva Mengisztki ([heloisasmengisztki@gmail.com](mailto:heloisasmengisztki@gmail.com)) 

**Last verified run**: 2022-12-01 (YYYY-MM-DD) <br><br><br>

This notebook is a first try of execution using rail_bpz with the dp0.2 data.

### IMPORTS

In [None]:
import time
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import rail
import qp
import tables_io

from rail.core.utils import RAILDIR
from lsst.rsp import get_tap_service

from rail.core.data import TableHandle
from rail.core.stage import RailStage
from rail.core.utilStages import ColumnMapper, TableConverter
from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite
from rail.evaluation.evaluator import Evaluator

%matplotlib inline 

In [None]:
service = get_tap_service()

assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

## General Configs

In [None]:
pd.set_option('display.max_rows', 20)

In [None]:
CURR_DIR = os.getcwd()
CURR_DIR, RAILDIR

## Reading DP0.2 data

For this step, we are going to read 1k of galaxies, the, we are going to use it to try running bpz_rail. Here we are going to use some values present in the TAP tutorial notebook from rubin science platform for coordenates and radius. 

In [None]:
max_rec = 1000
use_center_coords = "62, -37"
use_radius = "1.0"

In [None]:
bands = ['g', 'i', 'r', 'u', 'y', 'z']

mags = ""
for band in bands:
    mags+= f"scisql_nanojanskyToAbMag({band}_cModelFlux) AS mag_{band}_cModel, {band}_cModelFluxErr, "

columns_query = f"objectId, {mags}coord_ra, coord_dec "

In [None]:
query = "SELECT " + columns_query + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND r_extendedness = 1 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 "
print(query)

In [None]:
%%time
results = service.search(query, maxrec=max_rec)
print(type(results))
results = results.to_table()
print(type(results))
results_pd = results.to_pandas()
results_pd.info(memory_usage="deep")

In [None]:
results_pd.head()

---

##  RAIL BPZ

In [None]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True

In [None]:
columns_remmap = {
"objectId": "id",
"coord_ra": "coord_ra",
"coord_dec": "coord_dec",
"mag_g_cModel": "mag_g_lsst",
"g_cModelFluxErr": "mag_err_g_lsst",
"mag_i_cModel": "mag_i_lsst",
"i_cModelFluxErr": "mag_err_i_lsst",
"mag_r_cModel": "mag_r_lsst",
"r_cModelFluxErr": "mag_err_r_lsst",
"mag_u_cModel": "mag_u_lsst",
"u_cModelFluxErr": "mag_err_u_lsst",
"mag_y_cModel": "mag_y_lsst",
"y_cModelFluxErr": "mag_err_y_lsst",
"mag_z_cModel": "mag_z_lsst",
"z_cModelFluxErr": "mag_err_z_lsst",
"detect_isPrimary": "detect_isPrimary"
}

col_remapper_train = ColumnMapper.make_stage(name='col_remapper_train', columns=columns_remmap)
table_conv_train = TableConverter.make_stage(name='table_conv_train', output_format='numpyDict')

results_remmaped = col_remapper_train(results_pd)
## the redshift value is required and it is going to come from other surveys
results_remmaped.data["redshift"] = 1

train_data = table_conv_train(results_remmaped)

Here it's interesting to point out that we setted the redshift column in 1 and did not split the data in a training set and vaidation/test set. The main objective is to explore bpz with te dp0.2 sample. This sample is supposed to be as close as the ones that are going to be collected when the telescope starts to operate (there is a name for that, for the 0 datas, the name is testing phase, but after that i dont rlly remember the name) 

In [None]:
type(results_remmaped), type(train_data)

In [None]:
DS

In [None]:
test_table = tables_io.convertObj(train_data.data, tables_io.types.PD_DATAFRAME)
test_table.head()

### PRIORS - Inform

In [None]:
columns_file = os.path.join(CURR_DIR, '../configs/bpz.columns')
inform_bpz = Inform_BPZ_lite.make_stage(
    name='inform_bpzlite', 
    input="inprogress_output_table_conv_train.hdf5", 
    model='trained_BPZ_output.pkl', ##gera o arquivo de treinamento pra usar depois
    hdf5_groupname='', 
    columns_file=columns_file
)

In [None]:
%%time
returned = inform_bpz.inform(train_data)

___

## Posterior -> Estimate


In [None]:
estimate_bpz = BPZ_lite.make_stage(
    name='estimate_bpz', 
    hdf5_groupname='', 
    columns_file=columns_file, 
    model=inform_bpz.get_handle('model'))

In [None]:
bpz_estimated = estimate_bpz.estimate(train_data)

In [None]:
#help(bpz_estimated())
bpz_estimated().build_tables()

results_tables = tables_io.convertObj(bpz_estimated().build_tables()['ancil'], tables_io.types.PD_DATAFRAME)
results_tables

In [None]:
test_data_orig = results_remmaped.data

evaluator = Evaluator.make_stage(name=f'bpz_eval', truth=test_data_orig)
result_dict = evaluator.evaluate(bpz_estimated, test_data_orig)

In [None]:
results_tables = tables_io.convertObj(result_dict.data, tables_io.types.PD_DATAFRAME)
results_tables.head()

___
## Resultado pz x spec-z

In [None]:
zmode = bpz_estimated().ancil['zmode']

In [None]:
plt.figure(figsize=(8,8))
plt.scatter(train_data()['redshift'],zmode,s=1,c='k',label='simple bpz mode')
plt.plot([0,3],[0,3],'r--');
plt.xlabel("true redshift")
plt.ylabel("bpz photo-z")

## Conslusion 

The results obtained made sense since we need a trained set with a redshifft value or a trained file to run calibrate the algorithm and since we are using dp0.2 we do not have a training set nor a redshift column. Since we setted the redshift value to one, this graphs shows exactly the setted value in the x axis and since the algorithm couldnt calculate all the values in the evaluate stage, and the y axis values were not close to the 1 value, the results are consistent to the configurations and the input, therefore, we conclude this step running a small dp0.2 sample using rail, learning that we are going to need a training set with the redshift value an a test set to be calculated by the algorithm.