# RAIL - Fundamentals

RAIL is a LSST-DESC software created to process different algorithms used to calculate photometric redshift. Its main goal is to minimize impact that different infrastructures can cause on different algorithms, for that it unifyes in a modular code supporting different inputs that different algorithms needs and padronizing the output so that it can be a more fair comparison between their results.

Rail uses 4 principal libraries in its core: <br>
_tables_io_: for data manipulation as hdf5 files, fits, etc. <br>
_qp_: used to paremitrize data PDFs for metrics calculation. <br>
_ceci_: construct pipelines, produces a .yaml within the steps and configurations as threads. <br>
_pzflow_: creates a flow for data creation. <br>

#### Core.
Where the main functions are going to manage the data and files that the program creates. It works based in the behavioral chain of resposability pattern (https://refactoring.guru/pt-br/design-patterns/chain-of-responsibility), where you create a flux in the code, where there is a request related/processed by a class handler that decides to pass it foward or not according to what is defined. So for that, what bpz does is create a class request (eg: Inform_BPZ_lite) that has all the inputs/configurations and is handled by its class handler (BPZ_lite).

#### Creation.
Contain all the support for data creation, as degradors, data flow creation, Column remapping, etc. It creates .hdf5 files with the data that is being manipulated.

#### Estimation.
This is where the codes are defined and executed.  <br>
inform: this is where the PRIORS for template fitting are informed and the machine learning codes are trained. <br>
estimate: where the algorith is executed though the .evaluate() function.
The code is wrapped as a RAIL stage so that it can be run in a controlled way. Estimation code can be stored in a yaml file to be run as a ceci module.


#### Evaluation.
This step contais the metrics for performance of the estimated codes.
<br>
------
For installation instructions check the official documentation: https://lsstdescrail.readthedocs.io/en/latest/source/installation.html <br>

Its important to point out that as Rail is still being developed it may be necessary to do a update (onde in a while) to you rail package once its installed. <br> 
First you must update the cloned rail repository: _git pull origin_ <br>
Then, run: `pip install pz-rail-bpz --upgrade`
`pip install pz-rail --upgrade`

For Rail versions check: https://github.com/LSSTDESC/RAIL/releases

 ipykernel with conda install ipykernel
 python -m ipykernel install –user –name [nametocallnewkernel] 

## Imports, setup and some sample data

In [1]:
import os
import numpy as np
import pandas as pd
import qp
import tables_io
import matplotlib.pyplot as plt

import rail
from rail.core.utils import RAILDIR
from rail.core.data import TableHandle, PqHandle, ModelHandle
from rail.core.stage import RailStage
from rail.core.utilStages import ColumnMapper, TableConverter

from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite
from rail.evaluation.evaluator import Evaluator

from rail.estimation.algos.knnpz import Inform_KNearNeighPDF

In [2]:
CURR_DIR = os.getcwd()
CURR_DIR, RAILDIR

('/home/heloisamengisztki/WORK/ic-photoz/Fase2-RAIL',
 '/home/heloisamengisztki/.local/lib/python3.10/site-packages')

### Reading some sample

In [3]:
data_columns = ["coadd_objects_id","ra","dec","mag_g","magerr_g","mag_i","magerr_i","mag_r","magerr_r","mag_u","magerr_u","mag_y","magerr_y","mag_z","magerr_z","z_true"]

file_path = '/home/heloisamengisztki/DATA/dp0_train_random.csv'
full_data = pd.read_csv(file_path, usecols=data_columns)
full_data.head()

Unnamed: 0,coadd_objects_id,z_true,ra,dec,mag_u,mag_g,mag_r,mag_i,mag_z,mag_y,magerr_u,magerr_g,magerr_r,magerr_i,magerr_z,magerr_y
0,18599476134425521,2.84238,60.4467,-34.056,26.1816,25.7714,25.629,25.9107,25.6477,,0.3817,0.0941,0.1011,0.2273,0.54,-0.7377
1,13542134963533657,2.888735,59.2224,-43.1165,26.4664,27.0861,27.1896,28.7258,24.9601,26.0376,0.5607,0.3413,0.3672,3.1551,0.2299,1.3945
2,18617081205359130,1.29035,67.6464,-33.5759,26.7561,27.2174,26.8622,25.6075,25.2444,24.469,0.5764,0.3407,0.2659,0.1358,0.2957,0.3227
3,17724148914627425,2.44262,65.1607,-34.4085,26.7917,26.0648,25.7113,26.3745,,26.209,0.7288,0.1233,0.1153,0.3547,,1.7678
4,14373666401847353,1.463598,73.0255,-40.2059,23.7023,23.5788,23.6343,23.4418,23.3789,22.8774,0.0423,0.0139,0.0144,0.0246,0.0525,0.0651


#### Spliting into train and test data

In [4]:
size = len(full_data)//2

train_sample = full_data.sample(n=size,ignore_index=True)
test_sample = full_data.drop(train_sample.index)

In [5]:
train_sample.head()

Unnamed: 0,coadd_objects_id,z_true,ra,dec,mag_u,mag_g,mag_r,mag_i,mag_z,mag_y,magerr_u,magerr_g,magerr_r,magerr_i,magerr_z,magerr_y
0,17716027131470567,0.551877,60.2358,-35.1148,25.0702,25.5462,25.0705,24.6702,24.1798,23.8673,0.1392,0.0699,0.0584,0.0673,0.1492,0.1731
1,14368710009579209,1.050643,72.2014,-40.7827,,26.5723,26.638,26.0706,25.8855,,,0.147,0.1887,0.1978,0.4264,-19.2522
2,21335477676159992,2.597062,56.8487,-29.4104,27.0969,26.0461,25.9731,26.8355,26.1577,24.9263,0.9958,0.1216,0.1219,0.5306,0.7492,0.492
3,18626027622267088,0.99629,70.7486,-32.8375,26.3635,25.9679,26.0404,25.5409,26.6184,24.8711,0.3529,0.0824,0.1061,0.1206,0.86,0.3501
4,15151704022495029,1.685075,57.0515,-38.895,26.7042,25.9116,26.2063,25.9087,26.1263,27.9162,0.6256,0.0985,0.1619,0.2252,0.7806,6.9637


In [6]:
test_sample.head()

Unnamed: 0,coadd_objects_id,z_true,ra,dec,mag_u,mag_g,mag_r,mag_i,mag_z,mag_y,magerr_u,magerr_g,magerr_r,magerr_i,magerr_z,magerr_y
40406,14320460346985530,1.083167,50.4691,-41.1147,26.5377,26.7127,26.079,26.0657,25.597,25.3729,0.529,0.2024,0.151,0.2448,0.46,0.6812
40407,19505632629520281,1.228961,62.7543,-31.6018,26.9483,26.6153,26.6607,25.9465,24.8356,24.4789,0.5684,0.1813,0.1866,0.1587,0.1882,0.2849
40408,14355786452989168,1.00551,65.7566,-40.9686,25.4074,26.4859,25.8216,25.7213,25.3755,24.4304,0.232,0.181,0.1192,0.2347,0.49,0.3558
40409,13568411573424664,1.727726,71.6283,-41.7451,26.5434,25.717,25.7152,25.4834,25.6776,25.0296,0.3836,0.0772,0.0948,0.1318,0.4083,0.4969
40410,16840279004839546,0.995374,59.1503,-36.0538,27.0809,26.0884,25.5183,25.5132,24.7519,24.8049,0.9508,0.1255,0.0814,0.1516,0.2752,0.4863


---

##  RAIL 

Rail has a lot of classes and it uses Object Oriented Programming - POO, therefore things can get complicated very fast, but for now we are going to focus on understanging a little bit of the three bases ones: **RailStage, DataStore and DataHandler**

**Image:** This diagram represents some classes and its hierarchy.

![title](RAILclasses.png)


## DataStore

The data store class is the class that is going to store all the data that is being processed associated with a key value. For example for a file containing the sample that we are going to use to test an algorithm named 'test_sampe.hdf5' we add this to the data store naming the key 'test_sample' and a what class (DaataHandler) is it going to use to read it, in this case TableHandler -> HandlerHdf5. 
<br>

Another important thing to know is that the DataStore class acts as a [singleton class](https://refactoring.guru/design-patterns/singleton) wich basically is a class that has only one instance in the aplication. That is important due to the fact that rail keeps all the data and handlers as it runs so that the previous stage can access and read it. Based on that when if we try to create another instace, what its going to do is serve as a DataStore factory, but not the DataStore class itself.  
<br>

We can see access the data storage trough the attribute data_store. By default it does not allow to overwrite the data tha its being stored so if we want to change the value of a key we have to manually set the property allow_overwrite to true.

In [7]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True
DS

DataStore
{}

To help us undestand better we are constantly going to monitor how DataStore stores the data and how the memory goes with that.

In [8]:
import sys

sys.getsizeof(DS), 'bytes'

(80, 'bytes')

We can manually add data to the data store with the add_data or pass a file and store it in the DS with the read_file.

In [9]:
DS.add_data(key="input", data=train_sample, handle_class=PqHandle)
## DS.read_file(key="name", path=file_path, handle_class=Handler)
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
}

Here is how we can access the data, what it is going to do is use the handler that we passed

In [10]:
sys.getsizeof(DS), 'bytes'

(248, 'bytes')

In [11]:
DS.read("input").head()

Unnamed: 0,coadd_objects_id,z_true,ra,dec,mag_u,mag_g,mag_r,mag_i,mag_z,mag_y,magerr_u,magerr_g,magerr_r,magerr_i,magerr_z,magerr_y
0,17716027131470567,0.551877,60.2358,-35.1148,25.0702,25.5462,25.0705,24.6702,24.1798,23.8673,0.1392,0.0699,0.0584,0.0673,0.1492,0.1731
1,14368710009579209,1.050643,72.2014,-40.7827,,26.5723,26.638,26.0706,25.8855,,,0.147,0.1887,0.1978,0.4264,-19.2522
2,21335477676159992,2.597062,56.8487,-29.4104,27.0969,26.0461,25.9731,26.8355,26.1577,24.9263,0.9958,0.1216,0.1219,0.5306,0.7492,0.492
3,18626027622267088,0.99629,70.7486,-32.8375,26.3635,25.9679,26.0404,25.5409,26.6184,24.8711,0.3529,0.0824,0.1061,0.1206,0.86,0.3501
4,15151704022495029,1.685075,57.0515,-38.895,26.7042,25.9116,26.2063,25.9087,26.1263,27.9162,0.6256,0.0985,0.1619,0.2252,0.7806,6.9637


#### Memory x Files

As we can see, as soon that we added the data to the DS the memory increased in 200 bytes. In Rail we can store data as a tableLike object as pandas dataframe, orderDic, etc. but we can also work with the flow in memory. For that there are a bunch of steps/configs that are going differ from bringing or not the data to memory.

## DataHandler

As all the stages herd from RailStages, [Delegate Pattern](https://en.wikipedia.org/wiki/Delegation_pattern), that can be seen in figure of the classe maped above, and RailStages can be seen as a 
[CeciStage](https://github.com/LSSTDESC/ceci/blob/d1d5686aefab18bc53e3d4d8a05af42d19e28a91/ceci/stage.py#L24]), when we declare a stage we use the method _make_stage(**args)_, what id does is to return the object itself as a stage configured with the given parameters. 

To undestand how the returned object works we can use an explanation present in the c# language for [delegate](https://docs.microsoft.com/pt-br/dotnet/csharp/programming-guide/delegates/using-delegates). Basically we can think of delegates as a method that points to an abstract class and a method of that class that is going to execute. Therefore a class that behaves as a method and can be executed. In python this method can be declared as `__call__` and the retuned class can be executed as class(), then this is going to execute the defined methos class. For RailStages it is going to run the algorithm.

In [12]:
##help(ColumnMapper)

In [13]:
columns_remmap = {
"coadd_objects_id": "id",
"ra": "coord_ra",
"dec": "coord_dec",
"mag_g": "mag_g_lsst",
"magerr_g": "mag_err_g_lsst",
"mag_i": "mag_r_lsst",
"magerr_i": "mag_err_r_lsst",
"mag_r": "mag_i_lsst",
"magerr_r": "mag_err_i_lsst",
"mag_u": "mag_u_lsst",
"magerr_u": "mag_err_u_lsst",
"mag_y": "mag_y_lsst",
"magerr_y": "mag_err_y_lsst",
"mag_z": "mag_z_lsst",
"magerr_z": "mag_err_z_lsst",
"z_true": "redshift"
}

col_remapper_train = ColumnMapper.make_stage(name='col_remapper_train', columns=columns_remmap)
print(f"Returned class: {type(col_remapper_train)}")

Returned class: <class 'rail.core.utilStages.ColumnMapper'>


In [14]:
sys.getsizeof(DS), 'bytes'

(248, 'bytes')

we can see the configurations of the returned class with `returned_obj.config.to_dict()`

In [15]:
##col_remapper_train.config.to_dict()

Basically we can call execute it in two ways.
1. When the data is added manually to the DS with `col_remapper_train.run()`
2. Passing the data trough parameter and invoking the method as `col_remapper_train(dataAsTableLike)`

in this case we are going to call the method run

In [16]:
col_remapper_train.run()
print(f"\nRodando em paralelo -> {col_remapper_train.is_parallel()}")
DS

Inserting handle into data store.  output_col_remapper_train: inprogress_output_col_remapper_train.pq, col_remapper_train

Rodando em paralelo -> False


DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
}

We can see that it is storing the outputs in the DS before the stage. Lets check the outupt

In [17]:
col_remapper_train.get_data("output").head() 
## or trough DS as 
##DS.read("output_col_remapper_train")
##DS["output_col_remapper_train"].data

#tables_io.convertObj(DS.read("output_estimate_bpz").build_tables()['ancil'], tables_io.types.PD_DATAFRAME)

Unnamed: 0,id,redshift,coord_ra,coord_dec,mag_u_lsst,mag_g_lsst,mag_i_lsst,mag_r_lsst,mag_z_lsst,mag_y_lsst,mag_err_u_lsst,mag_err_g_lsst,mag_err_i_lsst,mag_err_r_lsst,mag_err_z_lsst,mag_err_y_lsst
0,17716027131470567,0.551877,60.2358,-35.1148,25.0702,25.5462,25.0705,24.6702,24.1798,23.8673,0.1392,0.0699,0.0584,0.0673,0.1492,0.1731
1,14368710009579209,1.050643,72.2014,-40.7827,,26.5723,26.638,26.0706,25.8855,,,0.147,0.1887,0.1978,0.4264,-19.2522
2,21335477676159992,2.597062,56.8487,-29.4104,27.0969,26.0461,25.9731,26.8355,26.1577,24.9263,0.9958,0.1216,0.1219,0.5306,0.7492,0.492
3,18626027622267088,0.99629,70.7486,-32.8375,26.3635,25.9679,26.0404,25.5409,26.6184,24.8711,0.3529,0.0824,0.1061,0.1206,0.86,0.3501
4,15151704022495029,1.685075,57.0515,-38.895,26.7042,25.9116,26.2063,25.9087,26.1263,27.9162,0.6256,0.0985,0.1619,0.2252,0.7806,6.9637


In [18]:
sys.getsizeof(DS), 'bytes'

(248, 'bytes')

____
**OBSERVATION**

Passing the input in make_stage does not work
`ColumnMapper.make_stage(name='col_remapper_train_2', columns=columns_remmap, input='test')`

While a did the test of putting the input as name of DS PqHandler does not work, what it does when we call run what it does primary is to call get_data <br>
`data = self.get_data('input', allow_missing=True)` <br>
this search in the DS to a key named input.

To change that would be necessary to call the method <br>
`self.set_data(self.config.input, data)` <br> before and if not set then serach by the key 'input'
___

For the The algorithms, basically they all expect an input as TableHandler.<br>
`inputs = [('input', <class 'rail.core.data.TableHandle'>)]`<br>
as the output of remmapColumns is already a TableHandler we dont need to specify, but if the data is already in the correct form, it may be helpful to use the TableConverter class. 

Eg:

     table_conv_train = TableConverter.make_stage(name='table_conv_train', output_format='numpyDict')
     table_conv_train.run()


and the output is a ModelHandler<br>
`outputs = [('model', <class 'rail.core.data.ModelHandle'>)]`

**Image:** basic flux of inputs and outputs. 

![title](SimpleRailBPZflow.png)


In [19]:
##help(Inform_BPZ_lite)

In [20]:
DS.add_data(key="input", data=col_remapper_train.get_data("output"), handle_class=PqHandle)
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
}

In [21]:
bpz_columns_file = os.path.join(CURR_DIR, 'configs/bpz.columns')

inform_bpz = Inform_BPZ_lite.make_stage(
    name='inform_bpzlite', 
    #input="test_nome",
    model='trained_BPZ_output.pkl', 
    hdf5_groupname='', 
    columns_file=bpz_columns_file,
    prior_band="mag_i_lsst"
)
inform_bpz.config.to_dict()

{'output_mode': 'default',
 'hdf5_groupname': '',
 'save_train': True,
 'zmin': 0.0,
 'zmax': 3.0,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/WORK/ic-photoz/Fase2-RAIL/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'm0': 20.0,
 'nt_array': [1, 2, 3],
 'mmin': 18.0,
 'mmax': 29.0,
 'init_kt': 0.3,
 'init_zo': 0.4,
 'init_alpha': 1.8,
 'init_km': 0.1,
 'prior_band': 'mag_i_lsst',
 'redshift_col': 'redshift',
 'type_file': '',
 'name': 'inform_bpzlite',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'input': 'None',
 'aliases': {'model': 'model_inform_bpzlite'}}

compute the best fit prior parameters

In [22]:
%%time
inform_bpz.run()
## or inform_bpz.inform(data)

using 39635 galaxies in calculation
best values for fo and kt:
[1.]
[0.3]
minimizing for type 0
[0.4 1.8 0.1] 34775.87541654277
[0.42 1.8  0.1 ] 34573.5353956101
[0.4  1.89 0.1 ] 35739.380743853675
[0.4   1.8   0.105] 34490.06720332562
[0.41333333 1.71       0.10333333] 34118.534534460014
[0.42  1.62  0.105] 34241.09578643209
[0.42222222 1.74       0.10555556] 34220.1233000785
[0.4037037  1.7        0.10925926] 34171.33959664292
[0.42617284 1.63333333 0.10709877] 34384.091618178965
[0.41962963 1.675      0.10657407] 34203.85072700894
[0.40222222 1.65       0.10722222] 34120.44550029753
[0.39320988 1.69833333 0.1066358 ] 34075.48941035652
[0.38       1.71       0.10666667] 34098.435030059336
[0.40213992 1.67222222 0.10220165] 34028.59116650099
[0.40135802 1.65833333 0.09867284] 34024.03574875067
[0.40304527 1.72777778 0.09853909] 34312.93669433039
[0.40242798 1.66944444 0.10505144] 34054.32083312278
[0.38466392 1.64074074 0.10357339] 33965.55305664434
[0.37032922 1.60611111 0.10369342] 

In [23]:
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
  model_inform_bpzlite:<class 'rail.core.data.ModelHandle'> inprogress_trained_BPZ_output.pkl, (d)
}

In [24]:
inform_bpz.config.to_dict()

{'output_mode': 'default',
 'hdf5_groupname': '',
 'save_train': True,
 'zmin': 0.0,
 'zmax': 3.0,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/WORK/ic-photoz/Fase2-RAIL/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'm0': 20.0,
 'nt_array': [1, 2, 3],
 'mmin': 18.0,
 'mmax': 29.0,
 'init_kt': 0.3,
 'init_zo': 0.4,
 'init_alpha': 1.8,
 'init_km': 0.1,
 'prior_band': 'mag_i_lsst',
 'redshift_col': 'redshift',
 'type_file': '',
 'name': 'inform_bpzlite',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'input': 'None',
 'aliases': {'model': 'model_inform_bpzlite'}}

___

## STAGES


For posteriors

     inputs = [('model', <class 'rail.core.data.ModelHandle'>)]
     outputs = [('output', <class 'rail.core.data.QPHandle'>)]

adding to memory input variable

In [25]:
DS.add_data(key="input", data=test_sample, handle_class=ModelHandle)
col_remapper_train.run()
DS, DS.read("input")

Inserting handle into data store.  output_col_remapper_train: inprogress_output_col_remapper_train.pq, col_remapper_train


(DataStore
 {  input:<class 'rail.core.data.ModelHandle'> None, (d)
   output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
   model_inform_bpzlite:<class 'rail.core.data.ModelHandle'> inprogress_trained_BPZ_output.pkl, (d)
 },
         coadd_objects_id    z_true       ra      dec    mag_u    mag_g  \
 40406  14320460346985530  1.083167  50.4691 -41.1147  26.5377  26.7127   
 40407  19505632629520281  1.228961  62.7543 -31.6018  26.9483  26.6153   
 40408  14355786452989168  1.005510  65.7566 -40.9686  25.4074  26.4859   
 40409  13568411573424664  1.727726  71.6283 -41.7451  26.5434  25.7170   
 40410  16840279004839546  0.995374  59.1503 -36.0538  27.0809  26.0884   
 ...                  ...       ...      ...      ...      ...      ...   
 80807  18608160558314795  1.541024  64.3444 -32.8562  24.7260  25.0259   
 80808  16862801813338393  0.346085  67.1182 -36.9476  25.2311  23.7843   
 80809  15178221150575213  2.434222  68.4211 

In [None]:
DS.add_data(key="input", data=col_remapper_train.get_data("output"), handle_class=PqHandle)
DS, DS.read("input")

In [26]:
aaaaa

NameError: name 'aaaaa' is not defined

In [None]:
table_conv = TableConverter.make_stage(name='table_conv', output_format='numpyDict');

In [None]:
DS.add_data(key="input", data=col_remapper_train.get_data("output"), handle_class=PqHandle)
table_conv.run()
DS, DS.read("input")

In [None]:
DS.add_data(key="input", data=table_conv.get_data("output"), handle_class=PqHandle)
DS, DS.read("input")

In [None]:
DS.add_data(key="input", data=DS["model_inform_bpzlite"].data, handle_class=ModelHandle)
DS, DS.read("input")

In [None]:
estimate_bpz = BPZ_lite.make_stage(
    name='estimate_bpz', 
    hdf5_groupname='', 
    columns_file=bpz_columns_file, 
    #input="inprogress_output_table_conv_train.hdf5", 
    model=inform_bpz.get_handle('model')
)
#estimate_bpz.set_data()
estimate_bpz.config.to_dict()

In [None]:
estimate_bpz.run() ## -> input -> DS
#estimate(data) ## 'input' -> dados

In [None]:
DS

### Evaluator

In [None]:
help(tables_io.types)

In [None]:
DS.read("input")

DS.add_data(key="truth", data=DS["input"].data, handle_class=PqHandle)

In [None]:
DS

In [None]:
table_conv.get_data("output")

In [None]:
##test_data_orig = tables_io.convertObj(table_conv.get_data("output_estimate_bpz"), tables_io.types.NUMPY_DICT)
    
##print(type(table_conv.get_data("output_estimate_bpz")['redshift']))

evaluator = Evaluator8val', truth=DS.read("truth"))

##evaluator.run()
result_dict = evaluator.evaluate(estimate_bpz, test_sample)

In [None]:
help(Evaluator.evaluate)

____

In [None]:
results_tables = tables_io.convertObj(DS.read("output_estimate_bpz").build_tables()['ancil'], tables_io.types.PD_DATAFRAME)
zmode = results_tables['zmode']

In [None]:
plt.figure(figsize=(8,8))
plt.scatter(train_sample['z_true'],zmode,s=1,c='k',label='simple bpz mode')
plt.plot([0,3],[0,3],'r--');
plt.xlabel("true redshift")
plt.ylabel("bpz photo-z")

## CECI pipeline -> undestand this pipeline yaml

In [None]:
import ceci
pipe = ceci.Pipeline.interactive()
stages = [
    # create the test catalog
    #flow_creator_test, lsst_error_model_test, col_remapper_test, table_conv_test,
    col_remapper_train, 
    #table_conv_test,
    # inform the estimators
    inform_bpz,
    # estimate posteriors
    estimate_bpz,
    # evaluator
    #evaluator
]
for stage in stages:
    pipe.add_stage(stage)

In [None]:
help(pipe.initialize)

In [None]:
pipe.initialize(dict(input='inprogress_output_col_remapper_train.pq'), dict(output_dir='.', log_dir='.', resume=False), None)

In [None]:
pipe.save('pipe.yml')

In [None]:
pr = ceci.Pipeline.read('pipe.yml')
pr.run()