# BPZ RAIL - DP0.1

no bringing to memory yet

## Imports

### common libs

In [1]:
import time
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline 

### RAIL

RAIL is a LSST-DESC software created to process different algorithms used to calculate photometric redshift. Its main goal is to minimize impact that different infrastructures can cause on different algorithms, for that it unifyes in a modular code supporting different inputs that different algorithms needs and padronizing the output so that it can be a more fair comparison between their results.

Rail uses 4 principal libraries in its core: <br>
_tables_io_: for data manipulation as hdf5 files, fits, etc. <br>
_qp_: used to paremitrize data PDFs for metrics calculation. <br>
_ceci_: construct pipelines, produces a .yaml within the steps and configurations as threads. <br>
_pzflow_: creates a flow for data creation. <br>

#### Core.
Where the main functions are going to manage the data and files that the program creates. It works based in the behavioral chain of resposability pattern (https://refactoring.guru/pt-br/design-patterns/chain-of-responsibility), where you create a flux in the code, where there is a request related/processed by a class handler that decides to pass it foward or not according to what is defined. So for that, what bpz does is create a class request (eg: Inform_BPZ_lite) that has all the inputs/configurations and is handled by its class handler (BPZ_lite).

#### Creation.
Contain all the support for data creation, as degradors, data flow creation, Column remapping, etc. It creates .hdf5 files with the data that is being manipulated.

#### Estimation.
This is where the codes are defined and executed.  <br>
inform: this is where the PRIORS for template fitting are informed and the machine learning codes are trained. <br>
estimate: where the algorith is executed though the .evaluate() function.
The code is wrapped as a RAIL stage so that it can be run in a controlled way. Estimation code can be stored in a yaml file to be run as a ceci module.


#### Evaluation.
This step contais the metrics for performance of the estimated codes.
<br>
------
For installation instructions check the official documentation: https://lsstdescrail.readthedocs.io/en/latest/source/installation.html <br>
For Rail versions check: https://github.com/LSSTDESC/RAIL/releases

In [2]:
import rail
import qp
import tables_io

from rail.core.data import TableHandle
from rail.core.stage import RailStage
from rail.core.utilStages import ColumnMapper, TableConverter

##from rail.creation.engines.flowEngine import FlowEngine, FlowPosterior

from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite

from rail.evaluation.evaluator import Evaluator

#for rail versions
help(rail)

Help on package rail:

NAME
    rail - RAIL, the Redshift Assesement Infrastructre Layers

PACKAGE CONTENTS
    __main__
    _version
    core (package)
    creation (package)
    estimation (package)
    evaluation (package)
    main
    version

VERSION
    0.96.dev326+ge3e6ed6

FILE
    /home/heloisamengisztki/.local/lib/python3.10/site-packages/rail/__init__.py




### LSST - TAP 

For accessing the data avaliable vis rubin science plataform we are going to use TAP.

TAP is a protocol created to access general table data. 
It uses html and xml to configure and acess the data, wich can be tabular, with key values that are stored in tabbles, one column per keyword, and non tabular such as images, an n-dimensional data. 
Also, it passes as parameters atributes that are configurable, for example, the language and the query that we want trough:

LANG=ADQL<br>
QUERY=< ADQL query string >

    <capability standardID="ivo://ivoa.net/std/TAP"> 
        <!-- BasicAA authentication bundle -->
        <interface xsi:type="urx:Async" role="std" version="1.1">
          <accessURL use="base">https://example.net/myTAP/auth-async</accessURL>
          <securityMethod standardID="ivo://ivoa.net/sso#BasicAA"/>
        </interface>
        <interface xsi:type="urx:Sync" role="std" version="1.1">
          <accessURL use="base">https://example.net/myTAP/auth-sync</accessURL>
          <securityMethod standardID="ivo://ivoa.net/sso#BasicAA"/>
        </interface>
     </capability>

By default it returns a TapResult, witch is a wrapper for the Astropy Table that constains some metadata of the schema that is being stored, that can be accessed by some methods as getColumn(), getRecords(), etc.

Its important to remember that TAP is a protocol to access the database where data is being stored, not the database itself.

TAPResults documentation: https://pyvo.readthedocs.io/en/latest/api/pyvo.dal.TAPResults.html <br>
Oficial documentation: https://www.ivoa.net/documents/TAP/ <br>
video 1: https://www.youtube.com/watch?v=hFmhypXg7JA&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ <br>
video 2:https://www.youtube.com/watch?v=BX10AI0WgMA&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ&index=2 <br>
video 4:https://www.youtube.com/watch?v=szDdL7sqD68&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ&index=3 <br>

In [3]:
from lsst.rsp import get_tap_service

In [4]:
service = get_tap_service()

assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

##### Example of a query

In [5]:
query = "SELECT * FROM tap_schema.schemas"
results = service.search(query)
print(type(results))
results.to_table()

<class 'pyvo.dal.tap.TAPResults'>


description,schema_index,schema_name,utype
str512,int32,str64,str512
Data Preview 0.1 includes five tables based on the DESC's Data Challenge 2 simulation of 300 square degrees of the wide-fast-deep LSST survey region after 5 years. All tables contain objects detected in coadded images.,2,dp01_dc2_catalogs,
"Data Preview 0.2 contains the image and catalog products of the Rubin Science Pipelines v23 processing of the DESC Data Challenge 2 simulation, which covered 300 square degrees of the wide-fast-deep LSST survey region over 5 years.",0,dp02_dc2_catalogs,
ObsCore v1.1 attributes in ObsTAP realization,1,ivoa,
A TAP-standard-mandated schema to describe tablesets in a TAP 1.1 service,100000,tap_schema,
UWS Metadata,120000,uws,


## General Configs

Setting some default number of rows for pandas. So that it doesnt display all of them. 

In [6]:
pd.set_option('display.max_rows', 20)

Defining some variables that will help us with directories. 

In [7]:
CURR_DIR = os.getcwd()
RAIL_DIR = os.path.join(os.path.dirname(rail.__file__), '..')
CURR_DIR, RAIL_DIR

('/home/heloisamengisztki/ic-photoz/Fase 2 - RAIL/bpz_test_rail',
 '/home/heloisamengisztki/.local/lib/python3.10/site-packages/rail/..')

Defining some dictionaries for band names

## Reading DP0.1 csv

In [8]:
interesting_headers = [
"coadd_objects_id",
"ra",
"dec",
"mag_g",
"magerr_g",
"mag_i",
"magerr_i",
"mag_r",
"magerr_r",
"mag_u",
"magerr_u",
"mag_y",
"magerr_y",
"mag_z",
"magerr_z",
"z_true"
]

trainFile = os.path.join(CURR_DIR, 'dp0_train_random.csv')
full_data = pd.read_csv(trainFile)[interesting_headers]

In [9]:
full_data

Unnamed: 0,coadd_objects_id,ra,dec,mag_g,magerr_g,mag_i,magerr_i,mag_r,magerr_r,mag_u,magerr_u,mag_y,magerr_y,mag_z,magerr_z,z_true
0,18599476134425521,60.4467,-34.0560,25.7714,0.0941,25.9107,0.2273,25.6290,0.1011,26.1816,0.3817,,-0.7377,25.6477,0.5400,2.842380
1,13542134963533657,59.2224,-43.1165,27.0861,0.3413,28.7258,3.1551,27.1896,0.3672,26.4664,0.5607,26.0376,1.3945,24.9601,0.2299,2.888735
2,18617081205359130,67.6464,-33.5759,27.2174,0.3407,25.6075,0.1358,26.8622,0.2659,26.7561,0.5764,24.4690,0.3227,25.2444,0.2957,1.290350
3,17724148914627425,65.1607,-34.4085,26.0648,0.1233,26.3745,0.3547,25.7113,0.1153,26.7917,0.7288,26.2090,1.7678,,,2.442620
4,14373666401847353,73.0255,-40.2059,23.5788,0.0139,23.4418,0.0246,23.6343,0.0144,23.7023,0.0423,22.8774,0.0651,23.3789,0.0525,1.463598
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97550,16862393791441100,67.9314,-36.7758,28.7720,1.2271,26.3591,0.2543,27.0489,0.2929,27.7367,1.2957,25.3359,0.6981,26.2891,0.6279,0.710113
97551,13568390098622777,71.5996,-42.7991,25.1746,0.0438,23.9880,0.0309,24.4274,0.0262,25.6405,0.1765,23.8054,0.1595,23.9054,0.0724,0.591477
97552,17737888514977919,69.6233,-34.6987,26.1694,0.1524,25.7747,0.2401,25.7030,0.1107,26.0738,0.4401,24.6318,0.3878,24.7979,0.2691,1.343738
97553,20398831208241442,54.1159,-30.9220,25.4019,0.0625,25.1217,0.1161,25.3886,0.0728,26.1865,0.3406,24.1526,0.2369,25.0237,0.2213,1.418071


In [10]:
size = len(full_data)//2

train = full_data.sample(n=size,random_state=1, ignore_index=True)
test = full_data.drop(train.index)

In [11]:
train

Unnamed: 0,coadd_objects_id,ra,dec,mag_g,magerr_g,mag_i,magerr_i,mag_r,magerr_r,mag_u,magerr_u,mag_y,magerr_y,mag_z,magerr_z,z_true
0,15146876479245061,55.9912,-39.8580,25.3569,0.1272,24.4244,0.1182,24.7027,0.0822,26.3878,0.9465,23.7964,0.3250,23.5548,0.1567,1.110183
1,20402700973780462,56.6973,-29.8747,24.3402,0.1798,23.2175,0.1217,23.5304,0.0819,,-4.6963,22.3643,0.2561,22.6468,0.1895,2.960215
2,15146880774191099,56.1257,-39.6573,26.7927,0.1962,26.6570,0.3607,26.7903,0.2178,,-0.5771,25.1050,0.5014,28.3610,4.9414,2.783982
3,14329252145042598,54.4176,-41.3355,27.0715,0.2502,25.3793,0.0970,26.0541,0.0984,27.2076,0.6585,24.3831,0.2446,25.9385,0.5353,0.879006
4,15155956040115379,59.3510,-39.3628,26.5342,0.1394,25.3955,0.1106,26.1097,0.1083,26.9129,0.6230,24.1330,0.1817,24.7884,0.1876,0.969237
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48772,21352941013181346,63.9166,-28.9264,28.6511,1.0344,26.1371,0.1969,26.8875,0.2119,29.3414,4.9959,24.6868,0.3228,25.5203,0.3708,0.990902
48773,20389493949351383,51.7870,-30.5999,26.5151,0.2009,26.7896,0.5088,26.5411,0.1950,26.6083,0.5653,27.6912,6.5362,25.7321,0.4426,1.228950
48774,13559181688735488,68.5191,-42.8364,24.9395,0.0384,23.5482,0.0247,23.9914,0.0190,26.0086,0.3154,23.2388,0.0939,23.1635,0.0457,0.295909
48775,13542156438370531,59.3062,-42.0096,24.0051,0.0213,22.3903,0.0101,23.2414,0.0113,24.4748,0.0711,21.7622,0.0261,21.8209,0.0159,0.992519


In [12]:
test

Unnamed: 0,coadd_objects_id,ra,dec,mag_g,magerr_g,mag_i,magerr_i,mag_r,magerr_r,mag_u,magerr_u,mag_y,magerr_y,mag_z,magerr_z,z_true
48777,19514982773326478,65.3453,-31.2907,23.9389,0.0214,21.2609,0.0038,22.4342,0.0057,,,20.6881,0.0117,20.8660,0.0075,0.670809
48778,13532948028486378,56.3211,-42.0123,25.6934,0.1192,24.0256,0.0401,24.8676,0.0485,25.9866,0.2969,23.6445,0.1476,23.8769,0.1054,0.761477
48779,17702704142920334,55.1709,-34.6615,25.1734,0.0453,23.5145,0.0232,24.1467,0.0213,26.0631,0.3361,23.2643,0.0934,23.3686,0.0586,0.650587
48780,13559598300559474,67.6053,-42.6779,24.5163,0.0279,21.8810,0.0057,23.0119,0.0077,28.5363,3.0016,21.1603,0.0146,21.5076,0.0100,0.601929
48781,22316242048156479,71.0513,-27.9783,26.5334,0.1679,26.8955,0.6553,27.1343,0.4135,27.0412,0.9234,,-2.3513,31.1752,84.0580,1.262048
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
97550,16862393791441100,67.9314,-36.7758,28.7720,1.2271,26.3591,0.2543,27.0489,0.2929,27.7367,1.2957,25.3359,0.6981,26.2891,0.6279,0.710113
97551,13568390098622777,71.5996,-42.7991,25.1746,0.0438,23.9880,0.0309,24.4274,0.0262,25.6405,0.1765,23.8054,0.1595,23.9054,0.0724,0.591477
97552,17737888514977919,69.6233,-34.6987,26.1694,0.1524,25.7747,0.2401,25.7030,0.1107,26.0738,0.4401,24.6318,0.3878,24.7979,0.2691,1.343738
97553,20398831208241442,54.1159,-30.9220,25.4019,0.0625,25.1217,0.1161,25.3886,0.0728,26.1865,0.3406,24.1526,0.2369,25.0237,0.2213,1.418071


---

##  RAIL BPZ

### Core - Data Storage 

In [13]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True

Basically Rail store data in a transient class DataStore, this class associate keys and products in a dictionary, so that when program need some step it has the functions that read, writes, and a data handlers.

A DataHandler basically is a class that act like a handler for some data. What it does is that it associates the data with a file and the tool to read the file. The DataStore stores those handlers and their files associated with a key. So that when the algorithms process they are can propperly read the file content.

In [14]:
columns_remmap = {
"coadd_objects_id": "id",
"ra": "coord_ra",
"dec": "coord_dec",
"mag_g": "mag_g_lsst",
"magerr_g": "mag_err_g_lsst",
"mag_i": "mag_r_lsst",
"magerr_i": "mag_err_r_lsst",
"mag_r": "mag_i_lsst",
"magerr_r": "mag_err_i_lsst",
"mag_u": "mag_u_lsst",
"magerr_u": "mag_err_u_lsst",
"mag_y": "mag_y_lsst",
"magerr_y": "mag_err_y_lsst",
"mag_z": "mag_z_lsst",
"magerr_z": "mag_err_z_lsst",
"z_true": "redshift"
}

col_remapper_train = ColumnMapper.make_stage(name='col_remapper_train', columns=columns_remmap)
table_conv_train = TableConverter.make_stage(name='table_conv_train', output_format='numpyDict')

results_remmaped = col_remapper_train(train)
train_data = table_conv_train(results_remmaped)

Inserting handle into data store.  input: None, col_remapper_train
Inserting handle into data store.  output_col_remapper_train: inprogress_output_col_remapper_train.pq, col_remapper_train
Inserting handle into data store.  output_table_conv_train: inprogress_output_table_conv_train.hdf5, table_conv_train


As we can see, ceci stages basically configures the name and some configuration, so that when the stage runs, it return a TableHander, such as a PqHandler, Hdf5Handle or FitsHandle. 

obs: For machine leaning algorithms if may be necessary to configure a flowHandler too.

In [15]:
results_remmaped.data

Unnamed: 0,id,coord_ra,coord_dec,mag_g_lsst,mag_err_g_lsst,mag_r_lsst,mag_err_r_lsst,mag_i_lsst,mag_err_i_lsst,mag_u_lsst,mag_err_u_lsst,mag_y_lsst,mag_err_y_lsst,mag_z_lsst,mag_err_z_lsst,redshift
0,15146876479245061,55.9912,-39.8580,25.3569,0.1272,24.4244,0.1182,24.7027,0.0822,26.3878,0.9465,23.7964,0.3250,23.5548,0.1567,1.110183
1,20402700973780462,56.6973,-29.8747,24.3402,0.1798,23.2175,0.1217,23.5304,0.0819,,-4.6963,22.3643,0.2561,22.6468,0.1895,2.960215
2,15146880774191099,56.1257,-39.6573,26.7927,0.1962,26.6570,0.3607,26.7903,0.2178,,-0.5771,25.1050,0.5014,28.3610,4.9414,2.783982
3,14329252145042598,54.4176,-41.3355,27.0715,0.2502,25.3793,0.0970,26.0541,0.0984,27.2076,0.6585,24.3831,0.2446,25.9385,0.5353,0.879006
4,15155956040115379,59.3510,-39.3628,26.5342,0.1394,25.3955,0.1106,26.1097,0.1083,26.9129,0.6230,24.1330,0.1817,24.7884,0.1876,0.969237
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48772,21352941013181346,63.9166,-28.9264,28.6511,1.0344,26.1371,0.1969,26.8875,0.2119,29.3414,4.9959,24.6868,0.3228,25.5203,0.3708,0.990902
48773,20389493949351383,51.7870,-30.5999,26.5151,0.2009,26.7896,0.5088,26.5411,0.1950,26.6083,0.5653,27.6912,6.5362,25.7321,0.4426,1.228950
48774,13559181688735488,68.5191,-42.8364,24.9395,0.0384,23.5482,0.0247,23.9914,0.0190,26.0086,0.3154,23.2388,0.0939,23.1635,0.0457,0.295909
48775,13542156438370531,59.3062,-42.0096,24.0051,0.0213,22.3903,0.0101,23.2414,0.0113,24.4748,0.0711,21.7622,0.0261,21.8209,0.0159,0.992519


In [16]:
full_sample = tables_io.convertObj(train_data.data, tables_io.types.PD_DATAFRAME)
full_sample

Unnamed: 0,id,coord_ra,coord_dec,mag_g_lsst,mag_err_g_lsst,mag_r_lsst,mag_err_r_lsst,mag_i_lsst,mag_err_i_lsst,mag_u_lsst,mag_err_u_lsst,mag_y_lsst,mag_err_y_lsst,mag_z_lsst,mag_err_z_lsst,redshift
0,15146876479245061,55.9912,-39.8580,25.3569,0.1272,24.4244,0.1182,24.7027,0.0822,26.3878,0.9465,23.7964,0.3250,23.5548,0.1567,1.110183
1,20402700973780462,56.6973,-29.8747,24.3402,0.1798,23.2175,0.1217,23.5304,0.0819,,-4.6963,22.3643,0.2561,22.6468,0.1895,2.960215
2,15146880774191099,56.1257,-39.6573,26.7927,0.1962,26.6570,0.3607,26.7903,0.2178,,-0.5771,25.1050,0.5014,28.3610,4.9414,2.783982
3,14329252145042598,54.4176,-41.3355,27.0715,0.2502,25.3793,0.0970,26.0541,0.0984,27.2076,0.6585,24.3831,0.2446,25.9385,0.5353,0.879006
4,15155956040115379,59.3510,-39.3628,26.5342,0.1394,25.3955,0.1106,26.1097,0.1083,26.9129,0.6230,24.1330,0.1817,24.7884,0.1876,0.969237
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48772,21352941013181346,63.9166,-28.9264,28.6511,1.0344,26.1371,0.1969,26.8875,0.2119,29.3414,4.9959,24.6868,0.3228,25.5203,0.3708,0.990902
48773,20389493949351383,51.7870,-30.5999,26.5151,0.2009,26.7896,0.5088,26.5411,0.1950,26.6083,0.5653,27.6912,6.5362,25.7321,0.4426,1.228950
48774,13559181688735488,68.5191,-42.8364,24.9395,0.0384,23.5482,0.0247,23.9914,0.0190,26.0086,0.3154,23.2388,0.0939,23.1635,0.0457,0.295909
48775,13542156438370531,59.3062,-42.0096,24.0051,0.0213,22.3903,0.0101,23.2414,0.0113,24.4748,0.0711,21.7622,0.0261,21.8209,0.0159,0.992519


Here we should have somewhere a redshift result from other surveys.

### PRIORS - Inform

In [17]:
columns_file = os.path.join(CURR_DIR, 'configs/bpz.columns')
inform_bpz = Inform_BPZ_lite.make_stage(
    name='inform_bpzlite', 
    input="inprogress_output_table_conv_train.hdf5", 
    model='trained_BPZ_output.pkl', ##não precisaria isso pro bpz
    hdf5_groupname='', 
    columns_file=columns_file
)
inform_bpz.config.to_dict()

{'output_mode': 'default',
 'hdf5_groupname': '',
 'save_train': True,
 'zmin': 0.0,
 'zmax': 3.0,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/ic-photoz/Fase 2 - RAIL/bpz_test_rail/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'm0': 20.0,
 'nt_array': [1, 2, 3],
 'mmin': 18.0,
 'mmax': 29.0,
 'init_kt': 0.3,
 'init_zo': 0.4,
 'init_alpha': 1.8,
 'init_km': 0.1,
 'prior_band': 'mag_i_lsst',
 'redshift_col': 'redshift',
 'type_file': '',
 'name': 'inform_bpzlite',
 'input': 'inprogress_output_table_conv_train.hdf5',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'aliases': {'model': 'model_inform_bpzlite'}}

In [18]:
%%time
returned = inform_bpz.inform(train_data)

using 47908 galaxies in calculation
best values for fo and kt:
[1.]
[0.3]
minimizing for type 0
[0.4 1.8 0.1] 41953.05029263551
[0.42 1.8  0.1 ] 41701.94438691718
[0.4  1.89 0.1 ] 43119.474032263
[0.4   1.8   0.105] 41598.858064932094
[0.41333333 1.71       0.10333333] 41148.67092393496
[0.42  1.62  0.105] 41294.13274892067
[0.42222222 1.74       0.10555556] 41265.494427614685
[0.4037037  1.7        0.10925926] 41206.678145131445
[0.42617284 1.63333333 0.10709877] 41462.63851720793
[0.41962963 1.675      0.10657407] 41246.057699269186
[0.40222222 1.65       0.10722222] 41149.12391113318
[0.39320988 1.69833333 0.1066358 ] 41097.48116228586
[0.38       1.71       0.10666667] 41129.0938153824
[0.40213992 1.67222222 0.10220165] 41044.939511296776
[0.40135802 1.65833333 0.09867284] 41044.93449508369
[0.40304527 1.72777778 0.09853909] 41394.17775894709
[0.40242798 1.66944444 0.10505144] 41071.87589253207
[0.38466392 1.64074074 0.10357339] 40971.81566319697
[0.37032922 1.60611111 0.10369342] 

___

## Posterior -> Estimate


In [19]:
results_remmaped = col_remapper_train(test)
test_data = table_conv_train(results_remmaped)

Inserting handle into data store.  output_col_remapper_train: inprogress_output_col_remapper_train.pq, col_remapper_train
Inserting handle into data store.  output_table_conv_train: inprogress_output_table_conv_train.hdf5, table_conv_train


In [20]:
estimate_bpz = BPZ_lite.make_stage(
    name='estimate_bpz', 
    hdf5_groupname='', 
    columns_file=columns_file, 
    model=inform_bpz.get_handle('model'))
estimate_bpz.is_parallel()

False

In [None]:
bpz_estimated = estimate_bpz.estimate(test_data)

Process 0 running estimator on chunk 0 - 48777


In [None]:
bpz_estimated().build_tables()

results_tables = tables_io.convertObj(bpz_estimated().build_tables()['ancil'], tables_io.types.PD_DATAFRAME)
results_tables

In [None]:
test_data_orig = test_data.data

evaluator = Evaluator.make_stage(name=f'bpz_eval', truth=test_data_orig)
result_dict = evaluator.evaluate(bpz_estimated, test_data_orig)

In [None]:
results_tables = tables_io.convertObj(result_dict.data, tables_io.types.PD_DATAFRAME)
results_tables.head()

___
## VOU MEXER AINDA - Resultado pz x spec-z

In [None]:
zmode = bpz_estimated().ancil['zmode']

In [None]:
plt.figure(figsize=(8,8))
plt.scatter(test_data()['redshift'],zmode,s=1,c='k',label='simple bpz mode')
plt.plot([0,3],[0,3],'r--');
plt.xlabel("true redshift")
plt.ylabel("bpz photo-z")

### PIPELINES CECI

In [None]:
import ceci
pipe = ceci.Pipeline.interactive()
stages = [flow_engine_train, lsst_error_model_train, inv_redshift,
          line_confusion, quantity_cut, col_remapper_train, table_conv_train,
          flow_engine_test, lsst_error_model_test, col_remapper_test, table_conv_test,  
          inform_knn, inform_fzboost, inform_bpz, estimate_knn, 
          estimate_fzboost, estimate_bpz, point_estimate_test,
          naive_stack_test]
for stage in stages:
    pipe.add_stage(stage)

In [None]:
pipe.initialize(dict(flow=flow_file), dict(output_dir='.', log_dir='.', resume=False), None)
pipe.save('bpz_pipeline.yml')