# BPZ RAIL - DP0.2

no bringing to memory yet

## Imports

### common libs

In [1]:
import time
import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline 

### RAIL

RAIL is a LSST-DESC software created to process different algorithms used to calculate photometric redshift. Its main goal is to minimize impact that different infrastructures can cause on different algorithms, for that it unifyes in a modular code supporting different inputs that different algorithms needs and padronizing the output so that it can be a more fair comparison between their results.

Rail uses 4 principal libraries in its core: <br>
_tables_io_: for data manipulation as hdf5 files, fits, etc. <br>
_qp_: used to paremitrize data PDFs for metrics calculation. <br>
_ceci_: construct pipelines, produces a .yaml within the steps and configurations as threads. <br>
_pzflow_: creates a flow for data creation. <br>

#### Core.
Where the main functions are going to manage the data and files that the program creates. It works based in the behavioral chain of resposability pattern (https://refactoring.guru/pt-br/design-patterns/chain-of-responsibility), where you create a flux in the code, where there is a request related/processed by a class handler that decides to pass it foward or not according to what is defined. So for that, what bpz does is create a class request (eg: Inform_BPZ_lite) that has all the inputs/configurations and is handled by its class handler (BPZ_lite).

#### Creation.
Contain all the support for data creation, as degradors, data flow creation, Column remapping, etc. It creates .hdf5 files with the data that is being manipulated.

#### Estimation.
This is where the codes are defined and executed.  <br>
inform: this is where the PRIORS for template fitting are informed and the machine learning codes are trained. <br>
estimate: where the algorith is executed though the .evaluate() function.
The code is wrapped as a RAIL stage so that it can be run in a controlled way. Estimation code can be stored in a yaml file to be run as a ceci module.


#### Evaluation.
This step contais the metrics for performance of the estimated codes.
<br>
------
For installation instructions check the official documentation: https://lsstdescrail.readthedocs.io/en/latest/source/installation.html <br>
For Rail versions check: https://github.com/LSSTDESC/RAIL/releases

In [2]:
import rail
import qp
import tables_io

from rail.core.data import TableHandle
from rail.core.stage import RailStage
from rail.core.utilStages import ColumnMapper, TableConverter

##from rail.creation.engines.flowEngine import FlowEngine, FlowPosterior

from rail.estimation.algos.bpz_lite import Inform_BPZ_lite, BPZ_lite

from rail.evaluation.evaluator import Evaluator

#for rail versions
help(rail)

Help on package rail:

NAME
    rail - RAIL, the Redshift Assesement Infrastructre Layers

PACKAGE CONTENTS
    __main__
    _version
    core (package)
    creation (package)
    estimation (package)
    evaluation (package)
    main
    version

VERSION
    0.96.dev326+ge3e6ed6

FILE
    /home/heloisamengisztki/.local/lib/python3.10/site-packages/rail/__init__.py




### LSST - TAP 

For accessing the data avaliable vis rubin science plataform we are going to use TAP.

TAP is a protocol created to access general table data. 
It uses html and xml to configure and acess the data, wich can be tabular, with key values that are stored in tabbles, one column per keyword, and non tabular such as images, an n-dimensional data. 
Also, it passes as parameters atributes that are configurable, for example, the language and the query that we want trough:

LANG=ADQL<br>
QUERY=< ADQL query string >

    <capability standardID="ivo://ivoa.net/std/TAP"> 
        <!-- BasicAA authentication bundle -->
        <interface xsi:type="urx:Async" role="std" version="1.1">
          <accessURL use="base">https://example.net/myTAP/auth-async</accessURL>
          <securityMethod standardID="ivo://ivoa.net/sso#BasicAA"/>
        </interface>
        <interface xsi:type="urx:Sync" role="std" version="1.1">
          <accessURL use="base">https://example.net/myTAP/auth-sync</accessURL>
          <securityMethod standardID="ivo://ivoa.net/sso#BasicAA"/>
        </interface>
     </capability>

By default it returns a TapResult, witch is a wrapper for the Astropy Table that constains some metadata of the schema that is being stored, that can be accessed by some methods as getColumn(), getRecords(), etc.

Its important to remember that TAP is a protocol to access the database where data is being stored, not the database itself.

TAPResults documentation: https://pyvo.readthedocs.io/en/latest/api/pyvo.dal.TAPResults.html <br>
Oficial documentation: https://www.ivoa.net/documents/TAP/ <br>
video 1: https://www.youtube.com/watch?v=hFmhypXg7JA&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ <br>
video 2:https://www.youtube.com/watch?v=BX10AI0WgMA&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ&index=2 <br>
video 4:https://www.youtube.com/watch?v=szDdL7sqD68&list=PL7kL5D8ITGyXDJYyms0rjzt9o-wDg-rKQ&index=3 <br>

In [3]:
from lsst.rsp import get_tap_service

In [4]:
service = get_tap_service()

assert service is not None
assert service.baseurl == "https://data.lsst.cloud/api/tap"

##### Example of a query

In [5]:
query = "SELECT * FROM tap_schema.schemas"
results = service.search(query)
print(type(results))
results.to_table()

<class 'pyvo.dal.tap.TAPResults'>


description,schema_index,schema_name,utype
str512,int32,str64,str512
Data Preview 0.1 includes five tables based on the DESC's Data Challenge 2 simulation of 300 square degrees of the wide-fast-deep LSST survey region after 5 years. All tables contain objects detected in coadded images.,2,dp01_dc2_catalogs,
"Data Preview 0.2 contains the image and catalog products of the Rubin Science Pipelines v23 processing of the DESC Data Challenge 2 simulation, which covered 300 square degrees of the wide-fast-deep LSST survey region over 5 years.",0,dp02_dc2_catalogs,
ObsCore v1.1 attributes in ObsTAP realization,1,ivoa,
A TAP-standard-mandated schema to describe tablesets in a TAP 1.1 service,100000,tap_schema,
UWS Metadata,120000,uws,


## General Configs

Setting some default number of rows for pandas. So that it doesnt display all of them. 

In [6]:
pd.set_option('display.max_rows', 20)

Defining some variables that will help us with directories. 

In [7]:
CURR_DIR = os.getcwd()
RAIL_DIR = os.path.join(os.path.dirname(rail.__file__), '..')
CURR_DIR, RAIL_DIR

('/home/heloisamengisztki/ic-photoz/RAIL/bpz_test_rail',
 '/home/heloisamengisztki/.local/lib/python3.10/site-packages/rail/..')

## Reading DP0.2 data

the catalog with columns for dp 0.2 data can https://dm.lsst.org/sdm_schemas/browser/dp02.html

In [8]:
max_rec = 1000
use_center_coords = "62, -37"
use_radius = "1.0"

In [9]:
bands = ['g', 'i', 'r', 'u', 'y', 'z']

mags = ""
for band in bands:
    mags+= f"scisql_nanojanskyToAbMag({band}_cModelFlux) AS mag_{band}_cModel, {band}_cModelFluxErr, "

columns_query = f"objectId, {mags}coord_ra, coord_dec "

for this quey there is *detect_isPrimary* wich means that the source has no children, so that is already the final object. (this explanation is not very clear, but ok) and *r_extendedness* that defines if the object is a star or a galaxy, being 1 for galaxies and 0 for point objects such as starts.

In [10]:
query = "SELECT " + columns_query + \
        "FROM dp02_dc2_catalogs.Object " + \
        "WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', " + use_center_coords + ", " + use_radius + ")) = 1 " + \
        "AND detect_isPrimary = 1 " + \
        "AND r_extendedness = 1 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 " + \
        "AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 "
print(query)

SELECT objectId, scisql_nanojanskyToAbMag(g_cModelFlux) AS mag_g_cModel, g_cModelFluxErr, scisql_nanojanskyToAbMag(i_cModelFlux) AS mag_i_cModel, i_cModelFluxErr, scisql_nanojanskyToAbMag(r_cModelFlux) AS mag_r_cModel, r_cModelFluxErr, scisql_nanojanskyToAbMag(u_cModelFlux) AS mag_u_cModel, u_cModelFluxErr, scisql_nanojanskyToAbMag(y_cModelFlux) AS mag_y_cModel, y_cModelFluxErr, scisql_nanojanskyToAbMag(z_cModelFlux) AS mag_z_cModel, z_cModelFluxErr, coord_ra, coord_dec FROM dp02_dc2_catalogs.Object WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', 62, -37, 1.0)) = 1 AND detect_isPrimary = 1 AND r_extendedness = 1 AND scisql_nanojanskyToAbMag(r_cModelFlux) > 17.0 AND scisql_nanojanskyToAbMag(r_cModelFlux) < 23.0 


In [11]:
%%time
results = service.search(query, maxrec=max_rec)
print(type(results))
results = results.to_table()
print(type(results))
results_pd = results.to_pandas()
results_pd.info(memory_usage="deep")

<class 'pyvo.dal.tap.TAPResults'>
<class 'astropy.table.table.Table'>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   objectId         1000 non-null   int64  
 1   mag_g_cModel     1000 non-null   float64
 2   g_cModelFluxErr  1000 non-null   float64
 3   mag_i_cModel     1000 non-null   float64
 4   i_cModelFluxErr  999 non-null    float64
 5   mag_r_cModel     1000 non-null   float64
 6   r_cModelFluxErr  1000 non-null   float64
 7   mag_u_cModel     988 non-null    float64
 8   u_cModelFluxErr  1000 non-null   float64
 9   mag_y_cModel     1000 non-null   float64
 10  y_cModelFluxErr  1000 non-null   float64
 11  mag_z_cModel     999 non-null    float64
 12  z_cModelFluxErr  1000 non-null   float64
 13  coord_ra         1000 non-null   float64
 14  coord_dec        1000 non-null   float64
dtypes: float64(14), int64(1)
memory usage

In [12]:
results_pd.head()

Unnamed: 0,objectId,mag_g_cModel,g_cModelFluxErr,mag_i_cModel,i_cModelFluxErr,mag_r_cModel,r_cModelFluxErr,mag_u_cModel,u_cModelFluxErr,mag_y_cModel,y_cModelFluxErr,mag_z_cModel,z_cModelFluxErr,coord_ra,coord_dec
0,1567798028092359809,21.925642,28.080038,21.595578,59.048393,21.900913,30.657338,21.846707,89.020718,21.006309,306.253558,21.203053,164.601396,61.290253,-37.823108
1,1567929969487672063,21.646285,16.188883,21.353752,27.827194,21.416171,15.749252,22.337348,47.119185,21.40412,159.674967,21.346525,78.693758,61.113473,-37.588913
2,1567929969487672056,23.709952,21.024943,22.374693,47.538013,22.767477,24.407099,25.043426,64.425174,21.931886,235.726572,22.158206,127.810631,61.067928,-37.588971
3,1567929969487672098,22.429075,17.268169,21.79028,35.558223,22.332013,18.185516,22.315114,58.418107,21.368171,188.847028,21.495673,102.538001,61.124363,-37.589575
4,1567929969487672099,23.759419,25.970056,21.459094,59.634345,22.567717,29.900368,24.229074,79.491818,20.71586,299.483143,20.925544,164.407268,61.123258,-37.589019


---

##  RAIL BPZ

### Core - Data Storage 

In [13]:
DS = RailStage.data_store
DS.__class__.allow_overwrite = True

Basically Rail store data in a transient class DataStore, this class associate keys and products in a dictionary, so that when program need some step it has the functions that read, writes, and a data handlers.

A DataHandler basically is a class that act like a handler for some data. What it does is that it associates the data with a file and the tool to read the file. The DataStore stores those handlers and their files associated with a key. So that when the algorithms process they are can propperly read the file content.

In [14]:
DS

DataStore
{}

In [15]:
columns_remmap = {
"objectId": "id",
"coord_ra": "coord_ra",
"coord_dec": "coord_dec",
"mag_g_cModel": "mag_g_lsst",
"g_cModelFluxErr": "mag_err_g_lsst",
"mag_i_cModel": "mag_r_lsst",
"i_cModelFluxErr": "mag_err_r_lsst",
"mag_r_cModel": "mag_i_lsst",
"r_cModelFluxErr": "mag_err_i_lsst",
"mag_u_cModel": "mag_u_lsst",
"u_cModelFluxErr": "mag_err_u_lsst",
"mag_y_cModel": "mag_y_lsst",
"y_cModelFluxErr": "mag_err_y_lsst",
"mag_z_cModel": "mag_z_lsst",
"z_cModelFluxErr": "mag_err_z_lsst",
"detect_isPrimary": "detect_isPrimary"
}

col_remapper_train = ColumnMapper.make_stage(name='col_remapper_train', columns=columns_remmap)
table_conv_train = TableConverter.make_stage(name='table_conv_train', output_format='numpyDict')

results_remmaped = col_remapper_train(results_pd)
## the redshift value is required and it is going to come from other surveys 
results_remmaped.data["redshift"] = 1

train_data = table_conv_train(results_remmaped)

Inserting handle into data store.  input: None, col_remapper_train
Inserting handle into data store.  output_col_remapper_train: inprogress_output_col_remapper_train.pq, col_remapper_train
Inserting handle into data store.  output_table_conv_train: inprogress_output_table_conv_train.hdf5, table_conv_train


As we can see, ceci stages basically configures the name and some configuration, so that when the stage runs, it return a TableHander, such as a PqHandler, Hdf5Handle or FitsHandle. 

obs: For machine leaning algorithms if may be necessary to configure a flowHandler too.

In [16]:
type(results_remmaped), type(train_data)

(rail.core.data.PqHandle, rail.core.data.Hdf5Handle)

In [17]:
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
  output_table_conv_train:<class 'rail.core.data.Hdf5Handle'> inprogress_output_table_conv_train.hdf5, (d)
}

In [18]:
test_table = tables_io.convertObj(train_data.data, tables_io.types.PD_DATAFRAME)
test_table.head()

Unnamed: 0,id,mag_g_lsst,mag_err_g_lsst,mag_r_lsst,mag_err_r_lsst,mag_i_lsst,mag_err_i_lsst,mag_u_lsst,mag_err_u_lsst,mag_y_lsst,mag_err_y_lsst,mag_z_lsst,mag_err_z_lsst,coord_ra,coord_dec,redshift
0,1567798028092359809,21.925642,28.080038,21.595578,59.048393,21.900913,30.657338,21.846707,89.020718,21.006309,306.253558,21.203053,164.601396,61.290253,-37.823108,1
1,1567929969487672063,21.646285,16.188883,21.353752,27.827194,21.416171,15.749252,22.337348,47.119185,21.40412,159.674967,21.346525,78.693758,61.113473,-37.588913,1
2,1567929969487672056,23.709952,21.024943,22.374693,47.538013,22.767477,24.407099,25.043426,64.425174,21.931886,235.726572,22.158206,127.810631,61.067928,-37.588971,1
3,1567929969487672098,22.429075,17.268169,21.79028,35.558223,22.332013,18.185516,22.315114,58.418107,21.368171,188.847028,21.495673,102.538001,61.124363,-37.589575,1
4,1567929969487672099,23.759419,25.970056,21.459094,59.634345,22.567717,29.900368,24.229074,79.491818,20.71586,299.483143,20.925544,164.407268,61.123258,-37.589019,1


Here we should have somewhere a redshift result from other surveys.

### PRIORS - Inform

In [19]:
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
  output_table_conv_train:<class 'rail.core.data.Hdf5Handle'> inprogress_output_table_conv_train.hdf5, (d)
}

observe what is happening with the aliases as we go

In [20]:
columns_file = os.path.join(CURR_DIR, 'configs/bpz.columns')
inform_bpz = Inform_BPZ_lite.make_stage(
    name='inform_bpzlite', 
    input="inprogress_output_table_conv_train.hdf5", 
    model='trained_BPZ_output.pkl', ##não precisaria isso pro bpz
    hdf5_groupname='', 
    columns_file=columns_file
)
inform_bpz.config.to_dict()

{'output_mode': 'default',
 'hdf5_groupname': '',
 'save_train': True,
 'zmin': 0.0,
 'zmax': 3.0,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/ic-photoz/RAIL/bpz_test_rail/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'm0': 20.0,
 'nt_array': [1, 2, 3],
 'mmin': 18.0,
 'mmax': 29.0,
 'init_kt': 0.3,
 'init_zo': 0.4,
 'init_alpha': 1.8,
 'init_km': 0.1,
 'prior_band': 'mag_i_lsst',
 'redshift_col': 'redshift',
 'type_file': '',
 'name': 'inform_bpzlite',
 'input': 'inprogress_output_table_conv_train.hdf5',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'aliases': {'model': 'model_inform_bpzlite'}}

In [21]:
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
  output_table_conv_train:<class 'rail.core.data.Hdf5Handle'> inprogress_output_table_conv_train.hdf5, (d)
}

In [22]:
type(train_data)

rail.core.data.Hdf5Handle

In [23]:
help(inform_bpz.inform)

Help on method inform in module rail.estimation.estimator:

inform(training_data) method of rail.estimation.algos.bpz_lite.Inform_BPZ_lite instance
    The main interface method for Informers
    
    This will attach the input_data to this `Informer`
    (for introspection and provenance tracking).
    
    Then it will call the run() and finalize() methods, which need to
    be implemented by the sub-classes.
    
    The run() method will need to register the model that it creates to this Estimator
    by using `self.add_data('model', model)`.
    
    Finally, this will return a ModelHandle providing access to the trained model.
    
    Parameters
    ----------
    input_data : `dict` or `TableHandle`
        dictionary of all input data, or a `TableHandle` providing access to it
    
    Returns
    -------
    model : ModelHandle
        Handle providing access to trained model



In [24]:
%%time
returned = inform_bpz.inform(train_data)

using 992 galaxies in calculation
best values for fo and kt:
[1.]
[0.3]
minimizing for type 0
[0.4 1.8 0.1] 637.8301424645861
[0.42 1.8  0.1 ] 543.2850005461503
[0.4  1.89 0.1 ] 708.1390126137383
[0.4   1.8   0.105] 626.7556480840133
[0.41333333 1.71       0.10333333] 523.7168783196857
[0.42  1.62  0.105] 474.3048204960172
[0.42666667 1.68       0.10666667] 467.35390508030304
[0.44 1.62 0.11] 424.9761634042651
[0.45333333 1.56       0.105     ] 403.01636486704643
[0.48  1.44  0.105] 403.2777074065231
[0.45555556 1.4        0.11333333] 438.52156436393
[0.47925926 1.43333333 0.11388889] 413.24949892032276
[0.45950617 1.67555556 0.10592593] 387.9379056603095
[0.46148148 1.81333333 0.10222222] 398.93345222406566
[0.48806584 1.49259259 0.10654321] 381.43067022891
[0.51209877 1.42888889 0.10481481] 395.47308645913955
[0.45467764 1.71876543 0.0977572 ] 407.29982429039126
[0.46082305 1.64740741 0.10179012] 386.8015147687354
[0.48559671 1.65037037 0.10450617] 346.7846296880057
[0.5017284  1.695

In [25]:
type(returned)

rail.core.data.ModelHandle

In [26]:
inform_bpz.config.to_dict()

{'output_mode': 'default',
 'hdf5_groupname': '',
 'save_train': True,
 'zmin': 0.0,
 'zmax': 3.0,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/ic-photoz/RAIL/bpz_test_rail/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'm0': 20.0,
 'nt_array': [1, 2, 3],
 'mmin': 18.0,
 'mmax': 29.0,
 'init_kt': 0.3,
 'init_zo': 0.4,
 'init_alpha': 1.8,
 'init_km': 0.1,
 'prior_band': 'mag_i_lsst',
 'redshift_col': 'redshift',
 'type_file': '',
 'name': 'inform_bpzlite',
 'input': 'inprogress_output_table_conv_train.hdf5',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'aliases': {'model': 'model_inform_bpzlite',
  'input': 'output_table_conv_train'}}

In [27]:
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
  output_table_conv_train:<class 'rail.core.data.Hdf5Handle'> inprogress_output_table_conv_train.hdf5, (d)
  model_inform_bpzlite:<class 'rail.core.data.ModelHandle'> trained_BPZ_output.pkl, (wd)
}

___

## Posterior -> Estimate


In [28]:
estimate_bpz = BPZ_lite.make_stage(
    name='estimate_bpz', 
    hdf5_groupname='', 
    columns_file=columns_file, 
    model=inform_bpz.get_handle('model'))
estimate_bpz.is_parallel()

False

In [29]:
help(estimate_bpz.estimate)

Help on method estimate in module rail.estimation.estimator:

estimate(input_data) method of rail.estimation.algos.bpz_lite.BPZ_lite instance
    The main interface method for the photo-z estimation
    
    This will attach the input_data to this `Estimator`
    (for introspection and provenance tracking).
    
    Then it will call the run() and finalize() methods, which need to
    be implemented by the sub-classes.
    
    The run() method will need to register the data that it creates to this Estimator
    by using `self.add_data('output', output_data)`.
    
    Finally, this will return a QPHandle providing access to that output data.
    
    Parameters
    ----------
    input_data : `dict` or `ModelHandle`
        Either a dictionary of all input data or a `ModelHandle` providing access to the same
    
    Returns
    -------
    output: `QPHandle`
        Handle providing access to QP ensemble with output data



In [30]:
estimate_bpz.config.to_dict()

{'output_mode': 'default',
 'chunk_size': 10000,
 'hdf5_groupname': '',
 'zmin': 0.0,
 'zmax': 3.0,
 'dz': 0.01,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/ic-photoz/RAIL/bpz_test_rail/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'madau_flag': 'no',
 'mag_limits': {'mag_u_lsst': 27.79,
  'mag_g_lsst': 29.04,
  'mag_r_lsst': 29.06,
  'mag_i_lsst': 28.62,
  'mag_z_lsst': 27.98,
  'mag_y_lsst': 27.05},
 'no_prior': True,
 'prior_band': 'mag_i_lsst',
 'p_min': 0.005,
 'gauss_kernel': 0.0,
 'zp_errors': [0.01, 0.01, 0.01, 0.01, 0.01, 0.01],
 'mag_err_min': 0.005,
 'name': 'estimate_bpz',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'input': 'None',
 'aliases': {'

In [31]:
bpz_estimated = estimate_bpz.estimate(train_data)

Process 0 running estimator on chunk 0 - 1000
Inserting handle into data store.  output_estimate_bpz: inprogress_output_estimate_bpz.hdf5, estimate_bpz


In [32]:
estimate_bpz.config.to_dict()

{'output_mode': 'default',
 'chunk_size': 10000,
 'hdf5_groupname': '',
 'zmin': 0.0,
 'zmax': 3.0,
 'dz': 0.01,
 'nzbins': 301,
 'band_names': ['mag_u_lsst',
  'mag_g_lsst',
  'mag_r_lsst',
  'mag_i_lsst',
  'mag_z_lsst',
  'mag_y_lsst'],
 'band_err_names': ['mag_err_u_lsst',
  'mag_err_g_lsst',
  'mag_err_r_lsst',
  'mag_err_i_lsst',
  'mag_err_z_lsst',
  'mag_err_y_lsst'],
 'nondetect_val': 99.0,
 'data_path': 'None',
 'columns_file': '/home/heloisamengisztki/ic-photoz/RAIL/bpz_test_rail/configs/bpz.columns',
 'spectra_file': 'SED/CWWSB4.list',
 'madau_flag': 'no',
 'mag_limits': {'mag_u_lsst': 27.79,
  'mag_g_lsst': 29.04,
  'mag_r_lsst': 29.06,
  'mag_i_lsst': 28.62,
  'mag_z_lsst': 27.98,
  'mag_y_lsst': 27.05},
 'no_prior': True,
 'prior_band': 'mag_i_lsst',
 'p_min': 0.005,
 'gauss_kernel': 0.0,
 'zp_errors': [0.01, 0.01, 0.01, 0.01, 0.01, 0.01],
 'mag_err_min': 0.005,
 'name': 'estimate_bpz',
 'model': 'trained_BPZ_output.pkl',
 'config': None,
 'input': 'None',
 'aliases': {'

In [33]:
DS

DataStore
{  input:<class 'rail.core.data.PqHandle'> None, (d)
  output_col_remapper_train:<class 'rail.core.data.PqHandle'> inprogress_output_col_remapper_train.pq, (d)
  output_table_conv_train:<class 'rail.core.data.Hdf5Handle'> inprogress_output_table_conv_train.hdf5, (d)
  model_inform_bpzlite:<class 'rail.core.data.ModelHandle'> trained_BPZ_output.pkl, (wd)
  output_estimate_bpz:<class 'rail.core.data.QPHandle'> output_estimate_bpz.hdf5, (wd)
}

In [34]:
type(bpz_estimated)

rail.core.data.QPHandle

In [35]:
#help(bpz_estimated())
bpz_estimated().build_tables()

results_tables = tables_io.convertObj(bpz_estimated().build_tables()['ancil'], tables_io.types.PD_DATAFRAME)
results_tables

Unnamed: 0,zmode
0,0.00
1,0.00
2,2.65
3,0.00
4,0.00
...,...
995,0.00
996,2.90
997,0.00
998,0.00


In [36]:
test_data_orig = results_remmaped.data

evaluator = Evaluator.make_stage(name=f'bpz_eval', truth=test_data_orig)
result_dict = evaluator.evaluate(bpz_estimated, test_data_orig)

Inserting handle into data store.  truth:                       id  mag_g_lsst  mag_err_g_lsst  mag_r_lsst  \
0    1567798028092359809   21.925642       28.080038   21.595578   
1    1567929969487672063   21.646285       16.188883   21.353752   
2    1567929969487672056   23.709952       21.024943   22.374693   
3    1567929969487672098   22.429075       17.268169   21.790280   
4    1567929969487672099   23.759419       25.970056   21.459094   
..                   ...         ...             ...         ...   
995  1567798028092356925   23.516105       14.758413   22.021177   
996  1567798028092356460   23.679292       14.636202   21.655181   
997  1567798028092356926   22.133826       21.644493   21.639290   
998  1567798028092356927   23.404638       25.453194   22.125147   
999  1567798028092356695   22.611775       17.051374   22.460034   

     mag_err_r_lsst  mag_i_lsst  mag_err_i_lsst  mag_u_lsst  mag_err_u_lsst  \
0         59.048393   21.900913       30.657338   21.846707   

  ad_results = stats.anderson_ksamp([pits_clean, uniform_yvals])


In [37]:
type(result_dict)

rail.core.data.Hdf5Handle

In [39]:
help(evaluator.evaluate)

Help on method evaluate in module rail.evaluation.evaluator:

evaluate(data, truth) method of rail.evaluation.evaluator.Evaluator instance
    Evaluate the performance of an estimator
    
    This will attach the input data and truth to this `Evaluator`
    (for introspection and provenance tracking).
    
    Then it will call the run() and finalize() methods, which need to
    be implemented by the sub-classes.
    
    The run() method will need to register the data that it creates to this Estimator
    by using `self.add_data('output', output_data)`.
    
    Parameters
    ----------
    data : qp.Ensemble
        The sample to evaluate
    truth : Table-like
        Table with the truth information
    
    Returns
    -------
    output : Table-like
        The evaluation metrics



In [40]:
results_tables = tables_io.convertObj(result_dict.data, tables_io.types.PD_DATAFRAME)
results_tables.head()

Unnamed: 0,PIT_KS_stat,PIT_KS_pval,PIT_CvM_stat,PIT_CvM_pval,PIT_OutRate,POINT_SimgaIQR,POINT_Bias,POINT_OutlierRate,POINT_SigmaMAD,CDE_stat,CDE_pval
0,,,,1.0,,0.0,-0.5,1.0,0.0,,


___
## VOU MEXER AINDA - Resultado pz x spec-z

In [41]:
zmode = bpz_estimated().ancil['zmode']

In [42]:
plt.figure(figsize=(8,8))
plt.scatter(train_data()['redshift'],zmode,s=1,c='k',label='simple bpz mode')
plt.plot([0,3],[0,3],'r--');
plt.xlabel("true redshift")
plt.ylabel("bpz photo-z")

NameError: name 'test_data' is not defined

<Figure size 576x576 with 0 Axes>

### PIPELINES CECI

In [None]:
import ceci
pipe = ceci.Pipeline.interactive()
stages = [flow_engine_train, lsst_error_model_train, inv_redshift,
          line_confusion, quantity_cut, col_remapper_train, table_conv_train,
          flow_engine_test, lsst_error_model_test, col_remapper_test, table_conv_test,  
          inform_knn, inform_fzboost, inform_bpz, estimate_knn, 
          estimate_fzboost, estimate_bpz, point_estimate_test,
          naive_stack_test]
for stage in stages:
    pipe.add_stage(stage)

In [None]:
pipe.initialize(dict(flow=flow_file), dict(output_dir='.', log_dir='.', resume=False), None)
pipe.save('bpz_pipeline.yml')