<img align="left" src = ../images/linea.png width=130 style="padding: 20px"> 
    
# Map the configuration parameters for PZ-Compute algorithms

   **Contact**: Heloisa da S. Mengisztki, Julia Gschwend<br>
   **Last verified run**: 2024-aug <br><br><br>

Notebook created to map the configurations used to run RAIL and Pz-Compute for FlexZboost, BPZ and TPZ algorithms.

# Rail Core - SHARED_PARAMS

Basically for each algorithm, there's their own configurations but also some shared ones. What rail do is define some common params in the rail.core package and pass them along with the specific ones. 

[Shared Params code](https://github.com/LSSTDESC/rail_base/blob/main/src/rail/core/common_params.py)

In [None]:
from rail.estimation.estimator import CatEstimator, CatInformer
from rail.core.common_params import SHARED_PARAMS

In [None]:
#help(CatInformer)
#help(CatEstimator)

# estimate(SHARED_PARAMS + [especificos])

In [None]:
SHARED_PARAMS

| Params                 | Type  | Default      | Description                                                                   |
|---------------------------|-------|--------------|-----------------------------------------------------|
| hdf5_groupname            | str   | ""           | name of hdf5 group for data, if None, then set to ''                         |
| zmin                      | float | 0.0          | The minimum redshift of the z grid                                           |
| zmax                      | float | 3.0          | The maximum redshift of the z grid                                           
| nzbins                    | int   | 301          | The number of gridpoints in the z grid                                   |
| dz                        | float | 0.01         | delta z in grid                                                          |
| nondetect_val             | float | 99.0         | value to be replaced with magnitude limit for non detects      |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | Names of columns for magnitgude by filter band|
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]|Names of columns for magnitgude errors by filter band |
| mag_limits                | dict | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| Limiting magnitdues by filter|
| ref_band                  | str   | "mag_i_lsst" | band to use in addition to colors                         |
| redshift_col              | str   | "redshift"   | name of redshift column                                           |
| calculated_point_estimates| array | []           | List of strings defining which point estimates to automatically calculate using `qp.Ensemble`.Options include, 'mean', 'mode', 'median'.|
| recompute_point_estimates | bool  | False        | Force recomputation of point estimates |

# FlexZBoost

- [fzboost repository](https://github.com/LSSTDESC/rail_flexzboost/blob/main/src/rail/estimation/algos/flexzboost.py)
- [FlexCode](https://github.com/lee-group-cmu/FlexCode)

## Inform Params

| Params                 | Type  | Default      | Description                                
|------------------|-------|--------|-----------------------------------------------------|
| retrain_full     | bool  | True   | if True, re-run the fit with the full training set, including data set aside for bump/sharpen validation.  If False, only use the subset defined via trainfrac fraction |
| trainfrac        | float | 0.75   | fraction of training data to use for training (rest used for bump thresh and sharpening determination)|
| seed             | int   | 1138   | Random number seed                                  |
| bumpmin          | float | 0.02   | minimum value in grid of thresholds checked to optimize removal of spurious small bumps |
| bumpmax          | float | 0.35   | max value in grid checked for removal of small bumps |
| nbump            | int   | 20     | number of grid points in bumpthresh grid search     |
| sharpmin         | float | 0.7    | min value in grid checked in optimal sharpening parameter fit |
| sharpmax         | float | 2.1    | max value in grid checked in optimal sharpening parameter fit |
| nsharp           | int   | 15     | number of search points in sharpening fit |
| max_basis        | int   | 35     | maximum number of basis funcitons to use in density estimate |
| basis_system     | str   | 'cosine' | type of basis sytem to use with flexcode |
| regression_params| dict  | {'max_depth': 8, 'objective': 'reg:squarederror'}| dictionary of options passed to flexcode, includes max_depth (int), and objective, which should be set  to reg:squarederror
| zmin                      | float | 0.0          | SHARED_PARAMS |
| zmax                      | float | 3.0          | SHARED_PARAMS |
| nzbins                    | int   | 301          | SHARED_PARAMS |
| nondetect_val             | float | 99.0         | SHARED_PARAMS |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| mag_limits                | dict  | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| ref_band                  | str   | "mag_i_lsst" | SHARED_PARAMS |
| redshift_col              | str   | "redshift"   | SHARED_PARAMS |

## Estimate Params

For flexzboost the inform parameters are more siginificant to obtain different results than the estimate parameters

| Params                 | Type  | Default      | Description                                
|---------------------------|-------|--------------|---------------------------------------------|                                 
| qp_representation         | str   | "interp"     | qp generator to use. `interp or flexzboost` |
| nzbins                    | int   | 301          | SHARED_PARAMS |
| nondetect_val             | float | 99.0         | SHARED_PARAMS |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| mag_limits                | dict  | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| ref_band                  | str   | "mag_i_lsst" | SHARED_PARAMS |

# Bpz

- [bpz-lite repository](https://github.com/LSSTDESC/rail_bpz/blob/main/src/rail/estimation/algos/bpz_lite.py)
- [Benitez (2000)](https://ui.adsabs.harvard.edu/abs/2000ApJ...536..571B/abstract)
- [Coe et al. (2006)](https://ui.adsabs.harvard.edu/abs/2006AJ....132..926C/abstract)


## Inform Params

| Params                 | Type  | Default      | Description                                
|------------------|-------|-------------------|-------------------------------------------|
| data_path        | str   | "None"            | data_path (str): file path to the SED FILTER and AB directories. If left to default `None` it will use the install directory for rail + rail/examples_data/estimation_data/data |
| columns_file     | str   | "test_bpz.columns"| name of the file specifying the columns |
| spectra_file     | str   | "CWWSB4.list"     | name of the file specifying the list of SEDs to use |
| m0               | float | 20.0              | reference apparent mag used in prior param |
| nt_array         | list  | [1, 2, 3]         | list of integer number of templates per 'broad type' must be in same order as the template set and must sum to the same number as the # of templates in the spectra file |
| mmin             | float | 18.0              | lowest apparent mag in ref band lower values ignored |
| mmax             | float | 29.0              | highest apparent mag in ref band higher values ignored |
| init_kt          | float | 0.3               | initial guess for kt in training |
| init_zo          | float | 0.4               | initial guess for z0 in training |
| init_alpha       | float | 1.8               | initial guess for alpha in training |
| init_km          | float | 0.1               | initial guess for km in training |
| type_file        | str   | ""                | name of file with the broad type fits for the training data |
| zmin                      | float | 0.0          | SHARED_PARAMS |
| zmax                      | float | 3.0          | SHARED_PARAMS |
| nzbins                    | int   | 301          | SHARED_PARAMS |
| nondetect_val             | float | 99.0         | SHARED_PARAMS |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| mag_limits                | dict | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| ref_band                  | str   | "mag_i_lsst" | SHARED_PARAMS |
| redshift_col              | str   | "redshift"   | SHARED_PARAMS | 

## Estimate Params

| Params                 | Type  | Default      | Description                                
|-------------------|-------|--------------|-----------------------------------------------------|                                 
| dz                | float | 0.01         | delta z in grid |
| unobserved_val    | float | -99.0        | value to be replaced with zero flux and given large errors for non-observed filters |
| data_path         | str   | "None"       | data_path (str): file path to the SED, FILTER, and AB directories.  If left to default `None` it will use the install directory for rail + ../examples_data/estimation_data/data |
| columns_file      | str   | "test_bpz.columns | | name of the file specifying the columns |
| spectra_file      | str   | "CWWSB4.list"| name of the file specifying the list of SEDs to use |
| madau_flag        | str   | "no"         | set to 'yes' or 'no' to set whether to include intergalactic Madau reddening when constructing model fluxes |
| no_prior          | bool  | "False"      | set to True if you want to run with no prior |
| p_min             | float | 0.005        | BPZ sets all values of the PDF that are below p_min*peak_value to 0.0, p_min controls that fractional cutoff |
| gauss_kernel      | float | 0.0          | gauss_kernel (float): BPZ convolves the PDF with a kernel if this is set to a non-zero number |
| zp_errors         | list  | [0.01, 0.01, 0.01, 0.01, 0.01, 0.01] | BPZ adds these values in quadrature to the photometric errors |
| mag_err_min       | float | 0.005        | a minimum floor for the magnitude errors to prevent a large chi^2 for very very bright objects |
| zmin              | float | 0.0          | SHARED_PARAMS |
| zmax              | float | 3.0          | SHARED_PARAMS |
| nzbins            | int   | 301          | SHARED_PARAMS |
| nondetect_val     | float | 99.0         | SHARED_PARAMS |
| bands             | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands         | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| mag_limits        | dict | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| ref_band          | str   | "mag_i_lsst" | SHARED_PARAMS |
| redshift_col      | str   | "redshift"   | SHARED_PARAMS |


# Tpz

- [tpz repository](https://github.com/LSSTDESC/rail_bpz/blob/main/src/rail/estimation/algos/bpz_lite.py)
- [Carrasco Kind, M., & Brunner, R. J. (2013)](https://ui.adsabs.harvard.edu/abs/2013MNRAS.432.1483C/abstract)

## Inform Params

| Params                 | Type  | Default      | Description                                
|---------------------------|-------|---------------|-----------------------------------------------------|                                 
| seed                      | int   | 8758          | random seed | |
| use_atts                  | list  | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"] | attributes to use in training trees | |
| err_dict                  | dict  | {mag_u_lsst:"mag_err_u_lsst", mag_g_lsst:"mag_err_g_lsst", mag_r_lsst:"mag_err_r_lsst", mag_i_lsst:"mag_err_i_lsst", mag_z_lsst:"mag_err_z_lsst", mag_y_lsst:"mag_err_y_lsst", redshift:None} | dictionary that contains the columns that will be used to predict as the keys and the errors associated with that column as the values. If a column does not havea an associated error its value should be `None` | |
| nrandom                   | int   | 8             | number of random bootstrap samples of training data to create | |
| ntrees                    | int   | 5             | number of trees to create | |
| minleaf                   | int   | 5             | minimum number in terminal leaf | |
| natt                      | int   | 3             | number of attributes to split for TPZ | |
| sigmafactor               | float | 3.0           | Gaussian smoothing with kernel Sigma1*Resolution | |
| rmsfactor                 | float | 0.02          | RMS for zconf calculation | |
| tree_strategy             | str   | native        | which decision tree function to use when constructing the forest, valid choices are 'native' or 'sklearn'.  If 'native', use the trees written for TPZ 'sklearn' then use sklearn's DecisionTreeRegressor |
| zmin                      | float | 0.0           | SHARED_PARAMS |
| zmax                      | float | 3.0           | SHARED_PARAMS |
| nzbins                    | int   | 301           | SHARED_PARAMS |
| nondetect_val             | float | 99.0          | SHARED_PARAMS |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| mag_limits                | dict | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| redshift_col              | str   | "redshift"    | SHARED_PARAMS | 

## Estimate Params

| Params                 | Type  | Default      | Description                                
|---------------------------|-------|--------------|-----------------------------------------------------|                                 
| test_err_dict             | dict  | def_err_dict | dictionary that contains the columns that will be used to predict as the keys and the errors associated with that column as the values. If a column does not havea an associated error its value shoule be `None` |
| nondetect_val             | float | 99.0         | SHARED_PARAMS |
| mag_limits                | dict  | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05} | SHARED_PARAMS |

# GPZ

- https://github.com/LSSTDESC/rail_gpz_v1/blob/main/src/rail/estimation/algos/gpz.py

## Inform Params

| Params                 | Type  | Default      | Description                                
|---------------------------|-------|---------------|-----------------------------------------------------|  
| trainfrac | float | 0.75 | fraction of training data used to make tree, rest used to set best sigma |
| seed | int | 87 | random seed |
| gpz_method | str | "VC" | method to be used in GPz, options are 'GL', 'VL', 'GD', 'VD', 'GC', and 'VC' |
| n_basis | int | 50 | number of basis functions used |
| learn_jointly | bool | True | if True, jointly learns prior linear mean function |
| hetero_noise | bool | True | if True, learns heteroscedastic noise process, set False for point est. |
| csl_method | str | "normal" | cost sensitive learning type, 'balanced', 'normalized', or 'normal' |
| csl_binwidth | float | 0 |1, width of bin for 'balanced' cost sensitive learning |
| pca_decorrelate | bool | True | if True, decorrelate data using PCA as preprocessing stage |
| max_iter | int | 200 | max number of iterations |
| max_attempt | int | 100 | max iterations if no progress on validation |
| log_errors | bool | True | if true, take log of magnitude errors |
| replace_error_vals | list | [0.1, 0.1, 0.1, 0.1, 0.1, 0.1] | list of values to replace negative and nan mag err values |
| nondetect_val             | float | 99.0          | SHARED_PARAMS |
| mag_limits                | dict | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| redshift_col              | str   | "redshift"    | SHARED_PARAMS |


## Estimate Params

| Params                 | Type  | Default      | Description                                
|---------------------------|-------|--------------|-----------------------------------------------------|                             
| log_errors | bool | True | if true, take log of magnitude errors|
| replace_error_vals | list |  [0.1, 0.1, 0.1, 0.1, 0.1, 0.1] | list of values to replace negative and nan mag err values|
| zmin                      | float | 0.0           | SHARED_PARAMS |
| zmax                      | float | 3.0           | SHARED_PARAMS |
| nzbins                    | int   | 301           | SHARED_PARAMS |
| nondetect_val             | float | 99.0          | SHARED_PARAMS |
| bands                     | array | ["mag_u_lsst", "mag_g_lsst", "mag_r_lsst", "mag_i_lsst", "mag_z_lsst", "mag_y_lsst"]      | SHARED_PARAMS |
| err_bands                 | array | ["mag_err_u_lsst", "mag_err_g_lsst", "mag_err_r_lsst", "mag_err_i_lsst", "mag_err_z_lsst", "mag_err_y_lsst"]| SHARED_PARAMS |
| mag_limits                | dict | {mag_u_lsst:27.79, mag_g_lsst:29.04, mag_r_lsst:29.06, mag_i_lsst:28.62, mag_z_lsst:27.98, mag_y_lsst:27.05}| SHARED_PARAMS |
| ref_band                  | str   | "mag_i_lsst" | SHARED_PARAMS |