# Tutorial: Data Reduction

## Initial imports

In [15]:
from datetime import datetime
import config, input_tables
from straklip.steps import buildhdf,mktiles,mkphotometry,fow2cells
from straklip.stralog import getLogger
import os
import pkg_resources as pkg


First, we need to initialize the logger here.

In [16]:
if 'SHARED_LOG_FILE' not in os.environ:
    os.environ['SHARED_LOG_FILE'] = f'straklip_{datetime.now().strftime("%Y-%m-%d_%H%M")}.log'

getLogger('straklip', setup=True, logfile=os.environ['SHARED_LOG_FILE'],debu=False,
          configfile=pkg.resource_filename('straklip', './config/logging.yaml'))

<Logger straklip (DEBUG)>

We import the 2 yalm files that holds all the options for the pipeline to run.

The data.yaml specify properties for the input catalogs that are needed to assemble te different dataframes for the pipeline.
The pipe.yaml instead, hold all the options that each step will need to perform its tasks.

Both yaml files need to be adjusted to reflect the specific project.

Since in the notebook we are explicitly calling each task we intend to run, we do not technically need the "flow" variable in the pipe.yaml, while it is necessary if we run the pipeline with the provide script "skpipe.yaml"

NOTE: the fits file needed to runt his tutorial ar not included at this moment, but they can be provided by private communication at any time.

## Loading the pipeline configuration file for the project

Two pipeline configuration files are stored in the `tutorials/pipeline_logs`, directory, namely `pipe.yaml` and `data.yaml`. A more in-deep explanation of these file, is presented here: https://straklip.readthedocs.io/latest/quick_start.html. We start by loading into the pipeline as follow. Remember, to manipulate the options for each specific step, we can change the entries in the 'pipe.yaml' accordingly, or change them from a line of code like `pipe_cfg..analysis['steps']['extract_candidate']=True`,  before running the step.

In [17]:
pipe_cfg='/Users/gstrampelli/StraKLIP/docs/source/tutorials/work/pipeline_logs/pipe.yaml' #or where these files are
data_cfg='/Users/gstrampelli/StraKLIP/docs/source/tutorials/work/pipeline_logs/data.yaml'
# calls needed to configuration correctly the pipe_cfg and the data_cfg, that hare configurations need for the pipeline to work
pipe_cfg = config.configure_pipeline(pipe_cfg,pipe_cfg=pipe_cfg,data_cfg=data_cfg,dt_string=datetime.now().strftime("%d/%m/%Y %H:%M:%S"))
data_cfg = config.configure_data(data_cfg,pipe_cfg)
config.make_paths(config=pipe_cfg)

2025-06-19 10:11:18 config                      :INFO     (configure_pipeline:70[pid=55653]) 
StraKLIP pipeline started at date and time: 19/06/2025 10:11:18
Pipe_cfg: /Users/gstrampelli/StraKLIP/docs/source/tutorials/work/pipeline_logs/pipe.yaml
Data_cfg: /Users/gstrampelli/StraKLIP/docs/source/tutorials/work/pipeline_logs/data.yaml

2025-06-19 10:11:18 config                      :INFO     (configure_data:151[pid=55653]) Validation of default labels and data successful!
2025-06-19 10:11:18 config                      :INFO     (make_paths:115[pid=55653]) "/Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/data/fits" exists, and will not be created.
2025-06-19 10:11:18 config                      :INFO     (make_paths:115[pid=55653]) "/Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/database" exists, and will not be created.
2025-06-19 10:11:18 config                      :INFO     (make_paths:112[pid=55653]) Creating "/Users/gstrampelli/PycharmProjects/StraKLIP_tutori

Once the pipe_cfg and data_cfg are ready, we can use them to finally configure the dataset that is a class that holds basic generic information that the pipeline can use to build its specific datasets. Since this is the first time we run the pipeline, we use skip_originals False to tell the pipeline to look for existing input photometry tables provided by the user instead of existing dataframes generated by the pipeline itself.

In [18]:
dataset = input_tables.Tables(data_cfg, pipe_cfg, skip_originals=False)
DF = config.configure_dataframe(dataset)

2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:424[pid=55653]) before dust, V =  0.0 mag(VEGA)
2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:425[pid=55653]) after dust, V = 1.0146 mag(VEGA)
2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:444[pid=55653]) Av = 1.0146 mag
2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:488[pid=55653]) AV=0 wfc3,uvis2,f814w 0.0 mag(VEGA)
2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:489[pid=55653]) AV=1 wfc3,uvis2,f814w 0.6094 mag
2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:488[pid=55653]) AV=0 wfc3,uvis2,f850lp 0.0 mag(VEGA)
2025-06-19 10:11:20 straklip.utils.ancillary    :INFO     (get_Av_dict:489[pid=55653]) AV=1 wfc3,uvis2,f850lp 0.4694 mag
2025-06-19 10:11:20 config                      :INFO     (configure_dataframe:270[pid=55653]) Fetching dataframes from /Users/gstrampelli/PycharmProjects/StraKLIP_tutori

## Building the dataframes

Now we run the very first step of the pipeline: "buildhdf" where the pipeline will draw the needed information from the "original" input catalogs provided by the user, to assemble its specific dataframe and store them in the DF object. Running this step will also automatically save the generated dataframe on disk on the "out" paths specified, according to the dataframe extension (either a csv file or an h5) provided in the pipe.yaml

In [19]:
buildhdf.run({'DF': DF, 'dataset': dataset})

2025-06-19 10:11:24 straklip.steps.buildhdf     :INFO     (run:29[pid=55653]) Initializing new dataframes.
2025-06-19 10:11:24 straklip.utils.utils_dataframe:INFO     (mk_crossmatch_ids_df:266[pid=55653]) Creating the cross match ids dataframe
2025-06-19 10:11:24 straklip.steps.buildhdf     :INFO     (mk_targets_df:9[pid=55653]) Creating the targets dataframe
2025-06-19 10:11:24 straklip.utils.utils_dataframe:INFO     (mk_mvs_targets_df:352[pid=55653]) Creating the average targets dataframe
2025-06-19 10:11:24 straklip.utils.utils_dataframe:INFO     (mk_unq_targets_df:302[pid=55653]) Creating the multi-visit targets dataframe
2025-06-19 10:11:24 straklip.utils.ancillary    :INFO     (distances_cube:229[pid=55653]) Making the distance cube
2025-06-19 10:11:24 straklip.steps.buildhdf     :INFO     (make_candidates_dataframes:22[pid=55653]) Creating the candidates dataframe
2025-06-19 10:11:24 straklip.dataframe          :INFO     (save_dataframes:106[pid=55653]) Saving the the following 

We can now look at the content of the DF object, and find different pandas dataframes in it:
  - crossmatch_ids_df: since the pipeline can wok with multiple visits surveys, this dataframe will store the which ids in the "unq" dataframes correspond to the ids in the "mvs" dataframes
  - mvs_targets_df: a type of "mvs" dataframe. "mvs" stand for multi-visits and store the basic information of your targets coming from multiple visits (so the same unqiue source could be stored multiple time if observed across multiple recurring visits)
  - unq_targets_df: a type of "unq" dataframe. "unq" stand for unique and store the  information of your targets gathered across multiple visits and averaged together. In  this kind of dataframes, there can be only one source for each id. If your survey consist in only one visit, then the "mvs" dataframes and the "unq" dataframes will be very similar, even though the typo of information stored may vary accorded to the pipeline needs (like is the case in this tutorial).
  - mvs_candidates_df: a type of "mvs" dataframe. It will store the properties of your candidates gatered from your mvs_targets_df when detected. Now empty.
  - unq_candidates_df: a type of "unq" dataframe. It will store the properties of your candidates gatered from your unq_targets_df when detected. Now empty.


In [20]:
DF.keys

['crossmatch_ids_df',
 'mvs_targets_df',
 'unq_targets_df',
 'unq_candidates_df',
 'mvs_candidates_df']

In [21]:
DF.mvs_targets_df

Unnamed: 0,mvs_ids,x_f814w,x_f850lp,y_f814w,y_f850lp,vis,ext,counts_f814w,counts_f850lp,ecounts_f814w,...,exptime_f814w,exptime_f850lp,cell_f814w,cell_f850lp,rota_f814w,rota_f850lp,pav3_f814w,pav3_f850lp,fits_f814w,fits_f850lp
0,0,766.682062,766.297865,870.519962,870.86323,13,1,,,,...,712.0,712.0,,,169.438303,169.438303,124.436996,124.436996,iexn13010,iexn13020
1,1,769.100793,768.870891,866.146099,866.488792,1,1,,,,...,712.0,712.0,,,142.364009,142.364009,97.360497,97.360497,iexn01010,iexn01020
2,2,762.415167,762.213142,869.798303,870.180384,2,1,,,,...,716.0,716.0,,,143.685147,143.685147,98.681168,98.681168,iexn02010,iexn02020
3,3,765.630228,765.408206,867.979551,868.467341,3,1,,,,...,712.0,712.0,,,146.062551,146.062551,101.0597,101.0597,iexn03010,iexn03020
4,4,767.251059,767.0096,867.473929,867.939866,4,1,,,,...,712.0,712.0,,,144.993916,144.993916,99.990402,99.990402,iexn04010,iexn04020
5,5,766.601507,766.33671,868.267308,868.601284,5,1,,,,...,716.0,716.0,,,145.42267,145.42267,100.4188,100.4188,iexn05010,iexn05020
6,6,767.081804,766.867054,866.901849,867.166727,6,1,,,,...,712.0,712.0,,,145.295285,145.295285,100.292198,100.292198,iexn06010,iexn06020
7,7,765.825833,765.608929,868.625604,869.098238,7,1,,,,...,712.0,712.0,,,145.278236,145.278236,100.274902,100.274902,iexn07010,iexn07020
8,8,765.740929,765.619928,868.295724,868.761725,8,1,,,,...,712.0,712.0,,,145.102367,145.102367,100.098801,100.098801,iexn08010,iexn08020
9,9,762.743761,762.620949,866.727766,866.796443,9,1,,,,...,712.0,712.0,,,144.710595,144.710595,99.707359,99.707359,iexn09010,iexn09020


In [22]:
DF.unq_targets_df

Unnamed: 0,unq_ids,ra,dec,m_f814w,m_f850lp,e_f814w,e_f850lp,type,FirstDist,SecondDist,ThirdDist,FirstID,SecondID,ThirdID,m_f814w_o,m_f850lp_o,e_f814w_o,e_f850lp_o
0,0,237.959171,-21.582686,,,,,1,7236.026079,11440.035617,13911.020984,7.0,8.0,3.0,22.277353,21.078353,0.021258,0.022984
1,1,251.650498,-23.227155,,,,,1,20713.866132,29090.719964,33666.625958,5.0,4.0,9.0,20.840983,19.749264,0.005901,0.006932
2,2,238.569423,-26.505131,,,,,1,7932.959139,11463.957292,17856.463125,8.0,7.0,0.0,21.448087,20.330281,0.010977,0.012019
3,3,241.686092,-20.561958,,,,,1,4447.833912,7315.417653,13019.44318,6.0,9.0,10.0,21.493397,20.318072,0.011936,0.01207
4,4,243.644804,-24.326008,,,,,1,8134.218055,11143.242455,14868.253894,9.0,6.0,5.0,21.553108,20.299904,0.011095,0.011329
5,5,247.042901,-26.673437,,,,,1,14868.253894,20713.866132,22692.380268,4.0,1.0,9.0,21.419805,20.159706,0.010367,0.01019
6,6,241.881676,-21.781889,,,,,1,3013.003477,4447.833912,11143.242455,9.0,3.0,4.0,22.182586,20.929665,0.0221,0.021179
7,7,238.932269,-23.341438,,,,,1,7236.026079,7593.796513,11463.957292,0.0,8.0,2.0,21.854202,20.71008,0.015996,0.016869
8,8,237.313809,-24.694252,,,,,1,7593.796513,7932.959139,11440.035617,7.0,2.0,0.0,22.007757,20.915487,0.017982,0.020411
9,9,242.327752,-22.490052,,,,,1,3013.003477,7315.417653,8134.218055,6.0,3.0,4.0,22.001171,20.88359,0.018724,0.020491


## Tiles

Now that the default dataframes are ready, we can run the next step in the pipeline: "mktiles".

With this step we cut out a small tile around each target in our mvs_targets_df to define a search area for the pipeline.
This step will create a "mvs_tiles" folder in the out directory with inside a folder for each filter and all the corresponding tiles. Within this folders, the pipeline will generate a fits cube for each source, containing a SCI image with the target at the center of the tile, an ERR and DQ cut out form the same coordinates, and if requested, also cosmic ray cleaned SCI images to use for the upcoming PSF subtraction.

As always, the dataframes are saved at the end of the step in the out directory.

In [23]:
mktiles.run({'DF': DF, 'dataset': dataset})

2025-06-19 10:12:06 straklip.steps.mktiles      :INFO     (make_mvs_tiles:317[pid=55653]) Working on the tiles
2025-06-19 10:12:06 straklip.steps.mktiles      :INFO     (mk_mvs_tiles:243[pid=55653]) Working on multi-visits tiles on filter f814w
2025-06-19 10:12:06 straklip.config             :INFO     (make_paths:112[pid=55653]) Creating "/Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/out/mvs_tiles/f814w"
2025-06-19 10:12:06 straklip.steps.mktiles      :INFO     (task_mvs_tiles:132[pid=55653]) Making mvs tile /Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/out/mvs_tiles/f814w/tile_ID0.fits
2025-06-19 10:12:07 straklip.steps.mktiles      :INFO     (task_mvs_tiles:132[pid=55653]) Making mvs tile /Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/out/mvs_tiles/f814w/tile_ID1.fits
2025-06-19 10:12:07 straklip.steps.mktiles      :INFO     (task_mvs_tiles:132[pid=55653]) Making mvs tile /Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/out/mvs_tiles/f814w/

## Photometry

Now we can run the next step in the pipeline: "mkphotometry".

This step wil perform basic aperture photometry around each sources in the mvs_targets_df, providing vital information for the next stpes in the pipeline. A "targets_photometry_tiles" folder will be created in the "database" directory to store a quick visual summary of the photometry for each "mvs" target.

As always, the dataframes are saved at the end of the step in the out directory.

In [24]:
mkphotometry.run({'DF': DF, 'dataset': dataset})

2025-06-19 10:12:18 straklip.steps.mkphotometry :INFO     (get_ee_df:26[pid=55653]) Fetching encircled energy dataframe for filters ['f814w', 'f850lp']
2025-06-19 10:12:18 straklip.steps.mkphotometry :INFO     (make_mvs_photometry:76[pid=55653]) Make photometry for multi-visits targets on filter f814w
2025-06-19 10:12:18 straklip.steps.mkphotometry :INFO     (make_mvs_photometry:82[pid=55653]) Making /Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/database/targets_photometry_tiles/f814w directory
2025-06-19 10:12:18 straklip.utils.ancillary    :INFO     (parallelization_package:968[pid=55653]) Max allowable workers 8, # of elements 12 , # of chunk 12 approx # of elemtent per chunks 1 (chunksize)
2025-06-19 10:12:23 straklip.steps.mkphotometry :INFO     (make_median_photometry:148[pid=55653]) Make photometry for average targets on filter f814w
2025-06-19 10:12:23 straklip.utils.ancillary    :INFO     (parallelization_package:968[pid=55653]) Max allowable workers 8, # of elemen

Now we can check again our dataframes, and we can see that the columns related to the photometry have been populated.

In [25]:
DF.mvs_targets_df

Unnamed: 0,mvs_ids,x_f814w,x_f850lp,y_f814w,y_f850lp,vis,ext,counts_f814w,counts_f850lp,ecounts_f814w,...,exptime_f814w,exptime_f850lp,cell_f814w,cell_f850lp,rota_f814w,rota_f850lp,pav3_f814w,pav3_f850lp,fits_f814w,fits_f850lp
0,0,766.682062,766.297865,869.519962,870.86323,13,1,10050.176567,8147.884608,103.716767,...,712.0,712.0,,,169.438303,169.438303,124.436996,124.436996,iexn13010,iexn13020
1,1,769.100793,768.870891,866.146099,866.488792,1,1,37148.970247,29523.416694,200.603377,...,712.0,712.0,,,142.364009,142.364009,97.360497,97.360497,iexn01010,iexn01020
2,2,762.415167,762.213142,869.798303,870.180384,2,1,20888.773322,16756.185268,150.306225,...,716.0,716.0,,,143.685147,143.685147,98.681168,98.681168,iexn02010,iexn02020
3,3,765.630228,765.408206,867.979551,868.467341,3,1,19851.347253,17782.29885,146.991839,...,712.0,712.0,,,146.062551,146.062551,101.0597,101.0597,iexn03010,iexn03020
4,4,767.251059,767.0096,867.473929,867.939866,4,1,18535.997643,17143.439224,141.164681,...,712.0,712.0,,,144.993916,144.993916,99.990402,99.990402,iexn04010,iexn04020
5,5,766.601507,766.33671,868.267308,868.601284,5,1,21899.906087,19840.838076,152.199777,...,716.0,716.0,,,145.42267,145.42267,100.4188,100.4188,iexn05010,iexn05020
6,6,767.081804,766.867054,866.901849,867.166727,6,1,10821.808262,9736.723197,111.510075,...,712.0,712.0,,,145.295285,145.295285,100.292198,100.292198,iexn06010,iexn06020
7,7,765.825833,765.608929,868.625604,869.098238,7,1,14345.541776,12125.508627,123.726601,...,712.0,712.0,,,145.278236,145.278236,100.274902,100.274902,iexn07010,iexn07020
8,8,765.740929,765.619928,868.295724,868.761725,8,1,12134.206337,9881.013676,116.470224,...,712.0,712.0,,,145.102367,145.102367,100.098801,100.098801,iexn08010,iexn08020
9,9,762.743761,762.620949,866.727766,866.796443,9,1,12163.7741,9674.484288,118.946485,...,712.0,712.0,,,144.710595,144.710595,99.707359,99.707359,iexn09010,iexn09020


## FOW2CELL

Last step needed for the data reduction is the: "fow2cells".

To avid distortion of the PSF across the fild of view, we generate a grid and group together close by stars to use as references for the upcoming PSF subtraction. For this tutorial, since all our sources are at the center of the FOW, we will need a very basic grid for this, but ideally we could generate a finer grid, as long as there are enough good reference stars in each cell to be able to run the PSF subtraction.
As always, the dataframes are saved at the end of the step in the out directory.


In [26]:
fow2cells.run({'DF': DF, 'dataset': dataset})

2025-06-19 10:12:41 straklip.steps.fow2cells    :INFO     (break_FOW_in_cells:17[pid=55653]) Braking f814w FOW in 1 cells.
2025-06-19 10:12:41 straklip.utils.utils_plot   :INFO     (fow_stamp:306[pid=55653]) Saving cell_f814w.png in /Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/database/
2025-06-19 10:12:41 straklip.steps.fow2cells    :INFO     (break_FOW_in_cells:17[pid=55653]) Braking f850lp FOW in 1 cells.
2025-06-19 10:12:41 straklip.utils.utils_plot   :INFO     (fow_stamp:306[pid=55653]) Saving cell_f850lp.png in /Users/gstrampelli/PycharmProjects/StraKLIP_tutorial_test/database/
2025-06-19 10:12:41 straklip.steps.fow2cells    :INFO     (run:79[pid=55653]) Updating type for unique detections.
2025-06-19 10:12:41 straklip.dataframe          :INFO     (save_dataframes:106[pid=55653]) Saving the the following keys in ['crossmatch_ids_df', 'mvs_targets_df', 'unq_targets_df', 'unq_candidates_df', 'mvs_candidates_df'] to .csv files in /Users/gstrampelli/PycharmProjects/StraK

In [27]:
DF.mvs_targets_df


Unnamed: 0,mvs_ids,x_f814w,x_f850lp,y_f814w,y_f850lp,vis,ext,counts_f814w,counts_f850lp,ecounts_f814w,...,exptime_f814w,exptime_f850lp,cell_f814w,cell_f850lp,rota_f814w,rota_f850lp,pav3_f814w,pav3_f850lp,fits_f814w,fits_f850lp
0,0,766.682062,766.297865,869.519962,870.86323,13,1,10050.176567,8147.884608,103.716767,...,712.0,712.0,0.0,0.0,169.438303,169.438303,124.436996,124.436996,iexn13010,iexn13020
1,1,769.100793,768.870891,866.146099,866.488792,1,1,37148.970247,29523.416694,200.603377,...,712.0,712.0,0.0,0.0,142.364009,142.364009,97.360497,97.360497,iexn01010,iexn01020
2,2,762.415167,762.213142,869.798303,870.180384,2,1,20888.773322,16756.185268,150.306225,...,716.0,716.0,0.0,0.0,143.685147,143.685147,98.681168,98.681168,iexn02010,iexn02020
3,3,765.630228,765.408206,867.979551,868.467341,3,1,19851.347253,17782.29885,146.991839,...,712.0,712.0,0.0,0.0,146.062551,146.062551,101.0597,101.0597,iexn03010,iexn03020
4,4,767.251059,767.0096,867.473929,867.939866,4,1,18535.997643,17143.439224,141.164681,...,712.0,712.0,0.0,0.0,144.993916,144.993916,99.990402,99.990402,iexn04010,iexn04020
5,5,766.601507,766.33671,868.267308,868.601284,5,1,21899.906087,19840.838076,152.199777,...,716.0,716.0,0.0,0.0,145.42267,145.42267,100.4188,100.4188,iexn05010,iexn05020
6,6,767.081804,766.867054,866.901849,867.166727,6,1,10821.808262,9736.723197,111.510075,...,712.0,712.0,0.0,0.0,145.295285,145.295285,100.292198,100.292198,iexn06010,iexn06020
7,7,765.825833,765.608929,868.625604,869.098238,7,1,14345.541776,12125.508627,123.726601,...,712.0,712.0,0.0,0.0,145.278236,145.278236,100.274902,100.274902,iexn07010,iexn07020
8,8,765.740929,765.619928,868.295724,868.761725,8,1,12134.206337,9881.013676,116.470224,...,712.0,712.0,0.0,0.0,145.102367,145.102367,100.098801,100.098801,iexn08010,iexn08020
9,9,762.743761,762.620949,866.727766,866.796443,9,1,12163.7741,9674.484288,118.946485,...,712.0,712.0,0.0,0.0,144.710595,144.710595,99.707359,99.707359,iexn09010,iexn09020


With this step the "cell_{filter}" columns of the "mvs_targets_df" is populated, and the data reduction is completed and the primary dataframes are assembled.

We are now ready to move on to the PSF subtraction section of the pipeline.