# Tutorial: Data Reduction

In [None]:
from datetime import datetime
import config, input_tables
from straklip.steps import buildhdf,mktiles,mkphotometry,fow2cells
from straklip.stralog import getLogger
import os
import pkg_resources as pkg


First, we need to initialize the logger here.

In [None]:
if 'SHARED_LOG_FILE' not in os.environ:
    os.environ['SHARED_LOG_FILE'] = f'straklip_{datetime.now().strftime("%Y-%m-%d_%H%M")}.log'

getLogger('straklip', setup=True, logfile=os.environ['SHARED_LOG_FILE'],debu=False,
          configfile=pkg.resource_filename('straklip', './config/logging.yaml'))

We import the 2 yalm files that holds all the options for the pipeline to run.

The data.yaml specify properties for the input catalogs that are needed to assemble te different dataframes for the pipeline.
The pipe.yaml instead, hold all the options that each step will need to perform its tasks.

Both yaml files need to be adjusted to reflect the specific project.

Since in the notebook we are explicitly calling each task we intend to run, we do not technically need the "flow" variable in the pipe.yaml, while it is necessary if we run the pipeline with the provide script "skpipe.yaml"

NOTE: the fits file needed to runt his tutorial ar not included at this moment, but they can be provided by private communication at any time.

In [None]:
pipe_cfg='/Users/gstrampelli/StraKLIP/Tutorial/pipeline_logs/pipe.yaml' #or where these files are
data_cfg='/Users/gstrampelli/StraKLIP/Tutorial/pipeline_logs/data.yaml'
# calls needed to configuration correctly the pipe_cfg and the data_cfg, that hare configurations need for the pipeline to work
pipe_cfg = config.configure_pipeline(pipe_cfg,pipe_cfg=pipe_cfg,data_cfg=data_cfg,dt_string=datetime.now().strftime("%d/%m/%Y %H:%M:%S"))
data_cfg = config.configure_data(data_cfg,pipe_cfg)
config.make_paths(config=pipe_cfg)

Once the pipe_cfg and data_cfg are ready, we can use them to finally configure the dataset that is a class that holds basic generic information that the pipeline can use to build its specific datasets. Since this is the first time we run the pipeline, we use skip_originals False to tell the pipeline to look for existing input photometry tables provided by the user instead of existing dataframes generated by the pipeline itself.

In [None]:
dataset = input_tables.Tables(data_cfg, pipe_cfg, skip_originals=False)
DF = config.configure_dataframe(dataset)

Now we un the very first step of the pipeline: "buildhdf" where the pipeline will draw the needed information from the "original" input catalogs provided by the user, to assemble its specific dataframe and store them in the DF object. Running this step will also automatically save the generated dataframe on disk on the "out" paths specified, according to the dataframe extension (either a csv file or an h5) provided in the pipe.yaml

In [None]:
buildhdf.run({'DF': DF, 'dataset': dataset})

We can now look at the content of the DF object, and find different pandas dataframes in it:
  - crossmatch_ids_df: since the pipeline can wok with multiple visits surveys, this dataframe will store the which ids in the "unq" dataframes correspond to the ids in the "mvs" dataframes
  - mvs_targets_df: a type of "mvs" dataframe. "mvs" stand for multi-visits and store the basic information of your targets coming from multiple visits (so the same unqiue source could be stored multiple time if observed across multiple recurring visits)
  - unq_targets_df: a type of "unq" dataframe. "unq" stand for unique and store the  information of your targets gathered across multiple visits and averaged together. In  this kind of dataframes, there can be only one source for each id. If your survey consist in only one visit, then the "mvs" dataframes and the "unq" dataframes will be very similar, even though the typo of information stored may vary accorded to the pipeline needs (like is the case in this tutorial).
  - mvs_candidates_df: a type of "mvs" dataframe. It will store the properties of your candidates gatered from your mvs_targets_df when detected. Now empty.
  - unq_candidates_df: a type of "unq" dataframe. It will store the properties of your candidates gatered from your unq_targets_df when detected. Now empty.


In [None]:
DF.keys

In [None]:
DF.mvs_targets_df

In [None]:
DF.unq_targets_df

Now that the default dataframes are ready, we can run the next step in the pipeline: "mktiles".

With this step we generate a small tile around each target in our mvs_targets_df to define a search area for the pipeline.
This step will create a "mvs_tiles" and "median_tiles" folders in the out directory with inside a folder for each filter and all the corresponding tiles. The folder referring to the unq targets is called "median_tiles" because these tiles are generated taking the median of all the tiles for the same target in the "mvs_tiles" folder. This step will generate a fits cube for each source, containing a SCI image with the target at the center of the tile, an ERR and DQ cut out form the same coordinates, and if requested, also cosmic ray cleaned SCI images to use for the upcoming PSF subtraction.

As always, the dataframes are saved at the end of the step in the out directory.

In [None]:
mktiles.run({'DF': DF, 'dataset': dataset})

Now we can run the next step in the pipeline: "mkphotometry".

This step wil perform basic aperture photometry around each sources in the mvs_targets_df, providing vital information for the next stpes in the pipeline. A "targets_photometry_tiles" folder will be created in the "database" directory to store a quick visual summary of the photometry for each "mvs" target.

As always, the dataframes are saved at the end of the step in the out directory.

In [None]:
mkphotometry.run({'DF': DF, 'dataset': dataset})

Now we can check again our dataframes, and we can see that the columns related to the photometry have been populated.

In [None]:
DF.mvs_targets_df

Last step needed for the data reduction is the: "fow2cells".

To avid distortion of the PSF across the fild of view, we generate a grid and group together close by stars to use as references for the upcoming PSF subtraction. For this tutorial, since all our sources are at the center of the FOW, we will need a very basic grid for this, but ideally we could generate a finer grid, as long as there are enough good reference stars in each cell to be able to run the PSF subtraction.
As always, the dataframes are saved at the end of the step in the out directory.


In [None]:
fow2cells.run({'DF': DF, 'dataset': dataset})

In [None]:
DF.mvs_targets_df


With this step the "cell_{filter}" columns of the "mvs_targets_df" is populated, and the data reduction is completed and the primary dataframes are assembled.

We are now ready to move on to the PSF subtraction section of the pipeline.