# Bootcamp exercise: Timeseries analysis with analysis tools

**Description:** Introduction to timeseries analysis on DRP difference imaging products

**Contact authors:** Eric Bellm

**Last verified to run:** 

**LSST Science Piplines version:** 



Check the version of the stack you are using

In [1]:
!eups list -s | grep lsst_distrib

lsst_distrib          gdf42428520+aa7779d39a 	current d_2023_05_03 setup


## Preliminaries

In [38]:
import numpy as np
import lsst.daf.butler as dafButler

In [3]:
# Point to existing sandbox repo if you prefer to skip processing steps
#collections = ['u/bechtol']
#repo = '/sdf/group/rubin/user/bechtol/bootcamp_2023/rc2_subset/SMALL_HSC/'

collections = ['HSC/runs/RC2/w_2023_07/DM-38042/20230308T213613Z']
repo = '/repo/main/'


# User instance of the repo if you have processed rc2_subset yourself
#collections = ['u/%s'%os.environ['USER']]
#repo = '/sdf/group/rubin/user/%s/bootcamp_2023/rc2_subset/SMALL_HSC/'%(os.environ['USER'])

In [4]:
butler = dafButler.Butler(repo, collections=collections)
registry = butler.registry

Check what (tabular) dataset types are present in the collection.  We are going to work with pre-associated DIASources, which are not always present.

In [5]:
required_dataset_type = 'diaSourceTable_tract'
has_required_dataset_type = False

for datasetType in registry.queryDatasetTypes():
    if registry.queryDatasets(datasetType, collections=collections).any(execute=False, exact=False):
        if datasetType.storageClass_name == 'DataFrame':
            print(datasetType)
        if datasetType.name == required_dataset_type:
            has_required_dataset_type = True

DatasetType('goodSeeingDiff_assocDiaSrcTable', {skymap, tract, patch}, DataFrame)
DatasetType('goodSeeingDiff_diaObjTable', {skymap, tract, patch}, DataFrame)
DatasetType('goodSeeingDiff_fullDiaObjTable', {skymap, tract, patch}, DataFrame)
DatasetType('diaSourceTable_tract', {skymap, tract}, DataFrame)
DatasetType('diaObjectTable_tract', {skymap, tract}, DataFrame)
DatasetType('forcedSourceOnDiaObjectTable', {skymap, tract, patch}, DataFrame)
DatasetType('forcedSourceOnDiaObjectTable_tract', {skymap, tract}, DataFrame)
DatasetType('forcedSourceTable_tract', {skymap, tract}, DataFrame)
DatasetType('forcedSourceTable', {skymap, tract, patch}, DataFrame)
DatasetType('mergedForcedSourceOnDiaObject', {band, instrument, skymap, detector, physical_filter, tract, visit}, DataFrame)


In [6]:
if not has_required_dataset_type:
    raise ValueError(f'Required dataset type {required_dataset_type} not present in collections {collections} and repo {repo}!')

If the cell above raises an error, you will need a different dataset!

## Object tables

In [7]:
refs = sorted(registry.queryDatasets("diaObjectTable_tract"))
print(len(refs))

3


In [8]:
refs[0].dataId

{skymap: 'hsc_rings_v1', tract: 9615}

In [9]:
objTable = butler.get(refs[0])
objTable

Unnamed: 0_level_0,ra,decl,nDiaSources,radecTai,gPSFluxLinearSlope,gPSFluxLinearIntercept,gPSFluxMAD,gPSFluxMaxSlope,gPSFluxErrMean,gPSFluxMean,...,yPSFluxPercentile05,yPSFluxPercentile25,yPSFluxPercentile50,yPSFluxPercentile75,yPSFluxPercentile95,yPSFluxSigma,yTOTFluxSigma,yPSFluxSkew,yPSFluxChi2,yPSFluxStetsonJ
diaObjectId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
3425264593545461761,217.080213,-0.064201,1,56741.506686,,,,,,,...,-2628.276984,-2628.276984,-2628.276984,-2628.276984,-2628.276984,,,,1.301390e-30,
3425264593545461762,217.080244,-0.069648,1,56741.506686,,,,,,,...,-2140.304091,-2140.304091,-2140.304091,-2140.304091,-2140.304091,,,,0.000000e+00,
3425264593545461763,217.079951,-0.068807,1,56741.506686,,,,,,,...,-2636.178480,-2636.178480,-2636.178480,-2636.178480,-2636.178480,,,,0.000000e+00,
3425264593545461764,217.079667,-0.058519,1,56741.506686,,,,,,,...,-2449.343813,-2449.343813,-2449.343813,-2449.343813,-2449.343813,,,,0.000000e+00,
3425264593545461765,217.079072,-0.049209,1,56741.506686,,,,,,,...,-2099.895280,-2099.895280,-2099.895280,-2099.895280,-2099.895280,,,,0.000000e+00,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3425616437266356022,215.642458,1.494351,1,57110.598862,,,,,,,...,,,,,,,,,,
3425616437266356023,215.642434,1.511678,1,57110.598862,,,,,,,...,,,,,,,,,,
3425616437266356024,215.641859,1.513705,1,57110.598862,,,,,,,...,,,,,,,,,,
3425616437266356025,215.641919,1.515050,1,57110.598862,,,,,,,...,,,,,,,,,,


In [11]:
objTable.columns.values

array(['ra', 'decl', 'nDiaSources', 'radecTai', 'gPSFluxLinearSlope',
       'gPSFluxLinearIntercept', 'gPSFluxMAD', 'gPSFluxMaxSlope',
       'gPSFluxErrMean', 'gPSFluxMean', 'gPSFluxMeanErr', 'gPSFluxNdata',
       'gTOTFluxMean', 'gTOTFluxMeanErr', 'gPSFluxMin', 'gPSFluxMax',
       'gPSFluxPercentile05', 'gPSFluxPercentile25',
       'gPSFluxPercentile50', 'gPSFluxPercentile75',
       'gPSFluxPercentile95', 'gPSFluxSigma', 'gTOTFluxSigma',
       'gPSFluxSkew', 'gPSFluxChi2', 'gPSFluxStetsonJ',
       'rPSFluxLinearSlope', 'rPSFluxLinearIntercept', 'rPSFluxMAD',
       'rPSFluxMaxSlope', 'rPSFluxErrMean', 'rPSFluxMean',
       'rPSFluxMeanErr', 'rPSFluxNdata', 'rTOTFluxMean',
       'rTOTFluxMeanErr', 'rPSFluxMin', 'rPSFluxMax',
       'rPSFluxPercentile05', 'rPSFluxPercentile25',
       'rPSFluxPercentile50', 'rPSFluxPercentile75',
       'rPSFluxPercentile95', 'rPSFluxSigma', 'rTOTFluxSigma',
       'rPSFluxSkew', 'rPSFluxChi2', 'rPSFluxStetsonJ',
       'iPSFluxLinearSlope', 'i

In [42]:
# identify some diaobjects with lots of epochs
filters = ['g','r','i','z','y']
objTable.loc[:,[f'{filt}PSFluxNdata' for filt in filters]].apply(np.sum,axis=1).sort_values()

diaObjectId
3425299777917552444     0.0
3425273389638496005     0.0
3425392136894297560     0.0
3425282185731513377     0.0
3425502088057069082     0.0
                       ... 
3425370146661728262    38.0
3425352554475684013    39.0
3425352554475684103    39.0
3425282185731506409    39.0
3425348156429172907    41.0
Length: 949028, dtype: float64

## Source tables

In [12]:
refs = sorted(registry.queryDatasets("diaSourceTable_tract"))

In [13]:
for ref in refs: print(ref.dataId.full)

{skymap: 'hsc_rings_v1', tract: 9615}
{skymap: 'hsc_rings_v1', tract: 9697}
{skymap: 'hsc_rings_v1', tract: 9813}


In [43]:
sourceTable = butler.get(refs[0])
sourceTable

Unnamed: 0_level_0,ccdVisitId,filterName,diaObjectId,ssObjectId,parentDiaSourceId,midPointTai,bboxSize,ra,decl,x,...,psfFlux_flag_edge,forced_PsfFlux_flag,forced_PsfFlux_flag_noGoodPixels,forced_PsfFlux_flag_edge,shape_flag,shape_flag_no_pixels,shape_flag_not_contained,shape_flag_parent_source,coord_ra,coord_dec
diaSourceId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
201880642781390,94008,y,3425264593545462643,0,0,56741.629032,13,216.960597,-0.073707,615.800232,...,True,True,False,True,False,False,False,False,216.960623,-0.073736
569529843319232,265208,i,3425264593545462643,0,0,56744.616684,21,216.960631,-0.073730,619.563921,...,False,False,False,False,True,False,False,False,216.960623,-0.073736
10270155862966957,4782414,r,3425264593545462643,0,0,57099.633578,15,216.960624,-0.073768,341.150391,...,False,False,False,False,False,False,False,False,216.960623,-0.073736
549826680848770,256033,i,3425264593545462643,0,0,56744.545180,15,216.960553,-0.073701,411.122986,...,False,False,False,False,True,False,False,False,216.960623,-0.073736
10251343906210458,4773654,r,3425264593545462643,0,0,57099.589358,13,216.960628,-0.073753,896.990479,...,False,False,False,False,True,False,False,False,216.960623,-0.073736
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11610333835690094,5406483,y,3425616437266356005,0,0,57110.428346,19,215.616441,1.564203,17.002071,...,True,True,False,True,True,False,False,False,215.616441,1.564203
11610333835690095,5406483,y,3425616437266356006,0,0,57110.428346,146,215.620109,1.579734,359.298730,...,False,False,False,False,True,False,False,False,215.620109,1.579734
11610333835690096,5406483,y,3425616437266356007,0,0,57110.428346,15,215.618651,1.564250,16.924311,...,True,True,False,True,True,False,False,False,215.618651,1.564250
11610333835690097,5406483,y,3425616437266356008,0,0,57110.428346,23,215.623972,1.575879,271.808655,...,False,False,False,False,False,False,False,False,215.623972,1.575879


Let's find a DIAObject with lots of detections

In [45]:
count = sourceTable.iloc[:10000].groupby('diaObjectId').agg(len)

In [47]:
count.iloc[count['filterName'].argmax()]

ccdVisitId                  33
filterName                  33
ssObjectId                  33
parentDiaSourceId           33
midPointTai                 33
                            ..
shape_flag_no_pixels        33
shape_flag_not_contained    33
shape_flag_parent_source    33
coord_ra                    33
coord_dec                   33
Name: 3425264593545461819, Length: 69, dtype: int64

In [49]:
test_DiaObjectId = 3425264593545461819
wt = sourceTable.diaObjectId == test_DiaObjectId
sourceTable[wt]

Unnamed: 0_level_0,ccdVisitId,filterName,diaObjectId,ssObjectId,parentDiaSourceId,midPointTai,bboxSize,ra,decl,x,...,psfFlux_flag_edge,forced_PsfFlux_flag,forced_PsfFlux_flag_noGoodPixels,forced_PsfFlux_flag_edge,shape_flag,shape_flag_no_pixels,shape_flag_not_contained,shape_flag_parent_source,coord_ra,coord_dec
diaSourceId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
201880642781451,94008,y,3425264593545461819,0,0,56741.629032,17,217.029675,-0.06162,934.859375,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
569529843319375,265208,i,3425264593545461819,0,0,56744.616684,25,217.029689,-0.061624,939.045532,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
11663430368887032,5431208,z,3425264593545461819,0,0,57110.626265,40,217.029698,-0.061606,945.013977,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
199310104854790,92811,y,3425264593545461819,0,0,56741.620087,19,217.029715,-0.061674,1294.126587,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
565241318474295,263211,i,3425264593545461819,0,0,56744.599638,29,217.029712,-0.06163,1300.924461,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
11659141844041959,5429211,z,3425264593545461819,0,0,57110.612563,38,217.029708,-0.061622,1299.05481,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
10270155862967070,4782414,r,3425264593545461819,0,0,57099.633578,21,217.029729,-0.061607,638.052612,...,False,False,False,False,True,False,False,False,217.029709,-0.061615
11197868798902916,5214414,g,3425264593545461819,0,0,57106.611946,24,217.029727,-0.061637,630.177799,...,False,False,False,False,False,False,False,False,217.029709,-0.061615
203613662085447,94815,y,3425264593545461819,0,0,56741.63525,17,217.029705,-0.061637,1185.18396,...,False,False,False,False,True,False,False,False,217.029709,-0.061615
554082993439253,258015,i,3425264593545461819,0,0,56744.560834,24,217.029729,-0.061601,1186.386085,...,False,False,False,False,False,False,False,False,217.029709,-0.061615


## Run analysis_tools interactively

Demonstration of running analysis tools interactively in a notbeook by passing in-memory data inputs to create metrics and diagnostic plots.

In [None]:
from lsst.analysis.tools.atools import ShapeSizeFractionalDiff
from lsst.analysis.tools.interfaces._task import _StandinPlotInfo
from lsst.analysis.tools.interfaces._actions import NoPlot

In [None]:
atool = ShapeSizeFractionalDiff()
atool.produce.plot.addSummaryPlot = False

# Do not produce plot; only metric values
#atool.produce.plot = NoPlot() 

# This helps simplify some of the configuration
# by ensuring that appropriate keys are set to 
# load columns that are needed in later steps. 
# This happens automatically when an AnalysisTool 
# is used as a single unit.
atool.populatePrepFromProcess() # Needed to run 

Notice that the returned metric values match summary statistics displayed on the plot

In [None]:
results = atool(objectTable, band='i', skymap=None, plotInfo=_StandinPlotInfo())
results