Author: Neven Caplar and the LINCC Frameworks team

Last updated: July 07, 2025

# Bringing it together for Rubin data Collections

In this tutorial, you will learn:

- What are collections
- How to get to Rubin data

## Visualize periodic lightcurves in Rubin data

In [1]:
import astropy.units as u
import lsdb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import lsdb

from dask.distributed import Client
from io import StringIO
from nested_pandas import NestedDtype
from pathlib import Path

import warnings

# Disable plotting warnings
warnings.filterwarnings("ignore")

Rubin data are organized in so-called ``collections``. A collection is the logical container for a group of catalogs that belong together. This often corresponds to a data release, a processing run, or a particular science product family. What this means for you is that you dont have to specify margin catalogs explicity (as we had to do in Notebook 1); the margin catalog is already preloaded for you.

In [2]:
obj_catalog = lsdb.open_catalog("/rubin/lincc_lsb_data/object_collection")
dia_catalog = lsdb.open_catalog("/rubin/lincc_lsb_data/dia_object_collection")

When you open a catalog in LSDB using lsdb.open_catalog, it initializes the catalog by reading only its metadata and schema—not the actual data—unless explicitly requested. In this case, out of 1304 total columns in the catalog, only 42 have been loaded lazily, meaning LSDB has registered their existence and structure, but no data has been read from disk for those columns yet. This lazy-loading behavior improves performance and memory efficiency, especially when dealing with large catalogs, by deferring I/O operations until a computation or filtering action requires access to the actual column values.

In [3]:
obj_catalog

Unnamed: 0_level_0,coord_dec,coord_decErr,coord_ra,coord_raErr,g_psfFlux,g_psfFluxErr,g_psfMag,g_psfMagErr,i_psfFlux,i_psfFluxErr,i_psfMag,i_psfMagErr,objectId,patch,r_psfFlux,r_psfFluxErr,r_psfMag,r_psfMagErr,refBand,refFwhm,shape_flag,shape_xx,shape_xy,shape_yy,tract,u_psfFlux,u_psfFluxErr,u_psfMag,u_psfMagErr,x,xErr,y,y_psfFlux,y_psfFluxErr,y_psfMag,y_psfMagErr,yErr,z_psfFlux,z_psfFluxErr,z_psfMag,z_psfMagErr,objectForcedSource
npartitions=389,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1
"Order: 6, Pixel: 130",double[pyarrow],float[pyarrow],double[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],int64[pyarrow],int64[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],string[pyarrow],float[pyarrow],bool[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],int64[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],double[pyarrow],float[pyarrow],double[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],"nested<coord_ra: [double], coord_dec: [double]..."
"Order: 8, Pixel: 2176",...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Order: 9, Pixel: 2302101",...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Order: 7, Pixel: 143884",...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...


### Choose the variable objects

We have a selection of 10 variable objects, which were found from previous analysis of forced photometry on science images. At this point we can consider these as object from a separate catalog that we have loaded.

In [4]:
variables_csv = \
"""ra,dec,period
94.95546,-24.73952,0.12095
95.30235,-25.27431,0.12248
94.91626,-24.69648,0.12038
95.12418,-25.04329,0.23554
58.83506,-48.79122,0.56335
94.92264,-25.23185,0.07672
94.72086,-25.05767,0.17559
94.97073,-25.13643,0.12048
59.12997,-48.78522,0.11628
94.72086,-25.05767,0.17554
"""
variables_df = pd.read_csv(StringIO(variables_csv)).reset_index()

# Transform the DataFrame into a LSDB Catalog. Not necesarry to crossmatch, but to showcase the ability to do so.
variables_catalog = lsdb.from_dataframe(variables_df)
variables_catalog

Unnamed: 0_level_0,index,ra,dec,period
npartitions=2,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"Order: 2, Pixel: 80",int64[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]
"Order: 5, Pixel: 8582",...,...,...,...


In [5]:
XMATCH_RADIUS_ARCSEC = 0.2
variable_object = variables_catalog.crossmatch(
    obj_catalog, radius_arcsec=XMATCH_RADIUS_ARCSEC, suffixes=["_var", "_obj"]
)
variable_dia = variables_catalog.crossmatch(
    dia_catalog, radius_arcsec=XMATCH_RADIUS_ARCSEC, suffixes=["_var", "_dia"]
)
# The result will have all the columns in obj_catalog suffixed with `_obj`,
# all the columns in dia_catalog suffixed with `_dia`, as well as all the
# columns in `variables_df` suffixed with `_var_obj`
result = variable_object.join(
    variable_dia, left_on="index_var", right_on="index_var", suffixes=["_obj", "_dia"]
)

And apply filtering according to the quality flags:

In [6]:
for column in ["objectForcedSource_obj", "diaSource_dia", "diaObjectForcedSource_dia"]:
    result = result.query(
        f"~{column}.psfFlux_flag"
        f" and ~{column}.pixelFlags_saturated"
        f" and ~{column}.pixelFlags_cr"
        f" and ~{column}.pixelFlags_bad"
    )
result

Unnamed: 0_level_0,index_var_obj,ra_var_obj,dec_var_obj,period_var_obj,coord_dec_obj_obj,coord_decErr_obj_obj,coord_ra_obj_obj,coord_raErr_obj_obj,g_psfFlux_obj_obj,g_psfFluxErr_obj_obj,g_psfMag_obj_obj,g_psfMagErr_obj_obj,i_psfFlux_obj_obj,i_psfFluxErr_obj_obj,i_psfMag_obj_obj,i_psfMagErr_obj_obj,objectId_obj_obj,patch_obj_obj,r_psfFlux_obj_obj,r_psfFluxErr_obj_obj,r_psfMag_obj_obj,r_psfMagErr_obj_obj,refBand_obj_obj,refFwhm_obj_obj,shape_flag_obj_obj,shape_xx_obj_obj,shape_xy_obj_obj,shape_yy_obj_obj,tract_obj_obj,u_psfFlux_obj_obj,u_psfFluxErr_obj_obj,u_psfMag_obj_obj,u_psfMagErr_obj_obj,x_obj_obj,xErr_obj_obj,y_obj_obj,y_psfFlux_obj_obj,y_psfFluxErr_obj_obj,y_psfMag_obj_obj,y_psfMagErr_obj_obj,yErr_obj_obj,z_psfFlux_obj_obj,z_psfFluxErr_obj_obj,z_psfMag_obj_obj,z_psfMagErr_obj_obj,objectForcedSource_obj_obj,_dist_arcsec_obj,index_var_dia,ra_var_dia,dec_var_dia,period_var_dia,dec_dia_dia,diaObjectId_dia_dia,nDiaSources_dia_dia,ra_dia_dia,radecMjdTai_dia_dia,tract_dia_dia,diaObjectForcedSource_dia_dia,diaSource_dia_dia,_dist_arcsec_dia
npartitions=8,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1
"Order: 8, Pixel: 329721",int64[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],float[pyarrow],double[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],int64[pyarrow],int64[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],string[pyarrow],float[pyarrow],bool[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],int64[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],double[pyarrow],float[pyarrow],double[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],float[pyarrow],"nested<coord_ra: [double], coord_dec: [double]...",double[pyarrow],int64[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],int64[pyarrow],int64[pyarrow],double[pyarrow],double[pyarrow],int64[pyarrow],"nested<band: [string], coord_dec: [double], co...","nested<band: [string], centroid_flag: [bool], ...",double[pyarrow]
"Order: 9, Pixel: 1324352",...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Order: 9, Pixel: 2197038",...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Order: 9, Pixel: 2197120",...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...


In [7]:
with Client(n_workers=4, threads_per_worker=1) as client:
    # Sort light curves by variable object index for plotting purposes
    result_df = result.compute().sort_values("index_var_obj")

2025-07-07 20:58:26,230 - distributed.worker - ERROR - Compute Failed
Key:       ('lambda-68aad3370eed01fea4a384a64aa6c172', 7)
State:     executing
Task:  <Task ('lambda-68aad3370eed01fea4a384a64aa6c172', 7) apply_and_enforce(..., ...)>
Exception: 'UndefinedVariableError("name \'objectForcedSource_obj\' is not defined")'
Traceback: '  File "/opt/lsst/software/stack/conda/envs/lsst-scipipe-10.0.0/lib/python3.12/site-packages/dask/dataframe/core.py", line 98, in apply_and_enforce\n    df = func(*args, **kwargs)\n         ^^^^^^^^^^^^^^^^^^^^^\n  File "/home/nevencaplar/.local/lib/python3.12/site-packages/lsdb/nested/core.py", line 550, in <lambda>\n    lambda x: npd.NestedFrame(x).query(expr), meta=self._meta\n              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/home/nevencaplar/.local/lib/python3.12/site-packages/nested_pandas/nestedframe/core.py", line 826, in query\n    nest_names = self.extract_nest_names(expr, **kwargs)\n                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n 







UndefinedVariableError: name "name 'objectForcedSource_obj' is not defined" is not defined

In [None]:
    result_df 


In [None]:
COLORS = {
    "u": "#56b4e9",
    "g": "#009e73",
    "r": "#f0e442",
    "i": "#cc79a7",
    "z": "#d55e00",
    "y": "#0072b2",
}

In [None]:
def plot_mag_lightcurves(ax, row):
    """Plot magnitude light curves from DIA source, DIA forced source and forcedSource"""
    # Define flux types for each dataset
    datasets = [
        ("scienceFlux", "diaSourceTable_tract", row.diaSource_dia),
        ("psfDiffFlux", "forcedSourceOnDiaObjectTable", row.diaForcedSource_dia),
        ("psfFlux", "forcedSourceTable", row.forcedSource_obj),
    ]
    all_mags = [[], []]  # To store magnitudes for each row
    for i, (flux_col, table_name, lc) in enumerate(datasets):
        title = f"{flux_col} from {table_name}"
        flux_err_col = f"{flux_col}Err"
        ax[0, i].set_title(title)
        # Compute phase
        lc = lc.assign(
            phase=(lc.midpointMjdTai - lc.midpointMjdTai.loc[lc.psfFlux.idxmax()])
            % row.period_var_obj
            / row.period_var_obj
        )
        # First row: original light curve
        all_mags[0].extend(
            plot_mag_scale(
                ax[0, i],
                lc,
                flux_col,
                flux_err_col,
                x_name="midpointMjdTai",
                x_label="MJD",
                show_legend=(i == 0),
            )
        )
        # Second row: folded light curve
        all_mags[1].extend(
            plot_mag_scale(
                ax[1, i], lc, flux_col, flux_err_col, x_name="phase", x_label="Phase"
            )
        )
    return all_mags


def plot_mag_scale(ax, lc, flux_col, flux_err_col, x_name, x_label, show_legend=False):
    """Plot light curves in magnitude scale"""
    mag_values = []  # Store magnitudes for setting axis limits
    for band, color in COLORS.items():
        band_lc = lc.query(f"band == '{band}'")
        # Compute magnitudes and errors
        mag, magErr = create_mag_errors(band_lc[flux_col], band_lc[flux_err_col])
        ax.errorbar(
            band_lc[x_name],
            mag,
            magErr,
            fmt="o",
            label=band,
            color=color,
            alpha=1,
            markersize=5,
            capsize=3,
            elinewidth=1,
        )
        mag_values.extend(mag.dropna().values)  # Collect magnitude values
    ax.set_xlabel(x_label)
    ax.set_ylabel("Magnitude (AB)")
    ax.invert_yaxis()  # Magnitudes are plotted with brighter objects lower
    if show_legend:
        ax.legend(loc="lower right", fontsize=12)  # Show legend in top-left panel only
    return mag_values  # Return magnitudes for axis scaling


def create_mag_errors(sciFlux, sciFluxErr):
    """Move flux into magnitudes and calculate the error on the magnitude"""
    mag = u.nJy.to(u.ABmag, sciFlux)
    upper_mag = u.nJy.to(u.ABmag, sciFlux + sciFluxErr)
    lower_mag = u.nJy.to(u.ABmag, sciFlux - sciFluxErr)
    magErr = -(upper_mag - lower_mag) / 2
    return mag, magErr


def scale_mag_y_axis(ax, all_mags):
    """Set uniform y-axis scaling for each plot row"""
    for row_idx in range(2):
        if all_mags[row_idx]:  # Ensure we have data
            ymin, ymax = np.nanmin(all_mags[row_idx]), np.nanmax(all_mags[row_idx])
            for i in range(3):  # Apply limits to all columns in the row
                ax[row_idx, i].set_ylim(
                    ymax + 0.1, ymin - 0.1
                )  # Keep magnitude inverted

In [None]:
for _, row in result_df.iterrows():
    fig, ax = plt.subplots(2, 3, figsize=(16, 8), sharex="row")  # 2 rows, 3 columns
    fig.suptitle(
        f"{drp_release} | RA={row.ra_var_obj:.5f}, Dec={row.dec_var_obj:.5f}",
        fontsize=16,
    )
    all_mags = plot_mag_lightcurves(ax, row)
    scale_mag_y_axis(ax, all_mags)
    plt.tight_layout()
    plt.show()

- Find the same objects in ZTF
- do any have photo z

Excercises:
- same, but for AGN 