# User's guide to joining DP1 Photo-Z outputs with the DP1 butler data

**Author**: Bryce Kalmbach

**Last Updated**: 2025/07/11

## Purpose

Public DP1 redshifts are available for DP1 users (see the [DP1 PZ technote](https://sitcomtn-154.lsst.io) for more information). 

This notebook provides a brief example of getting the redshifts as served by the LINCC frameworks LSDB ([Caplar et al., 2025](https://ui.adsabs.harvard.edu/abs/2025arXiv250102103C/abstract)) and available via the portal at https://data.lsdb.io/. We then join the redshifts with the available object table in the DP1 butler to calculate magnitudes and colors for all the objects.

In [None]:
## Variables for Times Square
# Specify the names of two algorithms to compare
algo_1 = 'knn'
algo_2 = 'bpz'
# Which type of flux model do you want to use?
flux_model = 'gaap1p0'
# For which bands do you want to require photometry in the object table? (Specify all in a single string)
required_bands = 'griz'

## Initial Setup

In [None]:
# Imports
import numpy as np
import pandas as pd
from lsst.daf.butler import Butler
from astropy.table import Table, join
from astropy import units as u
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
# Get a DP1 butler instance
butler = Butler('dp1')

In [None]:
# Download the PZ data from LSDB portal and put into an astropy table.
pz_table = Table.read('https://data.lsdb.io/hats/dp1/object_photoz.parquet')

## Available algorithms

The available algorithms that appear in the DP1 PZ data set and can be specified at the top of the notebook.

In [None]:
print(f'Available PZ algorithms in DP1: {[col_name.split('_')[0] for col_name in pz_table.columns if col_name.endswith('z_mode')]}')

## Pull data from butler

Now we pull a subset of the columns from the object table for one of the tracts covered in DP1. Here we include the `objectId` to join this table with our PZ table. We also pull some extendedness parameters and `psfFlux` measurements to do some basic quality cuts like the ones included in the DP1 PZ technote. We then pull the fluxes and flux errors for the desired `flux_model`.

In [None]:
obj_columns = ['objectId', 'coord_ra', 'coord_dec', 
               'g_extendedness', 'r_extendedness',
               f'u_{flux_model}Flux', f'g_{flux_model}Flux', 
               f'r_{flux_model}Flux', f'i_{flux_model}Flux', 
               f'z_{flux_model}Flux', f'y_{flux_model}Flux',
               f'u_{flux_model}FluxErr', f'g_{flux_model}FluxErr', 
               f'r_{flux_model}FluxErr', f'i_{flux_model}FluxErr', 
               f'z_{flux_model}FluxErr', f'y_{flux_model}FluxErr']
if flux_model != 'psf':
    obj_columns.append('i_psfFlux')
    obj_columns.append('i_psfFluxErr')

In [None]:
obj_table = butler.get('object', 
                       instrument='LSSTComCam', 
                       skymap='lsst_cells_v1', 
                       tract=5063, 
                       collections='LSSTComCam/DP1', 
                       parameters={'columns': obj_columns})

### Object Table Quality Cuts

In [None]:
obj_table = obj_table[(obj_table['g_extendedness'].mask > 0.5) | (obj_table['r_extendedness'].mask > 0.5)]

In [None]:
obj_table = obj_table[(obj_table['i_psfFlux'] / obj_table['i_psfFluxErr']) > 5.]

## Join Tables

We join the tables on the `objectId` columns.

In [None]:
combined_table = join(pz_table, obj_table, keys_left='objectId', keys_right='objectId')

## Calculate magnitudes and subselect data with required photometry

Using the `required_bands` variable at the top of the notebook we can require that objects have data in the object table for the given set of bands.

In [None]:
required_bands_present = [True] * len(combined_table)
band_list = ['u', 'g', 'r', 'i', 'z', 'y']
required_band_list = [band_label for band_label in required_bands]
for band_label in band_list:
    combined_table[f'{band_label}_{flux_model}Mag'] = \
        (combined_table[f'{band_label}_{flux_model}Flux']*u.nJy).to(u.ABmag)
    if band_label in required_band_list:
        required_bands_present *= ~np.isnan(combined_table[f'{band_label}_{flux_model}Mag'])

In [None]:
combined_table = combined_table[np.where(required_bands_present == True)]
print(f'Total number of objects with observations in all required bands ({required_band_list}): {len(combined_table)}')

## Make some plots!

Time to play with the data!

### Color-Redshift Plots

Here we make some color-redshift plots for each of the two algorithms specified at the top of the notebook and compare them side-by-side.

In [None]:
fig = plt.figure(figsize=(12, 20))
plot_idx = 1
for band_1, band_2 in zip(band_list[:-1], band_list[1:]):
    color_on = combined_table[f'{band_1}_{flux_model}Mag'] - combined_table[f'{band_2}_{flux_model}Mag']
    
    fig.add_subplot(5, 2, plot_idx) 
    plt.scatter(combined_table[f'{algo_1}_z_mode'], color_on, s=2, alpha=0.1)
    plt.title(f'{band_1} - {band_2} color vs. photo-z for {algo_1}')
    plt.ylim(-3, 5)
    plt.xlabel('Photometric Redshift')
    plt.ylabel(f'{band_1} - {band_2} color with {flux_model}')
    
    plot_idx += 1

    fig.add_subplot(5, 2, plot_idx) 
    plt.scatter(combined_table[f'{algo_2}_z_mode'], color_on, s=2, alpha=0.1)
    plt.title(f'{band_1} - {band_2} color vs. photo-z for {algo_2}')
    plt.ylim(-3, 5)
    plot_idx += 1
    plt.xlabel('Photometric Redshift')
    plt.ylabel(f'{band_1} - {band_2} color with {flux_model}')
plt.tight_layout()

### n(z) plots

Here we compare the overall estimated n(z) distributions between two algorithms.

In [None]:
fig = plt.figure(figsize=(14, 6))

bins = np.linspace(0, 3, 11) # 10 bins

fig.add_subplot(1,2,1)
plt.hist(combined_table[f'{algo_1}_z_mode'], bins=bins)
plt.title(f'n(z) for Photo-Z Algorithm: {algo_1}')
plt.xlabel('Photo-Z')
plt.ylabel('Count')

fig.add_subplot(1,2,2)
plt.hist(combined_table[f'{algo_2}_z_mode'], bins=bins)
plt.title(f'n(z) for Photo-Z Algorithm: {algo_2}')
plt.xlabel('Photo-Z')
plt.ylabel('Count')

In [None]:
plt.hist(combined_table[f'{algo_1}_z_mode'], alpha=0.2, label=algo_1, bins=bins)
plt.hist(combined_table[f'{algo_2}_z_mode'], alpha=0.2, label=algo_2, bins=bins)
plt.legend()
plt.xlabel('Photo-Z')
plt.ylabel('Count')
plt.title('Comparing n(z) for two algorithms')