# Standalone crossmatch method: let's crossmatch frames, too

New crossmatch method just dropped: `lsdb.crossmatch(left, right, ...)`.

This is a wrapper around `Catalog.crossmatch` that accepts `pd.DataFrame`s and `npd.NestedFrame`s as either or both sides of the crossmatch.

This enables crossmatches:
- Catalog x Catalog
- Catalog x Frame
- Frame x Catalog
- Frame x Frame

In [2]:
import lsdb
import pandas as pd
import tempfile
from dask.distributed import Client

In [None]:
# Dask client

tmp_dir = tempfile.TemporaryDirectory()
tmp_path = str(tmp_dir.name)

client = Client(n_workers=1, local_directory=tmp_path)
client

Perhaps you already have a cluster running?
Hosting the HTTP server on port 40143 instead


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:40143/status,

0,1
Dashboard: http://127.0.0.1:40143/status,Workers: 1
Total threads: 96,Total memory: 0.98 TiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:37931,Workers: 1
Dashboard: http://127.0.0.1:40143/status,Total threads: 96
Started: Just now,Total memory: 0.98 TiB

0,1
Comm: tcp://127.0.0.1:40730,Total threads: 96
Dashboard: http://127.0.0.1:34849/status,Memory: 0.98 TiB
Nanny: tcp://127.0.0.1:38630,
Local directory: /tmp/tmpsigdm71s/dask-scratch-space/worker-brkn6g03,Local directory: /tmp/tmpsigdm71s/dask-scratch-space/worker-brkn6g03


## Get data

In [None]:
# Get catalogs

des_cat = lsdb.read_hats('https://data.lsdb.io/hats/des/des_dr2', columns=['RA', 'DEC', ])
ztf_cat = lsdb.read_hats(
    "https://data.lsdb.io/hats/ztf_dr14/ztf_object",
    margin_cache="https://data.lsdb.io/hats/ztf_dr14/ztf_object_10arcs",
    columns=['ra', 'dec', 'mean_mag_g', 'mean_mag_r', 'mean_mag_i']
)

In [None]:
# Get frames

des_frame = des_cat.head(2000)
ztf_frame = ztf_cat.head(2000)

## Catalog x Catalog

Essentially a wrapper:

In [33]:
cat_x_cat = lsdb.crossmatch(des_cat, ztf_cat)

cat_x_cat

Unnamed: 0_level_0,RA_des_dr2,DEC_des_dr2,ra_ztf_dr14,dec_ztf_dr14,mean_mag_g_ztf_dr14,mean_mag_r_ztf_dr14,mean_mag_i_ztf_dr14,_dist_arcsec
npartitions=653,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Order: 4, Pixel: 0",double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]
"Order: 5, Pixel: 8",...,...,...,...,...,...,...,...
...,...,...,...,...,...,...,...,...
"Order: 4, Pixel: 3069",...,...,...,...,...,...,...,...
"Order: 4, Pixel: 3071",...,...,...,...,...,...,...,...


## Catalog x Frame

In [34]:
cat_x_frame = lsdb.crossmatch(des_cat, ztf_frame)

cat_x_frame

Unnamed: 0_level_0,RA_des_dr2,DEC_des_dr2,ra_right,dec_right,mean_mag_g_right,mean_mag_r_right,mean_mag_i_right,_dist_arcsec
npartitions=3,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Order: 4, Pixel: 0",double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]
"Order: 5, Pixel: 4437",...,...,...,...,...,...,...,...
"Order: 1, Pixel: 22",...,...,...,...,...,...,...,...


### Catalog name, suffixes

In the absence of a `catalog_name` argument given, we default to naming the news `Catalog`s we make from `Frame`s either "left" or "right".

In [35]:
# See this in the default suffixes for the columns:

cat_x_frame.columns

Index(['RA_des_dr2', 'DEC_des_dr2', 'ra_right', 'dec_right',
       'mean_mag_g_right', 'mean_mag_r_right', 'mean_mag_i_right',
       '_dist_arcsec'],
      dtype='object')

But if we do specify a `catalog_name`, this becomes our default suffix instead.

In [36]:
cat_x_frame = lsdb.crossmatch(des_cat, ztf_frame, right_args={"catalog_name": "ztf"})

cat_x_frame.columns

Index(['RA_des_dr2', 'DEC_des_dr2', 'ra_ztf', 'dec_ztf', 'mean_mag_g_ztf',
       'mean_mag_r_ztf', 'mean_mag_i_ztf', '_dist_arcsec'],
      dtype='object')

And naturally, we can still specify our own suffixes if we prefer.

In [37]:
cat_x_frame = lsdb.crossmatch(des_cat, ztf_frame, right_args={"catalog_name": "ztf"}, suffixes=("_des", "_ztf_from_frame"))

cat_x_frame.columns

Index(['RA_des', 'DEC_des', 'ra_ztf_from_frame', 'dec_ztf_from_frame',
       'mean_mag_g_ztf_from_frame', 'mean_mag_r_ztf_from_frame',
       'mean_mag_i_ztf_from_frame', '_dist_arcsec'],
      dtype='object')

## Frame x Catalog

### Ra, dec columns

For convenience, we check for "RA" and "DEC" columns if the default "ra" and "dec" are not present.

This is nice in cases like our DES frame, which we just obtained via `.head()` with no further modifications such as renaming the columns:

In [38]:
des_frame.columns

Index(['RA', 'DEC'], dtype='object')

In [39]:
frame_x_cat = lsdb.crossmatch(des_frame, ztf_cat)

frame_x_cat

Unnamed: 0_level_0,RA_left,DEC_left,ra_ztf_dr14,dec_ztf_dr14,mean_mag_g_ztf_dr14,mean_mag_r_ztf_dr14,mean_mag_i_ztf_dr14,_dist_arcsec
npartitions=1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Order: 3, Pixel: 0",double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]


But, we can still specify the column names if desired:

In [40]:
frame_x_cat = lsdb.crossmatch(des_frame, ztf_cat, left_args={"ra_column": "RA", "dec_column": "DEC"})

frame_x_cat

Unnamed: 0_level_0,RA_left,DEC_left,ra_ztf_dr14,dec_ztf_dr14,mean_mag_g_ztf_dr14,mean_mag_r_ztf_dr14,mean_mag_i_ztf_dr14,_dist_arcsec
npartitions=1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Order: 3, Pixel: 0",double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]


## Frame x Frame

In [62]:
frame_x_frame = lsdb.crossmatch(des_frame, ztf_frame)

frame_x_frame

Unnamed: 0_level_0,RA_left,DEC_left,ra_right,dec_right,mean_mag_g_right,mean_mag_r_right,mean_mag_i_right,_dist_arcsec
npartitions=1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Order: 0, Pixel: 0",double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]


### Margins

The default `margin_threshold` used when converting a `Frame` to a `Catalog` is `5.0`.

If we want, we can set this to `None` to avoid using margins in our right catalog.

In [63]:
frame_x_frame_no_margins = lsdb.crossmatch(des_frame, ztf_frame, right_args={"margin_threshold": None})

frame_x_frame_no_margins



Unnamed: 0_level_0,RA_left,DEC_left,ra_right,dec_right,mean_mag_g_right,mean_mag_r_right,mean_mag_i_right,_dist_arcsec
npartitions=1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"Order: 0, Pixel: 0",double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow],double[pyarrow]


In [64]:
with_margins = frame_x_frame.compute()
no_margins = frame_x_frame_no_margins.compute()

print(len(with_margins), len(no_margins))

242 242


Well...you'll have to take my word for it 🙃

*(We do see the warning in the no-margin cell above, but I'd love to hear if anyone has any suggestions off the top of their head for catalogs or settings that would give different results for margin/no margin.)*

## Clean up

In [69]:
tmp_dir.cleanup()