# Mismatch Margin Demonstration

**Author**: Melissa DeLucchi

This notebook demonstrates the behavior when attempting cross match with and without margin catalogs for the right catalog.

For this demonstration, the actual cross-match results aren't that meaningful, so I'm using the unit test data for speed.

In [18]:
import lsdb
from pathlib import Path

data_dir = "/home/delucchi/git/lsdb/tests/data"

small_sky_catalog = lsdb.read_hipscat(Path(data_dir) / "small_sky")
small_sky_xmatch_catalog = lsdb.read_hipscat(Path(data_dir) / "small_sky_xmatch")
small_sky_xmatch_margin = lsdb.read_hipscat(Path(data_dir) / "small_sky_xmatch_margin")


## Missing right margin

Here, we try to perform the cross match without the margin for the *right* catalog. This results in error.

In [19]:

small_sky_catalog.crossmatch(small_sky_xmatch_catalog)

ValueError: Right margin is required for cross-match

## Allow missing margin

Here, we allow a missing margin catalog. This option isn't highly-publicized, because we'd really rather folks use margins for their crossmatches!

Even with the flag to allow missing margin, we STILL spit out a warning.

In [20]:
results = small_sky_catalog.crossmatch(small_sky_xmatch_catalog, require_right_margin=False)



## Bad match width

The KDTree cross match algorithm is the default algorithm used, and it has a cross match radius parameter. The cross match radius should be LESS than the margin radius, so we're not missing counterparts. 

Let's check what the right margin distance is (should be 7_200 arcseconds). We can then attempt a cross match using a larger radius (say, 10_000 arcs).

In [21]:
small_sky_xmatch_margin.hc_structure.catalog_info.margin_threshold

7200

In [22]:
small_sky_xmatch_catalog.margin = small_sky_xmatch_margin
small_sky_catalog.crossmatch(small_sky_xmatch_catalog, radius_arcsec=10_000)

ValueError: Cross match radius is greater than margin threshold

## Everything is ok

Bringing it all together, this crossmatch uses the toy "small sky" catalog, and the "small sky xmatch" catalog. The second catalog has had the radec perturbed, and the perturbed distance recorded as a column. 

The cross match is successful AND doesn't issue warnings because we have provided a valid margin catalog, and we have provided a radius that is within the margin threshold.

Further, we use the `_dist_arcsec` column of the result to check that the match distance matches our expectations from the initial perturbation.

In [23]:
small_sky_xmatch_catalog.margin = small_sky_xmatch_margin
results = small_sky_catalog.crossmatch(small_sky_xmatch_catalog, radius_arcsec=1_000).compute()
trimmed_results = results[["id_small_sky", "id_small_sky_xmatch", "calculated_dist_small_sky_xmatch", "_dist_arcsec"]]
trimmed_results["_dist_degrees"] = trimmed_results["_dist_arcsec"] / 3600
trimmed_results

Unnamed: 0_level_0,id_small_sky,id_small_sky_xmatch,calculated_dist_small_sky_xmatch,_dist_arcsec,_dist_degrees
_hipscat_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
12749688880727326720,707,707,0.006712,24.162094,0.006712
12751184493818150912,792,792,0.010034,36.121268,0.010034
12753202806647685120,723,723,0.004231,15.231399,0.004231
12753202806647685121,811,723,0.004231,15.231399,0.004231
12770681119980912640,826,826,0.003309,11.912542,0.003309
...,...,...,...,...,...
13467391906581315584,715,715,0.009322,33.558031,0.009322
13477206946360590336,782,782,0.004931,17.750196,0.004931
13488986123334057984,752,752,0.010022,36.080967,0.010022
13520476867982786560,746,746,0.008176,29.434745,0.008176
