# Playing around with different margins

In building a margin for ZTF, we wanted to generate at 10 arcseconds. But I wanted to also check with the default margin distance of 5 arcseconds. This notebook just looks at comparing the two data sets of margins. 

Maybe Sean can use both/either margins to see if the cross-match changes much using different margins.

In [1]:
import pandas as pd
from hipscat.catalog.healpix_dataset.healpix_dataset import HealpixDataset
from hipscat.io import file_io, paths

margin_5arcs = "/data3/epyc/data3/hipscat/test_catalogs/ztf_dr14_5arcs"
margin_10arcs = "/data3/epyc/data3/hipscat/catalogs/ztf_dr14_10arcs"

small_margin = HealpixDataset.read_from_hipscat(margin_5arcs)
big_margin = HealpixDataset.read_from_hipscat(margin_10arcs)

In [2]:
small_margin.catalog_info.total_rows

2553574

In [3]:
big_margin.catalog_info.total_rows

5220930

The "small" margin catalog was generated using a 5 arcsecond margin distance, and the "big" margin catalog was generated using a 10 arcsecond distance.

Naively, I would expect that the "big" one would have a little more than twice as many points as the smaller one.

And what do you know:

In [4]:
big_margin.catalog_info.total_rows / small_margin.catalog_info.total_rows

2.044557941144451

I'd also expect that the two catalogs will have the same pixel list.

In [5]:
assert big_margin.partition_info.get_healpix_pixels() == small_margin.partition_info.get_healpix_pixels()

And I'd expect each pixel to have more in the "big" margin, but otherwise look about the same.

So let's pick one:

In [10]:
sample_pixel = small_margin.partition_info.get_healpix_pixels()[502]
sample_pixel

Order: 3, Pixel: 178

In [11]:
sample_pixel_small = paths.pixel_catalog_file(margin_5arcs, sample_pixel.order, sample_pixel.pixel)
small_margin_data = pd.read_parquet(sample_pixel_small)

sample_pixel_big = paths.pixel_catalog_file(margin_10arcs, sample_pixel.order, sample_pixel.pixel)
big_margin_data = pd.read_parquet(sample_pixel_big)

len(big_margin_data) / len(small_margin_data)

2.8181818181818183

In [12]:
stats = small_margin_data.groupby(["margin_Norder", "margin_Npix"]).size().to_frame('size').reset_index()
stats["proportion"] = stats["size"]/len(small_margin_data)*100
stats

Unnamed: 0,margin_Norder,margin_Npix,size,proportion
0,3,167,89,23.116883
1,3,176,120,31.168831
2,3,179,66,17.142857
3,3,184,110,28.571429


In [13]:
stats = big_margin_data.groupby(["margin_Norder", "margin_Npix"]).size().to_frame('size').reset_index()
stats["proportion"] = stats["size"]/len(big_margin_data)*100
stats

Unnamed: 0,margin_Norder,margin_Npix,size,proportion
0,3,167,255,23.502304
1,3,176,312,28.75576
2,3,179,278,25.62212
3,3,184,240,22.119816
