## Reimport associations

This notebook demonstrates how to repartition association catalogs with hats-import.

In [1]:
import lsdb
import pyarrow.parquet as pq
import tempfile
import hats.io.paths as paths
from hats_import.pipeline import pipeline
from hats_import.association.arguments import AssociationArguments

In [2]:
tmp_path = tempfile.TemporaryDirectory()
original_path = f"{tmp_path.name}/original"
reimported_path = f"{tmp_path.name}/reimported"

These catalogs are currently (exclusively) generated by LSDB:

In [3]:
small_sky = lsdb.open_catalog("small_sky")
small_sky_source = lsdb.open_catalog("small_sky_order1_source_collection")

In [4]:
assoc = small_sky.crossmatch(small_sky_source, suffixes=("_obj","_src"), radius_arcsec=3600)

In [5]:
lsdb.io.to_association(
    assoc[["id_obj", "object_id_src", "_dist_arcsec"]],
    base_catalog_path=original_path,
    catalog_name="small_sky_object_source_association",
    # Association properties
    primary_catalog_dir="small_sky",
    primary_column_association="id_obj",
    primary_id_column="id",
    join_catalog_dir="small_sky_order1_source_collection",
    join_column_association="object_id_src",
    join_id_column="object_id",
    separation_column="_dist_arcsec",
)

The catalog was imported at order 1:

In [6]:
!cat $original_path/partition_info.csv

Norder,Npix
1,44
1,45
1,46
1,47


We can re-run the import with a set of `AssociationArguments`:

In [7]:
args = AssociationArguments.reimport_from_hats(
    original_path,
    reimported_path,
    output_artifact_name="small_sky_assoc",
    constant_healpix_order=2,
    simple_progress_bar=True,
    resume=False,
)
pipeline(args)

Validating catalog at path /var/folders/x4/rmzh8l_s0zxc74nwr72z12340000gn/T/tmpi49h17we/original ... 
Found 4 partitions.
Approximate coverage is 8.33 % of the sky.


Planning  : 100%|██████████| 4/4 [00:00<00:00, 2460.36it/s]
Mapping   : 100%|██████████| 4/4 [00:01<00:00,  3.64it/s]
Binning   : 100%|██████████| 2/2 [00:00<00:00, 532.64it/s]
Splitting : 100%|██████████| 4/4 [00:00<00:00, 215.71it/s]
Reducing  : 100%|██████████| 14/14 [00:00<00:00, 275.42it/s]
Finishing : 100%|██████████| 5/5 [00:00<00:00, 474.52it/s]


#### Some checks

The new catalogs has only pixels of order 2:

In [8]:
reimported_dir = f"{reimported_path}/small_sky_assoc"
!cat $reimported_dir/partition_info.csv

Norder,Npix
2,176
2,177
2,178
2,179
2,180
2,181
2,182
2,183
2,184
2,185
2,186
2,187
2,188
2,190


The association properties are in place:

In [9]:
!cat $reimported_dir/hats.properties

#HATS catalog
obs_collection=small_sky_assoc
dataproduct_type=association
hats_nrows=131
hats_primary_table_url=small_sky
hats_col_assn_primary=id
hats_col_assn_primary_assn=id_obj
hats_assn_join_table_url=small_sky_order1_source_collection
hats_col_assn_join=object_id
hats_col_assn_join_assn=object_id_src
hats_assn_max_separation=436.93257
hats_assn_leaf_files=True
hats_npix_suffix=.parquet
hats_builder=hats-import v0.6.5.dev7+g691a60bac.d20250918, hats v0.6.5.dev4+ga69bbd9de
hats_creation_date=2025-09-25T15:02UTC
hats_estsize=39
hats_release_date=2024-09-18
hats_version=v0.1
hats_order=2
moc_sky_fraction=0.07292
hats_max_rows=1000000


The skymaps were not written:

In [10]:
!ls $reimported_dir

[1m[36mdataset[m[m            hats.properties    partition_info.csv properties


The schema is the same:

In [11]:
original_common_md = paths.get_common_metadata_pointer(original_path)
expected_parquet_schema = pq.read_metadata(original_common_md).schema.to_arrow_schema()
expected_parquet_schema

id_obj: int64
object_id_src: int64
_dist_arcsec: double
_healpix_29: int64

In [12]:
new_schema = paths.get_common_metadata_pointer(reimported_dir)
schema = pq.read_metadata(new_schema).schema.to_arrow_schema()
assert schema.equals(expected_parquet_schema)