## HATS Data Preview 1 on RSP

This notebook tests access to Data Preview 1 (DP1) data in the HATS format. 

**Goal:** To load a randomized sample of the data, to be used for scale testing within the RSP.

In [None]:
# if not previously installed
# %pip install lsdb --quiet

In [1]:
import lsdb
import numpy as np
from upath import UPath

In [6]:
base_path = UPath("/rubin/lincc_lsb_data")
object_collection = lsdb.read_hats(base_path / "object_collection_lite")

In [7]:
pixel_statistics = object_collection.per_pixel_statistics()
pixel_counts = pixel_statistics["objectId: row_count"].astype(np.int64) 

In [8]:
partition_indices = []
for percentile in [10, 50, 90]:
    q = np.percentile(pixel_counts, percentile)
    print(f"Percentile: {percentile}, Quartile: {q}")
    index = int(np.argmin(np.abs(pixel_counts - q)))
    closest_value = pixel_counts.iloc[index]
    print(f"Closest value: {closest_value}, partition index: {index}")
    partition_indices.append(index)

Percentile: 10, Quartile: 833.0000000000001
Closest value: 926, partition index: 15
Percentile: 50, Quartile: 24624.0
Closest value: 24624, partition index: 35
Percentile: 90, Quartile: 138144.0000000001
Closest value: 132427, partition index: 3


In [11]:
for index in partition_indices:
    print(f"Sampling partition {index} of size {pixel_counts.iloc[index]}")
    %timeit object_collection.sample(index, n=100, seed=10)

Sampling partition 15 of size 926
82.4 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sampling partition 35 of size 24624
13.2 s ± 71.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sampling partition 3 of size 132427
3.97 s ± 86.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
