## Periods in Rubin vs Gaia

In this notebook we will compare periods for objects in Rubin with those we get from Gaia.

In [None]:
import lsdb
import pandas as pd

from nested_pandas import NestedDtype
from pathlib import Path

In [None]:
def cast_nested(df, columns):
    return df.assign(
        **{
            col: df[col].astype(NestedDtype.from_pandas_arrow_dtype(df.dtypes[col]))
            for col in columns
        },
    )

In [None]:
drp_release = "w_2025_10"
base_dir = Path("/sdf/data/rubin/shared/lsdb_commissioning/hats")
hats_dir = base_dir / drp_release

### Load Rubin

In [None]:
rubin_variables = lsdb.read_hats("rubin_variables")
# We use the `cast_nested` utility method to cast columns into the NestedFrame type
rubin_variables = rubin_variables.map_partitions(cast_nested, columns=["forcedSource"])

Let's also get the `i_psfMag` values for each Rubin object. It will help us compare the brightnesses of Rubin objects to those of the corresponding matches in Gaia. 

Comparing Gaia `RP` with Rubin's `i` band is what seems to make the most sense (see how the filters overlap [here](http://svo2.cab.inta-csic.es/svo/theory/fps3/index.php?mode=browse&gname=LSST&asttype=:)).

In [None]:
object_lc = lsdb.read_hats(
    hats_dir / "object_lc",
    margin_cache=hats_dir / "object_lc_5arcs",
    columns=["objectId", "i_psfMag"],
)

In [None]:
rubin_variables = rubin_variables.join(
    object_lc, left_on="objectId", right_on="objectId", suffixes=("", "")
)
rubin_variables

### Load Gaia

In [None]:
gaia_dr3 = lsdb.read_hats(
    "https://data.lsdb.io/hats/gaia_dr3/gaia",
    margin_cache="https://data.lsdb.io/hats/gaia_dr3/gaia_10arcs",
)

The main Gaia catalog does not contain the measured periods. We also do not have the epoch photometry catalog to run the Lombscargle. I'll be following Doug's approach here: download two variability catalogs from Vizier which have well-defined periods for RRLyrae and Cepheid. 

In [None]:
# Use a universal set of columns
shared_cols = ["Source", "SolID", "RA_ICRS", "DE_ICRS", "PF", "P1O"]

# Gaia RRLyrae Catalog - 271779 rows
# https://tapvizier.cds.unistra.fr/adql/?%20I/358/vrrlyr
rrlyr_df = pd.read_csv("/sdf/home/b/brantd/gaia_dr3_rrlyrae_period.csv")[shared_cols]
rrlyr_df["provenance"] = "vari_rrlyr"

# Gaia Variable Cepheid Catalog - 15021 rows
# https://tapvizier.cds.unistra.fr/adql/?%20I/358/scalerts%20I/358/alertsms%20I/358/varisum%20I/358/vclassre%20I/358/vcclassd%20I/358/vagn%20I/358/vcep%20I/358/vceph%20I/358/vcc%20I/358/veb%20I/358/veprv%20I/358/vrvstat%20I/358/vlpv%20I/358/vmicro%20I/358/vmsosc%20I/358/vrm%20I/358/vrmo%20I/358/vrms%20I/358/vpltrans%20I/358/vrrlyr%20I/358/vrrlyrh%20I/358/vst
vcep_df = pd.read_csv("/sdf/home/b/brantd/gaia_dr3_vcep.csv")[shared_cols]
vcep_df["provenance"] = "vari_vcep"

# Set gaia source id as the index
vari_df = pd.concat([rrlyr_df, vcep_df]).set_index("Source")
vari_df

### Find Rubin objects in Gaia+VSX

Find the corresponding objects on Rubin and then match them with Gaia:

In [None]:
rubin_x_gaia = rubin_variables.crossmatch(gaia_dr3, suffixes=["_rubin", "_gaia"])

Let's add the periods found in Gaia to this data:

In [None]:
results = rubin_x_gaia.compute().merge(
    vari_df, left_on="source_id_gaia", right_index=True
)

The object matches seem to be relevant; their brightnesses only differ by <0.4 mag. The periods given by Gaia and Rubin for the second object are very close to each other.

In [None]:
results[
    [
        "index_rubin",
        "source_id_gaia",
        "i_psfMag_rubin",
        "phot_rp_mean_mag_gaia",
        "PF",
        "period_rubin",
        "true_period_rubin",
        "provenance",
    ]
]

The result only includes information on 2 objects which happen to be VRRLyr. The remaining ones are eclipsing binaries, which we can check by querying by source identifier on Vizier (`I/358/varisum`). Since we had no period information about them they do not show up. Let's try getting the periods for these eclipsing binaries from the Variable Star Catalog (VSX) instead.

In [None]:
vsx = lsdb.read_hats(
    "https://data.lsdb.io/hats/vsx_2025-03-21/vsx",
    margin_cache="https://data.lsdb.io/hats/vsx_2025-03-21/vsx_10arcs",
    columns=["OID", "Name", "Period"],
)

In [None]:
# Need to investigate further, for some reason "designation_gaia" needs to be of type object for the next join to work
rubin_x_gaia._ddf["designation_gaia"] = rubin_x_gaia._ddf["designation_gaia"].astype(
    object
)
results_eclip_bin = rubin_x_gaia.join(
    vsx, left_on="designation_gaia", right_on="Name", suffixes=("", "_vsx")
).compute()
# Concatenate these periods with the ones we got previously
results = pd.concat([results, results_eclip_bin])

Let's concatenate the results and have a look at them:

In [None]:
results[
    [
        "index_rubin",
        "objectId_rubin",
        "source_id_gaia",
        "Name_vsx",
        "Period_vsx",
        "period_rubin",
        "true_period_rubin",
    ]
]

The periods obtained from VSX are off by about 50% (0.12 - 0.17 margin error). Not sure what happened.