# The state of the Almanac

**Author**: Melissa DeLucchi

This is a brief notebook to demonstrate some current functionality (and current gaps) for the hipscat almanac.

The almanac is NOT intended as a replacement for any kind of IVOA registry - it's just a quick way to find out more information about a catalog and its linked catalogs (both quick in terms of run time, and development time).

First, let's set the environment variable for the default almanac directory. This is something we'd need to be able to set on a root-level.

In [10]:
%env HIPSCAT_ALMANAC_DIR=/data3/epyc/data3/hipscat/almanac

env: HIPSCAT_ALMANAC_DIR=/data3/epyc/data3/hipscat/almanac


In [11]:
%%time
from hipscat.catalog import Catalog
import os
import tempfile
import shutil
from hipscat.catalog import AssociationCatalog
from hipscat.loaders import read_from_hipscat

from hipscat.inspection.almanac import Almanac
from hipscat.inspection.almanac_info import AlmanacInfo

CPU times: user 0 ns, sys: 67 µs, total: 67 µs
Wall time: 86.1 µs


Below is a list of the "active" catalogs in the almanac. This suppresses any "deprecated" catalogs.

In [21]:
%%time
almanac = Almanac()
almanac.catalogs(types=['object'])

CPU times: user 82.3 ms, sys: 1.86 ms, total: 84.2 ms
Wall time: 81.9 ms


['allwise',
 'catwise2020',
 'dr16q_constant',
 'gaia',
 'ps1_otmo',
 'tic_1',
 'ztf_dr14',
 'ztf_zource']

By including the deprecated catalogs, we now see the "ztf_source" catalog.

In [13]:
%%time
almanac.catalogs(include_deprecated=True)

CPU times: user 13 µs, sys: 26 µs, total: 39 µs
Wall time: 59.6 µs


['allwise',
 'allwise_10arcs',
 'catwise2020',
 'dr16q_constant',
 'gaia',
 'gaia_10arcs',
 'gaia_source_id_index',
 'macauff_association',
 'neowise_yr8',
 'ps1_10arcs',
 'ps1_detection',
 'ps1_otmo',
 'tic_1',
 'tic_10arcs',
 'ztf_dr14',
 'ztf_dr14_10arcs',
 'ztf_source',
 'ztf_zource',
 'zubercal']

The almanac is a graph of the catalogs, and their linked catalogs (e.g. a margin catalog has a primary catalog).

We need to improve the `__repr__` of the almanac info to display just the relevant parts, and maybe suggest a link to linked catalog info.

In [14]:
%%time
almanac.get_almanac_info("ztf_source")

CPU times: user 8 µs, sys: 17 µs, total: 25 µs
Wall time: 43.9 µs


AlmanacInfo(file_path='', storage_options={}, namespace='', catalog_path='/data3/epyc/data3/hipscat/catalogs/ztf_axs/ztf_source', catalog_name='ztf_source', catalog_type='source', primary='/data3/epyc/data3/hipscat/catalogs/ztf_axs/ztf_dr14', join=None, primary_link=AlmanacInfo(file_path='', storage_options={}, namespace='', catalog_path='/data3/epyc/data3/hipscat/catalogs/ztf_axs/ztf_dr14', catalog_name='ztf_dr14', catalog_type='object', primary=None, join=None, primary_link=None, join_link=None, sources=[...], objects=[], margins=[AlmanacInfo(file_path='', storage_options={}, namespace='', catalog_path='/data3/epyc/data3/hipscat/catalogs/ztf_dr14_10arcs', catalog_name='ztf_dr14_10arcs', catalog_type='margin', primary='/data3/epyc/data3/hipscat/catalogs/ztf_axs/ztf_dr14', join=None, primary_link=..., join_link=None, sources=[], objects=[], margins=[], associations=[], associations_right=[], indexes=[], creators=['Melissa DeLucchi'], description='10 arcsecond margin catalog to ZTX AXS 

If you know where to look, you can find out why this catalog has been deprecated, and maybe even a hint about what to use instead:

In [15]:
%%time
almanac.get_almanac_info("ztf_source").deprecated

CPU times: user 26 µs, sys: 0 ns, total: 26 µs
Wall time: 45.8 µs


'Use `ztf_zource` for significantly better performance'

## Linked catalogs

GAIA has a nice amount of supplemental tables. Let's look more:

In [16]:
%%time
gaia_info = almanac.get_almanac_info("gaia")
gaia_info.indexes[0].catalog_name

CPU times: user 22 µs, sys: 43 µs, total: 65 µs
Wall time: 85.6 µs


'gaia_source_id_index'

In [17]:
%%time
gaia_info.margins[0].catalog_name

CPU times: user 8 µs, sys: 16 µs, total: 24 µs
Wall time: 43.9 µs


'gaia_10arcs'

In [18]:
%%time
gaia_margin = read_from_hipscat(gaia_info.margins[0].catalog_path)
gaia_margin.catalog_info.margin_threshold

CPU times: user 104 ms, sys: 6.78 ms, total: 111 ms
Wall time: 109 ms


10

See a more in-depth discussion of the possibilities for fetching margin data for a primary catalog in [this notebook](https://github.com/lincc-frameworks/notebooks_lf/blob/main/sprints/2024/02_22/almanac_margins.ipynb)