# Debug Datacube Setup

This notebook is for debugging datacube metadata and connection to local db.

## Python Environment

Follow instructions in [README.rst](../README.rst) to bootstrap an `emit` Python environment and register it with the Jupyter server. To pull in fresh changes run

```bash
./scripts/update-libs.sh
```

## Local Database

Follow [db-setup](../db-setup.md) to bootrstrap local datacube db instance with sample data. If done previously, just make sure that database server is running with [scripts/db.sh](../scripts/db.sh).

```bash
./scripts/db.sh start
```

## Check Environment

In [None]:
import os
import sys

from odc.emit import dump_py_env

if "ODC_EMIT_DB_URL" not in os.environ:
    print("Updating ENV")
    for e in os.environ:
        if e.startswith("DB_"):
            os.environ.pop(e, None)

    os.environ["ODC_EMIT_DB_URL"] = "postgresql:///datacube?host=/tmp"

dump_py_env()

## Connect to local db

In [None]:
import datacube
import odc.emit
import odc.geo
import odc.stac
import sqlalchemy
from datacube import Datacube
from datacube.cfg.api import ODCConfig, ODCEnvironment
from datacube.storage import BandInfo, measurement_paths
from datacube.testutils.io import eo3_geobox
from datacube.utils.serialise import jsonify_document
from IPython.display import JSON, display, Markdown
from odc.geo.geom import unary_union


def band_to_json(bb):
    return jsonify_document({k: getattr(bb, k) for k in bb.__slots__})


dc = Datacube(env="emit")

In [None]:
dc.list_products(with_pandas=False)

In [None]:
list(dc.index.datasets.count_by_product())

## Review EMIT product

In [None]:
p = dc.index.products.get_by_name("emit_l2a")

display(JSON(p.definition))
print("-" * 40)
display(JSON(p.extra_dimensions.dims["wavelength"]))
print("-" * 40)
display(p.load_hints())

In [None]:
dss = dc.find_datasets(
    product="emit_l2a",
    lon=(146, 149),
    lat=(-37, -36),
)
display(len(dss), unary_union([ds.extent.exterior for ds in dss]))

In [None]:
dss = dc.find_datasets(
    product="emit_l2a",
    lon=(146, 149),
    lat=(-37, -36),
    time="2023-10",
)
display(len(dss), unary_union([ds.extent.exterior for ds in dss]))

In [None]:
ds = dss[0]
gbox = eo3_geobox(ds, "elev")
display(
    ds.time,
    ds.crs,
    ds.extent,
    gbox,
    JSON(ds.product.extra_dimensions.dims["wavelength"]),
)

In [None]:
bands = {band: BandInfo(ds, band) for band in ds.product.measurements}
display(JSON({name: band_to_json(bb) for name, bb in bands.items()}))

In [None]:
JSON(ds.metadata_doc)

## Review AVIS product

In [None]:
dss = dc.find_datasets(
    product="av3_l2a",
)
display(len(dss), unary_union([ds.extent.exterior for ds in dss]))

In [None]:
ds = dss[0]
gbox = eo3_geobox(ds, "rfl")
display(
    ds.time,
    ds.crs,
    ds.extent,
    gbox,
    JSON(ds.product.extra_dimensions.dims["wavelength"]),
)

In [None]:
display(
    JSON(ds.metadata_doc),
    Markdown("---------------"),
    ds.product.load_hints(),
    Markdown("---------------"),
    measurement_paths(ds),
)

### Check with dask (no compute)

These won't load and don't handle extra coordinates, but is good for checking sanity of time and spatial metadata present in the db.

#### Ungrouped

Expect to see 8 temporal slices with timestamps on 2023-09-15 UTC.

In [None]:
xx = dc.load(product="av3_l2a", dask_chunks={})
display(xx, xx.odc.geobox)
assert xx.time.shape[0] == 8

#### Group by solar day

Again but group by solar day, expect to see one single time slice.

In [None]:
xx = dc.load(product="av3_l2a", dask_chunks={}, group_by="solar_day")
display(xx, xx.odc.geobox)
assert xx.time.shape[0] == 1

## Misc

In [None]:
#!~/.envs/emit/bin/datacube -E emit system check

#!~/.envs/emit/bin/datacube -E emit product update --allow-unsafe av3.yaml

--------------