# Debug Datacube Setup

This notebook is for debugging datacube metadata and connection to local db.

## Python Environment

Follow instructions in [README.rst](../README.rst) to bootstrap an `emit` Python environment and register it with the Jupyter server. To pull in fresh changes run

```bash
./scripts/update-libs.sh
```

## Local Database

Follow [db-setup](../db-setup.md) to bootrstrap local datacube db instance with sample data. If done previously, just make sure that database server is running with [scripts/db.sh](../scripts/db.sh).

```bash
./scripts/db.sh start
```

## Check Environment

In [None]:
import os
from odc.emit import dump_py_env

if "ODC_EMIT_DB_URL" not in os.environ:
    print("Updating ENV")
    for e in os.environ:
        if e.startswith("DB_"):
            os.environ.pop(e, None)

    os.environ["ODC_EMIT_DB_URL"] = "postgresql:///datacube?host=/tmp"

dump_py_env()

# Inspect DB

In [None]:
from datacube import Datacube
from datacube.storage import BandInfo, measurement_paths
from datacube.testutils.io import eo3_geobox
from datacube.utils.serialise import jsonify_document
from IPython.display import JSON, Markdown, display
from odc.geo.geom import unary_union


def band_to_json(bb):
    return jsonify_document({k: getattr(bb, k) for k in bb.__slots__})


dc = Datacube(env="emit")

In [None]:
dc.list_products(with_pandas=False)

In [None]:
list(dc.index.datasets.count_by_product())

# Review EMIT product

## Query Datasets

Check that limit by space works.

In [None]:
p = dc.index.products.get_by_name("emit_l2a")

dss = dc.find_datasets(
    product="emit_l2a",
    lon=(146, 149),
    lat=(-37, -36),
)
display(len(dss), unary_union([ds.extent.exterior for ds in dss]))

Check that limit by time and space also works. Expect only 2 datasets to come out.

In [None]:
dss = dc.find_datasets(
    product="emit_l2a",
    lon=(146, 149),
    lat=(-37, -36),
    time="2023-10",
)
display(len(dss), unary_union([ds.extent.exterior for ds in dss]))

## Inspect Dataset

In [None]:
ds = dss[0]
gbox = eo3_geobox(ds, "elev")

bands = {band: BandInfo(ds, band) for band in ds.product.measurements}

display(
    Markdown(f"### Product"),
    Markdown("#### Extra Dims"),
    JSON(ds.product.extra_dimensions.dims["wavelength"]),
    Markdown("#### Load hints"),
    ds.product.load_hints(),
    
    Markdown("### Dataset"),
    Markdown(f"""
| property| value|
|:-------|:-----------------------------|
|product | {ds.product.name} |
|uuid    | {ds.id} |
|t.start | {ds.time.begin.isoformat()} |
|t.end   | {ds.time.end.isoformat()} |
|epsg    | {ds.crs.epsg} |
|resolution| {gbox.resolution.x} |
"""),

    gbox,

    Markdown("#### BandInfo objects"),
    JSON({name: band_to_json(bb) for name, bb in bands.items()}),

    Markdown("#### Metadata Document (full)"),
    JSON(ds.metadata_doc),
    Markdown("#### Measurement Paths"),
    JSON(measurement_paths(ds)),
)

# Review AVIRIS product

## Query Datasets

In [None]:
dss_av3 = dc.find_datasets(
    product="av3_l2a",
)
display(len(dss), unary_union([ds.extent.exterior for ds in dss_av3]))

## Inspect Dataset

In [None]:
ds = dss_av3[0]
gbox = eo3_geobox(ds, "rfl")
bands = {band: BandInfo(ds, band) for band in ds.product.measurements}

display(
    Markdown(f"### Product"),
    Markdown("#### Extra Dims"),
    JSON(ds.product.extra_dimensions.dims["wavelength"]),
    Markdown("#### Load hints"),
    ds.product.load_hints(),
    
    Markdown("### Dataset"),

    Markdown(f"""
| property| value|
|:-------|:-----------------------------|
|granule | {ds.metadata_doc['properties']['av3:granule']} |
|uuid    | {ds.id} |
|t.start | {ds.time.begin.isoformat()} |
|t.end   | {ds.time.end.isoformat()} |
|epsg    | {ds.crs.epsg} |
|resolution| {gbox.resolution.x} |
"""),

    gbox,

    Markdown("#### BandInfo objects"),
    JSON({name: band_to_json(bb) for name, bb in bands.items()}),
    Markdown("#### Metadata Document"),
    JSON(ds.metadata_doc),
    Markdown("#### Measurement Paths"),
    JSON(measurement_paths(ds)),
)

## Check with dask (no compute)

These won't load and don't handle extra coordinates, but is good for checking sanity of time and spatial metadata present in the db.

### Ungrouped

Expect to see 8 temporal slices with timestamps on 2023-09-15 UTC.

In [None]:
xx = dc.load(product="av3_l2a", dask_chunks={})
display(xx, xx.odc.geobox)
assert xx.time.shape[0] == 8

### Group by solar day

Again but group by solar day, expect to see one single time slice.

In [None]:
xx = dc.load(product="av3_l2a", dask_chunks={}, group_by="solar_day")
display(xx)
assert xx.time.shape[0] == 1

# Misc

In [None]:
#!~/.envs/emit/bin/datacube -E emit system check

#!~/.envs/emit/bin/datacube -E emit product update --allow-unsafe av3.yaml

In [None]:
import pickle

if False:
    pickle.dump(dss, open("dss-av3.pkl", "bw"))

--------------