# AVIRIS: ENVI -> ZARR

Convert local ENVI files to local zarr files.

In [None]:
import numpy as np
import xarray as xr
import zarr
from odc.av3 import av3_xr_load

In [None]:
# base = "/tmp/av3/AV320230915t213013_L2A_OE_main_98b13fff"
# base = "/tmp/av3/AV320230915t214314_L2A_OE_main_98b13fff"
base = "/tmp/av3/AV320230915t214955_L2A_OE_main_98b13fff"

s3_base = base.replace("/tmp/av3/", "s3://adias-prod-dc-data-projects/odc-hs/av3/")
base, s3_base

## Construct xarray.Dataset from local ENVI files

Code in `av3_xr_load` is using `numpy.memmap` to create unchunked views of the original data without loading all of it into RAM. Natively data is stored in `bil` (band interleaved lines) mode, which is equivalent to `Y,B,X` order. On output we use `Y,X,B` order for dimensions. Geospatial information is extracted using `rasterio` and added to the xarray in a format undesrtood by `odc` libraries and by `rioxarray` library also.

In [None]:
ds = av3_xr_load(base)
display(ds.odc.geobox, ds.odc.transform, ds)

## Save to local zarr store

Writing zarr directly to S3 is also possible, but I was hitting RAM issues with it though (24GB RAM container), probably due to in-place rechunking that was happening.

Cell below can take about 5-10 minutes to run.

In [None]:
%%time
# configure chunking and compression

chunks = (400, -1, 20)  # y,x,b
compressor = zarr.Blosc(cname="zstd", clevel=6, shuffle=1)

for dv in ds.data_vars.values():
    dv.encoding["compressor"] = compressor
    dv.encoding["chunks"] = chunks[: dv.ndim]

ds.to_zarr(
    f"{base}.zarr",
    mode="w",
    consolidated=True,
)

## Upload zarr from /tmp to S3

Remove `--dryrun` to actually upload the data.

In [None]:
!(cd {base}.zarr && aws s3 sync . {s3_base}.zarr/ --dryrun)

## Inspect metadata yaml

We should probably copy some metadata from yaml into xarray attributes, `mean_solar_{az|zn}` looks usefull.

In [None]:
import yaml
from IPython.display import JSON, Markdown

md_doc = yaml.load(open(f"{base}.yaml", "r"), yaml.CSafeLoader)
display(
    Markdown("## Lineage ORT"),
    JSON(md_doc["lineage"]["ORT"]),

    Markdown("## Lineage RDN"),
    JSON(md_doc["lineage"]["RDN"]),
    
    Markdown("## Full Document"),
    JSON(md_doc),
)

---------------------------