- We can access the raster data directly with `rioxarray`
- We need to apply the calibration factor from the metadata

We can think about what structure is the best between `xarray.DataSet` and `xarray.DataArray`.
- Arrays are more compacts but need to be accessed as `arr.loc[...,"HH"]`. Not optimal in size since diagonal elements of the covariance matrix are also stored as complex.
- Datasets store several arrays which may be accessed as `ds["HH"]`, we can optimize their size by using real numbers for diagonal elements if the covariance matrix. This will deviate from numpy compatibility and require more re-writing but we have to see in the long terms what are the implications.
- Apply multilooking with `coarsen` should work in both cases.
- Boxcar will need to be adapted to run on each variable of a dataset. 

In [None]:
import rioxarray as riox
import xarray as xr
import dask.array as da
from pathlib import Path
import warnings
from rasterio.errors import NotGeoreferencedWarning
data_path = Path("/data/psp/SAN_FRANCISCO_ALOS1")

In [None]:
# read metadata to extract the radiometric calibaration factor
def extract_calibration_factor(data_path):
    from math import sqrt, pow

    file_path = data_path / "ceos_leader.txt"
    with open(file_path, "r") as file:
        for line in file:
            if "Calibration Factor:" in line:
                parts = line.strip().split()
                # Assume the last element is the calibration value
                try:
                    value = float(parts[-1])
                    # dB to linear
                    value = sqrt(pow(10.0, (value - 32.0) / 10.0))
                    return value
                except ValueError("Invalid metadata."):
                    continue
    return None


calfac = extract_calibration_factor(data_path=data_path)

In [None]:
# silence warnings when dataset is in the SAR geometry
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=NotGeoreferencedWarning)
    # open with chunks to use dask arrays
    S = (
        riox.open_rasterio(
            data_path / "VOL-ALPSRP202350750-P1.1__A",
            chunks=(-1, "auto", "auto"),
        )
        # TODO: benchmark with and without transposition
        .transpose("x", "y", "band").rename({"x": "x", "y": "y", "band": "elt"})
    )
# convert digital number to RCS
S *= calfac
# label axes
S.coords["elt"] = ["HH", "HV", "VH", "VV"]
# replace original metadata with descriptive tags
S.attrs = {"poltype": "S", "description": "Scattering matrix"}

In [None]:
# inspect S
S

In [None]:
# convert S to C3

# coordinates of the Hermitian covariance matrix
# we only store the upper part
C3_coords = (S.coords["y"], S.coords["x"], ["11", "12", "13", "22", "23", "33"])

# scattering vector
k_hh = S[:, :, 0]
k_vv = S[:, :, 3]
k_hv = 0.5 * (S[:, :, 1] + S[:, :, 2])

# compute the cross products
C3_elts = []
for i, e1 in enumerate([k_hh, k_hv, k_vv]):
    for j, e2 in enumerate([k_hh, k_hv, k_vv]):
        if j >= i:
            C3_elts.append(e1 * e2.conj())

# stack elements
C3_data = da.stack(C3_elts).T

# make a lazy DataArray
C3 = xr.DataArray(
    data=C3_data.rechunk(("auto", "auto", 6)),
    dims=("y", "x", "elt"),
    coords=C3_coords,
    name="C3",
    attrs={"poltype": "C3", "description": "Covariance matrix"},
)

In [None]:
# inspect C3
C3

When attempting to write to zarr, several warnings appear  
due to some feature not yet being implemented in the specification.

```
UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  return cls(**configuration_parsed)
UserWarning: The dtype `<U2` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  meta = AsyncArray._create_metadata_v3(
UserWarning: The codec `vlen-utf8` is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  return cls(**configuration_parsed)
UserWarning: Consolidated metadata is currently not part in the Zarr format 3 specification. It may not be supported by other zarr implementations and may change in the future.
  warnings.warn(
```

**Update:** setting `zarr_format=2` avoids warnings.

In [None]:
# this returns many warnings, how serious are these?
# C3.to_zarr(data_path / "C3.zarr", mode="w")

# using the previous version works fine, but will it be deprecated?
# C3.to_zarr(data_path / "C3.zarr", mode="w", zarr_format=2)

# try geotiff
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=NotGeoreferencedWarning)
    C3.transpose("elt", "y", "x").rio.to_raster(data_path / "C3.tif")

## Note:
- Data files are slightly heavier than the c version  because 11,22 and 33 are stored as complex values.
- While using datasets could solve this problem, we could loose the ability to easily process the
data as an array.

In [None]:
# multilooking in azimuth
C3_mlt = C3.coarsen(y=8).mean()

In [None]:
# plot the data
abs(C3_mlt.loc[..., "33"]).plot.imshow(interpolation="none", vmax=1)