### Alternative approach: use datasets instead of data arrays

- We can access the raster data directly with `rioxarray`
- We need to apply the calibration factor from the metadata

We can think about what structure is the best between `xarray.DataSet` and `xarray.DataArray`.
- Arrays are more compacts but need to be accessed as `arr.loc[...,"HH"]`. Not optimal in size since diagonal elements of the covariance matrix are also stored as complex.
- Datasets store several arrays which may be accessed as `ds["HH"]`, we can optimize their size by using real numbers for diagonal elements if the covariance matrix. This will deviate from numpy compatibility and require more re-writing but we have to see in the long terms what are the implications.
- Apply multilooking with `coarsen` should work in both cases.
- Boxcar will need to be adapted to run on each variable of a dataset. 

In [2]:
import rioxarray as riox
import xarray as xr
import dask.array as da
from pathlib import Path
import warnings
from rasterio.errors import NotGeoreferencedWarning
data_path = Path("/data/psp/SAN_FRANCISCO_ALOS1")

In [3]:
# read metadata to extract the radiometric calibaration factor
def extract_calibration_factor(data_path):
    from math import sqrt, pow

    file_path = data_path / "ceos_leader.txt"
    with open(file_path, "r") as file:
        for line in file:
            if "Calibration Factor:" in line:
                parts = line.strip().split()
                # Assume the last element is the calibration value
                try:
                    value = float(parts[-1])
                    # dB to linear
                    value = sqrt(pow(10.0, (value - 32.0) / 10.0))
                    return value
                except ValueError("Invalid metadata."):
                    continue
    return None


calfac = extract_calibration_factor(data_path=data_path)

In [4]:
# silence warnings when dataset is in the SAR geometry
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=NotGeoreferencedWarning)
    # open with chunks to use dask arrays
    S = (
        riox.open_rasterio(
            data_path / "VOL-ALPSRP202350750-P1.1__A",
            chunks=(-1, "auto", "auto"),
            band_as_variable=True,
        ).rename(band_1="hh", band_2="hv", band_3="vh", band_4="vv")
    )
# convert digital number to RCS
S *= calfac
# set polarimetric data type
S.attrs = {"poltype": "S", "description": "Scattering matrix"}

In [5]:
# inspect S
S

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 4 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 4 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 4 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 4 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 4 graph layers,2 chunks in 4 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray


In [None]:
S.dims

In [6]:
# convert S to C3

# scattering vector
k1 = S.hh
k2 = S.vv
k3 = 0.5 * (S.hv + S.vh)

# compute the Hermitian matrix elements
C3_dict = {}

# force real diagonal to save space
C3_dict["m11"] = (k1*k1.conj()).real
C3_dict["m22"] = (k2*k2.conj()).real
C3_dict["m33"] = (k3*k3.conj()).real

# upper diagonal terms
C3_dict["m12"] = k1*k2.conj()
C3_dict["m13"] = k1*k3.conj()
C3_dict["m23"] = k2*k3.conj()

C3 = xr.Dataset(C3_dict)

In [7]:
C3

Unnamed: 0,Array,Chunk
Bytes,87.75 MiB,64.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 7 graph layers,2 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 87.75 MiB 64.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 7 graph layers Data type float32 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,87.75 MiB,64.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 7 graph layers,2 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,87.75 MiB,64.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 7 graph layers,2 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 87.75 MiB 64.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 7 graph layers Data type float32 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,87.75 MiB,64.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 7 graph layers,2 chunks in 7 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,87.75 MiB,64.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 13 graph layers,2 chunks in 13 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 87.75 MiB 64.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 13 graph layers Data type float32 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,87.75 MiB,64.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 13 graph layers,2 chunks in 13 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 10 graph layers,2 chunks in 10 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 10 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 10 graph layers,2 chunks in 10 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 16 graph layers,2 chunks in 16 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 16 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 16 graph layers,2 chunks in 16 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 16 graph layers,2 chunks in 16 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray
"Array Chunk Bytes 175.50 MiB 128.00 MiB Shape (18432, 1248) (13443, 1248) Dask graph 2 chunks in 16 graph layers Data type complex64 numpy.ndarray",1248  18432,

Unnamed: 0,Array,Chunk
Bytes,175.50 MiB,128.00 MiB
Shape,"(18432, 1248)","(13443, 1248)"
Dask graph,2 chunks in 16 graph layers,2 chunks in 16 graph layers
Data type,complex64 numpy.ndarray,complex64 numpy.ndarray


In [None]:
# this returns many warnings, how serious are these?
# C3.to_zarr(data_path / "C3.zarr", mode="w")

# using the previous version works fine, but will it be deprecated?
C3.to_zarr(data_path / "C3.zarr", mode="w", zarr_format=2)

# try geotiff
# with warnings.catch_warnings():
#     warnings.filterwarnings("ignore", category=NotGeoreferencedWarning)
#     C3.transpose("elt", "y", "x").rio.to_raster(data_path / "C3.tif")

In [None]:
# multilooking in azimuth
C3_mlt = C3.coarsen(y=8).mean()

In [None]:
# plot the data
abs(C3_mlt.m33).plot.imshow(interpolation="none", vmax=0.5)