# Creating a Multi-Layer Data Cube with Metadata in Python

## 1. Recommended Structure: Xarray Dataset
Instead of a simple 3D NumPy array, use an xarray Dataset. This format treats different layers (like slope, elevation, or geophysics) as named "data variables" that share the same spatial dimensions (e.g., x and y).
Vertical Stacking: Variables are stored together in one object but maintained as distinct layers.
Metadata: You can attach a dictionary of attributes (attrs) to each individual layer and to the entire cube.
Named Access: You can access layers via their description (e.g., cube.slope) rather than just a band number.

## 2. Implementation Guide
Using xarray and rioxarray (for geospatial data), follow these steps to build your cube:
Step A: Prepare Individual Layers
Create each layer as an xarray.DataArray. This allows you to assign specific metadata to that layer before stacking.

### Step A: Prepare Individual Layers
Create each layer as an xarray.DataArray. This allows you to assign specific metadata to that layer before stacking.

In [3]:
import xarray as xr
import numpy as np

# Example: Creating a 'slope' layer
slope_data = np.random.rand(100, 100)
slope_layer = xr.DataArray(
    slope_data,
    dims=("y", "x"),
    name="slope",
    attrs={"units": "degrees", "description": "Terrain slope from DEM"}
)

# Example: Creating a 'geophysics' layer
geo_layer = xr.DataArray(
    np.random.rand(100, 100),
    dims=("y", "x"),
    name="magnetic_field",
    attrs={"units": "nT", "source": "2025 Survey"}
)


### Step B: Stack into a Dataset
Combine these layers into a single xarray.Dataset. This effectively "stacks" them vertically.

In [7]:
# Create the data cube
cube = xr.Dataset({
    "slope": slope_layer,
    "geophysics": geo_layer
})

# Add global metadata for the entire cube
cube.attrs["creation_date"] = "2025-12-19"


### Step C: Accessing Data
You can now access layers by their assigned names or descriptions:
- By Name: `cube["slope"]` or `cube.slope`
- View Metadata: `cube.slope.attrs` shows the units and description specific to that layer

## 3. Alternative: Multi-Band DataArray
If you prefer a single 3D array (`shape: [bands, y, x]`), use a DataArray and assign a coordinate to the "band" dimension to store names.

In [6]:
# Stack existing DataArrays along a new 'band' dimension
cube_array = xr.concat([slope_layer, geo_layer], dim="band")

# Assign names to the bands for easy selection
cube_array = cube_array.assign_coords(band=["slope", "geophysics"])

# Access by name
slope_only = cube_array.sel(band="slope")


# Saving and Loading

## 1. Saving to NetCDF
The to_netcdf method saves your entire dataset, including all attributes (metadata) and variable names, into a single binary file.

In [None]:
# Save the entire data cube
cube.to_netcdf("my_datacube_2025.nc")


## 2. Alternative: Saving to Zarr (Cloud-Optimized)
If your data cube is very large or you intend to store it in cloud storage (like AWS S3), use Zarr. It is highly efficient for parallel reading and writing of chunks.

In [None]:
# Save as a Zarr store
cube.to_zarr("my_datacube_2025.zarr")


## 3. Loading

In [None]:
# Reload the cube
loaded_cube = xr.open_dataset("my_datacube_2025.nc")
print(loaded_cube.slope.attrs)  # Access your metadata
