# Fetch SurfaceGrid bulk data from DSIS

This notebook demonstrates how to fetch and decode **SurfaceGrid** binary (protobuf) data from DSIS using the `dsis-client` library.

The following steps are covered:

1. Authenticate to DSIS using an `.env` file with the required configuration and credentials.
2. Construct and execute a query requesting surface grid metadata.
3. Fetch binary bulk data for a specific surface grid using `get_bulk_data()` and `get_bulk_data_stream()`.
4. Decode the protobuf response (LGCStructure) and visualise the result.

For more information about the required content of the `.env` file, please contact the SDD-SID team, or the DSIS team in Equinor.

In [None]:
from dotenv import load_dotenv
import os
import json

from dsis_client import DSISClient, DSISConfig, QueryBuilder, Environment

from dotenv import load_dotenv
import os

### Authenticate and connect to DSIS

We need to specify the name of the model we plan to use, as the `dsis-client` library requires this when building the `DSISConfig` object.

In [56]:
MODEL_NAME = "OpenWorksCommonModel"

Next, we provide all the other required configuration and credentials via the `.env` file. Make sure to modify the `config` logic below if you are, e.g., fetching secrets from a key vault through this notebook.

In [57]:
load_dotenv(".env_dsis")

True

In [58]:
config = DSISConfig(
    environment=Environment.DEV,
    tenant_id=os.getenv("tenant_id"),
    client_id=os.getenv("client_id"),
    client_secret=os.getenv("client_secret"),
    access_app_id=os.getenv("resource_id"),
    dsis_username=os.getenv("dsis_function_key"),
    dsis_password=os.getenv("dsis_password"),
    subscription_key_dsauth=os.getenv("subscription_key_dsauth"),
    subscription_key_dsdata=os.getenv("subscription_key_dsdata"),
    dsis_site=os.getenv("dsis_site"),
)

In [59]:
dsis_client = DSISClient(config)
if dsis_client.test_connection():
    print("✓ Connected to DSIS API")

✓ Connected to DSIS API


### Build and execute a query

Specify the OW database (district), project, and schema.

In [60]:
SCHEMA = "SurfaceGrid"
DISTRICT = "BG4FROST"
PROJECT = "VOLVE_PUBLIC"

In [61]:
# Helper function which might be incorporated in the dsis-client library in the future


def build_district_id(database: str, *, model_name: str) -> str:
    """Build DSIS district_id from database name.

    DSIS uses different district-id conventions for different models.

    Examples:
    - OpenWorksCommonModel: OpenWorksCommonModel_OW_<DB>-OW_<DB>
    - OpenWorks native models (e.g., OW5000): OpenWorks_OW_<DB>_SingleSource-OW_<DB>
    """
    if model_name == "OpenWorksCommonModel":
        return f"OpenWorksCommonModel_OW_{database}-OW_{database}"
    return f"OpenWorks_OW_{database}_SingleSource-OW_{database}"

Build a query specifying the district (OW database), project, and schema (DSIS endpoint exposed in the model selected earlier).

In [62]:
query = QueryBuilder(
    model_name=MODEL_NAME,
    district_id=build_district_id(DISTRICT, model_name=MODEL_NAME),
    project=PROJECT,
).schema(SCHEMA)

Execute the query and inspect the first surface grid returned.

In [63]:
query_results = list(dsis_client.execute_query(query))

In [65]:
print(json.dumps(query_results[0], indent=4))

{
    "odata.mediaReadLink": "SurfaceGrid('2636')/$value",
    "odata.mediaEditLink": "SurfaceGrid('2636')",
    "odata.mediaContentType": "application/octet-stream",
    "data_values@odata.mediaEditLink": "SurfaceGrid('2636')/data_values",
    "data_values@odata.mediaReadLink": "SurfaceGrid('2636')/data_values",
    "data_values@odata.mediaContentType": "application/octet-stream",
    "map_data_set_name": "ihdTHugin13flt3",
    "z_unit_type": "depth measure",
    "fault_set_type": null,
    "rotation_unit": "rad",
    "z_unit": "meters",
    "remark": null,
    "y_max": "6482747.0",
    "x_min": "429588.0",
    "update_user_id": null,
    "xy_unit": "meters",
    "z_domain_qualifier": null,
    "data_domain": "TVDSS",
    "num_rows": 629,
    "y_min": "6475211.0",
    "data_min": -3502.0696,
    "x_max": "440124.0",
    "attribute": "DEPTH",
    "alternate_uid": "{\"geo_name\":\"UNKNOWN\",\"geo_type\":\"SURFACE\",\"map_data_set_name\":\"ihdTHugin13flt3\",\"attribute\":\"DEPTH\",\"data

### Fetch bulk (binary) data

The `dsis-client` provides two methods for downloading binary protobuf data:

- **`get_bulk_data()`** – loads everything at once (best for < 100 MB)
- **`get_bulk_data_stream()`** – streams in chunks (best for > 100 MB)

Use `query.entity(native_uid, data_field=...)` to target a specific entity's binary data, then pass the query to the bulk-data method.

> **Note:** SurfaceGrid uses the `/$value` endpoint, so we pass `data_field="$value"`.

First, we define a helper to decode the LGCStructure protobuf into a 2-D NumPy array.

#### Method 1: `get_bulk_data()` — load everything at once

Best for small to medium datasets (< 100 MB). We already configured `bulk_query` with `query.entity(uid, data_field="$value")` — just pass it to `get_bulk_data()`.

In [70]:
from dsis_model_sdk.protobuf import decode_lgc_structure

# ── Fetch bulk data for all surface grids ────────────────────────────────
surface_data: dict[str, bytes] = {}

for result in query_results:
    uid = result["native_uid"]
    bulk_query = query.entity(uid, data_field="$value")
    binary_data = dsis_client.get_bulk_data(bulk_query, accept="application/octet-stream")

    if binary_data:
        print(f"✓ {uid}: {len(binary_data):,} bytes")
        surface_data[uid] = binary_data
    else:
        print(f"✗ {uid}: no data")

print(f"\nFetched bulk data for {len(surface_data)} / {len(query_results)} surface grids")


# ── Decode all downloaded surface grids ──────────────────────────────────
decoded_surfaces: dict[str, object] = {}

for uid, binary_data in surface_data.items():
    lgc = decode_lgc_structure(binary_data, skip_length_prefix=True)
    decoded_surfaces[uid] = lgc
    print(f"{uid}: {len(lgc.elements)} element(s)")

✓ 2636: 2,773,141 bytes
✓ 2625: 2,557,216 bytes
✓ 2626: 2,557,216 bytes
✓ 2623: 644,970 bytes
✓ 2622: 287,535 bytes
✓ 2637: 287,535 bytes
✓ 2638: 287,535 bytes
✓ 2639: 138,345 bytes

Fetched bulk data for 8 / 8 surface grids
2636: 879 element(s)
2625: 844 element(s)
2626: 844 element(s)
2623: 423 element(s)
2622: 282 element(s)
2637: 282 element(s)
2638: 282 element(s)
2639: 195 element(s)


#### Method 2: `get_bulk_data_stream()` — stream in chunks

Best for large datasets (> 100 MB). Data is yielded in chunks so the full response never has to fit in memory at once.

In [71]:
def stream_surface(uid: str, chunk_size: int = 10 * 1024 * 1024) -> bytes | None:
    """Stream bulk data for a single surface grid and return the reassembled bytes."""
    bulk_query = query.entity(uid, data_field="$value")
    chunks: list[bytes] = []

    for chunk in dsis_client.get_bulk_data_stream(
        bulk_query,
        chunk_size=chunk_size,
        accept="application/octet-stream",
    ):
        chunks.append(chunk)

    if not chunks:
        print(f"✗ {uid}: no data")
        return None

    binary = b"".join(chunks)
    print(f"✓ {uid}: {len(binary):,} bytes ({len(chunks)} chunk(s))")
    return binary


# ── Stream all surface grids ──────────────────────────────────────────────
streamed_surfaces: dict[str, object] = {}

for result in query_results:
    uid = result["native_uid"]
    binary_data = stream_surface(uid)
    if binary_data is not None:
        streamed_surfaces[uid] = decode_lgc_structure(binary_data, skip_length_prefix=True)


# ── Cross-check against non-streamed results ─────────────────────────────
print("\n── Verification ──")
for uid in decoded_surfaces:
    if uid not in streamed_surfaces:
        print(f"  ✗ {uid}: missing from streamed results")
        continue

    orig_elems = len(decoded_surfaces[uid].elements)
    stream_elems = len(streamed_surfaces[uid].elements)
    match = orig_elems == stream_elems
    print(f"  {uid}: element count match: {match} ({orig_elems})")

✓ 2636: 2,773,141 bytes (339 chunk(s))
✓ 2625: 2,557,216 bytes (313 chunk(s))
✓ 2626: 2,557,216 bytes (313 chunk(s))
✓ 2623: 644,970 bytes (79 chunk(s))
✓ 2622: 287,535 bytes (35 chunk(s))
✓ 2637: 287,535 bytes (35 chunk(s))
✓ 2638: 287,535 bytes (35 chunk(s))
✓ 2639: 138,345 bytes (17 chunk(s))

── Verification ──
  2636: element count match: True (879)
  2625: element count match: True (844)
  2626: element count match: True (844)
  2623: element count match: True (423)
  2622: element count match: True (282)
  2637: element count match: True (282)
  2638: element count match: True (282)
  2639: element count match: True (195)
