# Derive Gauge Configuration — ausvic (FloodHubMaribyrnong)

This notebook reproduces every value in the `GAUGES` list from primary sources.
Run it end-to-end to verify or update the gauge configuration.

| Field | Source |
|-------|--------|
| `gauge_id` | Caravan convention: `ausvic_` + station number (no letters) |
| `name` | Victorian Water: Hydstra `get_ts_traces` → `site_details.name`; Melbourne Water: `/locations` API |
| `lat` / `lon` | Victorian Water: Hydstra `get_ts_traces` → `site_details`; Melbourne Water: `/locations` + `/summary` API |
| `area_km2` | HydroBASINS Level-12 `UP_AREA` via GEE (Keilor: official VW figure) |
| Exclusions | CAMELS AUS v2 overlap check (Zenodo 13350616) — CSV required |

**Steps**
1. Candidate discovery + CAMELS AUS v2 overlap
2. Victorian Water metadata — name/lat/lon from Hydstra `get_ts_traces` site_details
3. Melbourne Water metadata — name/lat/lon from `/locations` + `/summary` API
4. Catchment areas — `UP_AREA` from HydroBASINS via GEE
5. Compile final `GAUGES` list

In [63]:
import json
import time
import urllib.parse
import urllib.request
from pathlib import Path

print('Ready.')

Ready.


In [64]:
from google.colab import drive
drive.mount("/content/drive")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Step 1 — Candidate Station Discovery and CAMELS AUS v2 Overlap Check

**1a — Victorian Water candidates**  
Known 230* stations confirmed to carry discharge (141.00 = ML/day).  
Names and coordinates are fetched from the Hydstra API in Step 2.

**1b — Melbourne Water candidates via `/locations` API**  
Call `api.melbournewater.com.au/rainfall-river-level/locations` to get all MW sites.  
Filter to those with a `230` prefix (Maribyrnong basin). Use `/summary` to confirm  
each site has flow data (`flowLevels.minYear` present) and extract any available  
lat/lon from both responses.

**1c — CAMELS AUS v2 overlap**  
Any candidate already in Caravan via CAMELS AUS v2 (Zenodo 13350616) is excluded  
to avoid duplicate gauge IDs across the global dataset.

> **CAMELS CSV required.**  
> Download `CAMELS_AUS_Attributes&Indices_MasterTable.csv` from  
> [Zenodo record 13350616](https://doi.org/10.5281/zenodo.13350616) and place it in  
> `MyDrive/Colab Notebooks/`. The cell raises a `FileNotFoundError` if it is missing.

In [65]:
# ── Step 1a / Step 2 combined: Victorian Water metadata (Hydstra API) ─────────
# Station IDs are the known 230* gauges confirmed to carry discharge (141.00 ML/day).
# get_site_list is not exposed on this public endpoint — name/lat/lon come from
# the site_details object embedded in every get_ts_traces response.

import json as _json, urllib.parse as _up, urllib.request as _ur, time as _time

HYDSTRA_BASE = 'https://data.water.vic.gov.au/cgi/webservice.exe'
VW_STATIONS  = ['230200', '230206', '230202', '230213', '230227']

vw_meta = {}

print(f'  {"Station":<10} {"Name":<42} {"Lat":>12} {"Lon":>13}')
print('  ' + '-' * 82)

for sid in VW_STATIONS:
    params = {
        'function':   'get_ts_traces',
        'version':    '2',
        'site_list':  sid,
        'datasource': 'PUBLISH',
        'varfrom':    '100.00',
        'varto':      '141.00',
        'start_time': '20240601000000',
        'end_time':   '20240601235959',
        'interval':   'day',
        'multiplier': '1',
        'data_type':  'mean',
    }
    url = HYDSTRA_BASE + '?' + _up.urlencode(params)
    with _ur.urlopen(url, timeout=30) as resp:
        data = _json.loads(resp.read().decode())

    traces = data.get('return', {}).get('traces', [])
    if not traces:
        raise RuntimeError(f'No trace returned for Hydstra station {sid}')

    sd   = traces[0].get('site_details', {})
    name = sd.get('name', '').strip()
    lat  = float(sd.get('latitude', 0))
    lon  = float(sd.get('longitude', 0))

    if not name or lat == 0 or lon == 0:
        raise ValueError(f'Incomplete site_details for {sid}: {sd}')

    vw_meta[sid] = {'name': name, 'lat': lat, 'lon': lon}
    print(f'  {sid:<10} {name:<42} {lat:>12.6f} {lon:>13.6f}')
    _time.sleep(0.3)

print(f'\n{len(vw_meta)} Victorian Water gauges fetched from Hydstra API')

  Station    Name                                                Lat           Lon
  ----------------------------------------------------------------------------------
  230200     MARIBYRNONG RIVER @ KEILOR                   -37.727706    144.836476
  230206     JACKSON CREEK @ GISBORNE                     -37.475370    144.572443
  230202     JACKSON CREEK @ SUNBURY                      -37.583217    144.742036
  230213     TURRITABLE CREEK @ MOUNT MACEDON             -37.418905    144.584810
  230227     MAIN CREEK @ KERRIE                          -37.396121    144.660395

5 Victorian Water gauges fetched from Hydstra API


In [66]:
# ── Step 1b: Discover Melbourne Water candidates via /locations API ────────────
MELBWATER_BASE = 'https://api.melbournewater.com.au/rainfall-river-level'

MW_HEADERS = {
    'User-Agent': 'Mozilla/5.0',
    'Accept':     'application/json',
    'Origin':     'https://www.melbournewater.com.au',
    'Referer':    'https://www.melbournewater.com.au/',
}

# Known agency duplicates — same physical gauge as a Hydstra station, shorter record.
MW_AGENCY_DUPLICATES = {
    '230104A': 'agency duplicate of Hydstra 230202',
    '230105A': 'agency duplicate of Hydstra 230200',
}

# Gauges with flow data that are explicitly excluded for other reasons.
# These pass the automated filters but are unsuitable for Caravan.
MW_KNOWN_EXCLUSIONS = {
    '230236A': ('minimal flow only — 2.3 ML/day during Oct 2022 major flood peak; '
                'catchment area unknown; insufficient for hydrological characterisation'),
}

def check_flow_direct(sid):
    """
    Fall back to a direct flow request when /summary doesn't expose flowLevels.
    Uses the Oct 2022 Maribyrnong flood window (10–20 Oct 2022).
    MW API returns flow data under the key 'dailyRiverFlowsData'.
    Returns (has_flow: bool, min_year_label: str | None).
    """
    test_url = (f'{MELBWATER_BASE}/{sid}/river-flow/daily/range'
                f'?fromDate=2022-10-10&toDate=2022-10-20')
    req = urllib.request.Request(test_url, headers=MW_HEADERS)
    try:
        with urllib.request.urlopen(req, timeout=15) as resp:
            records = json.loads(resp.read().decode())
        items = records.get('dailyRiverFlowsData', [])
        if items:
            return True, '(Oct 2022 flood)'
    except Exception as e:
        print(f'    [debug {sid}] error: {e}')
    return False, None

# 1. Get the full locations list
req = urllib.request.Request(f'{MELBWATER_BASE}/locations', headers=MW_HEADERS)
with urllib.request.urlopen(req, timeout=30) as resp:
    all_locations = json.loads(resp.read().decode()).get("siteLocations", [])

print(f'Melbourne Water /locations returned {len(all_locations)} total sites')

maribyrnong_sites = [
    loc for loc in all_locations
    if str(loc.get('siteId', '')).startswith('230')
]
print(f'  {len(maribyrnong_sites)} sites with prefix 230')
print(f'\nSample location object fields: {sorted(maribyrnong_sites[0].keys()) if maribyrnong_sites else "none"}')
print()

# 2. Process each site — filter out non-streamflow gauges, check for flow data
mw_candidates = {}

print(f'  {"Site ID":<12} {"Name":<40} {"Status":<30} {"Min year":>10} {"Lat":>10} {"Lon":>11}')
print('  ' + '-' * 118)

for loc in maribyrnong_sites:
    sid  = str(loc.get('siteId', '')).strip()
    name = (loc.get('siteName') or loc.get('name') or loc.get('description') or sid).strip()

    # Filter 1: reservoirs — not a streamflow gauge
    if loc.get('reservoirRecording'):
        print(f'  {sid:<12} {name:<40} SKIP — reservoir')
        continue

    # Filter 2: agency duplicates — same physical gauge as a Hydstra station
    if sid in MW_AGENCY_DUPLICATES:
        print(f'  {sid:<12} {name:<40} SKIP — {MW_AGENCY_DUPLICATES[sid]}')
        continue

    # Filter 3: explicitly excluded gauges (flow present but unsuitable for Caravan)
    if sid in MW_KNOWN_EXCLUSIONS:
        print(f'  {sid:<12} {name:<40} SKIP — {MW_KNOWN_EXCLUSIONS[sid]}')
        continue

    # Call /summary for flow info and coordinates
    summary_url = f'{MELBWATER_BASE}/{sid}/summary'
    req = urllib.request.Request(summary_url, headers=MW_HEADERS)
    try:
        with urllib.request.urlopen(req, timeout=15) as resp:
            summary = json.loads(resp.read().decode())
        flow     = summary.get('flowLevels', {})
        min_yr   = flow.get('minYear')
        has_flow = min_yr is not None
        lat = (summary.get('latitude') or summary.get('lat') or
               loc.get('latitude') or loc.get('lat'))
        lon = (summary.get('longitude') or summary.get('lon') or
               loc.get('longitude') or loc.get('lng') or loc.get('lon'))
    except Exception:
        has_flow = False
        min_yr   = None
        lat = lon = None

    # Filter 4: very short records (started 2025 or later — insufficient for Caravan)
    if min_yr is not None and int(min_yr) >= 2025:
        print(f'  {sid:<12} {name:<40} SKIP — record from {min_yr} only')
        continue

    # Stage b: direct flood-date check for gauges /summary misses
    # MW API flow data is under key 'dailyRiverFlowsData' (confirmed from debug output)
    if not has_flow:
        has_flow, min_yr = check_flow_direct(sid)

    mw_candidates[sid] = {
        'name': name, 'has_flow': has_flow, 'min_year': min_yr,
        'lat': float(lat) if lat is not None else None,
        'lon': float(lon) if lon is not None else None,
    }
    status = 'YES' if has_flow else 'no flow'
    lat_s  = f'{float(lat):.4f}' if lat is not None else 'N/A'
    lon_s  = f'{float(lon):.4f}' if lon is not None else 'N/A'
    print(f'  {sid:<12} {name:<40} {status:<30} {str(min_yr or ""):>10} {lat_s:>10} {lon_s:>11}')
    time.sleep(0.3)

mw_with_flow = {sid: v for sid, v in mw_candidates.items() if v['has_flow']}
print(f'\n{len(mw_with_flow)} Melbourne Water sites with flow records')

Melbourne Water /locations returned 236 total sites
  16 sites with prefix 230


  Site ID      Name                                     Status                           Min year        Lat         Lon
  ----------------------------------------------------------------------------------------------------------------------
  230100A      Darraweit                                YES                                  1996   -37.4103    144.9023
  230102A      Bulla North                              YES                            (Oct 2022 flood)   -37.6314    144.8010
  230103A      Rosslynne Reservoir                      SKIP — reservoir
  230104A      Sunbury                                  SKIP — agency duplicate of Hydstra 230202
  230105A      Keilor                                   SKIP — agency duplicate of Hydstra 230200
  230106A      Maribyrnong                              YES                                  1996   -37.7659    144.8950
  230107A      Konagaderra             

In [67]:
# ── Step 1c: Combine all candidates then CAMELS AUS v2 overlap check ──────────
import pandas as pd

ALL_CANDIDATES = {
    **{sid: v['name'] for sid, v in vw_meta.items()},
    **{sid: v['name'] for sid, v in mw_with_flow.items()},
}
print(f'Total candidates (VW + MW, with discharge): {len(ALL_CANDIDATES)}')
for sid, name in sorted(ALL_CANDIDATES.items()):
    print(f'  {sid:<12} {name}')

# CAMELS AUS v2 overlap check — CSV required (download from Zenodo record 13350616)
CAMELS_CSV = Path('/content/drive/MyDrive/Colab Notebooks/CAMELS_AUS_Attributes&Indices_MasterTable.csv')

if not CAMELS_CSV.exists():
    raise FileNotFoundError(
        f'CAMELS CSV not found at {CAMELS_CSV}\n'
        'Download from https://doi.org/10.5281/zenodo.13350616 and place in '
        'MyDrive/Colab Notebooks/'
    )

camels     = pd.read_csv(CAMELS_CSV, dtype=str)
camels_ids = set(camels['station_id'].str.strip())
print(f'\nCAMELS AUS v2 loaded — {len(camels_ids)} stations\n')

EXCLUDED = set()
print(f'  {"Station":<12} {"Name":<45} Status')
print('  ' + '-' * 70)
for sid, name in sorted(ALL_CANDIDATES.items()):
    camels_sid = sid.rstrip('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    if camels_sid in camels_ids:
        EXCLUDED.add(sid)
        status = 'DUPLICATE -> EXCLUDED'
    else:
        status = 'OK'
    print(f'  {sid:<12} {name:<45} {status}')

INCLUDED = {sid: name for sid, name in ALL_CANDIDATES.items() if sid not in EXCLUDED}
print(f'\nResult: {len(INCLUDED)} included, {len(EXCLUDED)} excluded: {sorted(EXCLUDED)}')

Total candidates (VW + MW, with discharge): 12
  230100A      Darraweit
  230102A      Bulla North
  230106A      Maribyrnong
  230107A      Konagaderra
  230119A      Lancefield
  230200       MARIBYRNONG RIVER @ KEILOR
  230202       JACKSON CREEK @ SUNBURY
  230206       JACKSON CREEK @ GISBORNE
  230211A      Clarkefield
  230213       TURRITABLE CREEK @ MOUNT MACEDON
  230227       MAIN CREEK @ KERRIE
  230237A      Keilor North

CAMELS AUS v2 loaded — 561 stations

  Station      Name                                          Status
  ----------------------------------------------------------------------
  230100A      Darraweit                                     OK
  230102A      Bulla North                                   OK
  230106A      Maribyrnong                                   OK
  230107A      Konagaderra                                   OK
  230119A      Lancefield                                    OK
  230200       MARIBYRNONG RIVER @ KEILOR                    OK

## Step 3 — Melbourne Water Gauge Metadata

Coordinates and names are extracted from the `/locations` and `/summary` API  
responses fetched in Step 1b. The "Sample location object fields" output in Step 1b  
shows every field the API exposes — if coordinates are not found, Step 3 will raise  
a `ValueError` identifying the affected gauges.

In [68]:
# ── Step 3: Melbourne Water metadata from /locations + /summary ───────────────
# Coordinates are extracted in Step 1b from the /summary and /locations API
# responses. No hardcoded fallback — if the API doesn't return coords for a
# gauge in our included set the cell will raise an error.

loc_lookup = {str(loc.get('siteId', '')).strip(): loc for loc in all_locations}

print(f'  {"Site ID":<12} {"Name":<40} {"Min year":>10} {"Lat":>12} {"Lon":>13} {"Coord source"}')
print('  ' + '-' * 102)

mw_meta = {}
missing_coords = []

for sid, v in sorted(mw_with_flow.items()):
    if sid in EXCLUDED:
        continue

    name = v['name']
    lat  = v.get('lat')
    lon  = v.get('lon')
    coord_src = 'API' if lat is not None else 'MISSING'

    if lat is None or lon is None:
        missing_coords.append(sid)

    mw_meta[sid] = {'name': name, 'lat': lat, 'lon': lon, 'min_year': v['min_year']}
    lat_s = f'{lat:.6f}' if lat is not None else 'MISSING'
    lon_s = f'{lon:.6f}' if lon is not None else 'MISSING'
    print(f'  {sid:<12} {name:<40} {str(v["min_year"]):>10} {lat_s:>12} {lon_s:>13}  {coord_src}')

print(f'\n{len(mw_meta)} Melbourne Water gauges ready')

if missing_coords:
    raise ValueError(
        f'Coordinates missing for {missing_coords}.\n'
        'The Melbourne Water API did not return lat/lon for these gauges.\n'
        'Check the "Sample location object fields" output in Step 1b to find\n'
        'the correct field names, then update the lat/lon extraction in that cell.'
    )

  Site ID      Name                                       Min year          Lat           Lon Coord source
  ------------------------------------------------------------------------------------------------------
  230100A      Darraweit                                      1996   -37.410313    144.902285  API
  230102A      Bulla North                              (Oct 2022 flood)   -37.631400    144.801000  API
  230106A      Maribyrnong                                    1996   -37.765900    144.895000  API
  230107A      Konagaderra                                    1996   -37.528500    144.856000  API
  230119A      Lancefield                               (Oct 2022 flood)   -37.286090    144.777750  API
  230211A      Clarkefield                                    2008   -37.466200    144.744000  API
  230237A      Keilor North                             (Oct 2022 flood)   -37.677800    144.805000  API

7 Melbourne Water gauges ready


## Step 4 — Catchment Areas from MERIT Hydro (GEE)

For each gauge, **MERIT Hydro** (`MERIT/Hydro/v1_0_1`, `upa` band) provides the
upstream drainage area in km² at **90 m resolution** — far finer than HydroBASINS
Level-12 (~100 km² average sub-basin size).

**Why not HydroBASINS Level-12 UP_AREA?**

Three lower-mainstem gauges (Keilor North 230237A, Keilor 230200, Chifley Drive
230106A) all fall within the same HydroBASINS Level-12 cell (HYBAS_ID 5120612070,
UP_AREA = 1413.6 km²). That assigns every one of them the same area — wrong for
all but the outlet. MERIT Hydro resolves this at 90 m.

**Sampling approach:** Taking the *maximum* `upa` within a 1 km buffer reliably
captures the channel pixel even when gauge coordinates are slightly off the MERIT
river network. For Keilor (230200) the official Victorian Water figure (1305.4 km²,
based on 586 gaugings 1908–2025) is used in preference to any GEE-derived value.

**Note for small headwater catchments:** For gauges with catchments < ~200 km²
(Mt Macedon, Kerrie, etc.) MERIT sometimes returns low values because the 90 m
river network may not have a well-defined channel pixel within 1 km of the gauge.
Those areas are flagged below — treat with caution and cross-check against
HydroBASINS if they look unreasonably small.

In [69]:
import ee
ee.Authenticate()
ee.Initialize(project='floodhubmaribyrnong')

In [None]:
# ── Step 4: MERIT Hydro upstream area for all included gauges ─────────────────
# MERIT Hydro (Yamazaki et al. 2019, Water Resources Research)
# GEE collection: MERIT/Hydro/v1_0_1  |  upa band: upstream drainage area in km2
#
# Keilor (230200) uses the official Victorian Water area (1305.4 km2) in
# preference to any GEE-derived value — it is based on 586 gaugings 1908–2025.

KEILOR_OFFICIAL_AREA_KM2 = 1305.4
MERIT_BUFFER_M           = 1000      # 1 km — catches channel pixel even if coords slightly off
MERIT_LOW_WARNING_KM2    = 50        # flag anything below this for manual review

def get_merit_upa(lat, lon, buffer_m=MERIT_BUFFER_M):
    """
    Return the maximum MERIT Hydro upstream drainage area (km2) within
    buffer_m metres of the gauge point.
    """
    merit  = ee.Image('MERIT/Hydro/v1_0_1').select('upa')
    point  = ee.Geometry.Point([lon, lat])
    result = merit.reduceRegion(
        reducer   = ee.Reducer.max(),
        geometry  = point.buffer(buffer_m),
        scale     = 90,
        maxPixels = 1e6,
    )
    return float(result.get('upa').getInfo())

# Build gauge metadata lists from API-derived results
# (vw_meta from Step 2, mw_with_flow from Step 1b, mw_meta from Step 3)
GAUGE_META = (
    [(sid, 'ausvic_' + sid, 'hydstra')
     for sid in vw_meta] +
    [(sid, 'ausvic_' + sid.rstrip('ABCDEFGHIJKLMNOPQRSTUVWXYZ'), 'melbwater')
     for sid in mw_with_flow]
)

print(f'{len(GAUGE_META)} gauges in GAUGE_META ({len(vw_meta)} VW + {len(mw_with_flow)} MW)')
for sid, gid, src in GAUGE_META:
    print(f'  {sid:<12} -> {gid}  ({src})')

ALL_GAUGE_COORDS = (
    [(sid, gid, vw_meta[sid]['lat'], vw_meta[sid]['lon'])
     for sid, gid, src in GAUGE_META if src == 'hydstra' and sid in vw_meta] +
    [(sid, gid, mw_meta[sid]['lat'], mw_meta[sid]['lon'])
     for sid, gid, src in GAUGE_META if src == 'melbwater' and sid in mw_meta]
)

up_areas = {}
warnings = []

print(f'\n  {"Station":<10} {"Gauge ID":<20} {"MERIT upa km2":>14} {"area_km2 used":>14}  note')
print('  ' + '-' * 72)

for sid, gid, lat, lon in ALL_GAUGE_COORDS:
    merit_area = get_merit_upa(lat, lon)

    if sid == '230200':
        area_used = KEILOR_OFFICIAL_AREA_KM2
        note      = f'official VW (MERIT: {merit_area:.1f})'
    else:
        area_used = round(merit_area, 1)
        if merit_area < MERIT_LOW_WARNING_KM2:
            note = f'*** LOW — review (headwater or off-network?)'
            warnings.append(f'{gid}: MERIT {merit_area:.1f} km2 — may be unreliable')
        else:
            note = ''

    up_areas[gid] = area_used
    print(f'  {sid:<10} {gid:<20} {merit_area:>14.1f} {area_used:>14.1f}  {note}')

print(f'\nAreas fetched for {len(up_areas)} gauges.')
if warnings:
    print(f'\n{len(warnings)} area(s) flagged for review:')
    for w in warnings:
        print(f'  {w}')

In [None]:
# ── Sanity check: lower-mainstem area ordering ───────────────────────────────
# Keilor North (230237A) is upstream of Keilor (230200) which is upstream of
# Chifley Drive (230106A).  Areas must strictly increase downstream.

kn = up_areas.get('ausvic_230237')
ke = up_areas.get('ausvic_230200')
cd = up_areas.get('ausvic_230106')

print('Lower-mainstem area ordering check')
print(f'  Keilor North  (230237A): {kn} km2')
print(f'  Keilor        (230200) : {ke} km2  [official VW]')
print(f'  Chifley Drive (230106A): {cd} km2')
print()

errors = []
if kn and ke and kn >= ke:
    errors.append(f'Keilor North ({kn}) >= Keilor ({ke}) — should be smaller (upstream)')
if cd and ke and cd <= ke:
    errors.append(f'Chifley Drive ({cd}) <= Keilor ({ke}) — should be larger (downstream)')

if errors:
    print('ORDERING CHECK FAILED:')
    for e in errors:
        print(f'  ERROR: {e}')
    print()
    print('Check MERIT Hydro results above and verify gauge coordinates.')
else:
    print(f'  {kn} < {ke} < {cd}  [correct order]')
    print('  Ordering check passed.')

### Area Correction Summary — Lower-Mainstem Gauges

Before this fix all three lower-mainstem gauges shared HydroBASINS Level-12 cell
HYBAS_ID 5120612070 (UP_AREA = 1413.6 km²). MERIT Hydro resolves them individually
at 90 m resolution.

Streamflow conversion is `ML/day ÷ area_km²`, so a smaller area → larger mm/day.
Keilor North (230237A) was being **underestimated by ~10%** before the correction.

In [None]:
# ── Area correction summary: HydroBASINS (before) vs MERIT Hydro (after) ──────
# Before this fix all three gauges used the same HydroBASINS Level-12 UP_AREA.
# After: MERIT Hydro (90 m) for 230237A and 230106A; official VW kept for 230200.

HYDROBASINS_OLD = {
    'ausvic_230237': 1413.6,   # old shared HydroBASINS cell value
    'ausvic_230200': 1413.6,   # old shared HydroBASINS cell value
    'ausvic_230106': 1413.6,   # old shared HydroBASINS cell value
}

AFFECTED = [
    ('ausvic_230237', 'Maribyrnong River at Keilor North', 'MERIT Hydro 90 m'),
    ('ausvic_230200', 'Maribyrnong River at Keilor',       'Official VW 1305.4 km²'),
    ('ausvic_230106', 'Maribyrnong River at Chifley Drive','MERIT Hydro 90 m'),
]

print('Area correction — lower-mainstem gauges')
print(f'  {"gauge_id":<20} {"name":<42} {"old km²":>9} {"new km²":>9} {"Δ%":>7}  source')
print('  ' + '-' * 100)
for gid, name, source in AFFECTED:
    old = HYDROBASINS_OLD[gid]
    new = up_areas[gid]
    pct = (new - old) / old * 100
    flag = '  ← CORRECTED' if abs(pct) > 1 else '  (unchanged)'
    print(f'  {gid:<20} {name:<42} {old:>9.1f} {new:>9.1f} {pct:>+7.1f}%  {source}{flag}')

print()
print('Impact on streamflow (mm/day = ML/day ÷ area_km²):')
for gid, name, _ in AFFECTED:
    old = HYDROBASINS_OLD[gid]
    new = up_areas[gid]
    if abs(old - new) > 0.05:
        ratio = old / new   # old_area/new_area == new_mmday/old_mmday
        print(f'  {gid}: values now {ratio:.3f}× higher — was underestimated by {(ratio-1)*100:.1f}%')
    else:
        print(f'  {gid}: no change')


## Step 5 â€” Compile Final GAUGES List

Combines all sources into the final `GAUGES` configuration.
The output matches `gauges_config.py` in the project repository.

In [71]:
# ── Compile GAUGES from all derived sources ────────────────────────────────────
# GAUGE_META is defined in Step 4. Name/lat/lon come from vw_meta (Step 2) and
# mw_meta (Step 3). area_km2 comes from up_areas (Step 4).

# Canonical name overrides.
# MW API returns short place names (not full gauge names) — all 7 MW stations need overrides.
# VW (Hydstra) API returns UPPERCASE names with "@" — override to proper-case "at" format.
NAME_OVERRIDES = {
    # Melbourne Water API overrides (place names → full gauge names)
    '230100A': 'Deep Creek at Darraweit Guim',       # API: "Darraweit"
    '230102A': 'Deep Creek at Bulla',                # API: "Bulla North"
    '230106A': 'Maribyrnong River at Chifley Drive', # API: "Maribyrnong"
    '230107A': 'Konagaderra Creek at Konagaderra',   # API: "Konagaderra"
    '230119A': 'Maribyrnong River at Lancefield',    # API: "Lancefield"
    '230211A': 'Bolinda Creek at Clarkefield',       # API: "Clarkefield"
    '230237A': 'Maribyrnong River at Keilor North',  # API: "Keilor North"
    # Victorian Water / Hydstra API overrides (uppercase @ → proper case at)
    '230200':  'Maribyrnong River at Keilor',        # API: "MARIBYRNONG RIVER @ KEILOR"
    '230206':  'Jackson Creek at Gisborne',          # API: "JACKSON CREEK @ GISBORNE"
    '230202':  'Jackson Creek at Sunbury',           # API: "JACKSON CREEK @ SUNBURY"
    '230213':  'Turritable Creek at Mount Macedon',  # API: "TURRITABLE CREEK @ MOUNT MACEDON"
    '230227':  'Main Creek at Kerrie',               # API: "MAIN CREEK @ KERRIE"
}

GAUGES = []
for sid, gid, source in GAUGE_META:
    meta = vw_meta.get(sid, {}) if source == 'hydstra' else mw_meta.get(sid, {})

    api_name = meta.get('name', f'Station {sid}')
    name     = NAME_OVERRIDES.get(sid, api_name)
    if name != api_name:
        print(f'  Name override for {sid}: {api_name!r} -> {name!r}')

    GAUGES.append({
        'gauge_id':  gid,
        'name':      name,
        'lat':       meta.get('lat'),
        'lon':       meta.get('lon'),
        'area_km2':  up_areas.get(gid),
    })

ORDER = [
    'ausvic_230119',                                        # upper Maribyrnong mainstem
    'ausvic_230100', 'ausvic_230102', 'ausvic_230211', 'ausvic_230107',
    'ausvic_230237', 'ausvic_230106',                       # mid/lower mainstem (MW)
    'ausvic_230200',                                        # Keilor (VW long record)
    'ausvic_230206', 'ausvic_230202', 'ausvic_230213', 'ausvic_230227',  # tributaries (VW)
]
GAUGES.sort(key=lambda g: ORDER.index(g['gauge_id']) if g['gauge_id'] in ORDER else 99)

print(f'{len(GAUGES)} gauges compiled\n')
print(f'  {"gauge_id":<20} {"name":<42} {"lat":>12} {"lon":>13} {"area_km2":>10}')
print('  ' + '-' * 103)
for g in GAUGES:
    print(f"  {g['gauge_id']:<20} {g['name']:<42} {g['lat']:>12.6f} {g['lon']:>13.6f} {g['area_km2']:>10.1f}")

  Name override for 230200: 'MARIBYRNONG RIVER @ KEILOR' -> 'Maribyrnong River at Keilor'
  Name override for 230206: 'JACKSON CREEK @ GISBORNE' -> 'Jackson Creek at Gisborne'
  Name override for 230202: 'JACKSON CREEK @ SUNBURY' -> 'Jackson Creek at Sunbury'
  Name override for 230213: 'TURRITABLE CREEK @ MOUNT MACEDON' -> 'Turritable Creek at Mount Macedon'
  Name override for 230227: 'MAIN CREEK @ KERRIE' -> 'Main Creek at Kerrie'
  Name override for 230100A: 'Darraweit' -> 'Deep Creek at Darraweit Guim'
  Name override for 230102A: 'Bulla North' -> 'Deep Creek at Bulla'
  Name override for 230106A: 'Maribyrnong' -> 'Maribyrnong River at Chifley Drive'
  Name override for 230107A: 'Konagaderra' -> 'Konagaderra Creek at Konagaderra'
  Name override for 230119A: 'Lancefield' -> 'Maribyrnong River at Lancefield'
  Name override for 230211A: 'Clarkefield' -> 'Bolinda Creek at Clarkefield'
  Name override for 230237A: 'Keilor North' -> 'Maribyrnong River at Keilor North'
12 gauges compil

In [72]:
# â”€â”€ Print as Python dict literal (copy into gauges_config.py if updated) â”€â”€â”€â”€â”€â”€
print('GAUGES = [')
for g in GAUGES:
    print(f"    {{'gauge_id': {repr(g['gauge_id']):<22} 'name': {repr(g['name']):<46} "
          f"'lat': {g['lat']:<16} 'lon': {g['lon']:<16} 'area_km2': {g['area_km2']}}},")
print(']')

GAUGES = [
    {'gauge_id': 'ausvic_230119'        'name': 'Maribyrnong River at Lancefield'              'lat': -37.28609024     'lon': 144.7777498      'area_km2': 349.2},
    {'gauge_id': 'ausvic_230100'        'name': 'Deep Creek at Darraweit Guim'                 'lat': -37.41031306     'lon': 144.9022845      'area_km2': 682.7},
    {'gauge_id': 'ausvic_230102'        'name': 'Deep Creek at Bulla'                          'lat': -37.6314         'lon': 144.801          'area_km2': 876.1},
    {'gauge_id': 'ausvic_230211'        'name': 'Bolinda Creek at Clarkefield'                 'lat': -37.4662         'lon': 144.744          'area_km2': 177.9},
    {'gauge_id': 'ausvic_230107'        'name': 'Konagaderra Creek at Konagaderra'             'lat': -37.5285         'lon': 144.856          'area_km2': 682.7},
    {'gauge_id': 'ausvic_230237'        'name': 'Maribyrnong River at Keilor North'            'lat': -37.6778         'lon': 144.805          'area_km2': 1413.6},
    {'gaug

In [73]:
# â”€â”€ Validation â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
errors = []

for g in GAUGES:
    gid = g['gauge_id']
    if len(gid.split('_')) != 2:
        errors.append(f"{gid}: gauge_id must have exactly 2 parts")
    if not gid.startswith('ausvic_'):
        errors.append(f"{gid}: must start with 'ausvic_'")
    if g['lat'] is None or g['lon'] is None:
        errors.append(f"{gid}: missing lat/lon")
    if g['area_km2'] is None or g['area_km2'] <= 0:
        errors.append(f"{gid}: invalid area_km2")
    if not (-90 <= g['lat'] <= 90 and 100 <= g['lon'] <= 160):
        errors.append(f"{gid}: coordinates outside Victoria bounds")

if errors:
    print('ERRORS:')
    for e in errors:
        print(f'  {e}')
else:
    print(f'All {len(GAUGES)} gauges passed validation.')
    print('  - gauge_id format: OK (ausvic_XXXXXX, 2 parts)')
    print('  - lat/lon present: OK')
    print('  - area_km2 > 0:    OK')
    print('  - coords in VIC:   OK')

All 12 gauges passed validation.
  - gauge_id format: OK (ausvic_XXXXXX, 2 parts)
  - lat/lon present: OK
  - area_km2 > 0:    OK
  - coords in VIC:   OK


In [74]:
# ── Save GAUGES to Google Drive as JSON ──────────────────────────────────────
# Downstream notebooks (0b-fetch_catchments, etc.) load from this file
# instead of hardcoding the gauge list — single source of truth.

GAUGES_JSON = Path('/content/drive/MyDrive/caravan_maribyrnong_gee/gauges_ausvic.json')
GAUGES_JSON.parent.mkdir(parents=True, exist_ok=True)

with open(GAUGES_JSON, 'w') as f:
    json.dump(GAUGES, f, indent=2)

print(f'GAUGES saved: {GAUGES_JSON}')
print(f'  {len(GAUGES)} gauges, fields: {list(GAUGES[0].keys())}')
print()

# Print the actual file content to verify what was saved
with open(GAUGES_JSON) as f:
    print(f.read())

GAUGES saved: /content/drive/MyDrive/caravan_maribyrnong_gee/gauges_ausvic.json
  12 gauges, fields: ['gauge_id', 'name', 'lat', 'lon', 'area_km2']

[
  {
    "gauge_id": "ausvic_230119",
    "name": "Maribyrnong River at Lancefield",
    "lat": -37.28609024,
    "lon": 144.7777498,
    "area_km2": 349.2
  },
  {
    "gauge_id": "ausvic_230100",
    "name": "Deep Creek at Darraweit Guim",
    "lat": -37.41031306,
    "lon": 144.9022845,
    "area_km2": 682.7
  },
  {
    "gauge_id": "ausvic_230102",
    "name": "Deep Creek at Bulla",
    "lat": -37.6314,
    "lon": 144.801,
    "area_km2": 876.1
  },
  {
    "gauge_id": "ausvic_230211",
    "name": "Bolinda Creek at Clarkefield",
    "lat": -37.4662,
    "lon": 144.744,
    "area_km2": 177.9
  },
  {
    "gauge_id": "ausvic_230107",
    "name": "Konagaderra Creek at Konagaderra",
    "lat": -37.5285,
    "lon": 144.856,
    "area_km2": 682.7
  },
  {
    "gauge_id": "ausvic_230237",
    "name": "Maribyrnong River at Keilor North",
    