# 02 — Find and open a GB4 NetCDF


In [None]:
**Goal:**  
Locate a real eReefs/GB4 NetCDF file on `/g/data` and open it with `xarray`.

**Why this matters:**  
Everything we do later (subsetting, daily means, OceanParcels FieldSet) depends on:
- the dataset’s **dimensions** (time, depth, lat, lon)
- the **variable names** (U/V, temperature, etc.)
- coordinate conventions and units

**Expected outcome:**  
- We can open one NetCDF and print its structure (`print(ds)`)
- We can identify the main variables and dimensions

## 1) Define candidate data locations

We’ll start by checking the two project mounts you mentioned:
- `/g/data/fx3`
- `/g/data/ih54`

These are read-only, so we only *inspect* and *read* from them.


In [1]:
import os
from pathlib import Path

roots = [Path("/g/data/fx3"), Path("/g/data/ih54")]
for r in roots:
    print(f"{r} exists? {r.exists()}")


/g/data/fx3 exists? True
/g/data/ih54 exists? True


## 2) Find one NetCDF file (lightweight search)

We’ll search a few directory levels deep for `*.nc` files.
We intentionally limit depth so we don’t accidentally crawl huge directory trees.


In [2]:
from pathlib import Path

def find_nc_files(root: Path, max_depth: int = 4, limit: int = 30):
    """Find up to `limit` NetCDF files under `root` up to a certain depth."""
    found = []
    root_parts = len(root.parts)
    for p in root.rglob("*.nc"):
        depth = len(p.parts) - root_parts
        if depth <= max_depth:
            found.append(p)
            if len(found) >= limit:
                break
    return found

candidates = []
for root in roots:
    if root.exists():
        files = find_nc_files(root, max_depth=4, limit=30)
        print(f"\n{root} -> found {len(files)} candidate .nc files (showing up to 10):")
        for f in files[:10]:
            print("  ", f)
        candidates.extend(files)

print(f"\nTotal candidates collected: {len(candidates)}")



/g/data/fx3 -> found 30 candidate .nc files (showing up to 10):
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2012-02.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2017-06.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2016-03.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2016-07.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2015-11.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2012-05.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2014-08.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2013-02.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2018-02.nc
   /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2018-05.nc

/g/data/ih54 -> found 30 candidate .nc files (showing up to 10):
   /g/data/ih54/tra198/BRAN2023/MACQ/BRAN2023_uv_2023_11.nc
   /g/dat

## 3) Choose a file and open it with xarray

We’ll pick the first candidate file (or you can paste a specific path).
Then we’ll inspect:
- dimensions (time, lat, lon, depth)
- coordinate variables
- data variables (U, V, temperature, etc.)


In [3]:
import xarray as xr

if len(candidates) == 0:
    raise RuntimeError("No candidate NetCDF files found yet. We'll use a Terminal search next.")

fn = candidates[0]
print("Opening:", fn)

ds = xr.open_dataset(fn)  # keep simple for first inspection
print(ds)


Opening: /g/data/fx3/gbr4_bgc_GBR4_H2p0_B3p1_Cq3P_Dhnd/gbr4_bgc_all_simple_2012-02.nc


Matplotlib is building the font cache; this may take a moment.


<xarray.Dataset> Size: 77GB
Dimensions:               (j: 180, i: 600, time: 29, k_sed: 12, k: 47)
Coordinates:
  * time                  (time) datetime64[ns] 232B 2012-02-01T02:00:00 ... ...
    zc                    (k) float64 376B ...
    latitude              (j, i) float64 864kB ...
    longitude             (j, i) float64 864kB ...
    zcsed                 (k_sed) float64 96B ...
Dimensions without coordinates: j, i, k_sed, k
Data variables: (12/316)
    botz                  (j, i) float64 864kB ...
    zc_sedim              (time, k_sed, j, i) float64 301MB ...
    eta                   (time, j, i) float32 13MB ...
    salt                  (time, k, j, i) float32 589MB ...
    temp                  (time, k, j, i) float32 589MB ...
    Gravel-mineral        (time, k, j, i) float32 589MB ...
    ...                    ...
    DIP_fluxsedi_inst     (time, j, i) float32 13MB ...
    NH4_sedflux           (time, j, i) float32 13MB ...
    NH4_fluxsedi_inst     (time, j, i) flo

To continue decoding into a timedelta64 dtype, either set `decode_timedelta=True` when opening this dataset, or add the attribute `dtype='timedelta64[ns]'` to this variable on disk.
To opt-in to future behavior, set `decode_timedelta=False`.
  ds = xr.open_dataset(fn)  # keep simple for first inspection


## 4) Quick inventory of the dataset

This prints the dataset dimensions, coordinate names, and a short list of variables.
These names tell us how to subset and (later) how to build a Parcels FieldSet.
