# Convert data from the M2HATS field campaign to Zarr

This notebook utilizes the GDEX to read in, standardize, and convert files relevant to the M2HATS field campaign to Zarr for more efficient data processing. The two datasets used in this example are ERA5 (stored on GDEX) and 30-minute 449MHz Wind Profiler data (from the EOL FDA and temporarily stored on GDEX), but it is a straightforward process that could be applied to all ISS instruments during any field campaign. 

### Imports

In [1]:
# For analysis code
import glob
import numpy as np
import xarray as xr
import pandas as pd
import metpy.calc as mpcalc

# For Dask + cluster
from dask_jobqueue import PBSCluster
from distributed import Client
from dask import delayed

### Scratch directory
Designated scratch directory to hold Zarr stores created from field campaign data + ERA5 model data stored as NetCDF files.

In [2]:
lustre_scratch  = "/lustre/desc1/scratch/myasears"

### Spin up a cluster

In [3]:
cluster = PBSCluster(
        job_name = 'dask-eol-25',
        cores = 1,
        memory = '4GiB',
        processes = 1,
        local_directory = lustre_scratch + '/dask/spill',
        log_directory = lustre_scratch + '/dask/logs/',
        resource_spec = 'select=1:ncpus=1:mem=4GB',
        queue = 'casper',
        walltime = '3:00:00',
        interface = 'ext')

Perhaps you already have a cluster running?
Hosting the HTTP server on port 40195 instead


In [4]:
client = Client(cluster)

In [5]:
n_workers = 5
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)

### Load ERA5 data
ERA5 reanalysis information is stored on GDEX, so we can pull it right from there. However, we are interested in atmospheric profiles form a single lat/lon point for this analysis, and the data are stored on pressure levels over an xy plane spanning the entire globe. To work with this information, we lazily load the all relevant monthly datasets for a single variable, and subset the Xarray Dataset by the lat/lon of the profiler and all times spanning the target field campaign. This process is repeated for all desired variables, then all resulting datasets are merged together to produce an all-inclusive dataset for the field campaign. 

In [None]:
era5_path = '/gdex/data/d633000/e5.oper.an.pl'

In [None]:
target_lat = 38.0
target_lon = 243.0

start_date = pd.Timestamp("2023-07-11T00:00:00")
end_date = pd.Timestamp("2023-09-27T23:59:59")
yyyymm = ["202307", "202308", "202309"]

var_map = {"Z": "e5.oper.an.pl.128_129_z",
           "U": "e5.oper.an.pl.128_131_u",
           "V": "e5.oper.an.pl.128_132_v",
           "W": "e5.oper.an.pl.128_135_w"
           }

In [None]:
def open_variable(file_prefix, yyyymm):
    files = []
    for month in yyyymm:
        files.extend(sorted(glob.glob(f'{era5_path}/{month}/{file_prefix}*')))

    ds = xr.open_mfdataset(files, combine="by_coords", parallel=True)
    ds_point = ds.sel(latitude=target_lat, longitude=target_lon, time=slice(start_date, end_date))
    
    return ds_point

In [None]:
# Open and subset all variables
datasets = [open_variable(file_prefix, yyyymm) for file_prefix in var_map.values()]

# Merge them together
combined_era5 = xr.merge(datasets, compat="override", combine_attrs="override")

In [None]:
# Convert geopotential to geometric height (m above MSL)
height = (combined_era5["Z"] * 6371008.7714) / (9.80665 * 6371008.7714 - combined_era5["Z"])
combined_era5["height_msl"] = height
combined_era5.height_msl.attrs.update({"long_name": "Height above mean sea level", "units": "meters"})

# Drop utc_date variable
combined_era5 = combined_era5.drop_vars("utc_date")

# Change variable names to standardize with other datasets
name_mapping = {"level": "pressure", "Z": "geopotential", "U": "u_wind", "V": "v_wind", "W": "w_wind"}
combined_era5 = combined_era5.rename(name_mapping)

In [None]:
combined_era5.to_zarr(f"{lustre_scratch}/2023_M2HATS/era5_M2HATS_ISS2.zarr")

### Load 449 data

In [26]:
prof449_path = "/gdex/data/special_projects/pythia_2025/eol-cookbook/m2hats_iss2_data/prof449Mhz_30min_winds"

In [27]:
files = []
files.extend(sorted(glob.glob(f'{prof449_path}/*.nc')))

In [28]:
# Get min and max altitude from campaign 

def get_minmax_alt(f):
    with xr.open_dataset(f, decode_cf=False) as tmp:
        return float(tmp['height'].min()), float(tmp['height'].max())

min_heights, max_heights = zip(*[get_minmax_alt(f) for f in files])
min_height, max_height = min(min_heights), max(max_heights)

In [29]:
# Retrieve common height grid (with a step of 100m) using max and min values
step = 100
# Create common height grid
common_agl = np.arange(min_height, max_height + step, step)

# Retrieve altitude value from fifth file (after setup -- checked manually)
altitude = xr.open_dataset(files[5]).alt.values

# Use alt to create common MSL grid
common_msl = common_agl + altitude

In [30]:
def open_and_regrid(f, common_agl, common_msl):
    ds = xr.open_dataset(f, chunks="auto")

    # Calculate MSL height from AGL
    msl_height = ds['height'].isel(time=0) + altitude

    # Make height coordinate 1-dimensional (same at every time step)
    height_1d = ds['height'].isel(time=0).values
    ds = ds.assign_coords(height=("height", height_1d))

    # Reindex height coords to span min + max from entire campaign
    ds = ds.reindex(height=common_agl)
    
    # Update coords to the reindexed grid
    ds = ds.assign_coords(
        height_agl=("height", common_agl),
        height_msl=("height", common_msl)
    )
    
    ds.height_msl.attrs.update({"long_name": "Height above mean sea level", "units": "meters"})
    
    # Swap to make geopotential the vertical coordinate
    ds = ds.swap_dims({"height": "height_msl"}).drop_vars("height")

    return ds

In [31]:
datasets = [delayed(open_and_regrid)(f, common_agl, common_msl) for f in files[2:]]
datasets = [d.compute() for d in datasets]
combined_profiler = xr.concat(datasets, dim="time", combine_attrs="override")

In [32]:
combined_profiler = combined_profiler.assign_coords(
    latitude=combined_profiler["lat"].isel(time=0).item(),
    longitude=combined_profiler["lon"].isel(time=0).item(),
    altitude=combined_profiler["alt"].isel(time=0).item()
).drop_vars(["lat", "lon", "alt"])

name_mapping = {
    "u": "u_wind",
    "v": "v_wind",
    "wvert": "w_wind"
}

vars_to_keep = [var for var in name_mapping if var in combined_profiler.data_vars]
combined_profiler = combined_profiler.rename(name_mapping)

In [34]:
combined_profiler = combined_profiler.chunk({"time": 48, "height_msl": -1})
combined_profiler.to_zarr(f"{lustre_scratch}/2023_M2HATS/prof449_M2HATS_ISS1_winds30.zarr")

  combined_profiler.to_zarr(f"{lustre_scratch}/2023_M2HATS/prof449_M2HATS_ISS1_winds30.zarr")


<xarray.backends.zarr.ZarrStore at 0x148f27e60680>

### Open the Zarr files for confirmation

In [36]:
era5_test_zarr = xr.open_zarr(f"{lustre_scratch}/2023_M2HATS/era5_M2HATS_ISS2.zarr")
era5_test_zarr

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 274.03 kiB 148 B Shape (1896, 37) (1, 37) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",37  1896,

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 274.03 kiB 148 B Shape (1896, 37) (1, 37) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",37  1896,

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 274.03 kiB 148 B Shape (1896, 37) (1, 37) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",37  1896,

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 274.03 kiB 148 B Shape (1896, 37) (1, 37) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",37  1896,

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 274.03 kiB 148 B Shape (1896, 37) (1, 37) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",37  1896,

Unnamed: 0,Array,Chunk
Bytes,274.03 kiB,148 B
Shape,"(1896, 37)","(1, 37)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [35]:
prof449Mhz_test_zarr = xr.open_zarr(f"{lustre_scratch}/2023_M2HATS/prof449_M2HATS_ISS1_winds30.zarr")
prof449Mhz_test_zarr

Unnamed: 0,Array,Chunk
Bytes,776 B,776 B
Shape,"(97,)","(97,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 776 B 776 B Shape (97,) (97,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",97  1,

Unnamed: 0,Array,Chunk
Bytes,776 B,776 B
Shape,"(97,)","(97,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 28.88 kiB 384 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 14.44 kiB 192 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 14.44 kiB 192 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 28.88 kiB 384 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,700.22 kiB,9.09 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,int16 numpy.ndarray,int16 numpy.ndarray
"Array Chunk Bytes 700.22 kiB 9.09 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type int16 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,700.22 kiB,9.09 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,int16 numpy.ndarray,int16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
