# M2HATS Field Campaign Data Pipeline

### Purpose
Use GDEX to read, standardize, and convert files relevant to the M2HATS field campaign to Zarr for more efficient data processing.

### Data
The two datasets used in this example are ERA5 reanalysis on pressure levels (stored permanently on GDEX) and 30-minute 449MHz Wind Profiler data (from EOL's Field Data Archive (FDA); stored on GDEX for this case study). 

### Motivation
The old data comparison process involved numerous manual steps: programatically downloading ERA5 reanalysis data from the Copernicus Climate Data Store into local storage (which took 90+ hours per field campaign); manually downloading Wind Profiler data from EOL's FDA, then untarring and unzipping the dataset; then programatically aligning both datasets on common variables using EOL's internal server. The intent for this new process is to limit the number of steps required to perform analysis and use data formats more compatible with Python's processing tools.

### Audience
Any researcher or PI interested in performing analyses using EOL's in-situ data, who is looking to modernize their workflow.

---

## Import required packages

In [1]:
# For analysis code
import glob
import numpy as np
import xarray as xr
import pandas as pd
from scipy.interpolate import interp1d
import metpy.calc as mpcalc
from metpy.units import units

# For Dask + cluster
from dask_jobqueue import PBSCluster
from distributed import Client
from dask import delayed

---

## Designate a scratch directory
Define the designated scratch directory to hold Zarr stores created from field campaign data and ERA5 model data.

In [2]:
lustre_scratch  = "/lustre/desc1/scratch/myasears"

---

## Spin up a cluster
Create a cluster and scale it to 5 workers to assist with the processing in this notebook.

In [3]:
cluster = PBSCluster(
        job_name = 'dask-eol-25',
        cores = 1,
        memory = '4GiB',
        processes = 1,
        local_directory = lustre_scratch + '/dask/spill',
        log_directory = lustre_scratch + '/dask/logs/',
        resource_spec = 'select=1:ncpus=1:mem=4GB',
        queue = 'casper',
        walltime = '3:00:00',
        interface = 'ext')

client = Client(cluster)

In [4]:
n_workers = 5
cluster.scale(n_workers)
client.wait_for_workers(n_workers = n_workers)

---

## Load 449 data
This dataset was initially downloaded from EOL's FDA, then placed into a directory on the GDEX that was designated for this pilot study. The 449 MHz profiler dataset is stored in daily netcdf files, wherein each data variable depends on time (every 30 minutes) and height (every 100 meters). 

We would typically open this grouping of netcdf files using xarray's `open_mfdataset`, but each day of profiler data has a slightly different maximum height dimension, which requires a special process to align the height dimension before concatenating the datasets. Once these issues are resolved and the dataset is standardized for enhanced understanding and workflow, the concatenated dataset is converted to a zarr store for ease of future use. 

In [5]:
prof449_path = "/gdex/data/special_projects/pythia_2025/eol-cookbook/m2hats_iss2_data/prof449Mhz_30min_winds"
prof449_files = sorted(glob.glob(f"{prof449_path}/*.nc"))

#### Align 449 heights
1. Find the min and max height values across all 449 datasets.       
2. Establish a common height grid that extends from the min to max height value, with a step of 100m.
3. Open each 449 dataset and reindex its height coordinates to the common height grid.
4. Concatenate all 449 datasets into a single xarray dataset. 

In [6]:
def get_minmax_height(f):
    ds = xr.open_dataset(f)
    return float(ds['height'].min()), float(ds['height'].max())

In [7]:
min_heights, max_heights = zip(*[get_minmax_height(f) for f in prof449_files])
min_height, max_height = min(min_heights), max(max_heights)

common_agl = np.arange(min_height, max_height + 100, 100)

In [8]:
def open_and_regrid(f, common_agl):
    ds = xr.open_dataset(f, chunks="auto")
    ds = ds.assign_coords(height=ds.height.isel(time=0).values)
    ds = ds.reindex(height=common_agl)
    ds = ds.assign_coords(
            height_agl=("height", common_agl),
            height_msl=("height", common_agl + ds.alt.values))
    ds = ds.swap_dims({"height": "height_msl"}).drop_vars("height")
    ds.height_msl.attrs.update(long_name="Height above mean sea level", units="m")
    ds.height_agl.attrs.update(long_name="Height above ground level", units="m")
    return ds

In [9]:
prof449_datasets = [delayed(open_and_regrid)(f, common_agl) for f in prof449_files[2:]]
prof449_datasets = [d.compute() for d in prof449_datasets]
combined_profiler = xr.concat(prof449_datasets, dim="time", combine_attrs="override")

#### Standardize and save the 449 dataset
Align the variable names and defined coordinates to match a standard format.

In [10]:
combined_profiler = (
    combined_profiler
    .assign_coords({
        "latitude": combined_profiler["lat"].isel(time=0).item(),
        "longitude": combined_profiler["lon"].isel(time=0).item(),
        "altitude": combined_profiler["alt"].isel(time=0).item()
    })
    .drop_vars(["lat", "lon", "alt"])
    .rename({"u": "u_wind", "v": "v_wind", "wvert": "w_wind"})
)

#### Save 449 as a zarr store

In [11]:
combined_profiler = combined_profiler.chunk({"time": 48, "height_msl": -1})
combined_profiler.to_zarr(f"{lustre_scratch}/2023_M2HATS/prof449_M2HATS_winds30.zarr")

  combined_profiler.to_zarr(f"{lustre_scratch}/2023_M2HATS/prof449_M2HATS_winds30.zarr")


<xarray.backends.zarr.ZarrStore at 0x14b383698720>

---

## Load ERA5 data
The dataset for ERA5 reanalysis on pressure levels is stored on GDEX, so we bypass any necessity to download files from the CDS. ERA5 reanalysis data is stored in netcdf files separated by day and variable, wherein each data variable depends on time (every hour) and pressure (a standardized pressure grid). We would normally be able to read this data quite simply using 'intake', but this case study is unique in that we are interested in atmospheric profiles form a single lat/lon point for this analysis, and the ERA5 data are stored on pressure levels over an xy plane spanning the entire globe. 

To work with this information, we lazily load the all relevant monthly datasets for a single variable, then subset the Xarray Dataset by the lat/lon of the profiler and all times spanning the target field campaign. This process is repeated for all desired variables, then all resulting datasets are merged together to produce an all-inclusive dataset for the field campaign. 

Upon creating the concatenated dataset, we also implement code to interpolate the data variables onto a common msl height grid for direct comparison with the other ISS instruments.

In [12]:
era5_path = '/gdex/data/d633000/e5.oper.an.pl'

#### Define project parameters
1. Retrieve latitude, longitude, start date, and end date from the 449 information. This can be done programatically or manually, keeping in mind a grid space of 0.25º.
3. List the variables to access from GDEX and their file prefixes (referenced from the available files in era5_path).       

In [13]:
prof449_lat = 38.0
prof449_lon = 243.0

start_date = pd.Timestamp("2023-07-11T00:00:00")
end_date = pd.Timestamp("2023-09-27T23:59:59")

In [14]:
era5_vars = {"Z": "e5.oper.an.pl.128_129_z",
             "U": "e5.oper.an.pl.128_131_u",
             "V": "e5.oper.an.pl.128_132_v",
             "W": "e5.oper.an.pl.128_135_w"
             }

#### Retrieve ERA5 files
1. Create a dataset for each variable (Z, U, V, W), for each month in the date range.
2. Merge these datasets into a single xarray dataset.

In [15]:
def retrieve_era5(file_prefix, lat, lon, start, end):
    
    files = []
    yyyymm = pd.date_range(start.normalize().replace(day=1), end, freq="MS").strftime("%Y%m").tolist()
    
    for month in yyyymm:
        files.extend(sorted(glob.glob(f'{era5_path}/{month}/{file_prefix}*')))

    ds = xr.open_mfdataset(files, combine="by_coords", parallel=True)
    ds_point = ds.sel(latitude=lat, longitude=lon, time=slice(start, end))
    
    return ds_point

In [16]:
datasets = [retrieve_era5(prefix, prof449_lat, prof449_lon, start_date, end_date) for prefix in era5_vars.values()]
combined_era5 = xr.merge(datasets)

#### Add variables and attributes
1. Calculate MSL height from geopotential.
2. Calculate wind speed and wind direction from u and v and add attributes.

In [17]:
combined_era5["height_msl"] = (combined_era5["Z"] * 6371008.7714) / (9.80665 * 6371008.7714 - combined_era5["Z"])

In [18]:
u = combined_era5["U"].data
v = combined_era5["V"].data

wspd = np.sqrt(u**2 + v**2)
wdir = (np.degrees(np.arctan2(-u, -v)) + 360) % 360

combined_era5["wspd"] = (("time", "level"), wspd)
combined_era5["wdir"] = (("time", "level"), wdir)

combined_era5["wspd"].attrs = {
    "long_name": "Wind Speed",
    "short_name": "wspd",
    "units": "meters/second",
    "source": "ERA5 atmospheric pressure level analysis [netCDF4] u and v wind components",
    "calculation_method": "MetPy 1.7.0 -- metpy.calc.wind_speed(u, v)"
}

combined_era5["wdir"].attrs = {
    "long_name": "Wind Direction (from direction)",
    "short_name": "wdir",
    "units": "degrees (east of north)",
    "source": "ERA5 atmospheric pressure level analysis [netCDF4] u and v wind components",
    "calculation_method": "Numpy 1.26.4 -- arctan2(u_wind, v_wind)"
}

In [19]:
combined_era5 = combined_era5.drop_vars("utc_date")

#### Make ERA5 height dependent
1. Interpolate the each data variable along the common MSL grid used by the 449.
2. Structure the new dataset to have the same coordinates and structure as the 449 dataset. 

In [20]:
def make_height_dependent(era5_data, common_msl):
    """
    Interpolate ERA5 pressure-level data to a common height grid.
    Adapted from: [Hamid Ali Syed](https://github.com/syedhamidali) (@syedhamidali)
    Referenced at: https://discourse.pangeo.io/t/how-to-convert-era5-pressure-coordinates-to-altitude/4071
    """

    interpolated_vars = {}
    for var in era5_data.data_vars:
        if var in ["height_msl", "Z"]:
            continue

        era5_data = era5_data.transpose("time", "level")

        # Interpolate along altitude with apply_ufunc
        interp_data = xr.apply_ufunc(
            lambda x, y: interp1d(y, x, bounds_error=False, fill_value="extrapolate")(
                common_msl
            ),
            era5_data[var],
            era5_data["height_msl"],
            input_core_dims=[["level"], ["level"]],
            output_core_dims=[["height_msl"]],
            dask_gufunc_kwargs={"output_sizes": {"height_msl": len(common_msl)}},
            vectorize=True,
            dask="parallelized",
            output_dtypes=[era5_data[var].dtype],
        )
    
        interp_data.attrs = era5_data[var].attrs
        interpolated_vars[var] = interp_data

    coords = {"time": era5_data.time, "height_msl": common_msl}
    ds_interpolated = xr.Dataset(interpolated_vars, coords=coords).transpose("time", "height_msl")

    return ds_interpolated

In [21]:
alt_levels = combined_profiler["height_msl"].values
era5_height_levels = make_height_dependent(combined_era5, alt_levels)

#### Standardize the dataset
1. Align the variable names and defined coordinates of the ERA5 dataset to match a standard format.
2. Add height attributes. 

In [23]:
era5_height_levels = (
    era5_height_levels
    .rename({
        "U": "u_wind",
        "V": "v_wind",
        "W": "w_wind"})
    )

In [24]:
era5_height_levels["height_msl"].attrs = {
    "long_name": "Height above mean sea level",
    "short_name": "height_msl",
    "units": "meters"
}

era5_height_levels = era5_height_levels.assign_attrs(combined_era5.attrs)

#### Save ERA5 as a zarr store

In [25]:
era5_height_levels.to_zarr(f"{lustre_scratch}/2023_M2HATS/era5_M2HATS_heights.zarr")



<xarray.backends.zarr.ZarrStore at 0x14b3598f62a0>

## Open the Zarr files
A sanity check to make sure the Zarr files have the information we'd expect, in the correct format. They look good and ready to be used in analysis!

In [26]:
era5_test_zarr = xr.open_zarr(f"{lustre_scratch}/2023_M2HATS/era5_M2HATS_heights.zarr")
era5_test_zarr

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 718.41 kiB 388 B Shape (1896, 97) (1, 97) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",97  1896,

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 718.41 kiB 388 B Shape (1896, 97) (1, 97) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",97  1896,

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 718.41 kiB 388 B Shape (1896, 97) (1, 97) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",97  1896,

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 718.41 kiB 388 B Shape (1896, 97) (1, 97) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",97  1896,

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 718.41 kiB 388 B Shape (1896, 97) (1, 97) Dask graph 1896 chunks in 2 graph layers Data type float32 numpy.ndarray",97  1896,

Unnamed: 0,Array,Chunk
Bytes,718.41 kiB,388 B
Shape,"(1896, 97)","(1, 97)"
Dask graph,1896 chunks in 2 graph layers,1896 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [27]:
prof449Mhz_test_zarr = xr.open_zarr(f"{lustre_scratch}/2023_M2HATS/prof449_M2HATS_winds30.zarr")
prof449Mhz_test_zarr

Unnamed: 0,Array,Chunk
Bytes,776 B,776 B
Shape,"(97,)","(97,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 776 B 776 B Shape (97,) (97,) Dask graph 1 chunks in 2 graph layers Data type float64 numpy.ndarray",97  1,

Unnamed: 0,Array,Chunk
Bytes,776 B,776 B
Shape,"(97,)","(97,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float64 numpy.ndarray,float64 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 14.44 kiB 192 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 14.44 kiB 192 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,14.44 kiB,192 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 28.88 kiB 384 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray
"Array Chunk Bytes 28.88 kiB 384 B Shape (3696,) (48,) Dask graph 77 chunks in 2 graph layers Data type datetime64[ns] numpy.ndarray",3696  1,

Unnamed: 0,Array,Chunk
Bytes,28.88 kiB,384 B
Shape,"(3696,)","(48,)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,datetime64[ns] numpy.ndarray,datetime64[ns] numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,700.22 kiB,9.09 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,int16 numpy.ndarray,int16 numpy.ndarray
"Array Chunk Bytes 700.22 kiB 9.09 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type int16 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,700.22 kiB,9.09 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,int16 numpy.ndarray,int16 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.37 MiB 18.19 kiB Shape (3696, 97) (48, 97) Dask graph 77 chunks in 2 graph layers Data type float32 numpy.ndarray",97  3696,

Unnamed: 0,Array,Chunk
Bytes,1.37 MiB,18.19 kiB
Shape,"(3696, 97)","(48, 97)"
Dask graph,77 chunks in 2 graph layers,77 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


## Notes on this workflow
The GDEX-assisted ERA5 retrieval process highlighted in this notebook took significantly less time and storage space than the previous workflow. I look forward to using GDEX for the retrieval of other large-scale models and reanalysis products. Additionally, the Zarr storage of ERA5 data subset for the M2HATS field campaign will be immensely useful during the next stages of analysis. 

The 449MHz profiler ingest process was very programatically similar to my current workflow, but it's much more approachable to read in netcdf files from the GDEX, rather than go through the download/untar/unzipping process that is currently required. In addition, the ability to analyze the profiler data from a Zarr store rather than numerous individual (and height-unaligned) netcdf files is expected to significantly improve my workflow. Now, I can perform analysis after only writing one or two lines to read in both ERA5 and 449 Profiler datasets. 