# Datasets

Load the datasets used in the analysis for J. Davis et al. "Ocean surface wave slopes and wind-wave alignment observed in Hurricane Idalia".

All data are available at the Dryad repository for this publication (https://doi.org/10.5061/dryad.zw3r228h7) and at the Dryad repository for Davis et al. (2023) (https://doi.org/10.5061/dryad.g4f4qrfvb). See `input_data/README.md`.

Paths are saved in `config.toml`.

In [1]:
import pickle

import numpy as np
import pandas as pd
import xarray as xr

from configure import get_config
from src import best_track, buoy_accessor, met

# Setup

Load the configuration file, `config.toml`, which contains the data directories.

In [2]:
config = get_config()

Variables are shared across notebooks using the IPython "magic" commands `%store` to save variables and `%store -r` to read them.
The following cell clears all stored variables.

In [3]:
%store -z

## Analysis start and end time

Define and store the time periods which will be used in the analysis. This includes a longer time period `time_slice_full` which will be used to trim the datasets to a period covering a day leading up to the storm and part of the day after the storm has passed, and `time_slice`, a 15-hour period centered on Idalia's point of closest approach to the buoy array.

In [4]:
start_date = pd.Timestamp('2023-08-29T00:00', tz='utc')
end_date = pd.Timestamp('2023-08-31T00:00', tz='utc')
time_slice_full = slice(start_date, end_date)
time_slice_full_no_tz = slice(start_date.tz_localize(None), end_date.tz_localize(None))

%store time_slice_full
%store time_slice_full_no_tz

Stored 'time_slice_full' (slice)
Stored 'time_slice_full_no_tz' (slice)


Define the shorter, 15-hour `time_slice` period.

In [5]:
start_date = pd.Timestamp('2023-08-30T00:00', tz='utc')
end_date = pd.Timestamp('2023-08-30T15:00', tz='utc')
time_slice = slice(start_date, end_date)
time_slice_no_tz = slice(start_date.tz_localize(None), end_date.tz_localize(None))

%store time_slice
%store time_slice_no_tz

Stored 'time_slice' (slice)
Stored 'time_slice_no_tz' (slice)


## Drifter dataset

Load the microSWIFT and Spotter drifter datasets and concatenate them into a single DataFrame, `drifter_df`.

In [6]:
def concatenate_drifters(drifter_dict: dict) -> pd.DataFrame:
    """
    Concatenate a dictionary of individual drifter DataFrames into a single,
    multi-index DataFrame.  Drop the observations that do not contain waves
    (remove off-hour pressure and temperature observations).

    Args:
        drifter_dict (dict): individual drifter DataFrames keyed by id.

    Returns:
        DataFrame: concatenated drifters
    """
    drifter_df = (
        pd.concat(drifter_dict, names=['id', 'time'])
        .dropna(subset='energy_density')
    )
    return drifter_df

In [7]:
DRIFTER_DATA_PATH = config['dir']['idalia_drifter_data']

with open(DRIFTER_DATA_PATH, 'rb') as handle:
    drifter_data = pickle.load(handle)

# Concatenate the individual drifter DataFrames by type
microswift_df = concatenate_drifters(drifter_data['microswift'])
spotter_df = concatenate_drifters(drifter_data['spotter'])

# Create a drifter type column
microswift_df['drifter_type'] = 'microswift'
spotter_df['drifter_type'] = 'spotter'

# Combine all drifters into a single DataFrame.
drifter_df = (pd.concat([microswift_df, spotter_df])
              .sort_index(level=['id', 'time'], ascending=True)
              .loc[(slice(None), time_slice_full), :])

%store drifter_df

Stored 'drifter_df' (DataFrame)


In [8]:
drifter_df.index.get_level_values('id').unique().size

14

## COAMPS-TC

Load the COAMPS-TC wind fields into a Dataset, `coamps_ds`.

In [9]:
COAMPS_PATH = config['dir']['coamps']
coamps_ds = xr.open_dataset(COAMPS_PATH)
coamps_ds = coamps_ds.rename(
    {'lon': 'longitude',
     'lat': 'latitude',
     'wind_u': 'u',
     'wind_v': 'v'})
coamps_ds = coamps_ds.sel(time=time_slice_full_no_tz)
coamps_ws =  np.sqrt(coamps_ds['u'].values**2 + coamps_ds['v'].values**2)
coamps_ds['ws'] = (('time', 'latitude', 'longitude'), coamps_ws)

%store coamps_ds

Stored 'coamps_ds' (Dataset)


## NHC

Load National Hurricane Center shape files (for mapping).

In [10]:
BEST_TRACK_DIRECTORY = config['dir']['best_track']
idalia_pts = best_track.read_shp_file(BEST_TRACK_DIRECTORY + 'AL102023_pts.shp', index_by_datetime=True)
idalia_lin = best_track.read_shp_file(BEST_TRACK_DIRECTORY + 'AL102023_lin.shp')
idalia_radii = best_track.read_shp_file(BEST_TRACK_DIRECTORY + 'AL102023_radii.shp')
idalia_windswath = best_track.read_shp_file(BEST_TRACK_DIRECTORY + 'AL102023_windswath.shp')

idalia_pts = best_track.best_track_pts_to_intensity(idalia_pts)
idalia_nhc_geometry = (idalia_pts, idalia_lin, idalia_windswath)

%store idalia_nhc_geometry

Stored 'idalia_nhc_geometry' (tuple)


## SFMR

Load Stepped Frequency Microwave Radiometer (SFMR) data for COAMPS-TC surface wind validation.  The SFMR is flown by both NOAA and the United States Air Force Reserve (USAFR) Weather Reconnaissance Squadron.

Flight-level meterological datasets are available at: https://www.aoml.noaa.gov/2023-hurricane-field-program-data/#idalia

In [11]:
#TODO: add SFMR to input data on dryad
#TODO: update readmes

In [12]:
NOAA_SFMR_DIRECTORY = config['dir']['noaa_met_data']
noaa_met_data_vars = [
    'SfmrWS.1', 'SfmrWErr.1', 'SfmrRainRate.1', 'SfmrDV.1', 'LonGPS.1', 'LatGPS.1',
]
#TODO: pitch and roll? Altitude?
noaa_met_rename_dict = {
    'Time': 'datetime',
    'SfmrWS.1': 'sfmr_10m_wind_speed',
    'SfmrWErr.1': 'sfmr_10m_wind_speed_error',
    'SfmrRainRate.1': 'sfmr_rain_rate',
    'SfmrDV.1': 'sfmr_data_validity',
    'LonGPS.1': 'longitude',
    'LatGPS.1': 'latitude',
}

noaa_sfmr_ds = met.read_noaa_met_directory(NOAA_SFMR_DIRECTORY,
                                           data_vars=noaa_met_data_vars)
noaa_sfmr_ds = noaa_sfmr_ds.rename(noaa_met_rename_dict)

%store noaa_sfmr_ds

FileNotFoundError: No files of type "nc" found in "input_data/noaa_sfmr/". Please double check the directory and file_type.

In [26]:
USAFR_SFMR_DIRECTORY = config['dir']['usafr_met_data']
usafr_met_data_vars = [
    # 'SWS', 'WSPD', 'WDIR', 'RR', 'LON', 'LAT',
    'SWS', 'RR', 'LON', 'LAT',
]
#TODO: pitch and roll? Altitude?
usafr_met_rename_dict = {
    'GMT_Time': 'datetime',
    'SWS': 'sfmr_10m_wind_speed',
    # 'WSPD': 'flight_level_wind_speed',
    # 'WDIR': 'flight_level_wind_direction',
    'RR': 'sfmr_rain_rate',
    'LON': 'longitude',
    'LAT': 'latitude',
}

usafr_sfmr_ds = met.read_usafr_met_directory(USAFR_SFMR_DIRECTORY,
                                            data_vars=usafr_met_data_vars,
                                            data_type='xarray')
usafr_sfmr_ds = usafr_sfmr_ds.rename(usafr_met_rename_dict)

%store usafr_sfmr_ds

## IBTrACS

Load the International Best Track Archive for Climate Stewardship (IBTrACS) dataset which is used for storm positions and meterological metrics (Knapp et al., 2010; Gahtan et al., 2024).  The dataset is read into Pandas directly from the server.  The dataset is also available at: https://www.ncei.noaa.gov/products/international-best-track-archive.


In [11]:
IBTRACS_PATH = config['dir']['ibtracs']
ibtracs_df = pd.read_csv(IBTRACS_PATH, low_memory=False)
ibtracs_df = (ibtracs_df
    .query('NAME == "IDALIA"')
    .query('SEASON == "2023"')
    .assign(ISO_TIME = lambda df: pd.to_datetime(df['ISO_TIME'], utc=True))
    .set_index('ISO_TIME', drop=True)
    .assign(LAT = lambda df: df['LAT'].astype(np.float64))
    .assign(LON = lambda df: df['LON'].astype(np.float64))
)

%store ibtracs_df

Stored 'ibtracs_df' (DataFrame)


## GEBCO Bathymetry

Load bathymetry data for the region containing the buoys (GEBCO Bathymetric Compilation Group, 2023).  The dataset is available at: https://www.gebco.net/data_and_products/gridded_bathymetry_data/.



In [12]:
GEBCO_PATH = config['dir']['gebco']
bathymetry_ds = xr.load_dataset(GEBCO_PATH)

%store bathymetry_ds

Stored 'bathymetry_ds' (Dataset)


## Ian and Fiona from Davis et al. (2023)

Load data from Davis et al. (2023) "Saturation of Ocean Surface Wave Slopes Observed During Hurricanes".

The datasets can be downloaded at: https://doi.org/10.5061/dryad.g4f4qrfvb

In [13]:
def rename_davis_data(drifter_df):
    drifter_df = (drifter_df
        .rename({
            'spotter_id': 'id',
            'mean_square_slope_unadjusted': 'mean_square_slope_observed',
            'COAMPS_10m_wind_speed': 'wind_speed',
            'COAMPS_10m_wind_speed_u': 'wind_speed_u',
            'COAMPS_10m_wind_speed_v': 'wind_speed_v',
        }, axis=1)
        .set_index(['id', 'time'])
    )
    return drifter_df


In [14]:
IAN_PATH = config['dir']['ian_drifter_data']
FIONA_PATH = config['dir']['fiona_drifter_data']

# Read the data into a pandas.DataFrame and convert the entries in the
# 'time' column to datetimes.
ian_spotter_coamps_df = pd.read_json(IAN_PATH, convert_dates=['time'])
fiona_spotter_coamps_df = pd.read_json(FIONA_PATH, convert_dates=['time'])

# Rename variables for consistency
ian_spotter_coamps_df = rename_davis_data(ian_spotter_coamps_df)
fiona_spotter_coamps_df = rename_davis_data(fiona_spotter_coamps_df)

# Map spectral variables to arrays
spectral_cols = ian_spotter_coamps_df.buoy.spectral_variables
ian_spotter_coamps_df.loc[:, spectral_cols] = ian_spotter_coamps_df[spectral_cols].map(np.array)
spectral_cols = fiona_spotter_coamps_df.buoy.spectral_variables
fiona_spotter_coamps_df.loc[:, spectral_cols] = fiona_spotter_coamps_df[spectral_cols].map(np.array)

%store ian_spotter_coamps_df
%store fiona_spotter_coamps_df

Stored 'ian_spotter_coamps_df' (DataFrame)
Stored 'fiona_spotter_coamps_df' (DataFrame)


## References

Davis, J. R., Thomson, J., Houghton, I. A., Doyle, J. D., Komaromi, W. A., Fairall, C. W., Thompson, E. J., & Moskaitis, J. R. (2023). Saturation of Ocean Surface Wave Slopes Observed During Hurricanes. Geophysical Research Letters, 50(16), e2023GL104139. https://doi.org/10.1029/2023GL104139

Gahtan, J., Knapp, K. R., Schreck, C. J. I., Diamond, H. J., Kossin, J. P., & Kruk, M. C. (2024). International Best Track Archive for Climate Stewardship (IBTrACS) Project (Version 4.01 (Last 3 years)) [Dataset]. NOAA National Centers for Environmental Information. https://doi.org/doi:10.25921/82ty-9e16

GEBCO Bathymetric Compilation Group 2023. (2023). The GEBCO_2023 Grid—A continuous terrain model of the global oceans and land. [Dataset]. NERC EDS British Oceanographic Data Centre NOC. https://doi.org/10.5285/f98b053b-0cbc-6c23-e053-6c86abc0af7b

Knapp, K. R., Kruk, M. C., Levinson, D. H., Diamond, H. J., & Neumann, C. J. (2010). The International Best Track Archive for Climate Stewardship (IBTrACS): Unifying Tropical Cyclone Data. Bulletin of the American Meteorological Society, 91(3), 363–376. https://doi.org/10.1175/2009BAMS2755.1
