# NDBC Station 41047 Spectral Wave Density Data

This notebook explores the differences between spectral wave density data products for [NDBC Station 41047](http://www.ndbc.noaa.gov/station_page.php?station=41047). It investigates three unique OPeNDAP sources for the data: [Planet OS](http://data.planetos.com/datasets/noaa_ndbc_swden_stations), [NDBC realtime](http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w9999.nc.html), and [NDBC 2014 historical](http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w2014.nc.html) products.

Planet OS currently acquires spectral wave density data from each individual station's realtime product, which is denoted by the `w9999` nomenclature immediately before the file extension. We would expect the Planet OS product and the NDBC realtime product to be identical.

Let's begin with the required imports...

In [1]:
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
from package_api import download_data_station
import datetime

Next we'll setup our data sources and acquire the data via OPeNDAP using `xarray`.

In [2]:
API_key = open('APIKEY').readlines()[0].strip() #'<YOUR API KEY HERE>'
dataset_key = 'noaa_ndbc_swden_stations'
variables = 'spectral_wave_density,mean_wave_dir,principal_wave_dir,wave_spectrum_r1,wave_spectrum_r2'

In [3]:
# OpenDAP URLs for each product
ndbc_rt_url='http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w9999.nc'
ndbc_2014_url = 'http://dods.ndbc.noaa.gov/thredds/dodsC/data/swden/41047/41047w2018.nc'
planetos_filename = download_data_station(dataset_key,API_key,'41047','2018-01-01T00:00:00',datetime.datetime.strftime(datetime.datetime.now(),'%Y-%m-%dT%H:%M:%S')
,variables,'41047')
print (planetos_filename)
#planetos_tds_url = 'https://api.planetos.com/v1/datasets/noaa_ndbc_swden_stations/stations/41047?origin=dataset-details&station=46237&apikey=8428878e4b944abeb84790e832c633fc&_ga=2.215365009.721611707.1530692788-133742091.1504032768&netcdf=true'#'http://thredds.planetos.com/thredds/dodsC/dpipe//rel_0_8x03_dataset/transform/ns=/noaa_ndbc_swden_stations/scheme=/http/authority=/dods.ndbc.noaa.gov/path=/thredds/dodsC/data/swden/41047/41047w9999.nc/chunk=/1/1/data'

# acquire OpenDAP datasets
ds_ndbc_rt = xr.open_dataset(ndbc_rt_url)
ds_ndbc_2014 = xr.open_dataset(ndbc_2014_url)
#ds_planetos = xr.open_dataset(planetos_tds_url)

/Users/etoodu/Desktop/planetOS/git/notebooks/api-examples
https://api.planetos.com/v1/packages?dataset=noaa_ndbc_swden_stations&apikey=8428878e4b944abeb84790e832c633fc&station=41047&grouping=location&reftime_recent=true&time_start=2018-01-01T00:00:00&time_end=2018-07-04T16:14:12&package=noaa_ndbc_swden_stations_20180101T000000to20180704T161412_41047&var=spectral_wave_density,mean_wave_dir,principal_wave_dir,wave_spectrum_r1,wave_spectrum_r2
https://api.planetos.com/v1/packages/noaa_ndbc_swden_stations_20180101T000000to20180704T161412_41047?apikey=8428878e4b944abeb84790e832c633fc
https://api.planetos.com/v1/packages/noaa_ndbc_swden_stations_20180101T000000to20180704T161412_41047/data?apikey=8428878e4b944abeb84790e832c633fc
Please wait while package is downloaded
Data is downloaded!
<zipfile.ZipFile filename='noaa_ndbc_swden_stations_20180101T000000to20180704T161412_41047.zip' mode='r'>


FileNotFoundError: [Errno 2] No such file or directory: '/Users/etoodu/Desktop/planetOS/git/notebooks/api-examples/node\\:station/'

In [None]:
# Let's focus on a specific hour of interest...
time = '2014-08-09 00:00:00'

# Select the specific hour for each dataset
ds_ndbc_rt_hour = ds_ndbc_rt.sel(time=time).isel(latitude=0, longitude=0)
ds_ndbc_2014_hour = ds_ndbc_2014.sel(time=time).isel(latitude=0, longitude=0)
ds_planetos_hour = ds_planetos.sel(time=time).isel(latitude=0, longitude=0)

## Product Inspection: Planet OS / NDBC Realtime / NDBC 2014 Historical
For each of our three data products, we'll create an associated Dataframe for analysis.

In [None]:
# First, the Planet OS data which is acquired from the NDBC realtime station file.
df_planetos = ds_planetos_hour.to_dataframe().drop(['context_time_latitude_longitude_frequency','mx_dataset','mx_creator_institution'], axis=1)
df_planetos.head(8)

In [None]:
# Second, the NDBC realtime station data.
df_ndbc_rt = ds_ndbc_rt_hour.to_dataframe()
df_ndbc_rt.head(8)

In [None]:
# Finally, the 2014 archival data.
df_ndbc_2014 = ds_ndbc_2014_hour.to_dataframe()
df_ndbc_2014.head(8)

Based on the sample outputs above, it appears that **the Planet OS data matches the NDBC realtime file that it is acquired from.** We will further verify this below by performing an equality test against the two Dataframes.

We can also see that **the historical data is indeed different, with frequency bins that are neatly rounded and values for wave direction and wave spectrum even when spectral wave density is 0.**

Using the `describe()` method we can explore the statistical characteristics of each in more detail below. Note that the `NaN` values present in the Planet OS and NDBC realtime datasets will raise warnings for percentile calculations.

In [None]:
df_planetos.describe()

In [None]:
df_ndbc_rt.describe()

In [None]:
df_ndbc_2014.describe()

## Confirm Planet OS Equality to NDBC Realtime

To confirm that the Planet OS and NDBC realtime Dataframes are indeed equal, we'll perform a diff. Note that `NaN != NaN` evaluates as True, so `NaN` values will be raised as inconsistent across the dataframes. This could be resolved using `fillna()` and an arbitrary fill value such as `-9999.99`.

In [None]:
# function below requires identical index structure
def df_diff(df1, df2):
    ne_stacked = (df1 != df2).stack()
    changed = ne_stacked[ne_stacked]
    difference_locations = np.where(df1 != df2)
    changed_from = df1.values[difference_locations]
    changed_to = df2.values[difference_locations]
    return pd.DataFrame({'df1': changed_from, 'df2': changed_to}, index=changed.index)

# Compare the NDBC realtime to Planet OS data
# Note that NaN != NaN evaluates as True, so NaN values will be raised as inconsistent across the dataframes
# We could use fillna() to fix this issue, however this is not implemented here.
df_diff(df_ndbc_rt, df_planetos)

The `df_dff` results are as expected, only NaN values are different between the two datasets.

## Spectral Wave Density Plot

Let's plot the spectral wave density for all three datasets across the frequency coverage to see how they differ.

In [None]:
plt.figure(figsize=(20,10))
ds_ndbc_rt_hour.spectral_wave_density.plot(label='NDBC Realtime')
ds_ndbc_2014_hour.spectral_wave_density.plot(label='NDBC 2014')
ds_planetos_hour.spectral_wave_density.plot(label='Planet OS')
plt.legend()
plt.show()

There is a very slight discrepancy between the 2014 NDBC product and the Planet OS product, but no difference between the realtime NDBC product and Planet OS product.

## Wave Spectrum Plots

In [None]:
vars = ['wave_spectrum_r1','wave_spectrum_r2']
df_planetos.loc[:,vars].plot(label="Planet OS", figsize=(18,6))
df_ndbc_rt.loc[:,vars].plot(label="NDBC Realtime", figsize=(18,6))
df_ndbc_2014.loc[:,vars].plot(label="NDBC 2014", figsize=(18,6))
plt.show()

## Wave Direction Plots

In [None]:
vars = ['principal_wave_dir','mean_wave_dir']
df_planetos.loc[:,vars].plot(label="Planet OS", figsize=(18,6))
df_ndbc_rt.loc[:,vars].plot(label="NDBC Realtime", figsize=(18,6))
df_ndbc_2014.loc[:,vars].plot(label="NDBC 2014", figsize=(18,6))
plt.show()

## Conclusion

The Planet OS [NDBC spectral wave density data product](http://data.planetos.com/datasets/noaa_ndbc_swden_stations) matches the original NDBC realtime source.

It appears that NDBC is performing addition QA/QC processing on the archival data, which differ slightly from the realtime data, however attempts to locate documentation on the historical data product processing have not been successful.

Planet OS does not currently overwrite historical data with the NDBC archival products, however we may consider doing so if product quality is superior for end users.