# **Notebook Title: National Data Buoy Center (NDBC) Realtime Data Download**
## This notebook performs the following task(s):
> - #### Reads in realtime, tabular data for a single buoy NDBC using pandas DataFrames. 
> - #### Creates a single pandas DataFrame with all the timeseries data for an individual buoy.
> - #### Converts the pandas DataFrame into an xarray Dataset and exports the Dataset as a NetCDF file.

## Import packages
#### **Links to documentation for packages:**
> - #### [requests](https://requests.readthedocs.io/en/latest/) | [numpy](https://numpy.org/doc/1.21/) | [xarray](https://docs.xarray.dev/en/stable/) | [pandas](https://pandas.pydata.org/pandas-docs/version/1.3.5/)
> - #### Note #1: Documentation for packages linked above correspond to the most stable versions, which may not be the exact versions used when creating this notebook.
> - #### Note #2: Comments are also included in the actual code cells. Commented links above certain pieces of code are provided to help show where some lines were copied from. It is possible that there may still be snippets of code here that were simply grabbed off the internet, from places like StackOverflow, without any atribution.

In [None]:
#-------------------------------------------------
#Import packages
import requests
import numpy as np
import pandas as pd
import xarray as xr
#-------------------------------------------------

## Read realtime NDBC data for a single buoy into pandas DataFrames and append each yearly DataFrame into a list

#### **Notes**
> - #### It is pretty neat that pandas allows us to specify a URL with tabular data, which can be read directly into a DataFrame.
> - #### For more information regarding NDBC realtime data see this [link](https://www.ndbc.noaa.gov/station_realtime.php?station=46053).
> - #### Current buoy stations that we have downloaded data for are:
>> - #### East Santa Barbara Buoy (Station 46053; [link to live data from NDBC](https://www.ndbc.noaa.gov/station_page.php?station=46053))
>> - #### West Santa Barbara Buoy (Station 46054; [link to live data from NDBC](https://www.ndbc.noaa.gov/station_page.php?station=46054))
>> - #### Harvest Buoy (Station 46218; [link to live data from NDBC](https://www.ndbc.noaa.gov/station_page.php?station=46218))
>> - #### **Note:** The Harvest buoy is maintained by the Coastal Data Information Program [CDIP](https://cdip.ucsd.edu/). However, you can access a good chunk of its data from NDBC, which is the data repository that was used in this work.

In [12]:
#-------------------------------------------------
#Define buoy ID number
#A few ID numbers around the Santa Barbara are are:
#46218: Harvest buoy | 46054: West Santa Barbara buoy | 45053: East Santa Barbara buoy
buoy_id        = 46054

#Define URL to buoy data
buoy_url = f'https://www.ndbc.noaa.gov/data/realtime2/{buoy_id}.txt'

#Define parameters to make reading in buoy data easier

#Column names for text file
df_column_names = ['year', 'month', 'day', 'hour', 'minute', 'wind_dir_deg', 'wind_spd_ms', 'wind_gst_ms', 'wave_height_m', 'dom_wave_period_sec', 'average_wave_period_sec', 'mean_wave_dir_deg', 'pressure_hpa', 'air_temp_c', 'water_temp_c', 'dewpoint_c']

#Date format string
df_date_format  = f'%Y %m %d %H %M'

#Which columns should we use to convert into our datetime index
df_date_indexes = [0, 1, 2, 3, 4]

#Which columns do we want to read in from the text file?
df_use_cols = np.arange(0,15+1,1)

#Whic rows do we want to skip when reading the file?
skip_rows = 1

#Define values that should be considered as NaNs
nan_values = ['MM', 99.00, 999.0, 999, 9999.0]
#-------------------------------------------------
#Use a "try-except" workflow to deal with an HTTPError that may occur if data is not available for a specific buoy during a specific year
#Thanks ChatGPT
try:
    #Use the requests package to examine the current buoy URL
    response = requests.get(buoy_url)

    #Check for HTTP errors
    response.raise_for_status()  

    #Read the data into a pandas DataFrame
    #You will notice that there are a lot of custom options we specify when reading the buoy data using the "read_csv" function
    #These were chosen based on ease of use for my personal python programming workflow
    df = pd.read_csv(buoy_url, delim_whitespace=True, skiprows=skip_rows, header=0, na_values=nan_values, usecols=df_use_cols,
                     names=df_column_names, parse_dates={'date':df_date_indexes}, date_format=df_date_format).set_index('date').sort_index(ascending=True)
    
#If there is an HTTPError, print it. 
#This will not stop the code from continuing to run for other years, which is what we want.
except requests.exceptions.HTTPError as errh:
    print(f"HTTP Error: {errh}")
#-------------------------------------------------

## Convert pandas DataFrame to an xarray Dataset and export the final result as a NetCDF-4 file.

#### **Notes**

> - #### None

In [13]:
#Convert this DataFrame into an Xarray DataSet
ds = xr.Dataset.from_dataframe(df)

#Export xarray Dataset to a NetCDF file
ds.to_netcdf(path=f'ndbc_realtime_{buoy_id}.nc', mode='w', format='NETCDF4')