# **Notebook Title: National Data Buoy Center (NDBC) Data Download**
## This notebook performs the following task(s):
> - #### Reads in historical, tabular data for a single buoy NDBC using pandas DataFrames. 
> - #### Creates a single pandas DataFrame with all the timeseries data for an individual buoy.
> - #### Converts the pandas DataFrame into an xarray Dataset and exports the Dataset as a NetCDF file.

## Import packages
#### **Links to documentation for packages:**
> - #### [requests](https://requests.readthedocs.io/en/latest/) | [numpy](https://numpy.org/doc/1.21/) | [xarray](https://docs.xarray.dev/en/stable/) | [pandas](https://pandas.pydata.org/pandas-docs/version/1.3.5/) | [matplotlib](https://matplotlib.org/3.5.3/index.html) | 
> - #### Note #1: Package documentation versions linked above may not correspond to the exact package version used for this analysis.
> - #### Note #2: Comments are also included in the actual code cells. Commented links above certain pieces of code are provided to help show where I found certain pieces of code that were either directly copied or adapted. It is possible that I missed some links, suggesting that there may be snippets of code I simply grabbed off the internet without attribution.

In [1]:
#-------------------------------------------------
#Import packages
import requests
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
#-------------------------------------------------

## Read historical NDBC data for a single buoy into pandas DataFrames and append each yearly DataFrame into a list

#### **Notes**
> - #### It is pretty neat that pandas allows us to specify a URL with tabular data, which can be read directly into a DataFrame.
> - #### For information on the standard meteoroloigcal data used in this notebook from NDBC, please see the following [link](https://www.ndbc.noaa.gov/faq/measdes.shtml#stdmet).
> - #### Please read in-line comments for additional details about this code cell.

In [None]:
#-------------------------------------------------
#Define buoy ID number
#A few ID numbers around the Santa Barbara are are:
#46218: Harvest buoy | 46054: West Santa Barbara buoy | 45053: East Santa Barbara buoy
buoy_id        = 46053
buoy_years     = np.arange(1994, 2023+1, 1)
buoy_data_list = []
#-------------------------------------------------
#For every year we have, do the following:
for buoy_year_index, buoy_year in enumerate(buoy_years):
    
    #Define URL to buoy data
    #buoy_url = f'https://www.ndbc.noaa.gov/data/realtime2/{buoy_id}.txt'
    buoy_url = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={buoy_id}h{buoy_year}.txt.gz&dir=data/historical/stdmet/'
    
    #Station data before 1999 was identified to have two digits in the "year" column instead of four digits
    #We add this "if-else" statement to check for this and make a correction to the date format character that we will use when reading the data into a pandas DataFrame
    if buoy_year < 1999:
        data_format_year = '%y'
    else:
        data_format_year = '%Y'
    
    #It appears that the historical buoy data from NDBC added a "minute" column to historical data beginning in 2005
    #Historical data prior to 2005 is only available on an hourly basis, which makes reading the data into pandas slightly tricky.
    #All good though. We just implement an "if-else" statement to take care of this assuming that all historical buoy data from NDBC is like this
    #If the year is prior to 2005, read the data into pandas without specifying a "minute" column, else include a "minute" column
    if buoy_year < 2005:
        df_column_names = ['year', 'month', 'day', 'hour', 'wind_dir_deg', 'wind_spd_ms', 'wind_gst_ms', 'wave_height_m', 'dom_wave_period_sec', 'average_wave_period_sec', 'mean_wave_dir_deg', 'pressure_hpa', 'air_temp_c', 'water_temp_c', 'dewpoint_c']
        df_date_format  = f'{data_format_year} %m %d %H'
        df_date_indexes = [0, 1, 2, 3]
        df_use_cols     = np.arange(0,14+1,1)
    else: 
        df_column_names = ['year', 'month', 'day', 'hour', 'minute', 'wind_dir_deg', 'wind_spd_ms', 'wind_gst_ms', 'wave_height_m', 'dom_wave_period_sec', 'average_wave_period_sec', 'mean_wave_dir_deg', 'pressure_hpa', 'air_temp_c', 'water_temp_c', 'dewpoint_c']
        df_date_format  = f'{data_format_year} %m %d %H %M'
        df_date_indexes = [0, 1, 2, 3, 4]
        df_use_cols     = np.arange(0,15+1,1)
    
    #It appears that in the NDBC historical data that the unit row was added to data files starting in 2007
    #Because of this, we have to create and another "if-else" check to set how many rows we want to skip depending on what year we are in
    if buoy_year < 2007:
        skip_rows = 0
    else:
        skip_rows = 1
    
    #Use a "try-except" workflow to deal with an HTTPError that may occur if data is not available for a specific buoy during a specific year
    #Thanks ChatGPT
    try:
        #Use the requests package to examine the current buoy URL
        response = requests.get(buoy_url)
        
        #Check for HTTP errors
        response.raise_for_status()  

        #Read the data into a pandas DataFrame
        #You will notice that there are a lot of custom options we specify when reading the buoy data using the "read_csv" function
        #These were chosen based on ease of use for my personal python programming workflow
        df = pd.read_csv(buoy_url, delim_whitespace=True, skiprows=skip_rows, header=0, na_values=['MM', 99.00, 999.0, 999, 9999.0], usecols=df_use_cols,
                         names=df_column_names, parse_dates={'date':df_date_indexes}, date_format=df_date_format).set_index('date').sort_index(ascending=True)
        
        #Add DataFrame to list. We will concatentate all DataFrames into a single Dataframe after.
        buoy_data_list.append(df)

    #If there is an HTTPError, print it. 
    #This will not stop the code from continuing to run for other years, which is what we want.
    except requests.exceptions.HTTPError as errh:
        print(f"HTTP Error: {errh}")
#-------------------------------------------------

HTTP Error: 404 Client Error: Not Found for url: https://www.ndbc.noaa.gov/view_text_file.php?filename=46053h1997.txt.gz&dir=data/historical/stdmet/


## Concatenate pandas DataFrames for a single buoy into a single DataFrame, then convert to an xarray Dataset and export the final result as a NetCDF-4 file.

#### **Notes**

> - #### None

In [None]:
#-------------------------------------------------
#Concatenate all DataFrames in our list to form one large DataFrame
concatenate_df = pd.concat(buoy_data_list)

#Convert this DataFrame into an Xarray DataSet
ds = xr.Dataset.from_dataframe(concatenate_df)

#Export xarray Dataset to a NetCDF file
ds.to_netcdf(path=f'ndbc_historical_{buoy_years[0]_to_{buoy_years[-1]}_{buoy_id}}', mode='w', format='NETCDF4')
#-------------------------------------------------