Each of the stations (buoys) that are maintained by NOAA update a file at the url: https://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt periodically. The file contains the latest observations for each station. The file is updated every 10 minutes. The file is a text file with a header and a row for each station. The header contains the column names and the rows contain the data for each station. The columns are separated by spaces.

In [7]:
# get the latest news from the RSS feed at https://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt, and return the pandas dataframe of the data.
import pandas as pd
import requests
import io

def get_latest_data():
    url = "https://www.ndbc.noaa.gov/data/latest_obs/latest_obs.txt"
    s=requests.get(url).content

    # the table contains two rows that have header data. combine them into one row.

    df = pd.read_csv(io.StringIO(s.decode('utf-8')), sep='\s+')
    df.columns = df.columns.str.strip()
    # df = df.dropna(axis=1, how='all')
    # df = df.dropna(axis=0, how='all')
    # df = df.dropna(axis=0, how='any')
    # df = df.reset_index(drop=True)
    # df = df.drop(df.index[0])
    
    return df



How do I use RSS to get buoy observations? Programs called feed 'readers' or 'aggregators' collect RSS XML formatted content and present it in a user friendly format. Newer versions of web browsers and email programs offer built in support for RSS feeds. The user 'subscribes' to a feed by entering the link of the RSS feed into their RSS feed reader; the RSS feed reader then checks the subscribed feeds to see if any have new content since the last time it checked, and if so, retrieves the new content and presents it to the user.

To set it up, you'll need the URL to provide to the service when you subscribe. Go to your favorite station page, such as 42007, and look for the RSS icons. There are two pre-formatted RSS links: 1. the single station or 2. a search for stations within 100 nautical miles (our preset radius) of that station.

    Single Station - Look for and click on the RSS icon Image indicating link to RSS feed to the right of the station name/location. If your browser supports RSS, you will get another view of the data. If your browser includes an RSS subscription service, you will probably have a choice to subscribe to the link. Otherwise, copy the URL.
    Stations within 100 nautical miles - Look for the RSS icon Image indicating link to RSS feed to the right of the "Meteorological Observations from Nearby Stations" item. Again, copy the URL.
    Manually construct the URL - You can copy the following URL and change it to use your location (latitude and longitude) and search radius (nautical miles): https://www.ndbc.noaa.gov/rss/ndbc_obs_search.php?lat=40N&lon=73W&radius=100 Warning! Using a large search radius may return a very large dataset - more than 500 KB.

Paste the copied link into your favorite RSS reader or map service! Some services will give you an option to remember or subscribe to the URL.

All stations shown on the NDBC web site can be obtained, except for drifting buoys.

source: https://www.ndbc.noaa.gov/rss_access.shtml

In [8]:
latest_data = get_latest_data() # get the latest data

In [9]:
latest_data.head()

Unnamed: 0,#STN,LAT,LON,YYYY,MM,DD,hh,mm,WDIR,WSPD,...,DPD,APD,MWD,PRES,PTDY,ATMP,WTMP,DEWP,VIS,TIDE
0,#text,deg,deg,yr,mo,day,hr,mn,degT,m/s,...,sec,sec,degT,hPa,hPa,degC,degC,degC,nmi,ft
1,22101,37.24,126.02,2022,11,06,17,00,230,2.0,...,4,MM,MM,MM,MM,13.2,15.2,MM,MM,MM
2,22102,34.79,125.78,2022,11,06,17,00,20,4.0,...,6,MM,MM,MM,MM,13.2,15.3,MM,MM,MM
3,22103,34.00,127.50,2022,11,06,17,00,320,5.0,...,4,MM,MM,MM,MM,15.4,16.6,MM,MM,MM
4,22104,34.77,128.90,2022,11,06,17,00,340,6.0,...,4,MM,MM,MM,MM,15.3,21.6,MM,MM,MM


In [37]:
# give the list of columns in the dataframe
column_names = list(latest_data.columns.values)
print(column_names)

['#STN', 'LAT', 'LON', 'YYYY', 'MM', 'DD', 'hh', 'mm', 'WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 'PTDY', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']


In [11]:
# How many stations are there?
station_count = latest_data['#STN'].unique().size
print("There are {} stations/buoys.".format(station_count))

There are 846 stations/buoys.


In [30]:
# for each folder in the images/buoys folder, append the name of the folder to the list of stations 'buoy_list'
import os
import csv
import time
import pandas as pd
buoy_list = []
for folder in os.listdir('../images/buoys'):
    buoy_list.append(folder)


In [31]:

# create a dictionary to hold the buoy IDs and their lat/lon
buoy_id_to_lat_lon = {}
# loop through the buoy IDs
for buoy_id in buoy_list:
    #print(buoy_id)
    # pull the row from the latest data that matches the buoy ID
    buoy_data = latest_data[latest_data['#STN'] == buoy_id]
    #print(buoy_data)
    try:
        # get the lat/lon from the buoy data row
        buoy_lat = buoy_data['LAT'].values[0]
        buoy_lon = buoy_data['LON'].values[0]
        # add the buoy ID and lat/lon to the dictionary
        buoy_id_to_lat_lon[buoy_id] = [buoy_lat, buoy_lon]
    except:
        print("No data for buoy {}".format(buoy_id))

# save the dictionary to a csv file

with open('../data/buoy_id_to_lat_lon.csv', 'w+') as csv_file:
    writer = csv.writer(csv_file)
    for key, value in buoy_id_to_lat_lon.items():
       writer.writerow([key, value])


No data for buoy .DS_Store
No data for buoy 46085


We want to create our master dataframe with the columns shown here:
- station_id
- date of observation
- time of observation
- latitude
- longitude
- wind direction
- wind speed
- wave height
- wave period
- wave direction
- air temperature
- water temperature
- air pressure
- wind gust
- wave height of highest 1/3
- wave height of highest 1/10
- dominant wave period
- average wave period
- pressure tendency
- air pressure at sea level
- dew point
- visibility
- water level
- water level anomaly
- significant wave height
- average zero-crossing wave period
- direction of wind waves
- direction of swell waves
- swell wave height
- swell wave period
- primary wave direction
- primary wave mean period
- secondary wave direction
- secondary wave mean period
- ice concentration
- ice thickness
- ice type
- ice growth rate
- ice drift speed
- ice drift direction
- photo taken (Y/N)
- photo filename (if photo taken)
- photo relative path (if photo taken)


If any of the columns are not available for a station, we will fill in the value with NaN.

In [32]:
master_dataframe = pd.DataFrame() # create an empty dataframe to hold the data

# the data available from this rss feed is ['#STN', 'LAT', 'LON', 'YYYY', 'MM', 'DD', 'hh', 'mm', 'WDIR', 'WSPD', 'GST', 'WVHT', 'DPD', 'APD', 'MWD', 'PRES', 'PTDY', 'ATMP', 'WTMP', 'DEWP', 'VIS', 'TIDE']
# use these only for now






# Put the list above into a dictionary, with the key being the column name, and the value being the column description
column_names_dict = {
    '#STN': 'Station ID',
    'LAT': 'Latitude',
    'LON': 'Longitude',
    'YYYY': 'Year',
    'MM': 'Month',
    'DD': 'Day',
    'hh': 'Hour',
    'mm': 'Minute',
    'WDIR': 'Wind Direction',
    'WSPD': 'Wind Speed',
    'GST': 'Gust',
    'WVHT': 'Wave Height',
    'DPD': 'Dominant Wave Period',
    'APD': 'Average Wave Period',
    'MWD': 'Mean Wave Direction',
    'PRES': 'Pressure',
    'PTDY': 'Pressure Tendency',
    'ATMP': 'Air Temperature',
    'WTMP': 'Water Temperature',
    'DEWP': 'Dew Point',
    'VIS': 'Visibility',
    'TIDE': 'Tide'
}
# Now make the column names the keys in the dictionary
column_names = list(column_names_dict.keys())
# Now add those column names to the master dataframe
master_dataframe = pd.DataFrame(columns=column_names)
master_dataframe.head()

Unnamed: 0,station_id,date,time,latitude,longitude,wind_direction,wind_speed,wave_height,wave_period,wave_direction,...,secondary_wave_mean_period,ice_concentration,ice_thickness,ice_type,ice_growth_rate,ice_drift_speed,ice_drift_direction,photo_taken,photo_filename,photo_relative_path


In [38]:
last_time_fetched = time.time() # get the current time
first_run = True # set a flag to indicate that this is the first run of the loop
while True: # loop forever
    
    # if it has been ten minutes since the last time the data was fetched, fetch the data again
    if time.time() - last_time_fetched > 600 or first_run:

        
        latest_data = get_latest_data() # get the latest data from the RSS feed (updates every 10 minutes)
        # save the master dataframe to a csv file
        run_date = time.strftime("%Y%m%d_%H%M%S")
        latest_data.to_csv(f'rss_buoy_data_{run_date}.csv', index=False)
        print('Done with this run')
        time_last_fetched_rss = time.time() # get the time of the last fetch
        # wait for the ten minutes to pass
        time.sleep(600)


Done with this run
