# Metadata of NDBC Buoys

Most important is the GPS location of the buoys. Later, also sensor hights etc could be added here.

The goal for now is to build a Dataframe including the buoy ID as key and the buoys latitude and longitude.

## Metadata in XML
The metadata in XML is available at:
http://www.ndbc.noaa.gov/metadata/stationmetadata.xml
The supporting XML schema can be found at:
http://www.ndbc.noaa.gov/metadata/stationmetadata.xsd

This file contains the historical metadata back to 2000 for all stations on the NDBC
website. Limited metadata is available for non-NDBC stations. The file is generated once
daily at midnight U.S. Central Time (05:00 UTC during daylight saving time or 06:00 UTC
during standard time).

Note: this file is fairly new and there were some inconsistencies in our older metadata, so this
file is not 100% accurate, however, it is the best representation of the station history from our
perspective.

Source: https://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf (chapter 6)

## Usefull Tutorial:
https://www.studytonight.com/python-howtos/how-to-read-xml-file-in-python#:~:text=To%20read%20an%20XML%20file,XML%20file%20using%20getroot()%20.

In [2]:
import pandas as pd
from bs4 import BeautifulSoup

In [3]:
with open("../data/metadata_01_14_2023.xml", 'r') as f:
    data = f.read()

In [4]:
# Passing the stored data inside the beautifulsoup parser
bs_data = BeautifulSoup(data, 'xml')

In [5]:
# Finding all instances of tag
stations = bs_data.find_all('stations')[0]
# print(stations)

In [6]:
station_list = stations.find_all("station")
# station_list[0]

In [7]:
ids = []
lats = []
lons = []

for station in station_list:
    ids.append(station['id'])

    histories = station.find_all("history")

    current_lat = []
    current_lon = []
    for history in histories:
        current_lat.append(float(history['lat']))
        current_lon.append(float(history['lng']))

    average_lat = (sum(current_lat) / len(current_lat))
    average_lon = (sum(current_lon) / len(current_lon))

    #Round to value that matches ERA5 grid
    #   Reanalysis: 0.25° x 0.25° (atmosphere),
    #               0.5° x  0.5°  (ocean waves)
    #   Mean, spread and members: 0.5° x 0.5° (atmosphere),
    #                             1°   x 1°   (ocean waves)
    # For now, I do round to 0.5° but this may need to be adopted
    average_lat = round(average_lat * 2) / 2
    average_lon = round(average_lon * 2) / 2

    lats.append(average_lat)
    lons.append(average_lon)

metadata = pd.DataFrame({
    "StationID": ids,
    "lat": lats,
    "lon": lons,
})

metadata

Unnamed: 0,StationID,lat,lon
0,0Y2W3,45.0,-87.5
1,18CI3,41.5,-87.0
2,20CM4,42.0,-86.5
3,21346,40.5,146.0
4,21347,39.5,146.0
...,...,...,...
1406,YGNN6,43.5,-79.0
1407,YKRV2,37.0,-76.5
1408,YKTV2,37.0,-76.5
1409,YRSV2,37.5,-76.5


In [8]:
metadata.to_csv("../data/my_metadata.csv", index=False)