# Convert EmodNet tide gauge data to validator format

Here we convert EmodNet tide gauge data to a "validator" dataset for model validation.

We first have to define some paths.

In [3]:
# set working directory
setwd("/silos/notebook_collection/obsdata_gaugedata/convert_emodnet_validator/")

# set path to EmodNet data
datadir = "/data/sealevel/emodnet_hourly/"

# read list of stations
station_list_file = "/silos/notebook_collection/obsdata_gaugedata/find_stations/gauge_stations.csv"

# set output directory
outputdir = "./data"

Create the output directory

In [2]:
if (!dir.exists(outputdir)) { dir.create(outputdir) }

Read in the station list

In [6]:
station_list = read.csv(station_list_file, sep=";", stringsAsFactors = FALSE)
head(station_list)
max(station_list$rank)

name,lat,lon,rank,min_year,max_year,timesteps
Kungsholmsfort,56.1053,15.5894,1,1901,2021,1173509
Ratan,63.9861,20.895,1,1901,2021,1135665
Stockholm,59.3242,18.0817,1,1901,2021,1160374
Smogen,58.3536,11.2178,1,1910,2021,969861
Furuogrund,64.9158,21.2306,1,1916,2021,920860
Klagshamn,55.5222,12.8936,1,1929,2021,779952


The first task is to write the station list. We need to define colors for the different ranks. A low rank means a long record, we put the color to more reddish then.

In [9]:
color_for_rank = c("0.7;0.0;0.0", "1.0;0.3;0.0", "0.7;0.7;0.0", "0.0;0.7;0.0", "0.0;0.7;0.7", "0.0;0.0;1.0")

We will define a headerline and then merge the correct columns so the data fit the headerline. Then we can write the output to "stations.csv".

In [15]:
headerline = "latitude;longitude;stationname;depth;red;green;blue"
data_vector = paste0(station_list$lat,";",
                     station_list$lon,";",
                     station_list$name,";",
                     "1.0;",
                     color_for_rank[station_list$rank])
data_vector[1:5]
write.table(headerline,"data/stations.csv",col.names=FALSE,row.names=FALSE,quote=FALSE)
write.table(data_vector,"data/stations.csv",append = TRUE,col.names=FALSE,row.names=FALSE,quote=FALSE)

We need another file "timerange.txt" which states the start and end date of the dataset.

In [19]:
write(paste0(min(station_list$min_year),"-01-01"),file="data/timerange.txt")
write(paste0(max(station_list$max_year),"-01-01"),file="data/timerange.txt",append=TRUE)


The "variables.csv" file is easy:

In [20]:
write("varname;unit;longname",file="data/variables.csv")
write("sl;m;sealevel",file="data/variables.csv",append=TRUE)

Now we write the actual data to ascii files. We do a loop over the stations.

In [24]:
library("RNetCDF")
if (!dir.exists("./data/stationdata")) {dir.create("./data/stationdata")}
for (i in seq_len(nrow(station_list))) {
    nc_filename = paste0(datadir,"EP_ERD_INT_SLEV_AL_TS_NRT_",station_list$name[i],".nc")
    nc = open.nc(nc_filename)
    time = var.get.nc(nc,"time")
    sealevel = var.get.nc(nc,"SLEV")
    close.nc(nc)
    
    date = as.POSIXct("1970-01-01")+time
    datetime = as.numeric(difftime(as.POSIXct(date),as.POSIXct("1899-12-30"),units = "days"))
    
    mydf = data.frame(depth=0.0, datetime=datetime, value=sealevel)
    mydf = mydf[is.finite(mydf$datetime) & is.finite(mydf$value),]
    write.table(mydf,paste0("./data/stationdata/sl_",tolower(station_list$name[i]),".csv"),quote=FALSE,sep=";")
}