# Surface temperature validation using OSTIA sea surface temperature data

Temperature was validated using the OSTIA sea surface temperature dataset. The validation was performed by comparing the modelled temperature with the OSTIA data for the same time and location. The OSTIA data was downloaded from the Copernicus Marine Environment Monitoring Service () catalogue. A description of the dataset is available [here](https://data.marine.copernicus.eu/product/SST_GLO_SST_L4_REP_OBSERVATIONS_010_011/description).

In [None]:
chunk_start
variable = "sst"
Variable = "SST"

In [None]:
ds_model = nc.open_data("../../matched/gridded/ostia/ostia_model.nc")
ds_model.set_precision("F32")
mask_all(ds_model)
years = set(ds_model.years)
year_min = min(years)
year_max = max(years)
year_range = f"{year_min}-{year_max}"
ds_model.subset(variable = "model")
ds_model.tmean("month")
ds_model.as_missing(0)
ds_model.run()
ds_annual = ds_model.copy()
ds_annual.tmean()
ds_annual.set_longnames({"model": "Sea surface temperature"})

In [None]:
ds_obs = nc.open_data("../../matched/gridded/ostia/ostia_model.nc")
ds_obs.set_precision("F32")
ds_obs.subset(variable = "observation")
ds_obs.run()
ds_obs.tmean("month")
ds_obs.as_missing(0)
ds_obs.run()

obs_mask = ds_obs.copy()
obs_mask > -1e20
mod_mask = ds_model.copy()
mod_mask > -1e20
mod_mask * obs_mask
mod_mask.run()
ds_model * mod_mask
ds_obs * mod_mask

In [None]:
chunk_clim

In [None]:
chunk_bias

## Can the model reproduce seasonality of temperature?

The ability of the model to reproduce seasonality of SST is assessed by comparing the modelled and observed seasonal cycle of SST. The seasonal cycle is calculated by averaging the monthly values of SST over all available model years. The seasonal cycle is calculated for each grid cell. The modelled seasonal cycle is compared to the observed seasonal cycle of SST. The observed seasonal cycle is calculated by averaging the observed monthly values of SST over all available years. The seasonal cycle is calculated for each grid cell. The modelled seasonal cycle is compared to the observed seasonal cycle using the correlation coefficient between the two. The correlation coefficient is calculated for each grid cell. The correlation coefficient ranges from -1 to 1. A value of 1 indicates a perfect agreement between the modelled and observed seasonal cycle of SST. A value of -1 indicates a perfect disagreement between the modelled and observed seasonal cycle of SST. A value of 0 indicates no agreement between the modelled and observed seasonal cycle of SST. 


In [None]:
chunk_seasonal

## Regional assessment of model performance for sea surface temperature

We assessed the regional performance of the model by comparing the model with observations from the following regions: Southern North Sea, Central North Sea, Northern North Sea, Channel, Skagerrak, Norwegian Trench, Shetland, Irish Shelf, Irish Sea, Celtic Sea, Armorican, Northern North East Atlantic, Southern North East Atlantic, Shelf, Ocean, Rosa, Locate Shelf, Deep Ocean.



In [None]:
if regional:
    df_mapped = (
        ds_regions
        .to_dataframe()
        .reset_index()
        .melt(id_vars = ["lon", "lat"])
        .dropna()
        .merge(regions_contents.loc[:,["variable", "long_name"]])
        .drop(columns = [ "value"])
    )
    bad = ["Rosa", "Locate Shelf"]
    df_mapped = df_mapped.query("long_name not in @bad")
    xlim = np.array([df_mapped.lon.min(), df_mapped.lon.max()])
    ylim = np.array([df_mapped.lat.min(), df_mapped.lat.max()])
    shape = gpd.read_file(f"{data_dir}/mapping/TM_WORLD_BORDERS-0.3.shp")
    
    def fix_name(x):
        x = x.replace("North East", "NE")
        x = x.replace("North ", "N ")
        if x == "Channel":
            x = "English Channel"
        return x
    
    fix_name = np.vectorize(fix_name)
    
    
    df_mapped.long_name = fix_name(df_mapped.long_name)
    
    
    gg = (
        ggplot( df_mapped)+
         geom_tile(aes(x  = "lon",y =   "lat"))+
        geom_map(shape, aes("LON", "LAT"), fill = "grey", colour = "grey")+
        coord_cartesian(xlim = xlim, ylim = ylim)+
        scale_x_continuous(breaks = [-20, -10, 0, 10], labels = ["20°W", "10°W", "0°", "10°E"])+
        scale_y_continuous(breaks = [40, 50, 60], labels = ["40°N", "50°N", "60°N"])+
        theme_bw(base_size = 10)+
        facet_wrap("~long_name")+
        
        theme(axis_title_x=element_blank(),
                axis_title_y=element_blank())
    )
    gg = gg.draw()
    gg
    

In [None]:
if regional:
    md(f"**Figure {i_figure}**: Regions used for validation.")
    i_figure += 1

Time series were constructed comparing the monthly mean of the spatial average SST in each region. The spatial average was calculated using the mean of all grid cells within each region. The spatial average was calculated for each month. The time series were constructed using the monthly mean of the spatial average SST in each region. The correlation coefficient was calculated for each region. The correlation coefficient ranges from -1 to 1. A value of 1 indicates a perfect agreement between the modelled and observed time series of SST. A value of -1 indicates a perfect disagreement between the modelled and observed time series of SST. A value of 0 indicates no agreement between the modelled and observed time series of SST. 


In [None]:
if regional:
    df_all = []
    for vv in ds_regions.variables:
        ds_rr = ds_regions.copy()
        ds_rr.subset(variable = vv)
        ds_rr.run()
        ds_vv = ds_ts.copy()
        ds_vv * ds_rr
        ds_region = ds_vv.copy()
        ds_vv.spatial_mean()
        region = list(regions_contents.query("variable == @vv").long_name)[0]
        time_name = [x for x in list(ds_vv.to_xarray().coords) if "time" in x][0]
        df_vv = (
            ds_vv
            .to_dataframe()
            .reset_index()
            .rename(columns = {time_name: "time"})
            .melt("time")
            .assign(month = lambda x: x.time.dt.month)
            .assign(region = vv)
        )
        df_all.append(df_vv)
        ds_region.tmean()
        df_region = (
            ds_region
            .to_dataframe()
            .dropna()
            .reset_index()
            .loc[:,["model", "observation"]]
            .drop_duplicates()
        )
    
        del ds_rr, ds_vv, ds_region
    df_all = pd.concat(df_all).dropna()
        
    df_all = (
        df_all
        .merge(df_mapped.loc[:,["long_name", "variable"]].drop_duplicates().rename(columns = {"variable": "region"}))
    )
    df_all = df_all.query("variable in ['model', 'observation']")
    df_all["value"] = [float(x) for x in df_all["value"]]

In [None]:
if regional:
    ylab = "Spatial average " + variable + " ("+ nc.static_plot.fix_label(ds_ts.contents.unit[0]) + ")"
    gg = (
        ggplot(df_all)+
        geom_line(aes("month", "value", colour = "variable"))+
        facet_wrap("long_name")+
        labs(y = ylab )+
        labs(x = "Month")+
        theme(legend_position = "top")+
        scale_color_manual(values = ["red", "blue"])+
        theme_bw(base_size = 10)+
        labs(colour = "")+
        scale_x_continuous(breaks = [1,4, 7, 10], labels = ["Jan", "Apr", "Jul", "Oct"]) +
        theme(legend_position = "top") 
        
    )
    
    gg = gg.draw()
    gg


In [None]:
if regional:
    md(f"**Figure {i_figure}**: Seasonal cycle of {variable} for model and observations for each region. The spatial average is taken over the region.") 
    i_figure += 1

In [None]:
if regional:
    ds_ts = nc.open_data("../../matched/gridded/ostia/ostia_model.nc")
    mask_all(ds_ts)
    ds_ts.tmean("year")
    ds_ts.run()

In [None]:
if regional:
    df_all = []
    for vv in ds_regions.variables:
        ds_rr = ds_regions.copy()
        ds_rr.subset(variable = vv)
        ds_rr.run()
        ds_vv = ds_ts.copy()
        ds_vv * ds_rr
        ds_region = ds_vv.copy()
        ds_vv.spatial_mean()
        region = list(regions_contents.query("variable == @vv").long_name)[0]
        time_name = [x for x in list(ds_vv.to_xarray().coords) if "time" in x][0]
        df_vv = (
            ds_vv
            .to_dataframe()
            .reset_index()
            .rename(columns = {time_name: "time"})
            .loc[:,["time", "model", "observation"]]
            .melt("time")
            .assign(year = lambda x: x.time.dt.year)
            .assign(region = vv)
        )
        df_all.append(df_vv)
        ds_region.tmean()
        df_region = (
            ds_region
            .to_dataframe()
            .dropna()
            .reset_index()
            .loc[:,["model", "observation"]]
            .drop_duplicates()
        )
    
        del ds_rr, ds_vv, ds_region
    df_all = pd.concat(df_all).dropna()
        
    df_all = (
        df_all
        .merge(df_mapped.loc[:,["long_name", "variable"]].drop_duplicates().rename(columns = {"variable": "region"}))
    )

In [None]:
if regional:
    ylab = "Spatial average " + variable + " ("+ nc.static_plot.fix_label(ds_ts.contents.unit[0]) + ")"
    
    gg = (
        ggplot(df_all)+
        geom_line(aes("year", "value", colour = "variable"))+
        facet_wrap("long_name")+
        labs(y = ylab )+
        labs(x = "Year")+
        theme(legend_position = "top")+
        scale_color_manual(values = ["red", "blue"])+
        theme_bw(base_size = 10)+
        labs(colour = "")+
        theme(legend_position = "top") 
        
    )
    
    gg = gg.draw()
    gg


In [None]:
if regional:
    md(f"**Figure {i_figure}**: Changes in {variable} for model and observations for each region for the period {year_range}. The spatial average is taken over the region.") 
    i_figure += 1

In [None]:
chunk_results

In [None]:
chunk_end