<code>calculate_S2S_model_bias_mean_and_STD.ipynb</code>.  This notebook calculates bias (model - obs) in sea ice extent for each S2S model as a function of forecast month and region. This notebook considers both bias in mean SIE and in standard deviation of SIE. 

In [1]:
import xarray as xr
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
from S2S_sea_ice_preprocess import load_model, create_aggregate_regions, create_model_climatology
from S2S_sea_ice_preprocess import create_obs_climatology 

## Overview

<li>1) Load model netCDF files, combine with CTRL, and use common reforecast period. <br>
if NCEP, use entire period </li>
<li> 2) Add aggregate regions </li>
<li> 3) Create climatology--model: calculate date of year for valid date, lead time in weeks.<br>
<li> 4) Create observed climatology based on desired observational data set (static, using only common reforecast period) </li>
<li> 5) Calculate bias at desired lead period (0 - <code>max_lead</code>) for each region, in each model, as a function of forecast month  
    $$SIE_{bias} = \overline{SIE_{model}(m,date)} - SIE_{obs}(m,date),$$
    where the overline indicates averaging from lead days 0 - <code>max_lead</code>

### Inputs

In [2]:
model_names_ALL = ['ecmwf','ncep','ukmo','metreofr']
obs_name = 'NSIDC_0051'
COMMON_RF = True # we want to compare the reforecasts to obs over the same 15 year period
MAX_LEAD = 1 #max lead in days

Load data for all models at once

In [3]:
SIE_df_ALL = pd.DataFrame()
SIE_df_weekly_ALL = pd.DataFrame()
for model_name in model_names_ALL:
    print('loading ',model_name)
    # Load
    SIE = load_model(model_name)
    print('loaded ',model_name)
    # Create aggregate regions
    SIE = create_aggregate_regions(SIE)
    print('combined regions')
    # Take ensemble mean and get lead time in days
    SIE_ens_mean = SIE.mean(dim='ensemble')
    regions = SIE.region_names
    lead_days = SIE.fore_time.dt.days
    # Convert to dataframe, rename some columns, and get the date of the forecast by adding the fore_time to init_date
    SIE_df = SIE_ens_mean.to_dataframe().reset_index()
    SIE_df['valid date'] = SIE_df['init_time'] + SIE_df['fore_time']
    SIE_df = SIE_df.rename(columns={'region_names':'region',
                               'fore_time':'lead time (days)',
                               'init_time':'init date',
                               'Extent':'SIE'})
    SIE_df = create_model_climatology(SIE_df,7)
    SIE_df['model name'] = model_name
    
# Create climatology
    SIE_df_ALL = SIE_df_ALL.append(SIE_df)
    #SIE_df_weekly_ALL = SIE_df_weekly_ALL.append(SIE_df_weekly)

loading  ecmwf
loading files from  /home/disk/sipn/nicway/data/model/ecmwf/reforecast/sipn_nc_agg_commonland/
<xarray.Dataset>
Dimensions:       (ensemble: 10, fore_time: 46, init_time: 2080, nregions: 15)
Coordinates:
    region_names  (nregions) object dask.array<chunksize=(15,), meta=np.ndarray>
  * nregions      (nregions) int64 99 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  * fore_time     (fore_time) timedelta64[ns] 0 days 1 days ... 44 days 45 days
  * ensemble      (ensemble) int32 0 1 2 3 4 5 6 7 8 9
  * init_time     (init_time) datetime64[ns] 1998-08-06 ... 2018-08-01
Data variables:
    Extent        (ensemble, init_time, fore_time, nregions) float64 dask.array<chunksize=(10, 1, 46, 15), meta=np.ndarray>
loaded  ecmwf
combined regions


  x = np.divide(x1, x2, out)


loading  ncep
loading files from  /home/disk/sipn/nicway/data/model/ncep/reforecast/sipn_nc_agg_commonland/
<xarray.Dataset>
Dimensions:       (ensemble: 3, fore_time: 43, init_time: 4523, nregions: 15)
Coordinates:
    region_names  (nregions) object dask.array<chunksize=(15,), meta=np.ndarray>
  * nregions      (nregions) int64 99 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  * fore_time     (fore_time) timedelta64[ns] 1 days 2 days ... 42 days 43 days
  * ensemble      (ensemble) int32 0 1 2
  * init_time     (init_time) datetime64[ns] 1999-01-01 ... 2010-12-31
Data variables:
    Extent        (ensemble, init_time, fore_time, nregions) float64 dask.array<chunksize=(3, 16, 43, 15), meta=np.ndarray>
loaded  ncep
combined regions


  x = np.divide(x1, x2, out)


loading  ukmo
loading files from  /home/disk/sipn/nicway/data/model/ukmo/reforecast/sipn_nc_agg_commonland/
<xarray.Dataset>
Dimensions:       (ensemble: 6, fore_time: 60, init_time: 1008, nregions: 15)
Coordinates:
    region_names  (nregions) object dask.array<chunksize=(15,), meta=np.ndarray>
  * nregions      (nregions) int64 99 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  * fore_time     (fore_time) timedelta64[ns] 0 days 1 days ... 58 days 59 days
  * ensemble      (ensemble) int32 0 1 2 3 4 5
  * init_time     (init_time) datetime64[ns] 1995-01-01 ... 2015-12-25
Data variables:
    Extent        (ensemble, init_time, fore_time, nregions) float64 dask.array<chunksize=(6, 1, 60, 15), meta=np.ndarray>
loaded  ukmo
combined regions
loading  metreofr
loading files from  /home/disk/sipn/nicway/data/model/metreofr/reforecast/sipn_nc_agg_commonland/
<xarray.Dataset>
Dimensions:       (ensemble: 9, fore_time: 47, init_time: 834, nregions: 15)
Coordinates:
    region_names  (nregions) object dask.

Load obs

In [4]:
if obs_name == 'NSIDC_0079':
    obs_type = 'sipn_nc_yearly_agg_commonland'
else:
    obs_type = 'sipn_nc_yearly_agg'
filepath = '/home/disk/sipn/nicway/data/obs/{model_name}/{model_type}/'.format(model_name=obs_name,
                                                                              model_type=obs_type)
obs_filenames = xr.open_mfdataset(filepath+'/*.nc',combine='by_coords')
print('opening ',obs_filenames)
obs_SIE = obs_filenames.Extent
obs_regions = obs_filenames.nregions
obs_region_names = obs_filenames['region_names'].values
# Drop region names and re-add as a non-dask.array object.  This is stupid but oh well
obs_SIE = obs_SIE.drop('region_names')
obs_SIE["region_names"] = ("nregions",obs_region_names)
print('obs loaded')

opening  <xarray.Dataset>
Dimensions:       (nregions: 15, time: 11261)
Coordinates:
    region_names  (nregions) object dask.array<chunksize=(15,), meta=np.ndarray>
  * nregions      (nregions) int64 99 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  * time          (time) datetime64[ns] 1989-01-01 1989-01-02 ... 2019-10-31
Data variables:
    Extent        (time, nregions) float64 dask.array<chunksize=(365, 15), meta=np.ndarray>
obs loaded


Add aggregate regions to obs and convert obs to Pandas dataframe

In [5]:
obs_SIE = create_aggregate_regions(obs_SIE)
obs_SIE = obs_SIE.to_dataframe().reset_index()
obs_SIE = obs_SIE.rename(columns={'Extent':'SIE','region_names':'region','time':'valid date'})

Calculate our observed climatology using either the full period or the common reforecast period only

In [6]:
if COMMON_RF == True:
    obs_SIE = obs_SIE[pd.to_datetime(obs_SIE['valid date']).dt.year.isin(np.arange(1999,2015))]
    obs_SIE = create_obs_climatology(obs_SIE)
    time_str = 'COMMON_RF'
    print('common reforecast')
else:
    time_str = 'FULL_PERIOD'
    obs_SIE = create_obs_climatology(obs_SIE)
    print('full period')
print('observed climatology created')

common reforecast
observed climatology created


In [7]:
obs_SIE['model name'] = obs_name

Group by model name, region, lead time (for model output only), and the forecast valid date, and subtract the observed SIE from the model prediction of SIE.  Do it in raw bias and also in percent

In [8]:
SIE_model_gb = SIE_df_ALL.groupby(['region','valid date','model name','lead time (days)'])['SIE','SIE clim','SIE anom'].mean()
SIE_obs_gb = obs_SIE.groupby(['region','valid date'])['SIE','SIE clim','SIE anom'].mean()
SIE_err = SIE_model_gb[['SIE','SIE clim','SIE anom']] - SIE_obs_gb[['SIE','SIE clim','SIE anom']]
SIE_err_pct = SIE_err[['SIE','SIE clim','SIE anom']].divide(SIE_obs_gb[['SIE','SIE clim','SIE anom']])

  """Entry point for launching an IPython kernel.
  


Same, but get modeled and observed standard deviation ($\sigma$) of SIE.

In [9]:
SIE_model_gb_sd = SIE_df_ALL.groupby(['region','valid date month','model name','lead time (days)'])['SIE','SIE clim','SIE anom'].std()
SIE_obs_gb_sd = obs_SIE.groupby(['region','valid date month'])['SIE','SIE clim','SIE anom'].std()
SIE_err_sd = SIE_model_gb_sd[['SIE','SIE clim','SIE anom']] - SIE_obs_gb_sd[['SIE','SIE clim','SIE anom']]
SIE_err_pct_sd = SIE_err_sd[['SIE','SIE clim','SIE anom']].divide(SIE_obs_gb_sd[['SIE','SIE clim','SIE anom']])

KeyError: 'valid date month'

Multiply percent by 100

In [None]:
SIE_err_pct = SIE_err_pct*100
SIE_err[['SIE pct','SIE clim pct','SIE anom pct']] = SIE_err_pct
#
SIE_err_pct_sd = SIE_err_pct_sd*100
SIE_err_sd[['SIE pct','SIE clim pct','SIE anom pct']] = SIE_err_pct_sd

In [None]:
SIE_err_rs = SIE_err.reset_index()
SIE_err_rs['valid month'] = pd.to_datetime(SIE_err_rs['valid date']).dt.month
#
SIE_err_rs_sd = SIE_err_sd.reset_index()
SIE_err_rs_sd['valid month'] = pd.to_datetime(SIE_err_rs_sd['valid date']).dt.month

Save errors

In [None]:
fname_save = '../../data/RAW_ERRORS_all_S2S_models_OBS_{obs_name}_{time_str}.csv'.format(obs_name=obs_name,time_str=time_str)
SIE_err_rs.to_csv(fname_save)
print(fname_save)
#
fname_save_STD = '../../data/RAW_ERRORS_STD_all_S2S_models_OBS_{obs_name}_{time_str}.csv'.format(obs_name=obs_name,time_str=time_str)
SIE_err_rs_sd.to_csv(fname_save_STD)

In [None]:
import seaborn as sns
reg_sel_all = ['panArctic','Central Arctic','East Siberian-Beaufort-Chukchi Sea','Kara-Laptev Sea','Barents Sea',
               'East Greenland Sea','Bering']
SIE_err_lead_reg = SIE_err_trim[SIE_err_trim['region'].isin(reg_sel_all)]
SIE_err_lead_reg = SIE_err_lead_reg.set_index(['region'])

Plots

In [None]:
no_rows = 4
no_cols = 2
TO_PLOT = 'SIE sd'
TO_PLOT_str = 'SIE_sd'
mon_labels = ['J','F','M','A','M','J','J','A','S','O','N','D']
letters = ['a)','b)','c)','d)','e)','f)','g)']
fig1,ax = plt.subplots(no_rows,no_cols,figsize=(12,10))#,sharex=True,sharey=True)
for imod in np.arange(0,len(reg_sel_all)):
#imod=0
    region_sel = reg_sel_all[imod]
    #region_sel = 'panArctic'
    plt_test = SIE_err_lead_reg.loc[region_sel]
    piv_plt = pd.pivot_table(data=plt_test,index='model name',columns='valid date month',values=TO_PLOT,aggfunc=np.mean)
    #
    ax_sel = ax.flat[imod]
    #cbar_ax = fig.add_axes([.965,.3,.03,.4])
    if (TO_PLOT == 'SIE pct') | (TO_PLOT == 'SIE clim pct'):
        if region_sel == 'panArctic':
            [vmin,vmax] = [-15,5]
        elif region_sel == 'Barents Sea':
            [vmin,vmax] = [-40,40]
        else:
            [vmin,vmax] = [-25,25]
    elif (TO_PLOT == 'SIE') | (TO_PLOT == 'SIE clim'):
        if region_sel == 'panArctic':
            [vmin,vmax] = [-1.5,0.5]
        #elif region_sel == 'Barents Sea':
         #   [vmin,vmax] = [-0.6,0.60]
        else:
            [vmin,vmax] = [-0.5,0.5]
    
    elif (TO_PLOT == 'SIE sd'):
        if region_sel == 'panArctic':
            [vmin,vmax] = [-0.25,0.25]
        else:
            [vmin,vmax] = [-0.1,0.1]
    else:
        [vmin,vmax] = [-0.25,0.25]
    sns.heatmap(piv_plt,cmap = 'PuOr',ax=ax_sel,linewidth=0.2,linecolor='xkcd:slate grey',
                vmin=vmin,vmax=vmax,center=0)
    ax_sel.set_yticklabels(piv_plt.index,rotation=0,fontsize=14)
    ax_sel.set_xticklabels(mon_labels,fontsize=14,rotation=0)
    ax_sel.set_ylabel(None)
    ax_sel.set_xlabel(None)
    ax_sel.set_title('{lett} {region}'.format(lett=letters[imod],region=region_sel),fontsize=15)
    if (TO_PLOT == 'SIE') | (TO_PLOT == 'SIE clim'):
        ax_sel.collections[0].colorbar.set_label('10$^6$ km$^2$',rotation=0,fontsize=13,y=-0.04,labelpad=-20)
    elif (TO_PLOT == 'SIE pct') | (TO_PLOT == 'SIE clim pct'):
        ax_sel.collections[0].colorbar.set_label('%',rotation=0,fontsize=13,y=-0.04,labelpad=-20)
    #
    fig1.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=0.9, wspace=0.3, hspace=0.4)
    if (TO_PLOT == 'SIE') | (TO_PLOT == 'SIE pct'):
        fig1.suptitle('Bias in SIE, Lead Days {min_lead}-{max_lead}'.format(min_lead=min_lead,max_lead=max_lead),fontsize=18)
    elif (TO_PLOT == 'SIE clim') | (TO_PLOT == 'SIE clim pct'):
        fig1.suptitle('Bias in Climatological Sea Ice Extent, Lead Days {min_lead}-{max_lead}'.format(min_lead=min_lead,max_lead=max_lead),fontsize=18)
    elif (TO_PLOT == 'SIE sd'):
        #
        fig1.suptitle('Bias in $\sigma_{{SIE}}$, Lead Days {min_lead}-{max_lead}'.format(min_lead=min_lead,max_lead=max_lead),fontsize=18)
    elif (TO_PLOT == 'SIE anom') | (TO_PLOT == 'SIE anom pct'):
        fig1.suptitle('Bias in Anomalous Sea Ice Extent, Lead Days {min_lead}-{max_lead}'.format(min_lead=min_lead,max_lead=max_lead),fontsize=18)
    
fig1.delaxes(ax=ax.flat[7])
#fname_save = '../FIGURES/Bias_v_month_{TO_PLOT_str}_{MIN_LEAD}-{MAX_LEAD}_LEAD_DAYS_OBS_{obs_name}.pdf'.format(TO_PLOT_str=TO_PLOT_str,
#                                                                            MIN_LEAD=min_lead,MAX_LEAD=max_lead,obs_name=obs_name)
#fig1.savefig(fname_save,format='pdf',bbox_inches='tight')