## Reading of ASCII files created for cam diagnostics tool

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2
import pandas as pd
from glob import glob
import os
import helper_funcs as helpers
import ipywidgets as ipw

### 1. Paths and global settings (GLOB)  

Please change accordingly if you execute this notebook on your local machine.

#### 1.1. Paths (PATHS)

Here you can specify your paths.

In [2]:
#folder with ascii files
data_dir = "./data/michael_ascii_read/"
file_type = "webarchive"

# file containing additional information about variables (long names, can be interactively updated below)
varinfo_csv = "./data/var_info.csv"

# Config file for different groups
vargroups_cfg = "./data/varconfig.ini"

#directy to store results
output_dir = "./output/"

#### Global settings (SETUP)

In the following cells you can specify global default settings.

##### Define group of variables that you are interested in

Default group of variables. Variable groups can be defined in [varconfig.ini](https://github.com/jgliss/my_notebooks/blob/master/data/varconfig.ini). Use ``[group_name]`` to define a new group and add below all variables that should belong to the group in the desired display order (should be self-explanatory when looking at the file, I hope).

In [3]:
var_group = None #group_name from varconfig.ini (use None, if you want to use all)

##### Add data columns to index

Use the following list to specify table columns that should be added to the multiindex (Ada, here is where you can add "Obs").

In [4]:
add_to_index = None #["Obs"]

##### Define which parts of index should be unstacked

The following list can be used to specify how the final lists are displayed. The items in the list need to be names of sub-indices in the the Multiindex of the originally loaded file (i.e. "Run", "Years", "Variable", "Description") or data columns that were added to index (previous option). 

All values specified here will be unstacked, i.e. put from the original row into a column index representation (makes table view wider).

In [5]:
unstack_indices = ["Run", "Years"]

##### Shortcuts for Run IDs

Define list of shortnames for model runs or define a prefix. If undefined (i.e. empty list and ``None``), the original names are used.

In [6]:
#either
run_ids = list("ABC")
#or
run_id_prefix = "Run" #None

### 2. Importing and editing supplementary information

Let's begin with reading the variable information from the excel table. Note that this is not strictly required but helps us below to display the results in a more intuitive manner, when analysing the data.

Note that the following method makes sure the CSV file exists, i.e. if it has not been created before, the information is loaded from Michaels Excel table and then saved at ``varinfo_csv``.

In [7]:
var_info_dict = helpers.load_varinfo(varinfo_csv)

The following cell opens an interactive widget that can be used to edit the information available for each variable (stored in file ``varinfo_csv``, see previous cell).

In [8]:
from my_widgets import EditDictCSV

edit_config = EditDictCSV(varinfo_csv)
#show
edit_config()

HBox(children=(VBox(children=(HBox(children=(Label(value='RESTOM'), Text(value='TOmodel net flux', placeholder…

Now update to the current selection (run everything below if you change the previous cell).

In [9]:
var_info_dict = edit_config.var_dict

### 3. Search and load ASCII files, either using .asc or .webarchive file type (LOAD_FILE)

The following cell finds all files in folder ``data_dir``.

In [10]:
files = sorted(glob(data_dir + "*.{}".format(file_type)))
for file in files:
    print(file)    

./data/michael_ascii_read/N1850C53CLM45L32_f09_tn11_191017 (yrs 71-100).webarchive
./data/michael_ascii_read/N1850_f09_tn14_230218 (yrs 1-20).webarchive
./data/michael_ascii_read/N1850_f19_tn14_r227_ctrl (yrs 185-215).webarchive
./data/michael_ascii_read/N1850_f19_tn14_r227_ctrl (yrs 310-340).webarchive
./data/michael_ascii_read/N1850_f19_tn14_r227_ctrl (yrs 80-110).webarchive
./data/michael_ascii_read/N1850_f19_tn14_r265_ctrl_20180411 (yrs 90-120).webarchive


### 4. Importing multiple result files and concatenating them into one Dataframe (LOAD_FILES)

In the following, we load all files into one `Dataframe`. 

To do this, a custom method `read_and_merge_all` was defined in [helper_funcs.py](https://github.com/jgliss/my_py3_scripts/blob/master/notebooks/helper_funcs.py). The method basically loops over all files and calls the method ``read_file_custom``, which you can also find in [helper_funcs.py](https://github.com/jgliss/my_py3_scripts/blob/master/notebooks/helper_funcs.py). 

In [11]:
merged = helpers.read_and_merge_all(file_list=files, var_info_dict=var_info_dict, replace_runid_prefix=run_id_prefix)
if add_to_index:
    for item in add_to_index:
        merged = merged.set_index([merged.index, item])
merged

  df.test_case = pd.Series(mapping)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Flag,Model,Obs,Bias,RMSE
Run,Years,Variable,Description,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Run1,71-100,RESTOM,TOmodel net flux,True,-0.489,0.000,-0.489,
Run1,71-100,RESSURF,SRF net flux,True,-0.489,0.000,-0.489,
Run1,71-100,RESTOA_CERES-EBAF,TOA net flux,True,1.529,0.992,0.537,8.842
Run1,71-100,RESTOA_ERBE,,False,1.529,0.059,1.470,8.992
Run1,71-100,SOLIN_CERES-EBAF,,False,340.206,340.054,0.152,0.167
Run1,71-100,SOLIN_CERES,,False,340.206,341.479,-1.273,1.226
Run1,71-100,CLDTOT_ISCCP,Total cloud cover,True,63.621,66.800,-3.179,11.323
Run1,71-100,CLDTOT_CLOUDSAT,,False,63.621,66.824,-3.203,9.731
Run1,71-100,FLDS_ISCCP,LW down SRF,True,338.280,343.347,-5.066,14.450
Run1,71-100,FLNS_ISCCP,LW net SRF,True,55.819,49.425,6.394,11.967


### 5. Rearranging and restructuring of the imported data (REARRANGE)

In the following cell, you can interacively select which Variables you wish to keep for further analysis. Preselected are the variables that are flagged.

#### 5.1 Interactive selection of variables (IA_VAR)

In [12]:
from my_widgets import SelectVariable    
selector = SelectVariable(df=merged, level="Variable", preconfig_file=vargroups_cfg,
                         default_group=var_group)
#show
selector()

VBox(children=(HBox(children=(VBox(children=(Label(value='Predefined'), Dropdown(options=OrderedDict([('flagge…

Now access the current selection and continue.

In [13]:
selection = selector.df_edit
selection

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Flag,Model,Obs,Bias,RMSE
Run,Years,Variable,Description,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Run1,71-100,RESTOM,TOmodel net flux,True,-0.489,0.000,-0.489,
Run1,71-100,RESSURF,SRF net flux,True,-0.489,0.000,-0.489,
Run1,71-100,RESTOA_CERES-EBAF,TOA net flux,True,1.529,0.992,0.537,8.842
Run1,71-100,CLDTOT_ISCCP,Total cloud cover,True,63.621,66.800,-3.179,11.323
Run1,71-100,FLDS_ISCCP,LW down SRF,True,338.280,343.347,-5.066,14.450
Run1,71-100,FLNS_ISCCP,LW net SRF,True,55.819,49.425,6.394,11.967
Run1,71-100,FLUT_CERES-EBAF,LW up Top,True,238.148,239.574,-1.426,6.855
Run1,71-100,FLUTC_CERES-EBAF,LW up Top Clearsky,True,261.783,266.051,-4.268,6.042
Run1,71-100,FSDS_ISCCP,SW down SRF,True,187.801,189.390,-1.589,13.380
Run1,71-100,FSNS_ISCCP,SW net SRF,True,163.679,165.893,-2.214,12.711


#### 5.2 Interactive index renaming tool (IA_RENAME)

In the following, an interactive widget is defined, that allows for renaming of the runs.

In [14]:
from my_widgets import IndexRenamer

renamer = IndexRenamer(selection, suggestions=run_ids)
renamer()

VBox(children=(HBox(children=(VBox(children=(HBox(children=(Label(value='Run1', layout=Layout(width='300px')),…

Now, update the current dataframe for further usage.

In [15]:
selection = renamer.df_edit
selection

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Flag,Model,Obs,Bias,RMSE
Run,Years,Variable,Description,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
A,71-100,RESTOM,TOmodel net flux,True,-0.489,0.000,-0.489,
A,71-100,RESSURF,SRF net flux,True,-0.489,0.000,-0.489,
A,71-100,RESTOA_CERES-EBAF,TOA net flux,True,1.529,0.992,0.537,8.842
A,71-100,CLDTOT_ISCCP,Total cloud cover,True,63.621,66.800,-3.179,11.323
A,71-100,FLDS_ISCCP,LW down SRF,True,338.280,343.347,-5.066,14.450
A,71-100,FLNS_ISCCP,LW net SRF,True,55.819,49.425,6.394,11.967
A,71-100,FLUT_CERES-EBAF,LW up Top,True,238.148,239.574,-1.426,6.855
A,71-100,FLUTC_CERES-EBAF,LW up Top Clearsky,True,261.783,266.051,-4.268,6.042
A,71-100,FSDS_ISCCP,SW down SRF,True,187.801,189.390,-1.589,13.380
A,71-100,FSNS_ISCCP,SW net SRF,True,163.679,165.893,-2.214,12.711


#### 5.3 Reshaping of table (make it wider for readibility) (RESHAPE)
 
For visualisation this display requires a lot of scrolling. We can make the table `wider` by unstacking certain indices, e.g. the two outermost indices `Run` and `Years`.

In [16]:
selection_unstacked = selection.unstack(unstack_indices)
selection_unstacked

Unnamed: 0_level_0,Unnamed: 1_level_0,Flag,Flag,Flag,Flag,Flag,Flag,Model,Model,Model,Model,...,Bias,Bias,Bias,Bias,RMSE,RMSE,RMSE,RMSE,RMSE,RMSE
Unnamed: 0_level_1,Run,A,B,C,C,C,Run4,A,B,C,C,...,C,C,C,Run4,A,B,C,C,C,Run4
Unnamed: 0_level_2,Years,71-100,1-20,185-215,310-340,80-110,90-120,71-100,1-20,185-215,310-340,...,185-215,310-340,80-110,90-120,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
CLDTOT_ISCCP,Total cloud cover,True,True,True,True,True,True,63.621,68.586,68.543,68.234,...,1.744,1.435,2.157,3.947,11.323,11.881,12.992,13.078,12.869,12.485
FLDS_ISCCP,LW down SRF,True,True,True,True,True,True,338.28,341.547,353.861,354.846,...,10.514,11.499,5.162,4.507,14.45,15.351,16.891,17.664,16.72,15.278
FLNS_ISCCP,LW net SRF,True,True,True,True,True,True,55.819,59.124,56.249,56.272,...,6.824,6.847,7.167,6.925,11.967,14.953,14.098,14.174,14.516,13.988
FLUTC_CERES-EBAF,LW up Top Clearsky,True,True,True,True,True,True,261.783,265.065,267.09,267.477,...,1.039,1.426,-0.786,-1.182,6.042,4.778,4.662,4.738,5.67,4.778
FLUT_CERES-EBAF,LW up Top,True,True,True,True,True,True,238.148,241.278,241.972,242.502,...,2.398,2.928,0.66,-0.832,6.855,6.169,7.188,7.467,7.499,6.598
FSDS_ISCCP,SW down SRF,True,True,True,True,True,True,187.801,192.59,190.458,190.618,...,1.068,1.228,1.216,-1.721,13.38,15.089,15.915,16.082,16.048,15.421
FSNS_ISCCP,SW net SRF,True,True,True,True,True,True,163.679,168.522,166.962,167.258,...,1.07,1.365,-0.017,-2.145,12.711,12.632,13.587,13.727,13.705,13.068
FSNTOAC_CERES,SW net TOA clearsky,True,True,True,True,True,True,287.999,289.651,290.33,290.519,...,-4.373,-4.183,-5.747,-4.988,18.458,17.432,15.977,15.71,18.506,17.609
FSNTOA_CERES,SW net TOA,True,True,True,True,True,True,239.677,244.353,244.525,244.914,...,-0.167,0.223,-1.724,-3.937,12.307,10.795,12.096,12.125,12.314,12.711
LHFLX_JRA25,Lat Heat Flux,True,True,True,True,True,True,87.904,85.432,87.926,88.369,...,-0.009,0.434,-1.695,-2.692,17.176,14.578,14.947,15.116,15.587,15.153


Well, this is better but also not extremely illustrative / intuitive. It becomes more intuitive if we just look at one parameter that we are interested in (e.g. RMSE). 

#### 5.4 Extracting the Bias of each model run relative to the observations (GET_BIAS)

Retrieving a table that illustrates the Bias of each run for each flagged variable is straight forward. We just extract the `Bias` column from our flagged frame:

In [17]:
bias = selection_unstacked["Bias"]
bias.head()

Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,-3.179,1.786,1.744,1.435,2.157,3.947
FLDS_ISCCP,LW down SRF,-5.066,-1.799,10.514,11.499,5.162,4.507
FLNS_ISCCP,LW net SRF,6.394,9.699,6.824,6.847,7.167,6.925
FLUTC_CERES-EBAF,LW up Top Clearsky,-4.268,-0.986,1.039,1.426,-0.786,-1.182
FLUT_CERES-EBAF,LW up Top,-1.426,1.704,2.398,2.928,0.66,-0.832


#### 5.5 Computing RMSE relative error (GET_RMSE_REL)

In the following we extract the subset containing the *RSME* information of the flagged variables for all runs in order to compute the relative error for each run based on the average *RMSE* of all runs:

$$\frac{RMSE_{Run}\,-\,\overline{RMSE_{All\,Runs}}}{\overline{RMSE_{All\,Runs}}}$$


In [18]:
rmse = selection_unstacked["RMSE"]
rmse

Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,11.323,11.881,12.992,13.078,12.869,12.485
FLDS_ISCCP,LW down SRF,14.45,15.351,16.891,17.664,16.72,15.278
FLNS_ISCCP,LW net SRF,11.967,14.953,14.098,14.174,14.516,13.988
FLUTC_CERES-EBAF,LW up Top Clearsky,6.042,4.778,4.662,4.738,5.67,4.778
FLUT_CERES-EBAF,LW up Top,6.855,6.169,7.188,7.467,7.499,6.598
FSDS_ISCCP,SW down SRF,13.38,15.089,15.915,16.082,16.048,15.421
FSNS_ISCCP,SW net SRF,12.711,12.632,13.587,13.727,13.705,13.068
FSNTOAC_CERES,SW net TOA clearsky,18.458,17.432,15.977,15.71,18.506,17.609
FSNTOA_CERES,SW net TOA,12.307,10.795,12.096,12.125,12.314,12.711
LHFLX_JRA25,Lat Heat Flux,17.176,14.578,14.947,15.116,15.587,15.153


##### Side comment: Series vs. unstacked Multiindex Dataframes

As you can see in the previous output, we have extracted ***ONE*** variable from the ***UNSTACKED*** dataframe. Now, this is still a pandas ``Dataframe`` since it is *tabular* data. 

In [19]:
print("Extracted table is Dataframe since it is a wide table: {}".format(isinstance(rmse, pd.DataFrame)))

Extracted table is Dataframe since it is a wide table: True


In [20]:
rmse_mean = rmse.mean(axis=1, skipna=True)
#Note that the created object is a Series and not a Dataframe
rmse_mean.head()

Variable          Description       
CLDTOT_ISCCP      Total cloud cover     12.438000
FLDS_ISCCP        LW down SRF           16.059000
FLNS_ISCCP        LW net SRF            13.949333
FLUTC_CERES-EBAF  LW up Top Clearsky     5.111333
FLUT_CERES-EBAF   LW up Top              6.962667
dtype: float64

The next step is (semi) straight forward (we have to use the `div` and `subtract` methods of the Dataframe rather than `/` and `-` operators in order to specify that we want to apply them in the horizontal and not in the vertical direction.

In [21]:
rmse_err_rel = rmse.subtract(rmse_mean, axis=0).div(rmse_mean, axis=0)
rmse_err_rel

Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,-0.089645,-0.044782,0.044541,0.051455,0.034652,0.003779
FLDS_ISCCP,LW down SRF,-0.100193,-0.044087,0.051809,0.099944,0.041161,-0.048633
FLNS_ISCCP,LW net SRF,-0.14211,0.071951,0.010658,0.016106,0.040623,0.002772
FLUTC_CERES-EBAF,LW up Top Clearsky,0.182079,-0.065215,-0.087909,-0.07304,0.1093,-0.065215
FLUT_CERES-EBAF,LW up Top,-0.015463,-0.113989,0.032363,0.072434,0.07703,-0.052375
FSDS_ISCCP,SW down SRF,-0.126774,-0.015239,0.038669,0.049568,0.047349,0.006428
FSNS_ISCCP,SW net SRF,-0.039834,-0.045801,0.026338,0.036913,0.035251,-0.012867
FSNTOAC_CERES,SW net TOA clearsky,0.068048,0.00868,-0.075512,-0.090962,0.070825,0.018921
FSNTOA_CERES,SW net TOA,0.02065,-0.104744,0.003151,0.005556,0.021231,0.054155
LHFLX_JRA25,Lat Heat Flux,0.113433,-0.054982,-0.031062,-0.020107,0.010426,-0.017708


### 5.6 Inserting column of RMSE relative error into original table (INSERT_RMSE_REL_ORIG)

If we want, we can now add the typical RMSE to our original dataframe (containing the only flagged data, since it was computed from this). 

**Note: this is just illustrative and not used in the following section**

First we have to stack it:

In [22]:
stacked = rmse_err_rel.stack(level=(0,1)).reorder_levels(order=(2,3,0,1))
stacked.head()

Run  Years    Variable      Description      
A    71-100   CLDTOT_ISCCP  Total cloud cover   -0.089645
B    1-20     CLDTOT_ISCCP  Total cloud cover   -0.044782
C    185-215  CLDTOT_ISCCP  Total cloud cover    0.044541
     310-340  CLDTOT_ISCCP  Total cloud cover    0.051455
     80-110   CLDTOT_ISCCP  Total cloud cover    0.034652
dtype: float64

In [23]:
selection["RMSE_ERR"] = stacked
selection

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Flag,Model,Obs,Bias,RMSE,RMSE_ERR
Run,Years,Variable,Description,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A,71-100,RESTOM,TOmodel net flux,True,-0.489,0.000,-0.489,,
A,71-100,RESSURF,SRF net flux,True,-0.489,0.000,-0.489,,
A,71-100,RESTOA_CERES-EBAF,TOA net flux,True,1.529,0.992,0.537,8.842,-0.014764
A,71-100,CLDTOT_ISCCP,Total cloud cover,True,63.621,66.800,-3.179,11.323,-0.089645
A,71-100,FLDS_ISCCP,LW down SRF,True,338.280,343.347,-5.066,14.450,-0.100193
A,71-100,FLNS_ISCCP,LW net SRF,True,55.819,49.425,6.394,11.967,-0.142110
A,71-100,FLUT_CERES-EBAF,LW up Top,True,238.148,239.574,-1.426,6.855,-0.015463
A,71-100,FLUTC_CERES-EBAF,LW up Top Clearsky,True,261.783,266.051,-4.268,6.042,0.182079
A,71-100,FSDS_ISCCP,SW down SRF,True,187.801,189.390,-1.589,13.380,-0.126774
A,71-100,FSNS_ISCCP,SW net SRF,True,163.679,165.893,-2.214,12.711,-0.039834


### 6. Conditional formatting of tables (Dataframes) (VISUALISE)

This section illustrates, how we can perform conditional formatting of the color tables. As discussed above, we can apply background colour gradients to the data. In the example above we had a multiindex data type specifying model run, year-range and variable in stacked format (long table) and the four data columns specifying results from model and observation as well as bias and RMSE. 

Now, in the following we illustrate how we can apply this colour highlighting for the two unstacked tables that we just created and that contain Bias and relative error. 

Starting with the Bias data, we show an example that does not work for our purposes (since it only allows for conditional formatting of either rows or columns.

#### 6.1 NOT how we want it (using the style method `background_gradient`) (VIS_WRONG)

The most straight forward example for conditional formatting of a Dataframe is shown in the following. In the example we use the `Bias` table and, similar to the example above, apply a value based colormap. Here, we use a *diverging colormap (bwr)* which has white as center color. Like in the example above, we use the style method `background_gradient` which can perform the formatting either in a **rowwise** or **columnwise** manner (using input argument `axis=1` or `axis=0`, respectively). 

Note, however, that this is not what we are aiming for in this example, rather, we want the colour formatting to be applied based on the values available the **whole table** and not individually for **columns** or **rows** (which is done in the next section). Nonetheless, in the cell below we show what we get if we use the method `backgroun_gradient`.  

Again, we use the `low` and `high` parameters to specify the colorrange that we use to map the values (see above).

In [24]:
bias.style.background_gradient(cmap="bwr", low=0.5, high=0.5, axis=1).highlight_null("white")

Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,-3.179,1.786,1.744,1.435,2.157,3.947
FLDS_ISCCP,LW down SRF,-5.066,-1.799,10.514,11.499,5.162,4.507
FLNS_ISCCP,LW net SRF,6.394,9.699,6.824,6.847,7.167,6.925
FLUTC_CERES-EBAF,LW up Top Clearsky,-4.268,-0.986,1.039,1.426,-0.786,-1.182
FLUT_CERES-EBAF,LW up Top,-1.426,1.704,2.398,2.928,0.66,-0.832
FSDS_ISCCP,SW down SRF,-1.589,3.2,1.068,1.228,1.216,-1.721
FSNS_ISCCP,SW net SRF,-2.214,2.63,1.07,1.365,-0.017,-2.145
FSNTOAC_CERES,SW net TOA clearsky,-6.703,-5.051,-4.373,-4.183,-5.747,-4.988
FSNTOA_CERES,SW net TOA,-5.015,-0.338,-0.167,0.223,-1.724,-3.937
LHFLX_JRA25,Lat Heat Flux,-0.031,-2.503,-0.009,0.434,-1.695,-2.692


In [25]:
bias

Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,-3.179,1.786,1.744,1.435,2.157,3.947
FLDS_ISCCP,LW down SRF,-5.066,-1.799,10.514,11.499,5.162,4.507
FLNS_ISCCP,LW net SRF,6.394,9.699,6.824,6.847,7.167,6.925
FLUTC_CERES-EBAF,LW up Top Clearsky,-4.268,-0.986,1.039,1.426,-0.786,-1.182
FLUT_CERES-EBAF,LW up Top,-1.426,1.704,2.398,2.928,0.66,-0.832
FSDS_ISCCP,SW down SRF,-1.589,3.2,1.068,1.228,1.216,-1.721
FSNS_ISCCP,SW net SRF,-2.214,2.63,1.07,1.365,-0.017,-2.145
FSNTOAC_CERES,SW net TOA clearsky,-6.703,-5.051,-4.373,-4.183,-5.747,-4.988
FSNTOA_CERES,SW net TOA,-5.015,-0.338,-0.167,0.223,-1.724,-3.937
LHFLX_JRA25,Lat Heat Flux,-0.031,-2.503,-0.009,0.434,-1.695,-2.692


Now, this worked nicely but there are mainly two problems with this representation:

1. As mentioned above, one problem here is that the colour coding can only be performed row or column wise using the input parameter `axis` (and not based on the values of the whole table, see [here](https://pandas.pydata.org/pandas-docs/stable/style.html#Building-Styles-Summary) for details)
2. If we use the symmetric colormap as is (i.e. center colour is white), then, the color white will be mapped to the midpoint value of the considered value range (e.g. min=-2, max=4 => (4 - -2)/2 = 3 => 1 == white). However, what we want is a *shifter diverging colormap* that ensures that the value 0 is mapped white, even if min != -max.
3. Further, we might wish to have control over the number of significant digits that are displayed in the table

All these problems will be solved in the following.

#### 6.2 How we want it (VIS_RIGHT)

In the following, we use a custom display method `my_table_display` (that is defined in [helper_funcs.py](https://github.com/jgliss/my_py3_scripts/blob/master/notebooks/helper_funcs.py)) in order to perform colour formatting considering all rows and columns at the same time and furthermore, using a diverging colour map that is dynamically shifted such that value 0 corresponds to the colour white (method `shifted_color_map`) also if `-vmin != vmax` (like usually).

In [26]:
from helper_funcs import my_table_display
my_table_display(bias)

Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,-3.18,1.79,1.74,1.44,2.16,3.95
FLDS_ISCCP,LW down SRF,-5.07,-1.8,10.51,11.5,5.16,4.51
FLNS_ISCCP,LW net SRF,6.39,9.7,6.82,6.85,7.17,6.92
FLUTC_CERES-EBAF,LW up Top Clearsky,-4.27,-0.99,1.04,1.43,-0.79,-1.18
FLUT_CERES-EBAF,LW up Top,-1.43,1.7,2.4,2.93,0.66,-0.83
FSDS_ISCCP,SW down SRF,-1.59,3.2,1.07,1.23,1.22,-1.72
FSNS_ISCCP,SW net SRF,-2.21,2.63,1.07,1.36,-0.02,-2.15
FSNTOAC_CERES,SW net TOA clearsky,-6.7,-5.05,-4.37,-4.18,-5.75,-4.99
FSNTOA_CERES,SW net TOA,-5.01,-0.34,-0.17,0.22,-1.72,-3.94
LHFLX_JRA25,Lat Heat Flux,-0.03,-2.5,-0.01,0.43,-1.7,-2.69


Now for the typical RMSE error

In [27]:
my_table_display(rmse_err_rel)

  np.copyto(xa, -1, where=xa < 0.0)


Unnamed: 0_level_0,Run,A,B,C,C,C,Run4
Unnamed: 0_level_1,Years,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
CLDTOT_ISCCP,Total cloud cover,-0.09,-0.04,0.04,0.05,0.03,0.0
FLDS_ISCCP,LW down SRF,-0.1,-0.04,0.05,0.1,0.04,-0.05
FLNS_ISCCP,LW net SRF,-0.14,0.07,0.01,0.02,0.04,0.0
FLUTC_CERES-EBAF,LW up Top Clearsky,0.18,-0.07,-0.09,-0.07,0.11,-0.07
FLUT_CERES-EBAF,LW up Top,-0.02,-0.11,0.03,0.07,0.08,-0.05
FSDS_ISCCP,SW down SRF,-0.13,-0.02,0.04,0.05,0.05,0.01
FSNS_ISCCP,SW net SRF,-0.04,-0.05,0.03,0.04,0.04,-0.01
FSNTOAC_CERES,SW net TOA clearsky,0.07,0.01,-0.08,-0.09,0.07,0.02
FSNTOA_CERES,SW net TOA,0.02,-0.1,0.0,0.01,0.02,0.05
LHFLX_JRA25,Lat Heat Flux,0.11,-0.05,-0.03,-0.02,0.01,-0.02


### 7. Concatenate and save results (Bias and typical RMSE) as table (EXPORT)

In the following, the two result tables ``bias_table`` and ``typical_rmse`` are merged into one result table and then saved both as excel table and as csv file.

In [28]:
result = pd.concat([bias, rmse_err_rel],axis=1, keys=["Bias", "RMSE relative Error"])
result

Unnamed: 0_level_0,Unnamed: 1_level_0,Bias,Bias,Bias,Bias,Bias,Bias,RMSE relative Error,RMSE relative Error,RMSE relative Error,RMSE relative Error,RMSE relative Error,RMSE relative Error
Unnamed: 0_level_1,Run,A,B,C,C,C,Run4,A,B,C,C,C,Run4
Unnamed: 0_level_2,Years,71-100,1-20,185-215,310-340,80-110,90-120,71-100,1-20,185-215,310-340,80-110,90-120
Variable,Description,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3
CLDTOT_ISCCP,Total cloud cover,-3.179,1.786,1.744,1.435,2.157,3.947,-0.089645,-0.044782,0.044541,0.051455,0.034652,0.003779
FLDS_ISCCP,LW down SRF,-5.066,-1.799,10.514,11.499,5.162,4.507,-0.100193,-0.044087,0.051809,0.099944,0.041161,-0.048633
FLNS_ISCCP,LW net SRF,6.394,9.699,6.824,6.847,7.167,6.925,-0.14211,0.071951,0.010658,0.016106,0.040623,0.002772
FLUTC_CERES-EBAF,LW up Top Clearsky,-4.268,-0.986,1.039,1.426,-0.786,-1.182,0.182079,-0.065215,-0.087909,-0.07304,0.1093,-0.065215
FLUT_CERES-EBAF,LW up Top,-1.426,1.704,2.398,2.928,0.66,-0.832,-0.015463,-0.113989,0.032363,0.072434,0.07703,-0.052375
FSDS_ISCCP,SW down SRF,-1.589,3.2,1.068,1.228,1.216,-1.721,-0.126774,-0.015239,0.038669,0.049568,0.047349,0.006428
FSNS_ISCCP,SW net SRF,-2.214,2.63,1.07,1.365,-0.017,-2.145,-0.039834,-0.045801,0.026338,0.036913,0.035251,-0.012867
FSNTOAC_CERES,SW net TOA clearsky,-6.703,-5.051,-4.373,-4.183,-5.747,-4.988,0.068048,0.00868,-0.075512,-0.090962,0.070825,0.018921
FSNTOA_CERES,SW net TOA,-5.015,-0.338,-0.167,0.223,-1.724,-3.937,0.02065,-0.104744,0.003151,0.005556,0.021231,0.054155
LHFLX_JRA25,Lat Heat Flux,-0.031,-2.503,-0.009,0.434,-1.695,-2.692,0.113433,-0.054982,-0.031062,-0.020107,0.010426,-0.017708


Now save both tables as excel file.

In [29]:
writer = pd.ExcelWriter('{}/result.xlsx'.format(output_dir))
result.to_excel(writer)
writer.save()