# Introduction into web evaluation setup

## Note

It is highly recommended to checkout [setup_and_intro.ipynb](https://github.com/metno/pyaerocom-meetings/blob/master/Feb2021_Workshop/setup_and_intro.ipynb) and make sure all is in place to use pyaerocom with access to PPI.

## Setting up the configuration for the analysis

In [28]:
import os
import pyaerocom as pya
pya.__version__

'0.10.0'

In the following, a whole configuration setup is specified, see comments for details.

In [31]:
# ID of project
PROJ_ID = 'workshop2021'

# ID of experiment 
EXP_ID = 'tutorial'

# Output directory (where json files are stored)
OUT_BASEDIR = os.path.abspath('.')

# Directory where colocated NetCDF files are stored
COLDATA_BASEDIR = os.path.abspath('./coldata')

In [32]:
stp = pya.web.AerocomEvaluation(proj_id=PROJ_ID, exp_id=EXP_ID, 
                                exp_name='Example configuration for pyaerocom workshop',
                                out_basedir=OUT_BASEDIR,
                                basedir_coldata=COLDATA_BASEDIR)
print(stp)


Pyaerocom AerocomEvaluation
---------------------------
Project ID: workshop2021
Eperiment ID: tutorial
Experiment name: Example configuration for pyaerocom workshop
colocation_settings: (will be updated for each run from model_config and obs_config entry)
  save_coldata: True
  _obs_cache_only: False
  obs_vars: None
  obs_vert_type: None
  model_vert_type_alt: None
  read_opts_ungridded: None
  obs_ts_type_read: None
  model_use_vars: None
  model_add_vars: None
  model_keep_outliers: True
  model_to_stp: False
  model_id: None
  model_name: None
  model_data_dir: None
  obs_id: None
  obs_name: None
  obs_data_dir: None
  obs_keep_outliers: False
  obs_use_climatology: False
  obs_add_meta: []
  gridded_reader_id: {'model': 'ReadGridded', 'obs': 'ReadGridded'}
  start: None
  stop: None
  ts_type: None
  filter_name: None
  remove_outliers: True
  apply_time_resampling_constraints: None
  min_num_obs: None
  resample_how: None
  var_outlier_ranges: None
  var_ref_outlier_ranges: No

The most important things to define for the analysis are:

- obs_config: dictionary of dictionaries containing observations to be used
- model_config: dictionary of dictionaries containing models to be used
- colocation_settings: (see above) most of these can be left untouched and below we show the essential information

### Observation setup

The `obs_config` entry defines observations to be used, below we define 1 set of observations, Aeronet (AOD and Angstrom Exponent) and EBAS (scattering and absorption coefficients). In the end, these setups are assigned to the evaluation class that we just created.

In [33]:
obs_cfg = {
    # key is name as it appears in web interface, value contains setup 
    'Aeronet' : {
        'obs_id'        : 'AeronetSunV3Lev2.daily', # ID of obsnetwork
        'obs_vars'      : ['ang4487aer', 'od550aer'], # list of variables (Angstrom Exponent, 440-870nm, and AOD at 550 nm)
        'obs_vert_type' : 'Column', # this is needed, choose from Column or Surface
        'obs_filters'   : {'altitude' : [0, 1000]},
        'ignore_station_names' : 'DRAGON*'
    }    
}

stp['obs_config'] = obs_cfg

### Defining models to be used for evaluation

In [34]:
model_cfg = {
    'Aerocom-Median' : {'model_id' : 'AEROCOM-MEDIAN-2x3-GLISSETAL2020-1_AP3-CTRL'},
    'EC-Earth'    : {'model_id' : 'EC-Earth3-AerChem-met2010_AP3-CTRL2019'}
}

stp['model_config'] = model_cfg

## Colocation setup

In the following we define essential settings for colocation of model / obs / var. Note: these can be overwritten in each individual model or obs config entry where needed.

In [36]:
DEFAULT_COLOCATION_SETTINGS = dict(
    start = 2010, 
    stop = 2011,
    ts_type = 'daily', # desired output frequency of colocated data objects
    colocate_time = False,
    weighted_stats = True, # only relevant if models are evaluated against gridded satellite data
    apply_time_resampling_constraints = True,
    min_num_obs = pya.const.OBS_MIN_NUM_RESAMPLE,
    reanalyse_existing = False, # relevant for re-runs. If True, pre-existing colocated data files are re-used for computation of json files 
    remove_outliers=True, # remove outliers during colocation
    harmonise_units=True,
    model_keep_outliers=True,   
)

stp.update(**DEFAULT_COLOCATION_SETTINGS)

In [37]:
stp.var_mapping = pya.web.web_naming_conventions.VAR_MAPPING

In [None]:
stp.run_evaluation()


Running analysis:
Obs. names: ['Aeronet']
Model names: ['Aerocom-Median', 'EC-Earth']
Remove outliers: True
Harmonise units: True
Delete existing json files before reanalysis: True
Reanalyse existing colocated NetCDF files: False
Run only colocation (no json files computed): False
Raise exceptions if they occur: False

Running colocation of Aerocom-Median against Aeronet
PREPARING colocation of AEROCOM-MEDIAN-2x3-GLISSETAL2020-1_AP3-CTRL vs. AeronetSunV3Lev2.daily
The following variable combinations will be colocated
MODEL-VAR	OBS-VAR
ang4487aer	ang4487aer
od550aer	od550aer
Running AEROCOM-MEDIAN-2x3-GLISSETAL2020-1_AP3-CTRL / AeronetSunV3Lev2.daily (ang4487aer, ang4487aer)
Updating ts_type from daily to monthly (highest available in model AEROCOM-MEDIAN-2x3-GLISSETAL2020-1_AP3-CTRL)
pyaerocom_version is outdated (value: 0.11.0.dev1). Current value: 0.10.0
Deleting outdated cache file: /home/jonasg/MyPyaerocom/_cache/jonasg/AeronetSunV3Lev2.daily_ang4487aer.pkl
Reading AERONET data
 4

In [None]:
COL_BASE = os.path.abspath(f'../../coldata')
CUSTOM_MODEL_READ_METHODS = os.path.abspath('../eval_py/cube_read_methods.py')
JSON_OUTPUT_BASEDIR = os.path.abspath('../json/')

# Basic configuration for analysis
BASE_CFG = dict(
    proj_id = PROJ_ID,

    out_basedir = JSON_OUTPUT_BASEDIR,
    coldata_basedir = COL_BASE,
    add_methods_file = CUSTOM_MODEL_READ_METHODS,

    # if True, already existing colocated data files will be ignored and
    # recomputed
    clear_existing_json = False,

    # if True, the analysis will stop whenever an error occurs (else, errors that
    # occurred will be written into the logfiles)
    raise_exceptions = False,

    var_order_menu = ['od550aer', 'ang4487aer', 'concpm10',
                      'concpm25', 'vmrno2', 'vmro3']
)

DEFAULT_COLOCATION_SETTINGS = dict(
    colocate_time = False,
    weighted_stats=True,
    apply_time_resampling_constraints = True,
    min_num_obs = const.OBS_MIN_NUM_RESAMPLE,
    reanalyse_existing = True,
    remove_outliers=True,
    harmonise_units=True,
    model_keep_outliers=True,
    filter_name = 'WORLD-wMOUNTAINS',
    ts_type = 'daily',
    var_mapping=VAR_MAPPING,

)

### General stuff
# resolution when colocating gridded satellite observations with models
REGRID_SAT = dict(lat_res_deg=5,
                  lon_res_deg=5)

### Some info on available classification codes in EEA-NRT (should be the same as in GHOST theoretically)

# EEA-NRT area codes (key: area_classification): 'rural', 'rural-nearcity',
# 'rural-regional', 'rural-remote', 'suburban', 'urban'
# EEA-NRT station codes (key: station_classification):'background', 'industrial', 'traffic'

### Define filters for the obs subsets

# BASE FILTERS
ALTITUDE_FILTER = {
    'altitude' : [0, 1000]
    }

GHOST_RURAL_FILTER = {
    'station_classification'  :   ['background'],
    'area_classification'     :   ['rural','rural-near_city',
                                   'rural-regional', 'rural-remote']
    }

EEA_NRT_RURAL_FILTER = {
    'station_classification'  :   ['background'],
    'area_classification'     :   ['rural', 'rural-nearcity',
                                   'rural-regional', 'rural-remote']
    }

# options station_classification: ['rural', 'urban', 'urban_bound']
# options area_classification: ['Agricultural', 'Commercial', 'Forested',
#               'Industrial', 'Residential', 'Undeveloped Rural', 'Unknown']
AIRNOW_RURAL_FILTER = {
    'station_classification'  :   ['rural']
    #'area_classification'     :   ['Agricultural', 'Forested', 'Undeveloped Rural']
    }

# OBS SPECIFIC FILTERS (combination of the above and more)
GHOST_BASE_FILTER = {
    'set_flags_nan' : True,
    }

# How to read auxiliary model variables (that cannot be read but need to be
# computed). Entry under "fun" needs to be defined in ../eval_py/cube_read_methods.py
MODEL_AUX_VARS = {
    'vmro3' : dict(
        vars_required=['mmro3'],
        fun='mmr_to_vmr'),
    'vmrno2' : dict(
        vars_required=['mmrno2'],
        fun='mmr_to_vmr')
    }

# Setup for models used in analysis
MODELS = {
    'IFS-OSUITE'        :   dict(model_id='ECMWF_OSUITE',
                             model_read_aux=MODEL_AUX_VARS),
    'IFS-OSUITE-96h'    :   dict(model_id='ECMWF_OSUITE_96H',
                             model_read_aux=MODEL_AUX_VARS),
    'IFS-CTRL'    :   dict(model_id='ECMWF_CNTRL',
                             model_read_aux=MODEL_AUX_VARS)
}

# Setup for available ground based observations (ungridded)

VAR_OUTLIER_RANGES = {
    'concpm10' : [-1, 5000], # ug m-3
    'concpm25' : [-1, 5000], # ug m-3
    'vmrno2'   : [-1, 5000], # ppb
    'vmro3'    : [-1, 5000] # ppb
    }

OBS_GROUNDBASED = {

    'AeronetL1.5-d'     :   dict(obs_id='AeronetSunV3Lev1.5.daily',
                                 obs_vars=['ang4487aer', 'od550aer'],
                                 obs_vert_type='Column',
                                 ignore_station_names='DRAGON*',
                                 obs_filters=ALTITUDE_FILTER,
                                 min_num_obs = {'monthly': {'daily': 3}}),

    'AN-EEA-MP'         : dict(is_superobs = True,
                               obs_id = ('AirNow', 'EEA-NRT-rural', 'MarcoPolo'),
                               obs_vars = ['concpm10', 'concpm25',
                                           'vmro3', 'vmrno2'],
                               obs_vert_type = 'Surface',
                               var_ref_outlier_ranges=VAR_OUTLIER_RANGES,
                               ),

    'EEA-NRT-rural'     :   dict(obs_id='EEAAQeRep.NRT',
                                 obs_data_dir=OBS_ACCESS['EEA_NRT_LOCAL'],
                                 obs_vars = ['concpm10', 'concpm25',
                                             'vmro3', 'vmrno2'],
                                 obs_vert_type='Surface',
                                 var_ref_outlier_ranges=VAR_OUTLIER_RANGES,
                                 obs_filters={**ALTITUDE_FILTER,
                                              **EEA_NRT_RURAL_FILTER}
                                 ),

    'G-EEA-rural'        :   dict(obs_id='GHOST.EEA.daily',
                                obs_data_dir=OBS_ACCESS['GHOST_LOCAL_EEA_DAILY'],
                                obs_vars=['concpm10','concpm25', 'vmro3',
                                          'vmrno2'],
                                obs_vert_type='Surface',
                                var_ref_outlier_ranges=VAR_OUTLIER_RANGES,
                                obs_filters= {**ALTITUDE_FILTER,
                                              **GHOST_BASE_FILTER,
                                              **GHOST_RURAL_FILTER}),
    'AirNow-rural'            :   dict(obs_id = 'AirNow',
                                 obs_data_dir=OBS_ACCESS['AIRNOW_LOCAL'],
                                 obs_vars = ['concpm10', 'concpm25',
                                             'vmro3', 'vmrno2'],
                                 var_ref_outlier_ranges=VAR_OUTLIER_RANGES,
                                 obs_filters={**ALTITUDE_FILTER,
                                              **AIRNOW_RURAL_FILTER},
                                 obs_vert_type='Surface'
                                 ),

    'AirNow'            :   dict(obs_id = 'AirNow',
                                 obs_data_dir=OBS_ACCESS['AIRNOW_LOCAL'],
                                 obs_vars = ['concpm10', 'concpm25',
                                             'vmro3', 'vmrno2'],
                                 var_ref_outlier_ranges=VAR_OUTLIER_RANGES,
                                 obs_filters={**ALTITUDE_FILTER},
                                 #only_superobs=True,
                                 obs_vert_type='Surface'
                                 ),

    'MarcoPolo'         :   dict(obs_id = 'MarcoPolo',
                                 obs_data_dir=OBS_ACCESS['MARCOPOLO_LOCAL'],
                                 obs_vars = ['concpm10', 'concpm25',
                                             'vmro3', 'vmrno2'],
                                 var_ref_outlier_ranges=VAR_OUTLIER_RANGES,
                                 obs_vert_type='Surface'
                                 # NO ALTITUDE INFO AVAILABLE -> NO ALTITUDE FILTERING POSSIBLE
                                 )
}


# Setup for supported satellite evaluations
OBS_SAT = {
# =============================================================================
#     'MODIS6.1-Tr'      :   dict(obs_id='MODIS6.1terra.DT.DP.mean',
#                                obs_vars=['od550aer'],
#                                obs_vert_type='Column',
#                                regrid_res_deg=REGRID_SAT,
#                                harmonise_units=False,
#                                remove_outliers=True,
#                                var_ref_outlier_ranges={'od550aer' : [0.01, 10]})
# =============================================================================
}

OBS_CFG = {
    **OBS_GROUNDBASED,
    **OBS_SAT
    }