# Trend Evaluation
In this tutorial we are going to show how to do trend evaluations, and the two different types of trends AeroTools can evaluate.

It is assumed that you have done evaluations before, so we are not going to go through the configuration in detail, and instead only discuss the parts that are important for trends. We will start with defining a typical configuration setup. We will then look at the new options that are needed for trend calculations, both what they are, and some details about how they work under the hood. Before running the evaluation we will discuss the two different types of trends that are calculated. We will also discuss some of the errors you will encounter while calculating trends.

## The config

Here will define a quite normal config for running an evaluation with EBAS and EMEP

In [1]:
output_dir = <output_dir>
exp_pi = <your name>

data_dir = output_dir + "/data"
coldata_dir = output_dir + "/coldata"
proj_id = "tutorial"
exp_id = "emep-trends"

SyntaxError: invalid syntax (1406459490.py, line 1)

In [2]:
DEFAULT_RESAMPLE_CONSTRAINTS = dict(
    yearly=dict(monthly=9),
    monthly=dict(
        daily=1,
        weekly=1,
    ),
    daily=dict(hourly=1),
)

In [None]:
CFG = dict(
    # Output directories
    json_basedir=data_dir,
    coldata_basedir=coldata_dir,

    # Run options
    reanalyse_existing=True,        # if True, existing colocated data files will be deleted
    raise_exceptions=True,          # if True, the analysis will stop whenever an error occurs 
    clear_existing_json=False,      # if True, deletes previous output before running

    # Map Options
    add_model_maps=False,           # Adds a plot of the whole map. Very slow!!!
    only_model_maps=False,          # Adds only plot above, without any other evaluation
    filter_name="ALL-wMOUNTAINS",   # Regional filter for analysis
    map_zoom="Europe",              # Zoom level. For EMEP, Europe is typically used
    regions_how="country",          # Calculates statistics for different regions. Typically "country" is used, but that does not work for satellite data

    # Time and Frequency Options
    ts_type="monthly",              # Colocation frequency (no statistics in higher resolution can be computed)
    freqs=["monthly", "yearly"],    # Frequencies that are evaluated
    main_freq="monthly",            # Frequency that is displayed when opening webpage
    periods=["2000-2010"],               # List of years or periods of years that are evaluated. E.g. "2005" or "2001-2020"
    

    # Statistical Options

    min_num_obs = DEFAULT_RESAMPLE_CONSTRAINTS,
    obs_remove_outliers=False,
    model_remove_outliers=False,
    colocate_time=True,
    zeros_to_nan=False,
    weighted_stats=True,
    annual_stats_constrained=True,
    harmonise_units=True,
    resample_how={"vmro3max": {"daily": {"hourly": "max"}}}, # How to handle Ozone. Used all the time in EMEP

    # Experiment Metadata
    exp_pi=exp_pi,
    proj_id=proj_id,
    exp_id=exp_id,
    exp_name="Evaluation of EMEP data, with trends",
    exp_descr=("Evaluation of EMEP data, with trends"),
    public=True,
)


Notice that we have more that one year in this config, instead using the period *2000-2010*. We normally want multiple years to run a trend evaluation.

We then define you EBAS observations

In [None]:
BASE_FILTER = {
    "latitude": [30, 82],
    "longitude": [-30, 90],
    "altitude": [-200, 5000],
}

EBAS_FILTER = {
    **BASE_FILTER,
    "data_level": [None, 2],
    "set_flags_nan": True,
}

CFG["obs_cfg"] = {
    "EBAS-tc": dict(
        obs_id="EBASMC",
        web_interface_name="EBAS",
        obs_vars=[
            "concpm10",
            "concpm25",
        ],
        obs_vert_type="Surface",
        colocate_time=True,
        ts_type="monthly",
        obs_filters=EBAS_FILTER,
    )
}

We then add an EMEP model with data for our defined period

In [None]:
# Dir where emep data is found
folder_EMEP = f"/lustre/storeB/project/fou/kl/emep/ModelRuns/2021_REPORTING/TRENDS/"

# Setup for models used in analysis
CFG["model_cfg"] = {
    "EMEP": dict(
        model_id="EMEP",
        model_data_dir=folder_EMEP,
        gridded_reader_id={"model": "ReadMscwCtm"},
        model_use_vars={"vmro3": "vmro3max"},
        model_rename_vars={"vmro3max": "vmro3"},
        model_ts_type_read="daily",
    ),
}

## Adding the Trend Options
We now have a standard configuration, but with enough years to calculate the trend. But to enable the trends to be calculated, we need some more options:

In [None]:
CFG.update(
    dict(
        add_trends=True,
        avg_over_trends=True,
        obs_min_yrs=7,
        stats_min_yrs=7,
        sequential_yrs=False,
    )
)

We can then go over them one by one

- `add_trends`: This simple enables trends calculations
- `avg_over_trends`: This enables a second way of calculating trends. We will look closer at the two different methods below. This defaults to False, but should be enabled
- `obs_min_yrs`: When calculating trends we often want to only calculate it over years where we have enough *valid* years. A valid year is a year where each season have atleast one data point. This options tells AeroTools to remove stations with fewer valid years that the given options, here 7 years. By default this is set to 0, but it should be 70-75% of you period.
- `stats_min_yrs`: Similar to the above option, but instead tells AeroTools how many valid years are needed for  the trends to be calculated. Should be the same as `obs_min_yrs`
- `sequential_yrs`: Whether or not the valid years have to be sequential. Default is False

## The Two Ways of Calculating Trends

When calculating trends over more than one station, such as the trend for a country or Europe, there are two ways of doing this.

The first way is to calculate the average time series over a region of multiple stations, then calculate the trend. This is the old way of doing things, that has been in pyaerocom since the start.

The second way is to calculate the trends for all the stations in the region, and then take the average of the slope and intersect of the trend, which then defines the regional trend. This is the new way of doing trend calculation, and is done when `avg_over_trends` is enabled. Notice that the first way of doing it is also done when this is enabled.

Some notes on the second way: both the mean and median of the station trends are calculated, meaning you can choose which statistic to use. One downside of taking the mean/median of the trends is that we also need to find a way of calculating a new p-value of the new regional trend. Instead of taking the mean/median of the station trends, we have instead used the [harmoic mean](https://en.wikipedia.org/wiki/Harmonic_mean) to find the regional p-values.

## Running the Evaluation

We can now run this evaluation. A pure python script for this evaluation can be found [here](./scripts/trends-evaluation.py). You will notice that it take some time to run this evaluation. We will talk about that below. [Here](https://aeroval-test.met.no/danielh/pages/evaluation/?project=tutorial&experiment=emep-trends&statistic=obs%2Fmod_trend&station=ALL#) you can find a ready made version of the evaluation.



### The Interface

<img src="./trend-interface.png" alt="Trend interface" style="width:800px;"/>

Here we can see how it look with the new trends. At the right (in the circle) we can see the option to switch between the different methods for calculating the trends

## Things to think about

As we saw above the evaluation was quite slow. The reason is, of course, that we use multiple years of model data from EMEP to see a trend. This means that a lot of data is read, meaning that the evaluation is slow and takes up a lot of memory.