# Running PyAeroval part 2

Here we will build on what we learned in the previous lecture and expand on it with more advanced options and features, then we will see how to run PyAeroval from a standard config, and lastly how to make your own local PyAerocom environment.

## Advanced Options

Here we will look at some options you can add. You don't need to use all the features listed below, and can instead use what you need for your project.

Note that in this part you will not be able to run the python code, instead, if you want to test it, you can copy the code snippets to your config and try it there.



### Adding other observations

In the last tutorial we saw how to add AERONET observations. For aerosols, such as O3, NOX, SOX, PM, etc, EEA and EBAS are more often used. They are added simply by adding observations with their respective `obs_id`

In [None]:
EEA = dict(
    obs_id="EEAAQeRep.v2",          # ID of EEA in AeroTools
    obs_vars=[
        "concpm10",                 # Variables to be used
    ],
    web_interface_name="EEA-rural", # The name which is shown on the web interface
    obs_vert_type="Surface",
    ts_type="monthly",              # Frequency of read observations. Evaluation can not be finer than this, for this network
    obs_filters=EEA_FILTER,         # Filters we make below
)

EBAS = dict(
    obs_id="EBASMC",                # ID of EBAS in AeroTools
    web_interface_name="EBAS-m",    # The name which is shown on the web interface
    obs_vars=[
        "concpm10",                 # Variables to be used
    ],
    obs_vert_type="Surface",        # Observation level
    ts_type="monthly",              # Frequency of read observations. Evaluation can not be finer than this, for this network
    obs_filters=EBAS_FILTER,        # Filters we make below
)


# Add the obs to the config

OBS_CFG = {
    "EBAS-m": EBAS,
    "EEA-m": EEA
}

CFG["obs_cfg"] = OBS_CFG

### Adding filters

For most observations we want to filter out some stations based on certain requirements. We can see in the observation definitions above that we have `obs_filters` already used, so we need to make those filters. (Should normally be done before defining the observations, so that the dictionaries actually exist).

The most general, and most used filters, are longitude, latitude and altitude. Below we can see the typical values we use for Europe. This is our base filter.

EBAS needs some extra filters, like the data level, and whether or not to set flagged data to nan. The EBAS filer is these option, as well as the base filters.

EEA expects other filters. These filters are where the stations are found, e.g. whether they are rural or near cities. These filters are then added to the base filters to make the EEA filter dictionary.

In [None]:
EEA_RURAL_FILTER = {
    "station_classification": ["background"],
    "area_classification": [
        "rural",
        "rural-nearcity",
        "rural-regional",
        "rural-remote",
    ],
}

BASE_FILTER = {
    "latitude": [30, 82],
    "longitude": [-30, 90],
    "altitude": [-200, 5000],
}

EBAS_FILTER = {
    **BASE_FILTER,
    "data_level": [None, 2],
    "set_flags_nan": True,
}

EEA_FILTER = {
    **BASE_FILTER,
    **EEA_RURAL_FILTER,
}

### Adding EMEP

In the last tutorial we looked at how to add Aerocom3 compliant model data. Those working on air pollution will in most cases use model data from our own model EMEP. We will now see how to add EMEP data as well. 

Due to some design decisions that were made when making the EMEP reader years ago, your EMEP data need to have a certain naming and folding scheme. This scheme will help the EMEP reader discern the time codes as well as the frequency of the data.

The following are the requirements

1. Maximum one year per file, named `Base_<frequency>.nc`
2. These files should be in a folder where the name of the folder is the year
3. For multiple years, there should be one folder per year, with that year as name.
4. All the folders with the years, can be in a custom named folder

```
customfoler/
├── 2015/
│   ├── Base_day.nc
│   ├── Base_month.nc
│   └── Base_hour.nc
└── 2016/
    ├── Base_day.nc
    ├── Base_month.nc
    └── Base_hour.nc
```

We can now use the base folder to read the data

In [None]:
folder_EMEP = "/lustre/storeB/project/fou/kl/emep/People/danielh/projects/pyaerocom/workshop/emep/mod/2020"
# folder_EMEP = "/path/to/your/emep/data"  # <-- CHANGE THIS TO YOUR LOCAL PATH

EMEP = dict(
        model_id="EMEP",
        model_data_dir=folder_EMEP,
        gridded_reader_id={"model": "ReadMscwCtm"}, # Tells pyaerocom to use the EMEP reader instead of the default aerocom reader
    )


MODELS = {
    # Other models can go here  
    "EMEP": EMEP,
}

CFG["model_cfg"] = MODELS

We can see that our model entry differs a bit from when we read the Aerocom compliant model data: We use `gridded_reader_id={"model": "ReadMscwCtm"}` to tell the program that we are now reading EMEP data.

### Using Pyaro

In the cases of observational networks like EBAS and EEA, PyAerocom has built-in readers, but we are slowly moving away from this, and are instead moving over to [Pyaro](https://github.com/metno/pyaro), which is an interface for writing readers compatible with PyAerocom. This means that everyone can write their own reader, even though they are not part of the AeroTools dev team. Pyaro require more options to run, but in return the readers are more flexible, and will in many cases lead to more optimized reading.

We start by making a `PyaroConfig`. This is in essence just a dictionary with options for the pyaro reader, but wrapped in Pyadantic. 

In [None]:
from pyaerocom.io.pyaro.pyaro_config import PyaroConfig

reader_id = "eeareader"
url = "/lustre/storeB/project/aerocom/aerocom1/AEROCOM_OBSDATA/EEA-AQDS/download"

config = PyaroConfig(
    name="pmf",
    reader_id=reader_id,
    filename_or_obj_or_url=url,
    filters={
        "time_bounds": {
            "startend_include":[("2010-01-01 00:00:00", "2011-01-01 00:00:00")], # Include data between these time bounds
        }
    },
    
)

This may seem like we are adding another level of complexity on top of the already complex PyAeroval configuration, but for the sake of developing new observational readers this makes it much easier. The options in this configurations are

- reader_id: name of the reader need to read the data
- name: unique name chosen by the user. Readers with the same *reader_id* might have to read from different sources, and therefore a unique name is needed
- filename_or_obj_or_url: is the aptly named path to where the reader can find the data
- filters: where are multiple filters in Pyaro. The most important are *variable/include* and *variable/exclude*, and *time_bounds*. See the [docs](https://pyaro.readthedocs.io/latest/) for more on filters

We can now add this to a observation config

In [None]:
PYARO = dict(
        obs_id=config.name,                                     # Must be set to the name found in the config
        pyaro_config=config,                                    # The pyaro config
        web_interface_name="Pyaro-m",                           # Name that is displayed on the webpage
        obs_vars=["concpm10"],                                  # List of variables that is to be evaluated
        obs_vert_type="Surface",                                # Observation level
        ts_type="monthly",                                      # Frequency of read observations. Evaluation can not be finer than this, for this network
    )


OBS_CFG = {
    "Pyaro-m": PYARO,
    # Other observation networks can go here
}

CFG["obs_cfg"] = OBS_CFG

We see that this is quite similar to out other observation configs, the only difference being `obs_id` and `obs_config`. Instead of defining the network with the `obs_id`, as we need for the other configs, we instead use `obs_config` to define it. Note that to make PyAeroval keep track of this observation, we need to make `obs_id` the same as the **name** defined in the Pyaro config. The way I've done this above is the easiest and safest.

## Running a Standardized Config

We often want to run a standardized config to make sure that everyone are comparing the same things. PyAeroval makes it simple to import the config we use for yearly report config. The recommended practice is to import this configuration and then make any necessary changes—such as modifying models, observations, or global options—to suit your needs.

In [None]:
from pyaerocom import const
from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
from pyaerocom.aeroval.config.emep.reporting_base import get_CFG

CFG = get_CFG(
    reportyear=2025, # Not that important
    year=2020, # Year you are evaluating
    model_dir=<folder_EMEP>,
)

This will create a rather large config. You are free to edit it as you see fit before running it. Some examples are listed below. The first example where we change the project and experiment ids are good practice/necessary.

In [None]:

CFG.update(                                                     # Updates the global options
    dict(
        proj_id="Workshop",
        exp_id="Workshop",
        exp_pi="Great Scientist",
    )
)


for obs in list(CFG["obs_cfg"].keys()):                         # Removes all EEA observations
    if obs.startswith("EEA"):
        del CFG["obs_cfg"][obs]
        print(f"removed {obs}")

CFG["obs_cfg"]["EBAS-d-tc"]["obs_vars"].append("vmrisop")       # Adds vmrisop to EBAS observations

## Making own AeroTools Environment

When doing development or research it is often smart to isolate the version of PyAerocom which is used. One way of doing this is to create your own environment with your own installation of PyAerocom. This way the version of PyAerocom is frozen, and you can even choose which branch to use (in the case some of the developers has made a new feature that you want to test out).

To do this:

1. Go somewhere locally or on PPI where you have your projects and make a folder for the new project folder
2. In said folder make an `env` folder for the PyAerocom environment, and an `evaluations` folder for your configs
3. In the `env` folder clone PyAerocom from github, then make a new environment using python from the module
4. Activate the new module, update pip, then install the cloned PyAerocom with pip
5. In the evaluation you can now run configs with your own version of PyAerocom

In [None]:
%%bash

# 1.
cd <your project directory> 
mkdir my_project
cd my_project

# 2.
mkdir env evaluations

# 3.
cd env
git clone git@github.com:metno/pyaerocom.git
module load /modules/MET/rhel8/user-modules/fou-kl/aerotools/aerotools
python -m venv pya_dev
module purge # optional, but recommended to avoid conflicts
source pya_dev/bin/activate
pip install --upgrade pip
cd pyaerocom
pip install -e .

# 4.
cd ../../evaluations
python <your_config>.py


The next time you want to use this environment you just need to go to the evaluation folder, then activate your environment and run your config

In [None]:
%%bash

cd <your project directory>/my_project/evaluations
source ../env/pya_dev/bin/activate
python <your_config>.py

## Features not covered here

There are more features in PyAeroval that are not covered here:

- Model maps
- Adding a yaml file to omit certain stations
- Trend calculations
- Fairmode statistics