# Checking pyaerocom installation and access to data

## Import pyaerocom

It all begins with an import:

In [None]:
import pyaerocom as pya
import logging
logging.getLogger().setLevel(logging.ERROR) # Set level in information outputted by pyaerocom to the console.\n",
pya.__version__

When imported, pyaerocom will automatically check access to different default data locations (e.g. mount to PPI at MET Norway, or ~/MyPyaerocom/data), and in case a data source location is detected, associated data directories for accessing these data are instantiated (details below).

## Check available datasets and directories

Accessible data and default paths for certain datasets are available (and can be updated via the `const` module).

In [None]:
pya.const

### Data search directories

In [None]:
pya.const.DATA_SEARCH_DIRS

This list contains all directories where pyaerocom will search for model and observation data. pyaerocom will search for both model and observation data in all directories that are specified here. Searching can be done using and if nothing can be found for a certain query, an Exception is raised. Let's try to find some data from the [TM5](http://tm5.sourceforge.net/) chemistry-transport-model:

In [None]:
from pyaerocom.io.utils import browse_database

try:
    browse_database('*TM5*')
except pya.exceptions.DataSearchError as e:
    print(e)

When I run this we can see multiple data sources, but you might not see any. That is because I already have access to alot of data, but you, being new to this, don't have any. This means that we need some data we all can use!

## Downloading the pyaerocom testdata-minimal dataset

The testdata-minimal dataset was developed for automatic testing of pyaerocom and is well suited to illustrate the main features of pyaerocom, without too requiring too heavy computing resources or data storage. It is very easy to get these data:

In [None]:
%%bash

pya getsampledata

Now we have a path, where there is supposed to be some data. Awesomeness!

In [None]:
dataloc = f'./data/testdata-minimal/'
dataloc

Side comment: If this way of formatting python strings looks weird to you, don't worry, this is because it is a [rather new feature](https://www.geeksforgeeks.org/formatted-string-literals-f-strings-python/) (as of Nov 2020).

In [None]:
import os
os.listdir(dataloc)

Let's look into the modeldata directory (obsdata follows later).

In [None]:
os.listdir(dataloc + 'modeldata')

### Adding data search directories

Great, found something. Let's tell pyaerocom about it.

In [None]:
pya.const.add_data_search_dir(dataloc + 'modeldata')

Now, let's repeat what we did before.

In [None]:
pya.const.DATA_SEARCH_DIRS

In [None]:
browse_database('*TM5*')

You should now see the downloaded data! Nice! This worked, and there is even a lot of additional information, that comes in handy. The latter is because the underlying NetCDF files in the data directory are stored using AeroCom file naming conventions. Each dataset has it's own ID (usually the directory name and can be accessed via this ID). For this example TM5 dataset the ID is *TM5-met2010_CTRL-TEST* as can be seen in the output from the browsing method.

pyaerocom makes extensive use og these conventions, which makes it easy to streamline analyses of many different models and observation records. However, as we shall see below, the latter are often formatted in many different ways, as observations from many different databases are used. 

In [None]:
reader = pya.io.ReadGridded('TM5-met2010_CTRL-TEST')
reader

### Tiny detour: AeroCom file naming conventions

Let's have a brief look at such a filename (taking the first file in the data directory):

In [None]:
first_file = reader.files[0]
os.path.basename(first_file)

The template is:

`
aerocom3_<ModelName>-<MeteoConfigSpecifier>_<ExperimentName>-<PerturbationName>_<VariableName>_<VerticalCoordinateType>_<Period>_<Frequency>.nc
`

So the above filename uses **TM5** model, 2010 meteorology (**met2010**), and this version is for AeroCom Phase III (**AP3**) experiment, particularly for the 2019 Control (**CTRL2019**) perturbation. Variable is **abs550aer** (which is the aerosol absorption optical depth, or AAOD), which is representative for a whole atmospheric **Column**, the simulated year is **2010** (here it is the same as meteorology, but this must not always be the case) and the temporal resolution is **daily**.

If you want to learn more about AeroCom conventions and ongoing experiments, [see here](https://docs.google.com/spreadsheets/d/1NiHLVTDsBo0JEBSnnDECNI2ojUnCVlxuy2PFrsRJW38/edit#gid=1475397852).

The metadata that is extracted from the filenames is accessible via:

In [None]:
reader.file_info

### How do I know what the variable names mean?

You can check all variables via `pyaerocom.const.VARS`, which is a dictionary-like object that allows access to variables and in most cases, provides relevant additional information such as the [CF standard_name](https://cfconventions.org/standard-names.html). For instance, for the above `abs550aer`:

In [None]:
var = pya.const.VARS['abs550aer']
var

In [None]:
var.long_name

Or the extinction (scattering + absorption) aerosol optical depth (AOD), called `od550aer`:

In [None]:
var = pya.const.VARS['od550aer']
var

### Reading of model data using `ReadGridded` class

The above instantiated `ReadGridded` interface relies on and makes use of these conventions. This class is also the standard interface to read the modeldata into instances of the `pyaerocom.GriddedData`

In [None]:
aaod_tm5 = reader.read_var('abs550aer', start=2010, ts_type='monthly')
aaod_tm5

Under the hood, the `GriddedData` object is an [iris.Cube](https://scitools.org.uk/iris/docs/latest/userguide/iris_cubes.html), and it is **single variable**, that is, it does not support reading of multiple variable fields (e.g. AOD and AAOD sharing the same lat, lon and time dimensions). 

The `GriddedData` object is introduced in more detail in other tutorials, but what is a tutorial without a nice, self-explanatory plot anyways?

In [None]:
aaod_tm5.sel(latitude=(-30, 30), longitude=(-150, 150)).quickplot_map('06/2010');

## Registering and reading of *ungridded* observational data

### ... COMING VERY SOON!!

Until then, checkout the section on ungridded observations in the following tutorial [getting_started_analysis](https://github.com/metno/pyaerocom-tutorials/blob/master/getting_started_analysis.ipynb).