# Some more info on reading of ungridded obsdata

This notebook shows how to:

- Understanding the architecture of ungridded reading
- Find the correct data ID for an observation network
- Identify the low-level reading classes used by `ReadUngridded`
- Register a new observation dataset (e.g. a [local copy](https://github.com/metno/pyaerocom-meetings/tree/master/Feb2021_Workshop#speedup---create-a-local-copy-of-relevant-obsdata))

## Basic architecture of ungridded data import

![](suppl/pyaerocom_ungridded_io_flowchart.png)

The `ReadUngridded` class is used by the user, but the actual reading of a dataset happens in one of the reading classes for the different obs datasets (as they come in all sorts of formats). So each dataset needs

- a **data_id**
- a path where the data is located
- and a low-level reader class, so that `ReadUngridded` knows, which reader to invoke for that **data_id**

## Hands-on part

In [1]:
import pyaerocom as pya
pya.__version__

'0.10.0'

In [2]:
pya.const.has_access_lustre

True

## Get a list of registered default ungridded obs IDs and corresponding paths

**NOTE**: these are defaults paths for the obs IDs, and the printed paths do not necessarily exist or are accessible.

In [3]:
pya.const.OBSLOCS_UNGRIDDED

OrderedDict([('AeronetSunV2Lev1.5.daily', '/lustre/storeA/project/aerocom/'),
             ('AeronetSun_2.0_NRT',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSunNRT'),
             ('AeronetSunV2Lev2.daily',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetRaw2.0/renamed'),
             ('AeronetSunV2Lev2.AP',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSun2.0AllPoints/renamed'),
             ('AeronetSDAV2Lev2.daily',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSun2.0.SDA.daily/renamed'),
             ('AeronetSDAV2Lev2.AP',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSun2.0.SDA.AP/renamed'),
             ('AeronetInvV2Lev1.5.daily',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/Aeronet.Inv.V2L1.5.daily/renamed'),
             ('AeronetInvV2Lev1.5.AP',
              '/lustre/storeA/project/aerocom/

In addition to the paths, for ungridded observations, a reading engine needs to be registered. Registered low-level reading classes can be accessed via `ReadUngridded` class.

In [4]:
reader = pya.io.ReadUngridded()
reader.SUPPORTED_READERS

[pyaerocom.io.read_aeronet_invv3.ReadAeronetInvV3,
 pyaerocom.io.read_aeronet_invv2.ReadAeronetInvV2,
 pyaerocom.io.read_aeronet_sdav2.ReadAeronetSdaV2,
 pyaerocom.io.read_aeronet_sdav3.ReadAeronetSdaV3,
 pyaerocom.io.read_aeronet_sunv2.ReadAeronetSunV2,
 pyaerocom.io.read_aeronet_sunv3.ReadAeronetSunV3,
 pyaerocom.io.read_earlinet.ReadEarlinet,
 pyaerocom.io.read_ebas.ReadEbas,
 pyaerocom.io.read_gaw.ReadGAW,
 pyaerocom.io.read_aasetal.ReadAasEtal,
 pyaerocom.io.read_ghost.ReadGhost]

These are all registerd data IDs:

In [5]:
reader.SUPPORTED_DATASETS

['AeronetInvV3Lev2.daily',
 'AeronetInvV3Lev1.5.daily',
 'AeronetInvV2Lev2.daily',
 'AeronetInvV2Lev1.5.daily',
 'AeronetSDAV2Lev2.daily',
 'AeronetSDAV3Lev1.5.daily',
 'AeronetSDAV3Lev2.daily',
 'AeronetSunV2Lev2.daily',
 'AeronetSunV2Lev2.AP',
 'AeronetSunV3Lev1.5.daily',
 'AeronetSunV3Lev1.5.AP',
 'AeronetSunV3Lev2.daily',
 'AeronetSunV3Lev2.AP',
 'EARLINET',
 'EBASMC',
 'DMS_AMS_CVO',
 'GAWTADsubsetAasEtAl',
 'GHOST.EEA.monthly',
 'GHOST.EEA.hourly',
 'GHOST.EEA.daily',
 'GHOST.EBAS.monthly',
 'GHOST.EBAS.hourly',
 'GHOST.EBAS.daily']

To get a reader for a ungridded ID, you can do:

In [6]:
ebas_reader = reader.get_reader('EBASMC')
ebas_reader

ReadEbas

For a given reader, you can check all data IDs that are supported via:

In [7]:
ebas_reader.SUPPORTED_DATASETS

['EBASMC']

In [8]:
reader.get_reader('AeronetSunV3Lev2.daily').SUPPORTED_DATASETS

['AeronetSunV3Lev1.5.daily',
 'AeronetSunV3Lev1.5.AP',
 'AeronetSunV3Lev2.daily',
 'AeronetSunV3Lev2.AP']

So the lowlevel reader for Aeronet version 3 level 2, daily averages also supports level 1.5 data as well as all-points (AP) data in addition to the daily dataset.

The following line will register a local copy of the EBAS dataset under a new name:

In [9]:
pya.const.add_ungridded_obs(obs_id='EBAS-LOCAL', 
                            data_dir='/home/jonasg/MyPyaerocom/ws21/obslocal/EBASMultiColumn/data',
                            reader=pya.io.ReadEbas)

Which has been registered in the `ReadEbas` class above, so now

In [10]:
ebas_reader.SUPPORTED_DATASETS

['EBASMC', 'EBAS-LOCAL']

Now this dataset can be read via the new obs ID, which is registered:

In [11]:
pya.const.OBSLOCS_UNGRIDDED

OrderedDict([('AeronetSunV2Lev1.5.daily', '/lustre/storeA/project/aerocom/'),
             ('AeronetSun_2.0_NRT',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSunNRT'),
             ('AeronetSunV2Lev2.daily',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetRaw2.0/renamed'),
             ('AeronetSunV2Lev2.AP',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSun2.0AllPoints/renamed'),
             ('AeronetSDAV2Lev2.daily',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSun2.0.SDA.daily/renamed'),
             ('AeronetSDAV2Lev2.AP',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/AeronetSun2.0.SDA.AP/renamed'),
             ('AeronetInvV2Lev1.5.daily',
              '/lustre/storeA/project/aerocom/aerocom1/AEROCOM_OBSDATA/Aeronet.Inv.V2L1.5.daily/renamed'),
             ('AeronetInvV2Lev1.5.AP',
              '/lustre/storeA/project/aerocom/

### Get list of supported variables for a certain reader

In [12]:
ebas_reader.PROVIDES_VARIABLES

['DEFAULT',
 'sc550aer',
 'sc440aer',
 'sc700aer',
 'sc550dryaer',
 'sc440dryaer',
 'sc700dryaer',
 'ang4470dryaer',
 'sc550lt1aer',
 'bsc550aer',
 'ac550aer',
 'ac550dryaer',
 'ac550lt1aer',
 'bsc550dryaer',
 'scrh',
 'acrh',
 'concso4',
 'concso2',
 'vmrso2',
 'concpm10',
 'concpm25',
 'concpm1',
 'concso4t',
 'concso4c',
 'concbc',
 'conceqbc',
 'conctc',
 'concoa',
 'concoc',
 'concss',
 'concnh3',
 'concno3',
 'concnh4',
 'conchno3',
 'conctno3',
 'concno2',
 'conco3',
 'concco',
 'vmro3',
 'vmrco',
 'vmrno2',
 'vmrno',
 'concprcpso4',
 'concprcpso4t',
 'concprcpso4c',
 'concprcpno3',
 'concprcpso4scavenging',
 'concprcpnh4',
 'wetso4',
 'wetconcso4',
 'wetso4t',
 'wetso4c',
 'wetoxn',
 'wetrdn',
 'wetnh4',
 'precip',
 'wetconcph',
 'wetno3',
 'scavratioso4',
 'test']

### Practical tip: specify obsdata path directly in the reader instead of registering a new data ID

I.e. this code:

In [13]:
obsdata = pya.io.ReadUngridded('EBASMC', data_dir='/home/jonasg/MyPyaerocom/ws21/obslocal/EBASMultiColumn/data').read(vars_to_retrieve=['concpm10'])

Reading EBASMC from specified data loaction: /home/jonasg/MyPyaerocom/ws21/obslocal/EBASMultiColumn/data


is exactly the same as reading via the newly registered `EBAS-LOCAL`:

In [14]:
obsdata = pya.io.ReadUngridded('EBAS-LOCAL').read(vars_to_retrieve=['concpm10'])

The only difference is, that the first command will cache the data under data ID `EBASMC` and the second command will store the cached object using `EBAS-LOCAL` as ID, so there is 2 cached objects of the same data now:

In [15]:
from glob import glob
glob(f'{pya.const.CACHEDIR}/*EBAS*concpm10*')

['/home/jonasg/MyPyaerocom/_cache/jonasg/EBASMC_concpm10.pkl',
 '/home/jonasg/MyPyaerocom/_cache/jonasg/EBAS-LOCAL_concpm10.pkl',
 '/home/jonasg/MyPyaerocom/_cache/jonasg/GHOST.EBAS.daily_concpm10.pkl']