# `lumia.obsdb` module

## Basic structure
The observations are stored in an instance of the `lumia.obsdb.obsdb` class (*lumia/obsdb/\__init__.py* file). The class essentially defines two tables (`pandas.DataFrame` objects):
- an `observations` table, which contains informations on the individual observations (one row for each observation)
- a `sites` table, which contains informations common to groups of observations (in our case, it contains observations related to a given observation site, such as its name and coordinates).

In [13]:
from lumia.obsdb import obsdb
from IPython.display import display, Markdown
import pandas
pandas.set_option('display.max_colwidth', 300)
pandas.set_option('display.max_rows', 12)

db = obsdb(filename='observations/truth.tar.gz')
display(db.observations)
display(db.sites)

Unnamed: 0,time,site,lat,lon,alt,height,obs,err,background,err_mod,err_obs,err_profile_bg,truth,file
0,2011-11-21 12:00:00,CIB005,41.8100,-4.9300,845.0,5.0,392.045068,3.294349,391.814668,3.278676,0.300000,0.114085,391.712176,147083
1,2011-11-28 12:00:00,CIB005,41.8100,-4.9300,845.0,5.0,402.666576,1.000000,392.479005,0.200040,0.300000,0.301554,401.022511,147083
2,2011-12-05 13:00:00,CIB005,41.8100,-4.9300,845.0,5.0,395.623640,1.000000,393.548186,0.734622,0.300000,0.487210,394.847078,147083
3,2011-05-23 12:00:00,HPB054,47.8011,11.0245,936.0,54.0,387.629958,3.448235,395.281689,3.390662,0.318198,0.540822,393.222466,350620
4,2011-08-17 12:00:00,HPB054,47.8011,11.0245,936.0,54.0,387.118939,1.000000,385.242735,0.201201,0.300000,0.649459,386.375785,350620
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9915,2011-12-18 15:29:00,WAO,52.9502,1.1219,20.0,10.0,395.334543,1.000000,395.429512,0.741418,0.300000,0.572706,396.206549,373775
9916,2011-12-25 15:29:00,WAO,52.9502,1.1219,20.0,10.0,397.296107,1.332804,395.597714,1.274913,0.300000,0.246908,397.808394,373775
9917,2011-12-28 14:29:00,WAO,52.9502,1.1219,20.0,10.0,398.974723,3.031384,395.323650,3.009688,0.300000,0.202645,397.723241,373775
9918,2011-12-29 14:29:00,WAO,52.9502,1.1219,20.0,10.0,397.616021,1.000000,395.795518,0.651527,0.300000,0.318376,398.303971,373775


Unnamed: 0,lat,lon,alt,height,code,name
BAL,55.3500,17.2200,3.0,25.0,BAL,Baltic Sea
BSC,44.1776,28.6647,0.0,5.0,BSC,"Black Sea, Constanta"
CES200,51.9710,4.9270,-1.0,200.0,CES200,"Cesar, Cabauw"
CIB005,41.8100,-4.9300,845.0,5.0,CIB005,Centro de Investigacion de la Baja Atmosfera (CIBA)
CMN,44.1800,10.7000,2165.0,12.0,CMN,Mt. Cimone Station
...,...,...,...,...,...,...
PUY015,45.7719,2.9658,1465.0,15.0,PUY015,Puy de Dome
SSL,47.9200,7.9200,1205.0,12.0,SSL,"Schauinsland, Baden-Wuerttemberg"
TRN180,47.9647,2.1125,131.0,180.0,TRN180,Trainou
TTA,56.5551,-2.9858,400.0,222.0,TTA,Tall Tower Angus


The number and position of the columns in both tables is specific to the application, varies during the inversion (which typically adds columns to the observations table to store different model results). The only imposed column in the `site` column in the `observations` table, which should point to the index of the `sites` table.

## Creating, reading and saving observation databases
There are two ways of constructing the observations database:
1. Initialize an empty database and fill-in the dataframes directly in python, possibly using a method from one of the `lumia.obsdb.obsdb` derived classes (see below);
2. Create a database file by another means, and then import it. The default format is a tar archive containing two csv files, one for the observations DataFrame (observations.csv) and one for the sites DataFrame (sites.csv):
    - writing an existing database to a tar file is done using the `save_tar` method
    - reading a tar archive relies on the `load_tar` method, but can also be done by initializing a new database with the `filename` optional argument pointing to the tar archive (as in the example above).

## Derived classes

The choice of columns, and the possible operations that can be applied on the data are very application specific. For instance, applications using observations in the obspack format (https://www.esrl.noaa.gov/gmd/ccgg/obspack/) would require different import methods than applications using observations in the format of the ICOS carbon portal (https://www.icos-cp.eu/). Technically, it would be possible to add specific methods to the `lumia.obsdb.obsdb` class for each use, but this would be difficult to maintain. Instead we rely on derived classes of `lumia.obsdb.obsdb` to expand it. The basic structure of a derived class is the following:

In [5]:
from lumia.obsdb import obsdb
class NewDB(obsdb):
    def SaySomething(self, text='Something'):
        print(text)

The `NewDB` class above adds a `SaySomething` method to the original `lumia.obsdb.obsdb` class. It can either be initialized like the main `obsdb` class:

In [6]:
db2 = NewDB(filename='observations/truth.tar.gz')
db2.SaySomething()

Something


Or it can be initialized from an existing `obsdb` object:

In [7]:
db = obsdb(filename='observations/truth.tar.gz')
try :
    db.SaySomething()
except :
    print("I won't speak without my laywer!")
db2 = NewDB(db=db)
db2.SaySomething(text="ok, you got me, I'll say everything!")

                   lumia.obsdb | ERROR    (line 49) | Unknown method or attribute for obsdb: SaySomething
I won't speak without my laywer!
ok, you got me, I'll say everything!


Several derived classes are available within the `lumia.obsdb` module:
- `lumia.obsdb.obspackDb.obsdb` expands the base `obsdb` class with a `importFromPath` method used to import obspack netCDF4 files.
- `lumia.obsdb.footprintdb.obsdb` expands the base `obsdb` with several methods to attribute footprint files to individual observations.
- `lumia.obsdb.backgrounDb.backgrounDb` adds a `read_backgrounds` method to the base `obsdb` class, to import the background concentrations used in our inversions.
- `lumia.obsdb.invdb.invdb` adds a `setupUncertainties` method, to generate observation uncertainties for the inversions.

The list is not exhaustive. The combined classes can be confined:

In [16]:
from lumia.obsdb.footprintdb import obsdb as FootprintDB
from lumia.obsdb.invdb import invdb
db = obsdb(filename='observations/truth.tar.gz')
db = FootprintDB(db=db)
db = invdb(db=db)

We can verify that `db` now has the methods defined by both `invdb` and `FootprintDB`:

In [17]:
for method in ['setupFootprints', 'setupUncertainties']:
    if hasattr(db, method):
        display(Markdown(f"db has the `{method}` method"))

db has the `setupFootprints` method

db has the `setupUncertainties` method