# Inversion howto

This is an interactive and slightly simplified version of the `var4d.py` script, used for computing the inversions of the GMDD manuscript. It can be used as a tutorial, but is by no means a comprehensive user manual.

In [2]:
from datetime import datetime
import lumia
from lumia.obsdb.footprintdb import obsdb
from lumia.formatters import lagrange
from lumia.interfaces import Interface
from lumia.control import monthlyFlux
from lumia.Uncertainties import PercentMonthlyPrior

## Run parameters (rc-file)

The settings are stored in a "rc-file" (see specific [documentation](rcfiles.html)). Here we use the "SRefG.rc" file as an example:

In [3]:
rcf = lumia.rc("../GMDD/rc/SRefG.rc")

## Read the observations database

The observation database is pre-processed and stored in a specific format, described [here](obsdb.html).

In [4]:
obsfile = rcf.get('observations.filename')
start = datetime(*rcf.get('time.start'))
end = datetime(*rcf.get('time.end'))

db = obsdb(filename=obsfile, start=start, end=end)

### Setup the footprint files:

The observations dataframe (`db.observations`) should contain a `footprint` column, pointing to the name of the file containing the footprint corresponding to each observation. If not, we use the `obsdb.setupFootprint` method to find the files. The `path` argument points to the location of the files, and the `cache` points to an optional temporary cache where the files might also be (first it looks in the `cache`, then in `path`, and if found in `path` but not in `cache`, it will copy the file to `cache`):

In [33]:
db.setupFootprints(path=rcf.get('footprints.path'), cache=rcf.get('footprints.cache'))

Checking footprints: 100%|██████████| 272/272 [00:05<00:00, 45.85it/s]


### Refinement of the obs selection:

The obs database can be reduced at this stage, for instance by excluding specific sites:

In [34]:
# If a "observations.use_sites" is defined in the rc-file, then use only these sites (see in "RA.rc" for an example)
if rcf.get("observations.use_sites", default=False):
    db.SelectSites(rcf.get("observations.use_sites"))

## Load the fluxes

We use the `lagrange` observation operator, therefore we use the `lagrange` formatter, from the `lumia.formatters` module, to handle the fluxes in the model space. With this formatter, the fluxes are to be provided in a pre-processed netCDF file, with the file names following the pattern `path/prefix.source.YYYYMM.nc` (see [here](fluxes.html) for full format specifications):

In [42]:
# Read the "emissions.categories" and "emissions.categories.extras" rc-keys (which should be two lists), and build an empty dictionary with them:
categories = dict.fromkeys(rcf.get('emissions.categories') + rcf.get('emissions.categories.extras', default=[]))
print(rcf.get('emissions.categories'))
print(rcf.get('emissions.categories.extras', default=[]))
print(categories)
# The pre-processed files are 
print(rcf.get(f"emissions.fossil.origin"))
print(rcf.get("emissions.prefix"))

['fossil', 'ocean', 'fires', 'biosphere']
[]
{'fossil': None, 'ocean': None, 'fires': None, 'biosphere': None}
EDGAR_eurocom
/media/guillaume/EXT4TB/LUMIA/fluxes/nc/eurocom05x05/3h/flux_co2.


The pre-processed files are then imported using the `lagrange.ReadArchive` method:

In [43]:
for cat in categories :
    categories[cat] = rcf.get(f'emissions.{cat}.origin')
emis = lagrange.ReadArchive(rcf.get('emissions.prefix'), start, end, categories=categories)

0%|          | 0/4 [00:00<?, ?it/s]
Importing data for category fossil:   0%|          | 0/1 [00:00<?, ?it/s][A
  0%|          | 0/4 [00:00<?, ?it/s]
Importing data for category fossil:   0%|          | 0/1 [00:00<?, ?it/s][AEmissions from category fossil will be read from file /media/guillaume/EXT4TB/LUMIA/fluxes/nc/eurocom05x05/3h/flux_co2.EDGAR_eurocom.2011.nc

 25%|██▌       | 1/4 [00:02<00:08,  2.93s/it]
Importing data for category ocean:   0%|          | 0/1 [00:00<?, ?it/s][A
 25%|██▌       | 1/4 [00:02<00:08,  2.93s/it]
Importing data for category ocean:   0%|          | 0/1 [00:00<?, ?it/s][AEmissions from category ocean will be read from file /media/guillaume/EXT4TB/LUMIA/fluxes/nc/eurocom05x05/3h/flux_co2.CARBOSCOPEv1_5.2011.nc

 50%|█████     | 2/4 [00:04<00:05,  2.66s/it]
Importing data for category fires:   0%|          | 0/1 [00:00<?, ?it/s][A
 50%|█████     | 2/4 [00:04<00:05,  2.66s/it]
Importing data for category fires:   0%|          | 0/1 [00:00<?, ?it/s][AEmi

## Initialize the observation operator:

The observation operator (`lumia.obsoperator.transport` class) essentially controls the subprocess which runs the actual forward and adjoint transport model (i.e. it launches it, and waits for the results). It also reads and writes the transport model files, but the code for doing that is included in the "formatter", which therefore needs to be passed on to the `model` object when instanciating it:

In [44]:
model = lumia.transport(rcf, obs=db, formatter=lagrange)

## Initialize the control vector 

An instance from a class from the `lumia.control` module is used to store the various inversion control vectors (prior, posterior and intermediate (pre-conditioned)), the control vector metadata (coordinates, flux category, land mask, etc.), and the prior uncertainties ($\mathbf{B}$ matrix, decomposed in variances, temporal covariances and spatial covariances). Here we use the `monthlyFlux.Control` class, which defines a monthly flux optimization:

In [48]:
ctrl = monthlyFlux.Control(rcf)

## Creation of the `Interface`

The `Interface`, formally part of the observation operator (and of its adjoint), handles the conversion of data between the control vector (i.e. used by the inversion, typically containing only the optimized fluxes, at the resolution of the inversion, here monthly, 0.5°), and the model structure (which contains also the non-optimized fluxes, such as fossil fuel here, and at the resolution used by the transport model, here 0.5°, 3-hourly).

The `Interface` is therefore specific to the couple control vector + transport model, although the correct interface is automatically selected, based on the `name` attributes of the control and model objects:

In [51]:
interface = Interface(ctrl.name, model.name, rcf, ancilliary=emis)

## Setup prior control vector and uncertainties

At this stage, we have the fluxes (in `emis`), we can then construct the prior control vector (i.e. the sum of the monthly biosphere flux). The conversion from fluxes to control vector is handled by the interface:

In [52]:
apri = interface.StructToVec(emis)
ctrl.setupPrior(apri)

The prior uncertainty vector can also be set at this stage, using the `ctrl.setupUncertainties` method. However, we need to defined that vector first. For this example, we do it using the `lumia.uncertainties.PercentMonthlyPrior` class, which defines the uncertainty of each control variable as a fraction of the absolute value of that control variable:

In [60]:
# Define the function that is going to compute the uncertainties, as a function of the fluxes:
errfunc = PercentMonthlyPrior(rcf, interface)

# Call that function, with the fluxes in use, to generate an uncertainty vector
err = errfunc(emis)

# Set that uncertainty vector as the diagonal of B
ctrl.setupUncertainties(err)

## Initialize the optimizer and run the inversion

We use the conjugate gradient optimizer, defined by the `lumia.optimizer.Optimizer` class. It takes as input variables a rc object (`rcf`), the control vector (`ctrl`), an observation operator (`model`) and the interface between the latter two (`interface`).

The inversion is initialized using the `Var4D` method of the optimizer.

In [61]:
opt = lumia.optimizer.Optimizer(rcf, ctrl, model, interface)
opt.Var4D()