# OpenGHG for users: accessing and interrogating data

In [None]:
from openghg.objectstore import visualise_store
from openghg.localclient import get_obs_surface, RankSources
from openghg.processing import search, get_footprint, footprints_data_merge, get_flux
from openghg.plotting import plot_footprint

import sys
import matplotlib.pyplot as plt
import pprint

sys.path.insert(0, "../../../../openghg-plotting/")
pp = pprint.PrettyPrinter(indent=4)

from ukghg_maps import plot_flux

%load_ext autoreload
%autoreload 2

## Check measurement data available

We can first check what measurements are currently available within the object store for you to view. 

*Note: not all data will be available for all users to view depending on permissions set*

In [None]:
visualise_store()

## Accessing the measurement data

You can access the available measurement data using a set of keywords

In [None]:
site = "bsd" # Site code - BSD is for Bilsdate
species = "co" # Species name - carbon monoxide
inlet = "248m" # Specific inlet height for this site

observations = get_obs_surface(site=site, species=species, inlet=inlet)

The `observations` variable contains both the measurement data and the associated metadata for the data source. The metadata can be viewed:

In [None]:
observations.metadata

### Tell `matplotlib` we're plotting within a notebook

In [None]:
%matplotlib inline

The data can be extracted (as an xarray Dataset) and quickly plotted:

In [None]:
data = observations.data
mol_frac = data.mf
mol_frac.plot()

## Within dates

In [None]:
site = "hfd"
species = "co2"
inlet = "50m"
start_date = "2017-03-01"
end_date = "2017-03-30"

observations = get_obs_surface(site=site, species=species, inlet=inlet, start_date=start_date, end_date=end_date)

In [None]:
observations.data

In [None]:
mol_frac = observations.data.mf
mol_frac.plot()

## Less exact searches

Search for all carbon dioxide measurements in the object store

In [None]:
results = search(species="co2", skip_ranking=True)

In [None]:
results

Examine the raw data underneath

In [None]:
pp.pprint(results.raw())

In [None]:
tac_co2 = results.retrieve(species="co2", site="tac", inlet="185m")
data = tac_co2.data
data

In [None]:
tac_co2.metadata

In [None]:
mol_frac = data.co2
mol_frac.plot()

### Heathfield instead of Tacolneston

In [None]:
hfd_co2 = results.retrieve(species="co2", site="hfd", inlet="50m")
mol_frac = hfd_co2.data.co2
mol_frac.plot()

### All data from a site

In [None]:
results = search(site="BSD", skip_ranking=True)

In [None]:
results

In [None]:
sf6_data = results.retrieve(species="sf6", site="bsd", inlet="248m")

In [None]:
mol_frac = sf6_data.data.sf6
mol_frac.plot()

## Ranking inlets

If we want to prefer data from a specific inlet at a site we can give that inlet a rank. Higher ranked data will be preferred over lower rank data.

### Set some ranks

First we find the the sources we want to rank

In [None]:
r = RankSources()
r.get_sources(site="tac", species="co")

In [None]:
r.set_rank(key="co_54m_lgr", rank=1, start_date="2016-09-01", end_date="2017-06-01")
r.set_rank(key="co_100m_lgr", rank=1, start_date="2017-06-02", end_date="2019-03-03")
r.set_rank(key="co_185m_lgr", rank=1, start_date="2019-03-03", end_date="2021-06-01")

In [None]:
r.get_sources(site="tac", species="co")

In [None]:
tac_data = search(site="tac", species="co").retrieve(site="tac", species="co")

In [None]:
tac_data

In [None]:
tac_data.metadata["rank_metadata"]

In [None]:
mol_frac = tac_data.data.co
mol_frac.plot()

## Comparing to predictions

OpenGHG provides tools which allow calculation of predicted measurements based on emissions sources, where the appropriate data is available.

This is done by aggregrating sensitivity maps (footprints) for a given site and an emissions map covering the same region.

### Retrieve some footprints

Here we retrieve a footprint by itself

In [None]:
footprint = get_footprint(site="TAC", domain="europe", height="185m", start_date="2021-02-01")

In [None]:
fp_data = footprint.data
fp_data

### Plot the footprint

We can have a quick look at the footprint using an OpenGHG helper function. We plan to pull the plotting functionality out into an `openghg-plotting` package as packages like cartopy can introduce complicated dependencies of their own.

In [None]:
plot_footprint(data=fp_data, label="Footprint")

### Merge a footprint and some data together into a single Dataset

For this we use the `footprints_data_merge` function from the `processing` module.

In [None]:
start_date = "2012-01-01"
end_date = "2013-01-01"

site = "tac" 
height = "100m"
species = "ch4"
domain ="EUROPE"
model = "NAME"
network = "decc"
source = "anthro"

combined_data = footprints_data_merge(site=site, height=height, network=network, domain=domain, 
                                    start_date=start_date, end_date=end_date, species=species,
                                    flux_sources=source, load_flux=True, calc_timeseries=True)
    
data = combined_data.data

Have a quick look at the Dataset

In [None]:
data

Calculate modelled mole fraction based on emissions estimate and add a sensible baseline

In [None]:
mf = data.mf
mf_mod = data.mf_mod
baseline = mf.quantile(0.01).drop("quantile")
mf_mod = mf_mod + baseline

print(f"Estimating baseline based on 1st percentile of data: {baseline.values}")

In [None]:
fig, ax = plt.subplots()

mf.plot(marker="x", ax=ax, label="Measured mole fraction")
mf_mod.plot(marker="o", ax=ax, label="Modelled mole fraction")

ax.legend()

## Retrieving emissions / flux data

Any emissions data itself can be viewed through OpenGHG. Some simple plotting functions are provided and will be available as part of the `openghg-plotting` package.

In [None]:
em_data = get_flux(species="ch4", sources="anthro", domain="europe", start_date="2012", end_date="2012")


In [None]:
em_data

In [None]:
flux_data = em_data.data.flux

Now we can make a quick plot to have a look at the data

In [None]:
plot_flux(flux=flux_data)