In [None]:
from matplotlib import pyplot as plt

# Key functions used to process ALE/GAGE/AGAGE data

**IMPORTANT:** This notebook is designed to demonstrate and test the main functions in this repository. It is **NOT** intended to be used to create the AGAGE data archive. To do that, please refer to the [workflow documentation](../docs/workflow.md).

Before you run this tutorial, make sure you've run the setup script to create your config file.

In [None]:
from agage_archive.io import read_nc, read_ale_gage, output_dataset, combine_datasets, read_baseline, combine_baseline
from agage_archive.convert import monthly_baseline

For this tutorial, we will use AGAGE test files in this repo by setting the network to ```agage_test```. If using with real data set this to the appropriate directory within the ```data``` folder.

In [None]:
network = "agage_test"

To read AGAGE netCDF files, use the read_agage function. E.g.:

In [None]:
ds_agage = read_nc(network, "CH3CCl3", "CGO", "GCMS-Medusa")
ds_agage.mf.plot()

In [None]:
ds_agage

Similarly, ALE or GAGE data can be read using:

In [None]:
ds_gage = read_ale_gage(network, "CH3CCl3", "CGO", "GAGE")

In [None]:
ds_gage

In [None]:
ds_gage.mf.plot()

The ```combine_datasets``` function calls these functions, based on the order in which they are specified in ```data/<network>/data_selector.xlsx```.

Here, we will create a CH3CCl3 timeseries from CGO ALE, GAGE, GCMD and Medusa data:

In [None]:
ds = combine_datasets(network, "CH3CCl3", "CGO", scale="SIO-05", verbose=False)

In [None]:
ds

In [None]:
ds.mf.plot()

To output the file to the output directory, use the ```output_dataset``` function:

In [None]:
output_dataset(ds, network, instrument="combined")

Now try a species that's only measured on the Medusa:

In [None]:
ds = combine_datasets(network, "nf3", "MHD", scale=None)

In [None]:
ds

In [None]:
output_dataset(ds, network, instrument="GCMS-Medusa")

## Extract baselines

Firstly for MD data:

In [None]:
ds_baseline = read_baseline(network, "CH3CCl3", "CGO", "GCMD", verbose=False)
ds_agage_md = read_nc(network, "CH3CCl3", "CGO", "GCMD", verbose=False)

baseline_points = ds_baseline.baseline == 1
plt.plot(ds_agage_md.time, ds_agage_md.mf, ".", label = "All data")
plt.plot(ds_agage_md.time[baseline_points], ds_agage_md.mf[baseline_points], "o", label = "Baseline")
plt.ylabel("CH3CCl3 mole fraction (ppt)")
plt.legend()

Now for combined file:

In [None]:
ds_baseline_combined = combine_baseline(network, "CH3CCl3", "CGO", verbose=False)
ds_combined = combine_datasets(network, "CH3CCl3", "CGO", verbose=False)

baseline_points = ds_baseline_combined.baseline == 1
plt.plot(ds_combined.time, ds_combined.mf, ".", label = "All data")
plt.plot(ds_combined.time[baseline_points], ds_combined.mf[baseline_points], "o", label = "Baseline")
plt.ylabel("CH3CCl3 mole fraction (ppt)")
plt.legend()

# Monthly mean baselines

In [None]:
ds_monthly = monthly_baseline(ds_combined, ds_baseline_combined)

In [None]:
ds_monthly

In [None]:
ds_combined.mf.plot()
ds_monthly.mf.plot()