# OpenGHG for data providers: uploading and classifying data

The OpenGHG platform has the ability to interpret and standardise data from multiple different sources. For measurement networks, this currently includes data from the following projects:

- AGAGE
- DECC
- LondonGHG

and can be expanded to include more as appropriate. At present, after being uploaded once this data will be available to access directly on the platform.

The standardised format aims to be CF and CEDA compliant (as long as the necessary metadata is provided).

## Manual upload

The current interface allows new measurement data to be uploaded directly to the platform by passing the data files along with a set of keywords so the data can be appropriately identified and categorised.

For instance to upload a data file or files from the Billsdale site (site code "BSD") within the DECC network this could be uploaded and stored within the OpenGHG cloud store using the key words:

- data_type of "CRDS"
- site code of "BSD"
- network of "DECC"

The data_type here indicates the expected format of the data files themselves. This can be specific to the type of instrument being used, a site or a particular network (more details below).

In [None]:
from openghg.modules import ObsSurface
from openghg.localclient import find_files, process_files, RankSources
from openghg.processing import search
from pathlib import Path

folderpath = Path("/home/gar/Documents/Devel/RSE/openghg/data/demo/timeseries")

data_folders = {
    "CRDS": folderpath.joinpath("CRDS"),
    "GCWERKS": {"GCMD": folderpath.joinpath("gc_gcmd"), 
                "GCMS": folderpath.joinpath("gc_gcms"), 
                "medusa":folderpath.joinpath("gc_gcmd") },
}

find_files(data_folders=data_folders)

To find and process all the files in a single step we can use the `process_files` function

In [None]:
# process_files(data_folders=data_folders)

#### Aside:

Accepted data types at the moment include:

- CRDS (data from CRDS instruments, typically within the DECC and AGAGE networks)
- GCWERKS (data from GC instruments, typically within the AGAGE network)
- NOAA
- THAMESBARRIER
- BEACO2N

## Automated upload

For each site data providers will be given an API key. This will tie the data uploaded to a pre-defined set of metadata. There will be multiple ways of uploading data, either using the OpenGHG Python interface or more directly using `curl`.

```
$ openghg upload <API_key> my_data.dat
```

## Ranking data

When multiple sets of data are available for the same site and species, it is possible to set up a *ranking* to provide an order of preference for the data returned over a given time period. Once created, this ranking will then persist and will be used whenever this data is accessed.

For example at the Tacolneston site this has inlets at different heights. For different time periods, depending on the status of the instruments and the data availability, data at different data may be preferred. This can be indicated and then stored to influence which data source is returned for each species at Tacolneston.

In [None]:
r = RankSources()
r.get_sources(site="tac", species="co")

In [None]:
r.set_rank(key="co_54m_lgr", rank=1, start_date="2016-09-01", end_date="2017-06-01")
r.set_rank(key="co_100m_lgr", rank=1, start_date="2017-06-02", end_date="2019-03-03")
r.set_rank(key="co_185m_lgr", rank=1, start_date="2019-03-03", end_date="2021-06-01")

In [None]:
r.get_sources(site="tac", species="co")

In [None]:
tac_data = search(site="tac", species="co").retrieve(site="tac", species="co")

In [None]:
tac_data

In [None]:
tac_data.metadata["rank_metadata"]