# CEDA DataPoint - Demonstration

`ceda-datapoint` is a Python package which provides Python-based search/access tools for using data primarily from the CEDA Archive, which includes many CCI Knowledge Exchange datasets. Datapoint serves as a wrapper around the `pystac_client` library to provide additional functionality, like reading Cloud Optimised formats via STAC records without manual intervention, and searching for items within nested collection structures. 

Both of these are especially important to the CCI STAC index, which contains many examples of Cloud Optimised formats, as well as a more complex nested collection structure than other similar catalogues.

Additional documentation is available for [ceda-datapoint](https://cedadev.github.io/datapoint/) which can be installed via pip with `pip install ceda-datapoint`

Note that the pystac_client itself has some [good documentation](https://pystac-client.readthedocs.io/en/latest/usage.html) if any of this isn't clear or something isn't covered. 

First we import the `DataPointClient` from `ceda-datapoint`

In [1]:
from ceda_datapoint import DataPointClient

Now we can create our client using the url of the STAC catalog that we want to work with.

In [None]:
# In the future this will be registered as `org='cci'` instead of needing the URL to be explicitly provided.
client = DataPointClient(url="https://api.stac.164.30.69.113.nip.io/")

The client has many different function to examine the collections available. For the CCI index it is easiest to view each level of collections below the main `cci` collection by going to the STAC browser using the link: https://radiantearth.github.io/stac-browser/#/external/api.stac.164.30.69.113.nip.io/collections/cci

From here you can navigate to any given subcollection and find the `ID` in either the URL or the record metadata. This can then be searched using the client. Alternatively if the UUID of the CEDA catalogue record is known, this relates directly to the STAC collection for that record, so it can be used to search for items as we will now demonstrate.

Note: For datapoint version 0.6.0 it will be possible to use `display_collections` under a given subcollection i.e `cci` such that this navigation step is not necessary.

In [None]:
search = client.search(
    collections=['fire'], 
    query=[
        'aggregation=true', # Required for the CCI Collection
    ],
    max_items=10)

In the above search we have selected an ESA Biomass record (v6 global dataset) to locate items for retrieval. We specify the `aggregation=True` queryable because with the CCI index, items that are cloud optimised are listed as aggregations. Omitting this queryable will result in a large number of normal items being received which cannot be processed directly with DataPoint as they are not cloud optimised.

We can now display the assets from the items received by this query, where each item typically contains one file asset in the CCI index.

In [8]:
search.display_cloud_assets()

<DataPointItem: ESACCI-L4_FIRE-BA-MODIS-20010101-20200120-fv5.1-kr1.2 (Collection: esacci.fire.mon.l4.ba.modis.terra.modis_terra.v5-1.grid)>
 - kerchunk
<DataPointItem: 198201-201812-ESACCI-L4_FIRE-BA-AVHRR-LTDR-fv1.1_kr1.0 (Collection: 62866635ab074e07b93f17fbf87a2c1a-main)>
 - kerchunk


We can inspect some of the collections metadata including it's spatial and temporal extents, as well as any summary information it has on it's items.

In [18]:
fire = client._client.get_collection('fire')
print(fire.keywords)
print(fire.summaries)
fire.license

['ESACCI', 'Fire', 'fire']
<pystac.summaries.Summaries object at 0x10560d270>


'other'

Finally, we can access the kerchunk dataset for the above search by specifying the full title of the asset, as can be seen below.

In [27]:
ds = search.open_dataset('198201-201812-ESACCI-L4_FIRE-BA-AVHRR-LTDR-fv1.1_kr1.0-reference_file')
ds