# PID resolution example

- **Dependencies**:
   - xarray (optional, used just for data attribute inspection)
   - PYHANDLE client 
        - code and installation: https://github.com/EUDAT-B2SAFE/PYHANDLE
        - documentation: https://eudat-b2safe.github.io/PYHANDLE/
    
- **Content**:
    - get tracking id from files
    - resolve tracking id associated metadata
    - get specific metadata items (e.g. replica, version and dataset relationships)

In [1]:
import xarray as xr
import os
from glob import glob as glob
from pathlib import Path, PureWindowsPath

### Open CMIP6 test file with tracking_id global attribute

In [2]:
prefix = '/run/media/stephan/Volume/data/tas'
access_tas_file = os.path.join(prefix,'tas_Amon_GFDL-CM4_piControl_r1i1p1f1_gr1_015101-025012.nc')
#other_tas_file = os.path.join(prefix,"tas_Amon_CESM2_piControl_r1i1p1f1_gn_000101-009912.nc")
dset = xr.open_dataset(access_tas_file, decode_times=True)
#dset2 = xr.open_dataset(other_tas_file,decode_times=True)

In [3]:
#dset.attrs
dset.attrs['tracking_id']

'hdl:21.14100/7ba3c844-8001-404b-b3df-7e5b680b4000'

In [4]:
tracking_id = dset.tracking_id.split(':')[1]
print(tracking_id)

21.14100/7ba3c844-8001-404b-b3df-7e5b680b4000


### Use PyHandle Client to retrieve tracking_id associated metadata

In [5]:
from pyhandle.handleclient import PyHandleClient
client = PyHandleClient('rest')

INFO:pyhandle.client.resthandleclient:Instantiating RESTHandleClient at 2019-06-28_15:16
INFO:pyhandle.handlesystemconnector:Instantiating HandleSystemConnector
INFO:pyhandle.handlesystemconnector: - handle_server_url set to default: https://hdl.handle.net
INFO:pyhandle.handlesystemconnector: - url_extension_REST_API set to default: /api/handles/
INFO:pyhandle.handlesystemconnector: - https_verify set to default: True
INFO:pyhandle.searcher:Instantiating Searcher
INFO:pyhandle.searcher: - https_verify set to default: True
INFO:pyhandle.searcher: - allowed_search_keys set to default: ['URL', 'CHECKSUM']
INFO:pyhandle.searcher: - solrbaseurl: No default.
INFO:pyhandle.searcher: - reverselookup_url_extension set to default: /hrls/handles/
INFO:pyhandle.searcher: - reverselookup_username: Not specified. No default.
INFO:pyhandle.searcher: - reverselookup_password: Not specified. No default.
INFO:pyhandle.searcher:Reverse lookup not possible. Neither username nor password were provided.
INF

### Looking into the json metadata record

In [6]:
from pprint import pprint

result = client.retrieve_handle_record_json(tracking_id)
info_types = ['FILE_SIZE', 'CHECKSUM','URL_ORIGINAL_DATA','URL_REPLICA','IS_PART_OF']

# print metadata entry types and get specific type related entries
print("Metadata entry types supported:")
res = {}
for entry in result['values']:
    print(entry['type'])
    if entry['type'] in info_types:
        res[entry['type']] = entry['data']['value'] 

Metadata entry types supported:
URL
AGGREGATION_LEVEL
FIXED_CONTENT
FILE_NAME
FILE_SIZE
IS_PART_OF
FILE_VERSION
CHECKSUM
CHECKSUM_METHOD
URL_ORIGINAL_DATA
URL_REPLICA
HS_ADMIN


In [7]:
# print retrieved metadata
pprint(res) 

{'CHECKSUM': '81b93009445bb962c0df852e9ae3a637719fb4a070790def2dd388e2ec9af427',
 'FILE_SIZE': '139975118',
 'IS_PART_OF': 'hdl:21.14100/c6dd2530-655c-33be-b03c-7c981bcf704b',
 'URL_ORIGINAL_DATA': '<locations><location '
                      'href="http://esgdata.gfdl.noaa.gov/thredds/fileServer/gfdl_dataroot3/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_piControl_r1i1p1f1_gr1_015101-025012.nc" '
                      'publishedOn="2018-10-04T18:12:27.152+00:00" '
                      'host="esgdata.gfdl.noaa.gov" '
                      'dataset="hdl:21.14100/c6dd2530-655c-33be-b03c-7c981bcf704b" '
                      '/></locations>',
 'URL_REPLICA': '<locations><location '
                'href="http://esgf-data3.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_piControl_r1i1p1f1_gr1_015101-025012.nc" '
                'publishedOn="2019-03-13T16:56:48.187+00:

### Using the get_value_from_handle method to retrieve specific handle metadata entries

In [8]:
for info_type in info_types:   
  result = client.get_value_from_handle(tracking_id, info_type)
  print(info_type, " :", result)

FILE_SIZE  : 139975118
CHECKSUM  : 81b93009445bb962c0df852e9ae3a637719fb4a070790def2dd388e2ec9af427
URL_ORIGINAL_DATA  : <locations><location href="http://esgdata.gfdl.noaa.gov/thredds/fileServer/gfdl_dataroot3/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_piControl_r1i1p1f1_gr1_015101-025012.nc" publishedOn="2018-10-04T18:12:27.152+00:00" host="esgdata.gfdl.noaa.gov" dataset="hdl:21.14100/c6dd2530-655c-33be-b03c-7c981bcf704b" /></locations>
URL_REPLICA  : <locations><location href="http://esgf-data3.ceda.ac.uk/thredds/fileServer/esg_cmip6/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_GFDL-CM4_piControl_r1i1p1f1_gr1_015101-025012.nc" publishedOn="2019-03-13T16:56:48.187+00:00" host="esgf-data3.ceda.ac.uk" dataset="hdl:21.14100/c6dd2530-655c-33be-b03c-7c981bcf704b" /><location href="http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/piControl/r1i1p1f1/Amon/tas/gr1/v20180701/tas_Amon_G

In [9]:
# get metadata of the containing dataset
resultm = client.retrieve_handle_record_json(res['IS_PART_OF'].split(':')[1])
pprint(resultm)

{'handle': '21.14100/c6dd2530-655c-33be-b03c-7c981bcf704b',
 'responseCode': 1,
 'values': [{'data': {'format': 'string',
                      'value': 'https://handle-esgf.dkrz.de/lp/21.14100/c6dd2530-655c-33be-b03c-7c981bcf704b'},
             'index': 1,
             'timestamp': '2018-10-04T18:12:30Z',
             'ttl': 86400,
             'type': 'URL'},
            {'data': {'format': 'string', 'value': 'DATASET'},
             'index': 2,
             'timestamp': '2018-10-04T18:12:30Z',
             'ttl': 86400,
             'type': 'AGGREGATION_LEVEL'},
            {'data': {'format': 'string', 'value': 'TRUE'},
             'index': 3,
             'timestamp': '2018-10-04T18:12:30Z',
             'ttl': 86400,
             'type': 'FIXED_CONTENT'},
            {'data': {'format': 'string',
                      'value': 'CMIP6.CMIP.NOAA-GFDL.GFDL-CM4.piControl.r1i1p1f1.Amon.tas.gr1'},
             'index': 4,
             'timestamp': '2018-10-04T18:12:30Z',
            