# Provena Dataset Registration with Metadata
This notebook will demonstrate how a Provena user can register a new dataset in the Provena Data Store.
The demonstration will include the prerequisites to register a dataset, the registration process it self, and post-registration activities such as updating metadata with new/missed or updated fields, releasing the dataset, creating new versions, changing access permissions, and more.

## Configuration Set up

In [None]:
# This is a small helper class which provides a config object for validation and
# a loader function
import example_workflow_config
import json
# this contains helpers for interacting with the registry
import registry
import datastore
# This is a helper function for managing authentication with Provena
import mdsisclienttools.auth.TokenManager as ProvenaAuth

import json
import time
import requests
from utils import pprint_json

In [None]:
# Provena config - replace with your Provena instance endpoints

# Replace the domain with the domain of your Provena instance
PROVENA_DOMAIN = "dev.rrap-is.com"

# Edit this to point to the Keycloak instance for your Provena instance
kc_endpoint = "https://auth.dev.rrap-is.com/auth/realms/rrap"

stage = "DEV"
registry_endpoint = "https://registry-api.{}".format(PROVENA_DOMAIN)
provenance_endpoint = "https://prov-api.{}".format(PROVENA_DOMAIN)
data_store_endpoint = "https://data-api.{}".format(PROVENA_DOMAIN)
job_endpoint =  "https://job-api.{}".format(PROVENA_DOMAIN)



In [None]:
# sets up auth connections - could potentially open browser window if not signed
# in recently - caches in .tokens.json - ensure this is included in gitignore
provena_auth = ProvenaAuth.DeviceFlowManager(
    stage=stage,
    keycloak_endpoint=kc_endpoint
)

# expose the get auth function which is used for provena methods 
get_auth = provena_auth.get_auth

## Prerequisites to Dataset registration
Dataset metadata makes references to organisations, owners, and optionally more users (e.g. Data Custodian). These entities must be registered in the Provena Data Store before they can be referenced in a dataset registration. This is generally a one off activity and therefore is best performed using the the friendly web user interfaces. [A guide for registering entities is available](http://docs.provena.io/registry/registering_and_updating.html). Further more, you must also ensure you are registered as a Person Entity. More info below.

The following entities are required (unless specified otherwise) for registering a dataset.

You are minimally required to register the following entities prior to dataset registration:
* **Person Entity** of yourself (for Provena to automatically assign your person entity as the dataset entity owner)
* (Optional) **Person Entity** of the Dataset's Dataa Custodian
* **Organisation Entity** of the Dataset's Record Creator Organisation
* **Organisation Entity** of the Dataset's Publisher


In addition to registering a Person Entity of yourself, you must also then [link your account to this Person Entity](http://docs.provena.io/getting-started-is/linking-identity.html).


### Pre-requisit entities

I have pre-registered the following entities in the web user-interface which generated the following references to be used in the dataset metadata fields later:  

| Entity Type and Purpose                       | Entity Handle + Link                                      |
|-----------------------------------|-------------------------------------------------|
| Person Entity of myself           | [10378.1/1764273](https://hdl.handle.net/10378.1/1764273)   |
| Person Entity of the Dataset's Data Custodian (Peter Baker)  | [10378.1/1758949](https://hdl.handle.net/10378.1/1758949)   |
| Organisation Entity of the Dataset's Record Creator Organisation (CSIRO)  | [10378.1/1764284](https://hdl.handle.net/10378.1/1764284)   |
| Organisation Entity of the Dataset's Publisher (CSIRO)  | [10378.1/1764284](https://hdl.handle.net/10378.1/1764284)   |



In [None]:
# TODO during demonstration.
record_creator = "10378.1/1764284" 
publisher = "10378.1/1764284"
data_custodian = "10378.1/1758949"

## Dataset Registration
Now the prerequisites are done. The following sections will demonstrate how to register a dataset in the Provena Data Store. You can check the expected json payload for the endpoints using https://data-api.dev.rrap-is.com/redoc. For registering a new dataset, see https://data-api.dev.rrap-is.com/redoc#tag/Register-dataset.

#### Load in Dataset Metadata

In [None]:
# Get path to file containing the dataset metadata
# TODO enter handles here copied from UI. then inject into dataset metadata json.

dataset_metadata_path = "configs/example_dataset_registration.json"

# load into dict
with open(dataset_metadata_path) as f:
    dataset_metadata = json.load(f)

# Inject references
dataset_metadata = datastore.inject_references(dataset_metadata, record_creator, publisher, data_custodian)

# Pretty Display
#pprint_json(dataset_metadata)

#### Post Dataset Metadata to Provena for Creation of Dataset Entity

In [None]:
register_response = datastore.register_dataset(datastore_endpoint=data_store_endpoint,dataset_metadata=dataset_metadata, auth=get_auth())
print(f"Registered dataset with id {register_response['handle']}")
pprint_json(register_response)

In [None]:
print(register_response['handle'])