# Examples
This Notebook serves as an illustrative examples on the usage of the prototype. It shows how the different APIs can be accessed and how the prototype can be used to provide several types of evaluation measurements to assist reviewers in the assessment of DMPs.

## Setup Dependencies
Before running any requests against the prototype the DMP Evaluator Application has to be running and the requests package has to be installed in this workbench in order to communicate the the prototype endpoints.

In [1]:
import sys
!{sys.executable} -m pip install requests




[notice] A new release of pip is available: 23.2.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import requests

## Information on Implemented Components
In the examples provided in this Section we query the endpoints of the prototype which provide meta-information on the implemented components. 

### Get information on available DMP loaders
The prototype provides the ability to integrate different DMP loaders in oder to fetch DMPs from different sources such as local files or remote sources and convert the gathered DMPs into a uniform DCS DMP in Turtle and JSON notation. The following call lists the DMP loader components available in the prototype implementation. In this case a component with the identifier "JSON-FILE" which loads JSON DMPs from the filesystem.

In [3]:
dmp_loader_info = requests.get('http://localhost:8080/info/dmp-providers')
dmp_loader_info.json()

['JSON-FILE']

### Get information on available Context providers
The prototype provides the ability to integrate different Context providers in oder to fetch contextual information regarding the given DMP and provide it in a uniform format to be used downstream in the evaluation of the DMP by the prototype solution. The following call lists the Context provider components available in the prototype implementation.

In this case there are 2 context loader components available. The "OPEN_AIRE" context loader retrieves more information on datasets mentions a DMP and the "RE3DATA" context loader which provides contextual information for hosts mentioned in a DMP.

In [4]:
context_loader_info = requests.get('http://localhost:8080/info/context-providers')
context_loader_info.json()

['OPEN_AIRE', 'RE3DATA']

### Get information on available evaluators
The protype implementation provides the ability to integrate "Evaluator" components which independently provide DMPQV measurements to contribute to to the overall evaluation result. Ealuator components contribute measurements for exactly 1 evaluation dimension. The implementation provides a system that evaluator components can communicate the scope of their measurements with other actors by providing informtion of the corresponding evaluation dimension, category and the metric that are targeted with the provided measurements. The call below lists all implemented dimensions grouped by their evaluation category. 

In [5]:
eval_info = requests.get('http://localhost:8080/api/evaluation/info/evaluators')
eval_info.json()

[{'category': 'COMPLETENESS',
  'dimensions': ['DCS_COMPLETENESS', 'SCIENCE_EUROPE_EXTENSION_COMPLETENESS']},
 {'category': 'COMPLIANCE',
  'dimensions': ['DCS_COMPLIANCE', 'SCIENCE_EUROPE_GUIDELINE_COMPLIANCE']},
 {'category': 'FEASIBILITY', 'dimensions': ['ACCURACY', 'AVAILABILITY']},
 {'category': 'QUALITY_OF_ACTIONS', 'dimensions': ['FAIR']}]

### UC1 Call Evaluation Endpoint to produce all available measurements for the minimal DMP
The call below shows the basic case of the evaluation of the DMP stored in 'evaluation/minimal.json' for the data lifecycle 'published'. Because no restrictions on evaluation categories and dimensions is given, the evlation is considering all possible metrics.

In [7]:
json_data = {
    'dmpLoaderParameters': {
        'dmpLoader': 'JSON-FILE',
        'dmpIdentifier': 'evaluation/minimal.json',
    },
    'dataLifecycle': 'PUBLISHED',
    'dimensions': [],
    'categories': []
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post('http://localhost:8080/api/evaluation/evaluate', headers=headers, json=json_data)

result = response.json()

In [7]:
result

{'dmpStoreId': 'ec48f00a-5d6c-4939-b470-5fcf6c97001d',
 'evaluationId': '99ec93ad-fb21-4d0b-b8c8-230ff7a6231b',
 'measurements': [{'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'required_entity_or_property_existent',
    'description': 'Existence of a required entity or property according to the specification',
    'title': 'DCS Completeness',
    'inDimension': {'inCategory': {'title': 'COMPLETENESS'},
     'title': 'DCS_COMPLETENESS'},
    'applicableDMPLifeCycles': [{'title': 'PLANNING'}],
    'expectedDataType': 'http://www.w3.org/2001/XMLSchema#boolean',
    'metricTests': []},
   'computedOn': {'entity': 'dmp'},
   'value': True,
   'softwareAgent': {'title': 'Apache Jena SHACL Validator'},
   'testResults': []},
  {'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'dcs_multiplicity_metric',
    'description': 'Mutltiplicity of value in the DMP in accordance with the DCS application profile',
    'title': 'DCS Multipli

The code below shows which evaluation dimensions are included in the evaluation. The expected values are "DCS Completeness", "Accuracy", "Availability" as well as "Guideline Compliance" and "DCS Compliance". Furthermore it should include measurements from the FAIR dimensions "Findable", "Accesible", "Interoperable" and "Reusable". If the measurements from the FAIR evaluation are not included it could be, that the external F-UJI service is not properly running. It is included as a Docker container.

In [8]:
measurements = result["measurements"]
evalDimensions = set([x["isMeasurementOf"]["inDimension"]["title"] for x in measurements])
evalDimensions

{'AVAILABILITY', 'DCS_COMPLETENESS', 'DCS_COMPLIANCE'}

### UC3 Generate evaluation report: Aggregate and Average Measurements
While the DMPQV quality measurements contain all the information resulting from the DMP evaluation, they might not provide immediate benefit to the reviewer. For example, if a reviewer wants to know how many points a data set scores in a FAIR evaluation or what the mean \acrshort{fair} score is, then this information needs to be extracted from the available measurements. As a proof of concept, we implemented the calculation of sums and mean values over DMPQV dimensions to show how the information can be processed.

For this test case the evaluation of the DMP "ex7-dataset-many.json" is required as a precondition. This DMP contains 3 references that can be resolved and 3 references which are broken. After initiating the evaluation of this DMP, the reference to the result is returned with which the evaluation report can be requested with the intent do contain the sum and the average of the values of the measurements of the dimensions availability.

In [9]:
json_data = {
    'dmpLoaderParameters': {
        'dmpLoader': 'JSON-FILE',
        'dmpIdentifier': 'dcs-repo-examples/ex7-dataset-many.json',
    },
    'dataLifecycle': 'PUBLISHED'
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post('http://localhost:8080/api/evaluation/evaluate', headers=headers, json=json_data)

exManyResult = response.json()
exManyEvaluationID = exManyResult["evaluationId"]

In [10]:
exManyResult

{'dmpStoreId': 'dbe435e3-91db-4bf3-8989-ef7393bd00fc',
 'evaluationId': 'b41a9b25-f1ea-4956-ad44-db5a921dd30e',
 'measurements': [{'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'required_entity_or_property_existent',
    'description': 'Existence of a required entity or property according to the specification',
    'title': 'DCS Completeness',
    'inDimension': {'inCategory': {'title': 'COMPLETENESS'},
     'title': 'DCS_COMPLETENESS'},
    'applicableDMPLifeCycles': [{'title': 'PLANNING'}],
    'expectedDataType': 'http://www.w3.org/2001/XMLSchema#boolean',
    'metricTests': []},
   'computedOn': {'entity': 'dmp'},
   'value': True,
   'softwareAgent': {'title': 'Apache Jena SHACL Validator'},
   'testResults': []},
  {'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'dcs_multiplicity_metric',
    'description': 'Mutltiplicity of value in the DMP in accordance with the DCS application profile',
    'title': 'DCS Multipli

This is the evaluation id from the evaluation mentioned above that the prototype uses to reference the evaluation result an all corresponding artifacts.

In [11]:
exManyEvaluationID

'b41a9b25-f1ea-4956-ad44-db5a921dd30e'

In [12]:
json_data = {
    "evaluationId": exManyEvaluationID,
    "aggregateDimensions": [
        "availability"
    ],
    "averageDimensions": [
        "availability"
    ]
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post('http://localhost:8080/api/evaluation/createReport', headers=headers, json=json_data)

report = response.json()
report["dmp"] = ""

In [13]:
report

{'dmp': '',
 'dmpFormat': 'RDF/JSON',
 'measurements': [{'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'required_entity_or_property_existent',
    'description': 'Existence of a required entity or property according to the specification',
    'title': 'DCS Completeness',
    'inDimension': {'inCategory': {'title': 'COMPLETENESS'},
     'title': 'DCS_COMPLETENESS'},
    'applicableDMPLifeCycles': [{'title': 'PLANNING'}],
    'expectedDataType': 'http://www.w3.org/2001/XMLSchema#boolean',
    'metricTests': []},
   'computedOn': {'entity': 'dmp'},
   'value': True,
   'softwareAgent': {'title': 'Apache Jena SHACL Validator'},
   'testResults': []},
  {'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'dcs_multiplicity_metric',
    'description': 'Mutltiplicity of value in the DMP in accordance with the DCS application profile',
    'title': 'DCS Multiplicity Compliance',
    'inDimension': {'inCategory': {'title': 'COMPLIANCE'

In [14]:
report["sums"]

{'availability': 3.0}

In [15]:
report["averages"]

{'availability': 0.5}

### DMP Evaluation Examples
To show the correctness of the resulting quality measurements we provide examples and manually evaluate the minimal DMP according to a subset of the goals and dimensions implemented in the prototype and compare the resulting measurements with the result of a manual evaluation. This minimal DMP is available in data/case-study/maDMPs/evaluation/minimal.json.

For each dimension considered, we will provide positive cases where the DMP meets
the requirements and negative case where we alter the given DMP to introduce an issue
that should be detected and included in the evaluation measurements.

#### G1 Completeness
For the examples regarding the goal G1 we will consider the evaluation of the dimensions
DCS Completeness, but not Extension Completeness as the the generation of measure-
ments for metrics of both dimensions makes use of equivalent methods and only the
definition of the underlying guideline is different. We present two examples of DCS
completeness evaluation: One positive case where the DMP fulfills all completeness
requirements and one negative case where the DMP is missing a required item to be
complete with regard to the DCS application profile.

In [12]:
json_data = {
    'dmpLoaderParameters': {
        'dmpLoader': 'JSON-FILE',
        'dmpIdentifier': 'evaluation/not-dcs-complete.json',
    },
    'dataLifecycle': 'PUBLISHED',
    'dimensions': ["dcs_completeness"],
    'categories': []
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post('http://localhost:8080/api/evaluation/evaluate', headers=headers, json=json_data)

result = response.json()
result

{'dmpStoreId': 'b785ac16-ee03-490d-b246-4bd1e994ab3f',
 'evaluationId': 'fb093a83-0316-4f9e-b28d-d44b42aab008',
 'measurements': [{'lifeCycleStage': {'title': 'PUBLISHED'},
   'isMeasurementOf': {'identifier': 'required_entity_or_property_existent',
    'description': 'Existence of a required entity or property according to the specification',
    'title': 'DCS Completeness',
    'inDimension': {'inCategory': {'title': 'COMPLETENESS'},
     'title': 'DCS_COMPLETENESS'},
    'applicableDMPLifeCycles': [{'title': 'PLANNING'}],
    'expectedDataType': 'http://www.w3.org/2001/XMLSchema#boolean',
    'metricTests': []},
   'guidance': {'title': 'SHACL Report',
    'description': '<https://w3id.org/dcso/ns/core#hasContact>: minCount[1]: Invalid cardinality: expected min 1: Got count = 0'},
   'computedOn': {'entity': 'https://w3id.org/dcso/ns/core#dmp_0'},
   'value': False,
   'softwareAgent': {'title': 'Apache Jena SHACL Validator'}}]}

#### G2 Feasibility

##### Accuracy
To evaluate accuracy we modified the minimal DMP and manually added information which the solution should verify by comparing it with the information retrieved from trusted sources.


In [25]:
json_data = {
    'dmpLoaderParameters': {
        'dmpLoader': 'JSON-FILE',
        'dmpIdentifier': 'evaluation/minimal-with-host.json',
    },
    'dataLifecycle': 'PUBLISHED',
    'dimensions': [],
    'categories': []
}

headers = {
    'Content-Type': 'application/json'
}

response = requests.post('http://localhost:8080/api/evaluation/evaluate', headers=headers, json=json_data)

result = response.json()
result

ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

##### Availibility

#### Quality of Actions
We do not consider the evaluation of the correctness of the measurements of this dimension as they are provided by an external service, namely the F-UJI evaluator.

#### G4 Compliance
We proposed 3 dimensions, Guideline Compliance, DCS Compliance and Extension compliance.