# Validation Demo

This validation demo demonstrates how to validate the Global and Local catalogs for us of the File Registry Tool

## Setup

In [1]:
import validation

In [2]:
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

## Validate

Below we create a validator for the global heliodata registry.

In [3]:
global_catalog = 'http://heliocloud.org/catalog/HelioDataRegistry.json'
validator = validation.Validator(global_catalog)


ERROR:root:Failed Local Catalog Fetches (1/2): 
[
    GSFC HelioCloud Set 1 (us-east-1)
]


Below we validate that the global catalog schema agrees with the CloudMe Spec and what the File Registry Tool assumes.

Note: The global catalog *should* always be valid since it is validated when created and updated.

In [4]:
validator.validate_global_catalog_schema()

INFO:root:Global catalog schema validation passed.


True

Below we validate that all the local catalogs that the global catalog points to. We will check that each conforms to the CloudMe Spec and what the File Registry Tool assumes.

Note: The local catalog *should* always be valid since it is validated when created and updated. However, depending on the ownership of such buckets, changes could be done that would break the valid schema.

In [5]:
validator.validate_all_local_catalog_schemas()

INFO:root:Validating local catalog 0 GSFC HelioCloud:
ERROR:root:Local catalog schema validation failed: 'region' is a required property


False

Below we validate that all the global catalog entries have a unique name+region.

Note: The global catalog *may* enforce that name+region is unique. This spec is not explicit in the CloudMe spec, but it is useful for searchability for users.

In [6]:
validator.validate_global_uniqueness()

INFO:root:Catalog name + region uniqueness passed.


True

Below we validate that all the local catalogs have unique IDs across all catalogs.

Note: The local catalog *may* have unique IDs across all catalogs. This is not enforced according to the CloudMe spec, but IDs *should* always be unique within a local catalog and bucket.

In [7]:
validator.validate_local_uniqueness()

INFO:root:Local catalog IDs are unique across all catalogs. Passed.


True

Below we validate that all the local catalogs have valid pointers to file registry files.

Note: The local catalogs *should* always be valid since it is validated when created and updated.

In [8]:
validator.validate_all_local_catalog_file_registries()

INFO:root:Validating local catalog file registries 0 GSFC HelioCloud:

























ERROR:root:Loading Local Catalog File Registries Failed. Failures: 25


False

Below we do all of the above validations in one call.

In [9]:
validator.validate()

INFO:root:Global catalog schema validation passed.
INFO:root:Validating local catalog 0 GSFC HelioCloud:
ERROR:root:Local catalog schema validation failed: 'region' is a required property
INFO:root:Catalog name + region uniqueness passed.
INFO:root:Local catalog IDs are unique across all catalogs. Passed.
INFO:root:Validating local catalog file registries 0 GSFC HelioCloud:

























ERROR:root:Loading Local Catalog File Registries Failed. Failures: 25


False

Below get the global catalog used in all of the above validations.

In [10]:
validator.get_global_catalog()

{'CloudMe': '0.1',
 'modificationDate': '2022-01-01T00:00Z',
 'registry': [{'endpoint': 's3://helio-public/',
   'name': 'GSFC HelioCloud Public Temp',
   'region': 'us-east-1'},
  {'endpoint': 's3://gov-nasa-hdrl-data1/',
   'name': 'GSFC HelioCloud Set 1',
   'region': 'us-east-1'}]}

Below get all the local catalogs from the global catalog that were succesfully accessed.

In [11]:
validator.get_local_catalogs()

[{'Cloudy': '0.2',
  'endpoint': 's3://helio-public/',
  'name': 'GSFC HelioCloud',
  'contact': 'Dr. Contact, dr_contact@example.com',
  'description': 'Optional description of this collection',
  'citation': 'Optional how to cite, preferably a DOI for the server',
  'catalog': [{'id': 'mms1_feeps_brst_electron',
    'loc': 's3://helio-public/MMS/mms1/feeps/brst/l2/electron/',
    'title': 'mms1/feeps/brst/l2/electron/',
    'startDate': '2015-06-01T00:00:00Z',
    'stopDate': '2021-12-31T23:59:00Z',
    'modificationDate': '2023-03-08T00:00:00Z',
    'indexformat': 'csv',
    'fileformat': 'cdf',
    'description': 'Optional description for dataset',
    'resourceURL': 'optional identifier e.g. SPASE ID',
    'creationDate': '2023-04-05T00:00:00Z',
    'citation': 'optional how to cite this dataset, DOI or similar',
    'contact': 'optional contact info, SPASE ID, email, or ORCID',
    'aboutURL': 'optional website URL for info, team, etc'},
   {'id': 'mms1_feeps_brst_ion',
    'loc'