Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

way to generically expose any existing complex metadata associated with a dataset #117

Closed
jvandegriff opened this issue Mar 17, 2021 · 8 comments
Assignees
Labels
NovHackathon to be resolved during Nov 2021 session priority-high
Milestone

Comments

@jvandegriff
Copy link
Collaborator

Could we define another block in the header to capture all "foreign" (i.e., non-HAPI) metadata elements?

Examples where this could be useful are cases where data processing systems that are already using ISTP metadata can't switch to HAPI because those systems were reliant on ISTP keywords. The two examples of this include SPEDAS and a CCMC model-data comparing mechanism (Komodo).

@jvandegriff
Copy link
Collaborator Author

Maybe add an externalMetadata block in the info response.

{
    "HAPI": "2.0",
    "status": {
        "code": 1200,
        "message": "OK"
    },
    "externalMetadata" : {
          "istp": { "keyword": value, "keyword2" : value } ,
          "spase": { "keyword": value, "keyword2": value }
    }

    "parameters": [
        {
            "name": "epoch",
            "type": "isotime",
            "length": 24,
            "units": "UTC",
            "fill": null,
            "description": "time as ISO 8601 UTC string with day of year and accuracy to milliseconds"
        },
        {
            "name": "Dist_Rs",
            "type": "double",
            "units": "Saturn_radii",
            "fill": null,
            "description": "Cassini to Saturn distance"
        },
        {
            "name": "Bx_SSO",
            "type": "double",
            "units": "nt",
            "fill": "-1.0e+38",
            "description": "x component of magnetic field in the SSO frame"
        },
        {
            "name": "By_SSO",
            "type": "double",
            "units": "nt",
            "fill": "-1.0e+38",
            "description": "y component of magnetic field in the SSO frame"
        },
        {
            "name": "Bz_SSO",
            "type": "double",
            "units": "nt",
            "fill": "-1.0e+38",
            "description": "z component of magnetic field in the SSO frame"
        },
    ],
    "startDate": "2004-001T00:00:04.734Z",
    "stopDate": "2017-258T10:31:10.425Z",
    "sampleStartDate": "2004-183T00:00:00.000Z",
    "sampleStopDate": "2004-184T00:00:00.000Z",
    "description": "Cassini magnetometer data as used by the MIMI team",
    "resourceURL": "none yet",
    "creationDate": "2021-076T18:26:38.000Z",
    "cadence": "PT5S"
}

@jvandegriff jvandegriff added this to the Version 3.1 milestone May 26, 2021
@jvandegriff
Copy link
Collaborator Author

if all foreign metadata was in a block, then (like with units, and maybe with coordinate systems) you could specify a schema for that foreign metadata:
ISTP, SPASE, Das2

@jvandegriff jvandegriff self-assigned this Jun 14, 2021
@jvandegriff jvandegriff added the NovHackathon to be resolved during Nov 2021 session label Nov 1, 2021
@jvandegriff
Copy link
Collaborator Author

jvandegriff commented Nov 12, 2021

"externalMetadata" : [
     {  "name" : "spase",
        "content":  "string of XML SPASE record",
            # content types allowed are: number, string, JSON Object, XML
        "schema":  "http://site.org/url/to/XMLSchema.xsd"
        "about":  "http://site.org/SPASE 3.1.1_docs"
     },
     {  "name" : "istp",
        "content":  { "json object representing the tree of ITSP keyword-value pairs" },
        "about":  "https://spdf.gsfc.nasa.gov/istp_guide/variables.html"
     },
     {  "name" : "FITS",
        "content":  { "keyword" : "value1", "keyword2": "value2" },
        "schema":  "http://example.org/FITS8.0",
        "about": "http://fits.gsfc.nasa.gov"
     }
]

name and content are required.

schema and about are optional.

content can be: number, string, JSON Object (primitive, list object, structure, etc), XML (also really just a string).

If your content is XML, it should be a string that has all the content, and the schema should be a URL to the actual XML Schema document. The URL could also point to a JSON Schema if one exists.

Make this clear in the spec: it is not the intent that client support this other than just being able to carry it along. It is a way for servers to pass along their extra metadata in a natural way, rather than just drop it on the floor.

@jvandegriff
Copy link
Collaborator Author

Bernie suggested not calling it external, but additionalMetadata.

Also, XML and JSON already have a way to specify a schema within the document, so schema is uneeded.

(Side note: HAPI responses should include the JSON schema reference. Schemas should be posted as DOIs)

"additionalMetadata" : [
     {  "name" : "spase",
        "content":  "string of XML SPASE record",
            # content types allowed are: number, string, JSON Object, XML
        "contentURL": "doi or other very persistent info source",
            # must have one type of content
        "aboutURL":  "http://spase-group.org"
     },
     {  "name" : "istp",
        "content":  { "json object representing the tree of ITSP keyword-value pairs" },
        "aboutURL":  "https://spdf.gsfc.nasa.gov/istp_guide/variables.html"
     },
     {  "name" : "FITS",
        "content":  { "keyword" : "value1", "keyword2": "value2" },
        "aboutURL": "http://fits.gsfc.nasa.gov"
     }
]

name is required

one of content OR contentURL is required; having both is not allowed

about is optional

@jvandegriff
Copy link
Collaborator Author

jvandegriff commented Nov 17, 2021

conventionName is better than just name

Eventually we could be more specific in terms of identifying how to interpret the additional info by labeling it with schemaName instead. This would have to be a URL reference to the computer readable schema. For things like XML, this would correspond to the schema in the XML file (but note that XML docs can list more than one schema!). We are postponing use of this for now.

Note too that the use of additional in additionalMetadata might be confusing since this is not meant for just any additional things - it is really meant for capturing a set of existing info that you don't want to lose by using HAPI.

For conventionName, we provide a list of known ones. The value can be anything, but we will provide a table of known ones - please use these names for existing known schemas. Other fields will have their own set of common (maybe even standard!) schemas.

Need to emphasize that any content here is for the whole dataset. ISTP info is usually for each file, as are FITS headers, so even though there may be dataset-wide into in a FITS header, most of it is file-specific and not appropriate for addtionalMetadata. A server would have to do extra work to find just the dataset-wide elements (this might not be too hard for ISTP, for example).

"additionalMetadata" : [
     {  "conventionName" : "spase",
        "content":  "string of XML SPASE record",
            # content types allowed are: number, string, JSON Object, XML
        "contentURL": "doi or other very persistent info source",
            # must have one type of content
        "aboutURL":  "http://spase-group.org"
     },
     {  "conventionName" : "cf",
        "content":  { "keyword" : "value1", "keyword2": "value2" },
        "aboutURL": "https://cfconventions.org/"
     }
]

We are just looking into CF Conventions - they seem very SPASE-like (i.e., are for the whole dataset).

@jvandegriff
Copy link
Collaborator Author

Just an FYI - more info about NcML:
https://www.unidata.ucar.edu/software/tds/current/tutorial/NcML.htm

I would need to study this more to see if it is like SPASE in terms of describing a dataset. Note that Earth Science folks use the word "dataset", but often mean something different -- sometimes it means just one file, which could contain all the data from one campaign of salinity measurements from an ocean campaign.

@berniegsfc
Copy link
Contributor

I have a prototype of this. Here's an example
$ curl "https://cdaweb.gsfc.nasa.gov/registry/hdp/hapi/info?dataset=spase://NASA/NumericalData/ACE/MAG/L2/PT16S" | jq
Remember, if you use a browser (which specifies a preference for html) to request info from this hapi implementation, you will get an HTML representation. So use curl to get json.
Also, if you specify a preference for json when requesting the spase metadata, you'll get json instead of xml. For example,
$ curl -H "Accept: application/json" "https://cdaweb.gsfc.nasa.gov/registry/hdp/Spase.xql?id=spase://NASA/NumericalData/ACE/MAG/L2/PT16S" |jq

@jvandegriff
Copy link
Collaborator Author

jvandegriff commented May 17, 2022

Comments from May 17 telecon:

Note: don't use FITS but FITSheader (we don't want a full FITS file in the metadata.)

Use cases:
For example, if someone uses SPASE, the validator would look in the XML for the schema, and then verify it. Same for JSON.

"additionalMetadata" : [
     {
      (optional)
       "name" : "SPASE",  # in the documentation, explain that this name can refer to the schema (computer readable definition and structure) or convention (human readable specification)
        (must have one of `content` or `contentURL`)
       "content":  "string of XML SPASE record",
            # content types allowed are: number, string, JSON Object, XML
       "contentURL": "doi or other very persistent info source",

      (optional)
       "schemaURL": URL_to_XSD (for XML) JSON_file_that_is_the_schema

      (optional)
       "aboutURL":  "http://spase-group.org"
     },

    (can have a list of multiple sets of other metadata)
     {  "name" : "cf",
        "content":  { "keyword" : "value1", "keyword2": "value2" },
        "aboutURL": "https://cfconventions.org/"
     }
]

name is if the additional MD follows some kind of standard that people know about.
If there is a schema embedded in the MD, clients just need to figure that out. (XML and JSON have schemas built-in.)
If there is a separate, external schema, you can refer to it using the optional schemaURL.
The aboutURL is for human readable resources about the MD flavor / type.

Please use these names if appropriate: SPASE, ISTP.
Look at these locations to find schema URLs:
Mike is making JSON for ISTP metadata, but there is no schema yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NovHackathon to be resolved during Nov 2021 session priority-high
Projects
None yet
Development

No branches or pull requests

3 participants