-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should we specify the metadata endpoint? #3
Comments
@jb-adams you have had some experience with service-info I think? Also one possible alternative to the metadata endpoint is have the JSON send back the specific schema is uses {
"$id": "http://yourdomain.com/schemas/myschema.json",
"$schema": "http://json-schema.org/schema#"
} Both bits here shamelessly stolen from JSON Schema's basics page. |
@andrewyatz @nsheff yes I'm quite familiar with
|
It's more about how to offer the same endpoint but with extensions. In service-info we just allowed individual specifications in OpenAPI to inherit our base schema and extend. But that means you have to do it in openAPI and there is no way to access the schema bar going into OpenAPI. Maybe one to consider what our best practice is here |
Oh, so how to extend the base ServiceInfo schema in OpenAPI? We did this in
You'll see 3 objects under the
Is this what you're referring to? |
Discussions from the seqcol meeting just now said we should go the same route as refget, which specified the schema only in OpenAPI format. Also that this issue will get split into two to address the issue of having this endpoint (and if it is mandatory) and if so what is the format of that response (assuming I understood the resolution correctly) |
Hi, and thanks for including me in the seqcol meeting! I am a senior engineer employed by ELIXIR Norway (at the University of Oslo). So the reason I was invited, was that I am one of the main developers of the FAIRtracks draft standard (and related tool infrastructure) for metadata of genomic tracks files, which is the result on an ELIXIR implementation study: http://fairtracks.github.io. So FAIRtracks is available in the form of a set of JSON schemas: https://github.com/fairtracks/fairtracks_standard/. It is for now a suggestion and is meant to evolve. So obviously the metadata aspect of seqcol is of interest to me, and adding seqcol support would be a natural extension. A manuscript is written and will be submitted soon. So this seems to be a bit late in the process, so I hope I am not being too assuming here. I just wanted to present some initial thoughts:
|
From today's discussion:
|
Solved with "no metadata endpoint" decision in #54. |
The primary functions of seqcol are to 1) define unique identifiers for sequencing collections; 2) provide a protocol to serve sequence collection data given the identifiers; and 3) provide a function for comparing compatibility among sequence collections.
An important ancillary function is to provide metadata associated with a particular sequence collection, like provider, version, or organism. How should we provide this data? The current proposal is:
/metadata/:seqcol_digest
endpoint which returns an annotated JSON with all metadata for a given sequence collection./metadata-schema
endpoint that provides a JSON-schema defining the allowed and required files forDoes that seem reasonable? If so, what is the base information that should be included in the base schema? In other words, let's define the core JSON-schema.
Here's a proposal for a JSON-schema that could define a base set of metadata fields:
This means the
/metadata/:seqcol_digest
would return an array of what we might call "metadata packages", where each package must contain "source", "organism", and "aliases", and may contain "version". The rationale behind making this provide an array of "packages" instead of just one package is that multiple providers may provide the same collection, and annotate it in different ways, and this approach keeps their metadata separate.Perhaps it makes sense to use a simple ontology (or at least controlled vocabulary) for providers, and use those terms in the
source
field. If we did that, then the metadata endpoint could be qualified by a provider identifier, so you could retrieve only the metadata package specified by a particular provider. I'm not sure going to thls level of complexity is really warranted though.The text was updated successfully, but these errors were encountered: