Skip to content
Permalink
Browse files

Use JSON schema to define metadata. (#10)

* init schema

* readme

* change readme

* update readme

* README

* update execution

* add Markdown doc for schema
  • Loading branch information...
zhenghuiwang authored and k8s-ci-robot committed Apr 19, 2019
1 parent 33a9cf7 commit 47e5378c5b62bfbdb3d2a7e1db8abdb01c64d8d2
12 go.mod
@@ -1,13 +1,17 @@
module github.com/kubeflow/metadata
module github.com/zhenghuiwang/metadata

go 1.12

require (
github.com/bmatcuk/doublestar v1.1.1
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
github.com/golang/protobuf v1.3.1
github.com/google/go-cmp v0.2.0
github.com/grpc-ecosystem/grpc-gateway v1.8.5
google.golang.org/genproto v0.0.0-20190404172233-64821d5d2107
google.golang.org/grpc v1.19.1
gopkg.in/yaml.v2 v2.2.2 // indirect
github.com/kubeflow/metadata v0.0.0-20190416214508-33a9cf771562
github.com/xeipuuv/gojsonpointer v0.0.0-20180127040702-4e3ac2762d5f // indirect
github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 // indirect
github.com/xeipuuv/gojsonschema v1.1.0
google.golang.org/genproto v0.0.0-20190415143225-d1146b9035b9
google.golang.org/grpc v1.20.0
)
152 go.sum

Large diffs are not rendered by default.

@@ -0,0 +1,31 @@
# Metadata Schema

This directory contains all predefined metadata types in the form of [JSON schema](https://json-schema.org). We expect every piece of metadata to have the following fields:

- `id`of type _string_. Unique identifier assigned by this metadata service.
- `name` of type _string_. Name of the metadata assigned by external users.
- _type information_. We need type information to make the metadata self-explanatory and require explicit version of the type, following the Kubernetes convention. The following three fields together uniquely identify a version of a type.
- `kind` of type _string_. Name of the type.
- `namespace` of type _string_. The namespace of the type to avoid naming collision.
- `apiversion` of type _string_. The version of the type.
- `category` of type _string_. We categorize metadata based on its role in Kubeflow systems:
- _"artifact"_ represents input data and derived data in a workflow.E.g. _data set_, _model_.
- _"execution"_ represents a run of an excutable, which can have artifacts as input and/or output.
- _"container"_ represents a group of artifacts, executions, and other containers. E.g. _workspace_ for solving a ML problem and _Katib experiment_ for creating multiple models.

It is not necessary, but the easiest way to comply with these requirements is to extend the `alpha/entity.json` schema.

# Predefined Metadata
This directory contains versions of predefined metadata schemas, which are loaded by the metadata service before it starts. Therefore metadata of these types can be directly logged to the metadata store.

## Folder Structure

- Different versions of metadata schema should be organized as `<version>/<relative path>`.
- Markdown documentaion of schemas are at `<version>/docs`.
- `/examples` folder containers the example metadata as JSON files. In each file,
- field `$id` points to the its schema,
- field `example` is a JSON of metadata example.
- `schema_test.go` validates all schemas in sub-directories and examples in `/examples`.

# Customized Metadata
Customized metadata is defined in the same schema format as predefined metadata. The only difference between them is that customized metadata schemas are loaded by sending requests to the schema registration endpoint. (TODO: add link)
@@ -0,0 +1,44 @@
{
"$id": "http://github.com/kubeflow/metadata/schema/alpha/artifacts/artifact.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"allOf": [
{
"$ref": "http://github.com/kubeflow/metadata/schema/alpha/entity.json"
},
{
"properties": {
"category": {
"constant": "artifact"
},
"uri": {
"description": "unique resource identifier to the artifact",
"examples": [
"file://path/to/a/local/file",
"gcs://path/to/a/gcs/file",
"http://github.com/my-project/path/to/a/file"
],
"type": "string"
},
"version": {
"description": "entity version assigned by an external system",
"examples": [
"v1.3.2",
"e5a89c1eb6a836ecff76437ed955144b04227ad0"
],
"type": "string"
}
}
}
],
"description": "schema for an artifact, an extension of entity",
"required": [
"id",
"kind",
"namespace",
"apiversion",
"category",
"name",
"uri"
],
"type": "object"
}
@@ -0,0 +1,55 @@
{
"$id": "http://github.com/kubeflow/metadata/schema/alpha/artifacts/data_set.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"allOf": [
{
"$ref": "http://github.com/kubeflow/metadata/schema/alpha/artifacts/artifact.json"
},
{
"properties": {
"kind": {
"constant": "data_set"
},
"namespace": {
"constant": "kubeflow.org"
},
"query": {
"description": "query to get the data",
"type": "string"
},
"apiversion": {
"constant": "alpha"
}
}
}
],
"description": "alpha schema for a data set in Kubeflow",
"required": [
"id",
"kind",
"namespace",
"apiversion",
"name",
"category",
"uri"
],
"examples": [{
"annotations": {
"mylabel": "l1",
"tag": "data-set"
},
"apiversion": "v1",
"category": "artifact",
"create_time": "2018-11-13T20:20:39+00:00",
"description": "a example data",
"id": "123",
"kind": "data_set",
"name": "mytable-dump",
"namespace": "kubeflow.org",
"owner": "owner@my-company.org",
"uri": "file://path/to/dataset",
"version": "v1.0.0",
"query": "SELECT * FROM mytable"
}],
"type": "object"
}
@@ -0,0 +1,38 @@
{
"$id": "http://github.com/kubeflow/metadata/schema/alpha/artifacts/executable.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"allOf": [
{
"$ref": "http://github.com/kubeflow/metadata/schema/alpha/entity.json"
},
{
"properties": {
"category": {
"constant": "artifact"
},
"input_type": {
"items": {
"type": "object"
},
"type": "array"
},
"output_type": {
"items": {
"type": "object"
},
"type": "array"
}
}
}
],
"description": "schema for an executable, extension of an artifact",
"required": [
"id",
"kind",
"namespace",
"apiversion",
"category",
"name"
],
"type": "object"
}
@@ -0,0 +1,6 @@

{
"$id": "http://github.com/kubeflow/metadata/schema/alpha/artifacts/metrics.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "TODO: add metrics schema"
}
@@ -0,0 +1,7 @@

{

"$id": "http://github.com/kubeflow/metadata/schema/alpha/artifacts/model.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "TODO: Define the metadata about a model."
}
@@ -0,0 +1,5 @@
{

"$id": "http://github.com/kubeflow/metadata/schema/alpha/containers/workspace.json",
"$schema": "http://json-schema.org/draft-07/schema#"
}

0 comments on commit 47e5378

Please sign in to comment.
You can’t perform that action at this time.