Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New json ld context generator #36

Merged
merged 5 commits into from Sep 5, 2022

Conversation

timothee-haudebourg
Copy link
Collaborator

@timothee-haudebourg timothee-haudebourg commented Sep 2, 2022

The current LD context generator only generates type-scoped contexts with the assumption that incoming LD documents will always advertise the type of every node using a @type property. For instance, take the following TreeLDR schema:

base <https://example.com>;

type Foo {
  bar: Bar
}

type Bar {
  foo: Foo
}

using the command tldrc -i example.tldr json-ld-context https://example.com/Foo https://example.com/Bar, the following LD context is generated:

{
  "Foo": {
    "@id": "https://example.com/Foo",
    "bar": "https://example.com/Bar"
  }
  "Bar": {
    "@id": "https://example.com/Bar",
    "foo": "https://example.com/Foo"
  }
}

This is correct for an input LD document such as this:

{
  "@type": "Foo",
  "bar": {
    "@type": "Bar",
    "foo": {}
  }
}

Note how each node contains a @type entry, which is not specified in the original TreeLDR schema. So we want to be able to handle documents where no type is specified, such as

{
  "bar": {
    "foo": {}
  }
}

However it is not possible to specify the type of all the nodes using the JSON-LD context (specifying a @type entry inside the context only apply for value objects).

The purpose of the PR is to create a new LD context generator algorithm without this limitation. Type scoped contexts are useful to avoid conflicts and ambiguities between term definitions, so they should be generated whenever possible. Otherwise, terms should be defined globally, or inside property scoped contexts to avoid clashes whenever possible (if there is no cycle).

For the above example, the following context should be generated:

{
  "bar": "https://example.com/Foo/bar",
  "foo": "https://example.com/Bar/foo"
}

Ambiguous terms

Without type scoped contexts, ambiguities can arise when two layouts define fields with the same name.

type Foo {
  bar: Bar,
  prop: A
}

type Bar {
  foo: Foo,
  prop: B
}

Here there is an ambiguity on the prop term definition. These ambiguities should be detected by the LD context generator.

To solve this ambiguity, we could define some "main" layout (the expected layout of the input documents) and some included secondary layouts. Then the prop term of the main layout is defined globally while the prop term of the secondary layout is defined in a property-scoped context.

Type scoped contexts

Type scoped contexts can still be generated at the condition that the TreeLDR layout explicitly contains a required field holding its type.

type Foo {
  required rdf:type as myType,
  bar: Bar
}

Then the following context can be generated:

{
  "myType": "@type",
  "Foo": {
    "bar": "https://example.com/Foo/bar"
  }
}

Note: we should allow @type to be a valid field name so we can write:

type Foo {
  required rdf:type as @type,
  bar: Bar
}

which would generate the following context without @type alias:

{
  "Foo": {
    "bar": "https://example.com/Foo/bar"
  }
}

External contexts

Sometimes contexts are loaded alongside other contexts. For now, the context generator assumes the generated context will be the only one loader and should include all the term definitions. Instead, it would be nice to specify a list of contexts that will be loaded before the generated one. The generator can then omit the definitions already present in the input contexts.

Implementation status

  • json-ld refactor
  • Simple implementation without caring for ambiguities
  • Ambiguities detection
  • Ambiguities resolution using primary/secondary layouts and property scoped contexts
  • Type scoped contexts
  • External contexts

@timothee-haudebourg
Copy link
Collaborator Author

I discuss here some current limitations for the generation of type scoped contexts. Consider the following TreeLDR document:

base <https://example.com>;
use <http://www.w3.org/1999/02/22-rdf-syntax-ns#> as rdf;
use <http://www.w3.org/2000/01/rdf-schema#> as rdfs;

type Foo {
	bar: Bar,
	rdf:type: required &rdfs:Class
}

type Bar {
	foo: Foo,
	rdf:type: required &rdfs:Class
}

We want to generate the following JSON-LD context, with two type scoped contexts:

{
  "type": "@type",
  "Foo": {
    "@id": "https://example.com/Foo",
    "bar": "https://example.com/Foo/bar"
  },
  "Bar": {
    "@id": "https://example.com/Bar",
    "foo": "https://example.com/Bar/foo"
  }
}

This should be doable with the command:

tldrc -i example.tldr json-ld-context https://example.com/Foo https://example.com/Bar

Anonymous layouts cause ambiguous term definitions

In this example, each type define a type field for the rdf:type property with the same layout required &rdfs:Class. However because this layout is defined inline, it is anonymous and is given a blank node identifier. One blank node identifier for each occurrence, which means that the two fields in fact refer to two different layouts with different blank node identifiers. This causes the LD context generator to detect an ambiguity for the type term.

This can be solved by generating a unique blank node identifier for structurally equivalent anonymous layouts. This is already the case for references.

Type scoped context term name

In this example, we expect the type scoped context to be defined with the term Foo and Bar, extracted from the layout names. However this is completely inconsistent with the semantics of TreeLDR. The layout of the type field is required &rdfs:Class. The current semantics of TreeLDR dictates that the expected value for this field is hence an IRI, not Foo nor Bar.

One solution to this problem would be to allow the definition of custom reference layouts. One could specify what values the reference can take, and how it maps to actual IRIs. For instance with an enumeration layout (not yet implemented):

layout MyReference [ "Foo" = Foo, "Bar" = Bar ];

This states that a value of the layout MyReference is either the string Foo referring to https://example.com/Foo or Bar referring to https://example.com/Bar. Custom references are a power tool that can be useful outside the scope of simply generating type scoped contexts.

@timothee-haudebourg
Copy link
Collaborator Author

Until custom references are implemented, the generation of type scoped contexts is enabled only with the --rdf-type-to-layout-name option that explicitly asks for the value of the rdf:type property to be interpreted as the layout name. Once they are implemented, the option will be deprecated in favor of custom layouts.

tldrc -i example.tldr json-ld-context --rdf-type-to-layout-name https://example.com/Foo https://example.com/Bar

@timothee-haudebourg
Copy link
Collaborator Author

External contexts implementation is deferred (I'll probably need to harmonize the way vocabulary is handled before I can do that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite the JSON-LD context generator with new json-ld library
1 participant