Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turtle serializer shouldn't write blank nodes as <...> #555

Closed
fennibay opened this issue Apr 13, 2022 · 8 comments · Fixed by #589
Closed

Turtle serializer shouldn't write blank nodes as <...> #555

fennibay opened this issue Apr 13, 2022 · 8 comments · Fixed by #589
Labels

Comments

@fennibay
Copy link
Contributor

I'm converting JSON-LD to Turtle using rdflib.js.

Example input:

{
    "@context": {
        "ex": "http://example.com#",
    },
    "@id": "ex:myid",
    "ex:prop1": {
        "ex:prop2": {
            "ex:prop3": "value",
        },
    },
}

Example current output out of rdflib.js:

@prefix ex: <http://example.com#>.

<_:b0> ex:prop2 <_:b1>.
<_:b1> ex:prop3 "value".
ex:myid ex:prop1 <_:b0>.

Turtle spec states following:

RDF blank nodes in Turtle are expressed as _: followed by a blank node label which is a series of name characters.

So, I think blank nodes should be expressed without <...>, because this makes them absolute or relative IRIs and not blank nodes.

As an additional feature, it would be nice to be able to control the blank node output to have them nested or not nested.

Questions:

  1. Is this a known issue? I saw some non-conformances in Failing RDF Turtle 1.1 conformance cases #329, but couldn't find this exact case there.
  2. Could this be affected by arguments? In case I'm calling the functions wrong? I'm including below my code snippet.
/**
 * Convert JSON-LD to Turtle
 * @param input JSON string
 * @param base Base IRI for the content
 * @param namespaces The namespace map for use in ttl
 * @returns TTL string
 */
async function convertJsonLdToTtl(
    input: string,
    base: string,
    namespaces: Record<string, string> = {},
): Promise<string> {
    return new Promise<string>((res, rej) => {
        const store = rdflib.graph()
        rdflib.parse(input, store, base, "application/ld+json", (err, kb) => {
            if (err) {
                rej(err)
            } else {
                if (!kb) {
                    rej("KB empty: " + kb)
                } else {
                    console.log("KB # statements: " + kb.statements.length)
                    rdflib.serialize(
                        null,
                        kb,
                        undefined,
                        "text/turtle",
                        (err, output) => {
                            if (err) {
                                rej(err)
                            } else {
                                if (!output) {
                                    rej("Empty output: " + output)
                                } else {
                                    res(output)
                                }
                            }
                        },
                        {
                            namespaces,
                        },
                    )
                }
            }
        })
    })
}

Many thanks.

@jeff-zucker
Copy link
Contributor

I can confirm that <_:b0> is a NamedNode, not a BlankNode in Turtle. So this looks like a bug.

@bourgeoa
Copy link
Contributor

bourgeoa commented May 3, 2022

Agreed. The issue may be in JSON-LD parser and not in turtle serializer.

@fennibay
Copy link
Contributor Author

Agreed. The issue may be in JSON-LD parser and not in turtle serializer.

Thx for the hint. So I tried to first convert from JSON-LD to N-Quads (with another library, jsonld) and then convert to Turtle. Which helped by embedding the blank nodes. So the blank node labels may still be wrong, I couldn't test this, but my problem is solved for now.

@RinkeHoekstra
Copy link
Contributor

This is rather problematic for any system that uses rdflib.js to parse JSON-LD. Any chance this can get prioritized?

@RinkeHoekstra
Copy link
Contributor

RinkeHoekstra commented Jul 18, 2022

I can confirm that e.g. the following JSON-LD is not parsed correctly:

{
    "@context": {
        "@vocab": "https://example.com/"
    },
    "hasExampleProperty": "some literal value"
}

Results in the following statement (I'm using an example IRI for the graph here):

{
    "subject": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "_:b0"
    },
    "predicate": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/hasExampleProperty"
    },
    "object": {
        "termType": "Literal",
        "classOrder": 1,
        "value": "some literal value",
        "datatype": {
            "termType": "NamedNode",
            "classOrder": 5,
            "value": "http://www.w3.org/2001/XMLSchema#string"
        },
        "isVar": 0,
        "language": ""
    },
    "graph": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/test/"
    }
}

But clearly _:b0 should be a BlankNode.

Whereas the corresponding Turtle, is parsed correctly:

@prefix ex: <https://example.com/> .

[] ex:hasExampleProperty "some literal value" .

Becomes:

{
    "subject": {
        "termType": "BlankNode",
        "classOrder": 6,
        "value": "_g_L2C39",
        "isBlank": 1,
        "isVar": 1
    },
    "predicate": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/hasExampleProperty"
    },
    "object": {
        "termType": "Literal",
        "classOrder": 1,
        "value": "some literal value",
        "datatype": {
            "termType": "NamedNode",
            "classOrder": 5,
            "value": "http://www.w3.org/2001/XMLSchema#string"
        },
        "isVar": 0,
        "language": ""
    },
    "graph": {
        "termType": "NamedNode",
        "classOrder": 5,
        "value": "https://example.com/test/"
    }
}

(Interestingly, the blank node gets a completely different internal identifier in this case).

@RinkeHoekstra
Copy link
Contributor

When the JSON-LD contains a list, the blank nodes corresponding to that collection are generated correctly:

{
    "@context": {
        "@vocab": "https://example.com/",
        "hasExampleProperty": {
            "@container": "@list"
        }
    },
    "hasExampleProperty": ["some literal value", "some other literal value"]
}

As N-Quads:

_:n4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "some other literal value".
_:n4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nill>.
_:n5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> "some literal value".
_:n5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:n4.
<_:b0> <https://example.com/hasExampleProperty> _:n5 <https://example.com/test/> .

@RinkeHoekstra
Copy link
Contributor

The function jsonldObjectToTerm does not appear to ever return a BlankNode

export function jsonldObjectToTerm (kb, obj) {

@RinkeHoekstra
Copy link
Contributor

RinkeHoekstra commented Jul 18, 2022

Diagnosis

It looks like the flatten function from jsonld.js is the culprit.

The JSON-LD parser takes the flattened output, and checks for @id attributes to determine whether the JSON object represents a blank node or not.

export default function jsonldParser (str, kb, base, callback) {
const baseString = base && Object.prototype.hasOwnProperty.call(base, 'termType')
? base.value
: base
return jsonld
.flatten(JSON.parse(str), null, { base: baseString })
.then((flattened) => flattened.reduce((store, flatResource) => {
kb = processResource(kb, base, flatResource)
return kb
}, kb))
.then(callback)
.catch(callback)
}

and:

if (Object.prototype.hasOwnProperty.call(obj, '@id')) {
return kb.rdfFactory.namedNode(obj['@id'])
}

However, the jsonld.js flattened output inserts @id attributes, e.g. the above JSON-LD (without the list) results in:

[
  {
    "@id": "_:b0",
    "https://example.com/hasExampleProperty": [
      {
        "@value": "some literal value"
      }
    ]
  }
]

This turns the node into a NamedNode because it has an @id attribute.

The @id attribute is a non-normative part of the JSON-LD specification at https://www.w3.org/TR/json-ld11/#identifying-blank-nodes.

The flattened output (also non-normative) uses this in its examples: https://www.w3.org/TR/json-ld11/#flattened-document-form (and it needs to as it cannot use nesting to group the properties of the node together).

Proposed Solution

  • Do not rely on the presence of an @id attribute, as it will always be there for named and blank nodes.
  • Use the standard syntax for blank nodes in JSON-LD to identify whether a JSON object is a blank node: any value of @id that starts with _: is a blank node.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
4 participants