Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple literals are masked #127

Closed
wschella opened this issue Apr 26, 2018 · 17 comments
Closed

Simple literals are masked #127

wschella opened this issue Apr 26, 2018 · 17 comments

Comments

@wschella
Copy link

Due to https://github.com/rdfjs/representation-task-force/blob/master/interface-spec.html#L231-L233
you can't create simple literals, as they would automatically become typed literals with a xsd:string datatype.

I can see why this makes sense from a usability perspective (or some spec stuff I don't know about), but wouldn't it also make sense to not do this? As the RDFJS data model arguably is an interop and representation format, information that is lost like this is might not be recoverable (eg. if you don't control the parsing of the query/expression/triples/...).

Relevant problem: some functions in the SPARQL spec, specifically those on strings explicitly only take simple literals (eg: regex, langMatches) as arguments.

What is the argumentation behind this choice?

@acoburn
Copy link

acoburn commented Apr 26, 2018

From the RDF 1.1 specification:

Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string.

Which means that a simple literal with no datatype is semantically equivalent to a literal (with the same lexical form) typed with xsd:string. So while a concrete syntax may or may not include the datatype for a string literal, under RDF 1.1, they would be considered equivalent.

@wschella
Copy link
Author

Well, then the SPARQL spec is inconsistent, or at least confusing in respect to this matter. Thanks for the prompt reply.

@blake-regalia
Copy link
Contributor

Btw, graphy will let you know whether or not you parsed the literal from a simple literal, see here: https://github.com/blake-regalia/graphy.js/blob/master/README.md#literal-extends-term-implements-rdfjs-literal

@wschella
Copy link
Author

@blake-regalia that's an interesting approach, i believe in rdf-data-model it would work as well. It's not quite robust, but it's something!

@blake-regalia
Copy link
Contributor

What is not robust about it? Can you give an example? I'd be curious to hear more.

@wschella
Copy link
Author

The moment serializing and deserializing is done, or any library in a data pipeline would in any way, shape, or form just copy the datatype, this cool trick would no longer work.

@blake-regalia
Copy link
Contributor

Sure, but at that point it's sort of out of the hands of the parser anyways. Would you find it more appropriate if there was a meta-data flag on the Literal object as well, such as .isSimple ?

@RubenVerborgh
Copy link
Member

It's a syntactical difference; this should not end up in the model. For all purposes, "a" === "a"^^xsd:string.

@blake-regalia
Copy link
Contributor

Yes, we're not discussing incorporating this in the model -- just a conversation about a library feature that tracks syntactic metadata.

@RubenVerborgh
Copy link
Member

Yeah, so I would advice against .isSimple.

@elf-pavlik
Copy link
Member

Well, then the SPARQL spec is inconsistent, or at least confusing in respect to this matter. Thanks for the prompt reply.

Looking at references to SPARQL 1.1 spec, I agree it seems to have certain lack of clarity

https://www.w3.org/TR/sparql11-query/#func-strings

17.4.3.1.1 String arguments
Certain functions (e.g. REGEX, STRLEN, CONTAINS) take a string literal as an argument and accept a simple literal, a plain literal with language tag, or a literal with datatype xsd:string. They then act on the lexcial form of the literal.
The term string literal is used in the function descriptions for this. Use of any other RDF term will cause a call to the function to raise an error.

Section above seems to distinguish between simple literal and literal with datatype xsd:string

https://www.w3.org/TR/sparql11-query/#func-regex

xsd:boolean  REGEX (string literal text, simple literal pattern)
xsd:boolean  REGEX (string literal text, simple literal pattern, simple literal flags)

https://www.w3.org/TR/sparql11-query/#func-langMatches

 xsd:boolean  langMatches (simple literal language-tag, simple literal language-range)

Seems to expect only a simple literal.

On the other hand, pattern for REGEX or language-range for langMatches appear like something that would come from application specific logic not so much from a parsed dataset.

If needed we could try to ask @afs and @swh to help with clarifying SPARQL1.1 spec.

@blake-regalia in what scenarios to you need to know if serialization used simple literal or one with explicit xsd:string datatype? I assume you have some use case for it since you included it in your documentation.

@acoburn
Copy link

acoburn commented Apr 26, 2018

It is possibly worth noting that the SPARQL 1.1 specification predates the RDF 1.1 specification by about a year. And so a SPARQL 1.1 compliant endpoint is not necessarily compliant with RDF 1.1, which would affect how simple vs. typed string literals are matched.

@blake-regalia
Copy link
Contributor

@elf-pavlik
If you take a look at the numeric literals, they are parsed from syntactically valid number strings (e.g., 42, 2.5, or 10.3e5) which strictly conform to their respective datatypes. Since one could also write in Turtle "three"^^xsd:integer, having .isNumeric and .number properties on the Literal object facilitates the consumption of numeric data.

Moreover, graphy simply tries to preserve syntactic information from the document in case the user can find it handy. This may seem silly at first glance, but when you consider the incredible performance boost from using the graph_open and graph_close events, it suddenly does not seem so silly anymore.

@RubenVerborgh why does it matter if a library extends the interface spec with its own properties?

@RubenVerborgh
Copy link
Member

Moreover, graphy simply tries to preserve syntactic information from the document in case the user can find it handy.

There's a difference between preserving syntactic information in the in-memory representation versus emitting events as they are parsed (which indeed is an interesting idea). The graph event is only meaningful because of the temporal context, which is absent with the original case under discussion here.

why does it matter if a library extends the interface spec with its own properties?

It doesn't, but you could equally maintain whether the string was single, double, or triple-quoted: each of these differences have as much meaning as whether or not xsd:string was explicitly mentioned.

But actually, given how V8 is implemented, I'm suspect you'll see performance differences in functions accepting a literal as argument, because there will be two different underlying internal classes: one where datatype is inherited, and one where it is not (not to mention the numeric subtypes).

@wschella
Copy link
Author

@blake-regalia you are very correct that at one point the it will be 'out of the hands of the parser'. That's exactly why I wouldn't base (my use case) a small library on this neat trick, as I make no assumptions about the incoming data, except that it conforms to the rdfjs model.

I feel the issue at hand (for me atleast) is the SPARQL spec being inconsistent with the RDF spec, so i'll just work around that (and tbh, the issue is quite minimal actually).

@ericprud
Copy link

As of 2004, the RDF model had an abstract syntax:

literal ::= plainLiteral | typedLiteral
plainLiteral ::= string languageTag?
typedLiteral ::= string datatype

or more succinctly:

literal ::= string (languageTag | datatype)?

In 2008, SPARQL asserted that [simple literal] said that you could no longer use the datatype function to distinguish between a simple literal and a typed literal:

returns xsd:string if the parameter is a simple literal.

but you could still see the difference with the sameTerm function (which was used in BGP match).

In 2014, the RDF 1.1 WG decided that the pain of maintaining a distinction between simple literal and xsd:string-typed literal exceeded the pain of migration to a model where everything had a datatype. In the latter model, parsers that appeared to have a simple literal form were expected to produce literals with type xsd:string. You are experiencing some of that migration pain. Apologies for not getting this right in 1999.

@wschella
Copy link
Author

I could not have wished for a better explanation. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants