Simple literals are masked #127

wschella · 2018-04-26T14:25:47Z

Due to https://github.com/rdfjs/representation-task-force/blob/master/interface-spec.html#L231-L233
you can't create simple literals, as they would automatically become typed literals with a xsd:string datatype.

I can see why this makes sense from a usability perspective (or some spec stuff I don't know about), but wouldn't it also make sense to not do this? As the RDFJS data model arguably is an interop and representation format, information that is lost like this is might not be recoverable (eg. if you don't control the parsing of the query/expression/triples/...).

Relevant problem: some functions in the SPARQL spec, specifically those on strings explicitly only take simple literals (eg: regex, langMatches) as arguments.

What is the argumentation behind this choice?

acoburn · 2018-04-26T14:36:08Z

From the RDF 1.1 specification:

Simple literals are syntactic sugar for abstract syntax literals with the datatype IRI http://www.w3.org/2001/XMLSchema#string.

Which means that a simple literal with no datatype is semantically equivalent to a literal (with the same lexical form) typed with xsd:string. So while a concrete syntax may or may not include the datatype for a string literal, under RDF 1.1, they would be considered equivalent.

wschella · 2018-04-26T14:55:59Z

Well, then the SPARQL spec is inconsistent, or at least confusing in respect to this matter. Thanks for the prompt reply.

blake-regalia · 2018-04-26T16:16:47Z

Btw, graphy will let you know whether or not you parsed the literal from a simple literal, see here: https://github.com/blake-regalia/graphy.js/blob/master/README.md#literal-extends-term-implements-rdfjs-literal

wschella · 2018-04-26T17:16:51Z

@blake-regalia that's an interesting approach, i believe in rdf-data-model it would work as well. It's not quite robust, but it's something!

blake-regalia · 2018-04-26T17:41:39Z

What is not robust about it? Can you give an example? I'd be curious to hear more.

wschella · 2018-04-26T18:02:58Z

The moment serializing and deserializing is done, or any library in a data pipeline would in any way, shape, or form just copy the datatype, this cool trick would no longer work.

blake-regalia · 2018-04-26T18:15:47Z

Sure, but at that point it's sort of out of the hands of the parser anyways. Would you find it more appropriate if there was a meta-data flag on the Literal object as well, such as .isSimple ?

RubenVerborgh · 2018-04-26T20:41:58Z

It's a syntactical difference; this should not end up in the model. For all purposes, "a" === "a"^^xsd:string.

blake-regalia · 2018-04-26T21:11:00Z

Yes, we're not discussing incorporating this in the model -- just a conversation about a library feature that tracks syntactic metadata.

RubenVerborgh · 2018-04-26T21:12:18Z

Yeah, so I would advice against .isSimple.

elf-pavlik · 2018-04-26T21:13:21Z

Well, then the SPARQL spec is inconsistent, or at least confusing in respect to this matter. Thanks for the prompt reply.

Looking at references to SPARQL 1.1 spec, I agree it seems to have certain lack of clarity

https://www.w3.org/TR/sparql11-query/#func-strings

17.4.3.1.1 String arguments
Certain functions (e.g. REGEX, STRLEN, CONTAINS) take a string literal as an argument and accept a simple literal, a plain literal with language tag, or a literal with datatype xsd:string. They then act on the lexcial form of the literal.
The term string literal is used in the function descriptions for this. Use of any other RDF term will cause a call to the function to raise an error.

Section above seems to distinguish between simple literal and literal with datatype xsd:string

https://www.w3.org/TR/sparql11-query/#func-regex

xsd:boolean  REGEX (string literal text, simple literal pattern)
xsd:boolean  REGEX (string literal text, simple literal pattern, simple literal flags)

https://www.w3.org/TR/sparql11-query/#func-langMatches

 xsd:boolean  langMatches (simple literal language-tag, simple literal language-range)

Seems to expect only a simple literal.

On the other hand, pattern for REGEX or language-range for langMatches appear like something that would come from application specific logic not so much from a parsed dataset.

If needed we could try to ask @afs and @swh to help with clarifying SPARQL1.1 spec.

@blake-regalia in what scenarios to you need to know if serialization used simple literal or one with explicit xsd:string datatype? I assume you have some use case for it since you included it in your documentation.

acoburn · 2018-04-26T21:25:56Z

It is possibly worth noting that the SPARQL 1.1 specification predates the RDF 1.1 specification by about a year. And so a SPARQL 1.1 compliant endpoint is not necessarily compliant with RDF 1.1, which would affect how simple vs. typed string literals are matched.

blake-regalia · 2018-04-26T21:32:49Z

@elf-pavlik
If you take a look at the numeric literals, they are parsed from syntactically valid number strings (e.g., 42, 2.5, or 10.3e5) which strictly conform to their respective datatypes. Since one could also write in Turtle "three"^^xsd:integer, having .isNumeric and .number properties on the Literal object facilitates the consumption of numeric data.

Moreover, graphy simply tries to preserve syntactic information from the document in case the user can find it handy. This may seem silly at first glance, but when you consider the incredible performance boost from using the graph_open and graph_close events, it suddenly does not seem so silly anymore.

@RubenVerborgh why does it matter if a library extends the interface spec with its own properties?

RubenVerborgh · 2018-04-26T22:18:14Z

Moreover, graphy simply tries to preserve syntactic information from the document in case the user can find it handy.

There's a difference between preserving syntactic information in the in-memory representation versus emitting events as they are parsed (which indeed is an interesting idea). The graph event is only meaningful because of the temporal context, which is absent with the original case under discussion here.

why does it matter if a library extends the interface spec with its own properties?

It doesn't, but you could equally maintain whether the string was single, double, or triple-quoted: each of these differences have as much meaning as whether or not xsd:string was explicitly mentioned.

But actually, given how V8 is implemented, I'm suspect you'll see performance differences in functions accepting a literal as argument, because there will be two different underlying internal classes: one where datatype is inherited, and one where it is not (not to mention the numeric subtypes).

wschella · 2018-04-29T11:42:32Z

@blake-regalia you are very correct that at one point the it will be 'out of the hands of the parser'. That's exactly why I wouldn't base (my use case) a small library on this neat trick, as I make no assumptions about the incoming data, except that it conforms to the rdfjs model.

I feel the issue at hand (for me atleast) is the SPARQL spec being inconsistent with the RDF spec, so i'll just work around that (and tbh, the issue is quite minimal actually).

ericprud · 2018-04-30T06:22:44Z

As of 2004, the RDF model had an abstract syntax:

literal ::= plainLiteral | typedLiteral
plainLiteral ::= string languageTag?
typedLiteral ::= string datatype

or more succinctly:

literal ::= string (languageTag | datatype)?

In 2008, SPARQL asserted that [simple literal] said that you could no longer use the datatype function to distinguish between a simple literal and a typed literal:

returns xsd:string if the parameter is a simple literal.

but you could still see the difference with the sameTerm function (which was used in BGP match).

In 2014, the RDF 1.1 WG decided that the pain of maintaining a distinction between simple literal and xsd:string-typed literal exceeded the pain of migration to a model where everything had a datatype. In the latter model, parsers that appeared to have a simple literal form were expected to produce literals with type xsd:string. You are experiencing some of that migration pain. Apologies for not getting this right in 1999.

wschella · 2018-04-30T09:59:00Z

I could not have wished for a better explanation. Thanks :)

wschella mentioned this issue Apr 26, 2018

RDF.js masks plain literals wschella/Sparqlee#3

Open

wschella closed this as completed Apr 26, 2018

wschella mentioned this issue Sep 27, 2018

Plain literal behaviour comunica/sparqlee#2

Closed

danielbeeke mentioned this issue Feb 27, 2023

Adding the val to the statement of the literal equality check comunica/sparqlee#168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple literals are masked #127

Simple literals are masked #127

wschella commented Apr 26, 2018

acoburn commented Apr 26, 2018

wschella commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

wschella commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

wschella commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

RubenVerborgh commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

RubenVerborgh commented Apr 26, 2018

elf-pavlik commented Apr 26, 2018

acoburn commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

RubenVerborgh commented Apr 26, 2018

wschella commented Apr 29, 2018

ericprud commented Apr 30, 2018

wschella commented Apr 30, 2018

Simple literals are masked #127

Simple literals are masked #127

Comments

wschella commented Apr 26, 2018

acoburn commented Apr 26, 2018

wschella commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

wschella commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

wschella commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

RubenVerborgh commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

RubenVerborgh commented Apr 26, 2018

elf-pavlik commented Apr 26, 2018

acoburn commented Apr 26, 2018

blake-regalia commented Apr 26, 2018

RubenVerborgh commented Apr 26, 2018

wschella commented Apr 29, 2018

ericprud commented Apr 30, 2018

wschella commented Apr 30, 2018