Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow use of id-maps in :values pattern #809

Merged
merged 6 commits into from
Jul 17, 2024

Conversation

dpetran
Copy link
Contributor

@dpetran dpetran commented Jun 19, 2024

The json-ld standard does not actually support iri expansion in a value map. Also, iris are denoted with id maps in every other bit of FQL syntax. This commit allows both json-ld-compliant iri declaration and makes our syntax more consistent.

Here's an example of our current value-map syntax not expanding iris

I believe we should deprecate the {"@value" <iri> "@type" "xsd:anyURI"} syntax for iri values and not document this usage for public use.

@dpetran dpetran requested a review from a team June 19, 2024 15:43
@bplatz
Copy link
Contributor

bplatz commented Jun 20, 2024

I love this feature and struggled with this myself, super glad you tackled it.

The one thing I think we should support slightly differently than you put it in is that the values variable should work like a pure substiution.

e.g. this make tons of sense:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?friend"]  [{"@id" "ex:brian"}]]]

as you can think of it resolving to this:

"where" {"@id": "?s", "ex:friend": {"@id" "ex:brian"}}

But this doesn't make sense to me:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  [{"@id" "ex:brian"}]]]

As you'd imagine it would resolve to this, which isn't how you'd query:

"where" {"@id": {"@id" "ex:brian"}, "ex:friend": "?friend"}

Instead, I think if you were trying to do that same query you'd want to define it like this and sub in the variable:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  ["ex:brian"]]]

Likewise in the original example above, if you used @id for ex:friend you'd think this would logically work:

"where" {"@id": "?s", "ex:friend": {"@id": "?friend"}}
"values" ["values" [["?friend"]  ["ex:brian"]]]

Is this possible without making it substantially more complex a problem?

@dpetran
Copy link
Contributor Author

dpetran commented Jun 20, 2024

Is this possible without making it substantially more complex a problem?

Unfortunately it wouldn't be possible to achieve this without the addition of some complex analysis of where the variable is used. And I think it may even be inconsistent, because there is nothing stopping you from using the same :value bound variable in all three of the subject, predicate, object positions.

However, I think you can frame it another way where the semantic isn't "literal substitution" and more "expansion, then substitution".

If we imagine the query/txn as a json-ld document (it's not, but we try to pretend it is as far as we can), then we can think of the id-maps as annotation for the f:values key, something like this:

{"@context": {"ex": "http://example.com/", "?": "http:flur.ee/var#"},
 "f:values": [{"?:var": [{"@id": "ex:bar"}, "not-an-iri", {"@id": "ex:foo"}]}]}

Which expands to:

[
  {
    "f:values": [
      {
        "http:flur.ee/var#var": [
          {
            "@id": "http://example.com/bar"
          },
          {
            "@value": "not-an-iri"
          },
          {
            "@id": "http://example.com/foo"
          }
        ]
      }
    ]
  }
]

Now, this isn't how we actually do expansion, but users don't need to know that. They can correctly mentally model the values of the :values key as things that will be expanded before substitution, and then the whole id-map vs iri distinction goes away.

@bplatz
Copy link
Contributor

bplatz commented Jun 20, 2024

Ok, either way this is helpful over what we had. We can think about addressing the next step if it becomes an issue for our users.

I want to confirm one thing, even though these are different queries, I assume they will both behave as the same query?

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?friend"]  [{"@id" "ex:brian"}]]]
"where" {"@id": "?s", "ex:friend": {"@id": "?friend"}}
"values" ["values" [["?friend"]  [{"@id" "ex:brian"}]]]

@dpetran
Copy link
Contributor Author

dpetran commented Jun 20, 2024

I want to confirm one thing, even though these are different queries, I assume they will both behave as the same query?

Yes, and I've added a test to ensure that semantic persists.

@zonotope
Copy link
Contributor

FQL is not JSON-LD. There are a lot of things that we do in FQL that don't agree with the JSON-LD spec. The whole notion of binding variables to particular values is foreign to JSON-LD, so this scenario would never arise and I'm not surprised that JSON-LD doesn't address it.

Also, iris are denoted with id maps in every other bit of FQL syntax

Do you have any examples of this? Every usage of iris in FQL that I can think of use id maps to represent the node. For example, in a where clause

{"@id": "ex:foo"}

represents the node with "@id" "ex:foo", just like

{"ex:bar": "ex:baz"}

represents a node with the value of the "ex:bar" property being "ex:baz". Also

{"ex:bestFriend": {"@id": "ex:charlie}}

represent a node with the value of the "ex:bestFriend" property being the node with "@id" "ex:charlie".

Binding an id map to a variable and then using that variable as the value of an id later is inconsistent in my mind. You are saying that a node's id is the node itself, not the iri that represents that node.

@dpetran
Copy link
Contributor Author

dpetran commented Jun 20, 2024

In practice I think our users will mainly be depending on the heuristic that identifiers that need expansion need to be wrapped in an id-map. I know I tried to use id-maps at first and when they didn't work I was confused.

And FWIW, we do pretend in our official documentation that FQL is JSON-LD. And that's not strictly wrong, we just utilized @type @json to allow our own syntax within it, which is a distinction that I doubt users will understand until they've grokked JSON-LD.

I still think this is useful, do you think we shouldn't allow id-maps in :values?

@zonotope
Copy link
Contributor

The semantic is not "wrap this in an id map if you want to expand it"; the semantic is "an id map represents a node, everything else is scalar data". We automatically expand any value of the "@id" attribute because we know that must unambiguously be an iri, just like we also expand "ex:foo" in {"@id": "?s", "ex:foo": "bar"} because we know a property identifier must also be an iri. That's it.

FQL is a pattern matching system and we substitute the value bound to a variable directly in the places that variable appears. This patch breaks that, but only sometimes, in certain specific situations. That inconsistency will lead to much more confusion. I don't think we should allow id maps in values because of all of that inconsistency. @bplatz listed some of these inconsistencies involving weirdly recursive "@id" values:

But this doesn't make sense to me:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  [{"@id" "ex:brian"}]]]

As you'd imagine it would resolve to this, which isn't how you'd query:

"where" {"@id": {"@id" "ex:brian"}, "ex:friend": "?friend"}

I agree that this doesn't make sense. This is my point.

Instead, I think if you were trying to do that same query you'd want to define it like this and sub in the variable:

"where" {"@id": "?s", "ex:friend": "?friend"}
"values" ["values" [["?s"]  ["ex:brian"]]]

This is the most consistent way to express this query given the rest of the syntax of FQL, and it's is what we currently do without this patch.

The only missing piece is that if we rely on automatic inference, the parser will think that "ex:brian" is a string, so we now need some way to tell the parser that "ex:brian" is supposed to be an iri and not a string.

In every other scenario where automatic inference fails and the user has to specify how to interpret scalar data, we use an "@value" map to provide that extra information, so the most consistent thing to do here is to use an "@value" map as well. "@id" maps represent nodes; "@value" maps represent scalar data along with extra information for how to interpret that data.

@zonotope
Copy link
Contributor

I also want to make clear that I'm not wedded to using value maps for this per se if folks find that construct confusing in this situation. I think value maps are fine, but I could get on board with a new syntax (depending on what the syntax is, of course).

For example, maybe we could try to do something with the (iri "ex:foo") function. I don't know how easy or hard it would be to add that not having looked into it since I haven't been in that part of the code in a while, and I would also make sure that we could do it in consistent way that's in line with our other planned usage of that function, but it's a thought.

I just think it would be a bad idea to reuse an existing syntax that already means something else.

@bplatz
Copy link
Contributor

bplatz commented Jun 21, 2024

"@id" maps represent nodes

I understand your point but don't fully agree that there is a distinction between an IRI and a 'node'. I think an IRI is always a node. It may not have properties assigned to it in the local db (yet), but you must assume it has properties in different db somewhere.

The fact that we don't require specifying the IRI data type in a few circumstances is probably the main culprit that makes this confusing. Those circumstances are:

  1. when used as the value of @id
  2. when used as the value of @type
  3. when used as a property

The JSON-LD spec allows you to specify the datatype of an IRI like this:

{"@value" "ex:brian",
 "@type": "@id"}

I think this should be the base case of what we support. Here @id is a shortcut for xsd:anyURI (I presume, although I'm not sure that is ever made explicit in the spec). This is similar to how @type is a shortcut for rdf:type.

I consider this a further shortcut for the above, which I am supportive of using but there is arguably some debate about that:

{"@id": "ex:brian"}

Then the question becomes if we can do anything to handle the 3 circumstances above such that the behavior of using :values is identical to the behavior without it.

I understand the complexity of this, so I'd recommend punting that decision as we have some larger issues that don't have work arounds currently which deserve our focus.

I do think we should at minimum support the "@type": "@id" as that is what the spec uses - so I'll suggest we update this PR to limit it to that.

@zonotope
Copy link
Contributor

I understand your point but don't fully agree that there is a distinction between an IRI and a 'node'. I think an IRI is always a node. It may not have properties assigned to it in the local db (yet), but you must assume it has properties in different db somewhere.

This has the potential to get real philosophical real fast, but I just want to assert that there is a concrete distinction between IRI and subject node. We need to keep that distinction straight in order to build a consistent system.

It's the same as the distinction between "Benjamin Lamothe" and myself. I am not the sequence of characters "Benjamin Lamothe", but people use that character sequence as a symbol to refer to me in certain contexts.

The JSON-LD Spec describes a node identifier as an IRI used to refer to a subject node. It isn't the node itself. The node itself is an entity that has a set of characteristics, or properties. In order to talk about that specific entity in the context of an RDF graph, and to differentiate it from other entities in the graph, we give it a specific name. That name is an IRI. We indicate that a specific IRI is the identifier of a subject by using the "@id" key of the map we use to describe that subject.

I have a height, weight, age, and favorite food, but my name does not. My name is a thing that people use to refer to me.

The goal of this exercise is to allow users to include subject node identifiers in values clauses, and we later substitute what they provide us explicitly as a subject node identifier inside of a subject node map.

Everywhere in FQL, we use {"@id": "ex:foo"} to represent the subject node whose identifier is "ex:foo". That is consistent with the usage in the JSON-LD spec. There is never a situation in FQL when we use an IRI string alone to represent a subject node.

For example, when a subject node is an object object of an RDF triple, we don't use the raw iri string, we use a map as in {"@id": "ex:john", "ex:bestFriend": {"@id": "ex:steve"}}. This is also consistent with the JSON-LD spec.

This proposal here is to change that semantic, and instead use a subject node map to serve as a node identifier, but only in a certain specific situation. That would already introduce an internal inconsistency, but it would also have weird consequences that would require that we introduce cascading inconsistencies to resolve.

You mentioned this about the JSON-LD spec:

The JSON-LD spec allows you to specify the datatype of an IRI like this:

{"@value" "ex:brian",
"@type": "@id"}

This is not the case. Note the paragraph describing the value of the "@type" key within an "@value" object:

The value associated with the @type key MUST be a term, an IRI, a compact IRI, a string which can be turned into an IRI using the vocabulary mapping, @JSON, or null.

The spec excludes the "@id" keyword from that list. The spec does allow you to define "@type": "@id" for an alias within a context, but it's only to tell the processor to expand the string iri value associated with that alias into a subject node map whose "@id" is that string value. Doing that here would lead to the same problems involved with trying to use a subject node as a subject node identifier.

You went on to say

I think this should be the base case of what we support. Here @id is a shortcut for xsd:anyURI (I presume, although I'm not sure that is ever made explicit in the spec). This is similar to how @type is a shortcut for rdf:type.

This is what we support today, without this patch. The only caveat is that it uses xsd:anyURI instead of @id because of that explicit omission in the spec for "@value" maps.

I consider this a further shortcut for the above, which I am supportive of using but there is arguably some debate about that:

{"@id":` "ex:brian"}

As I've mentioned, introducing this as a shortcut for {"@value": "ex:brian", "@type": "xsd:anyURI"} conflicts with the rest of FQL in the usage of maps with an "@id" key. It might make putting iris in values clauses easier (though that's arguable), but it will lead to much more confusion in the long run as a result of all of the inconsistencies it introduces with how the rest of FQL treats that syntax.

I thought about this a lot when I developed this syntax for specifying IRIs in values. The first option I considered was using {"@id": "ex:foo"} but it soon became clear to me why this was a bad choice.

What we have now using an "@value" map with "xsd:anyURI" as the data type is the most clear and consistent alternative that I could think of but, like I said, I'm not wedded to it and am open to the possibility that there is a better syntax I haven't considered.

I just don't think introducing a new, contradicting meaning to a syntax we already use is the right call.

@dpetran
Copy link
Contributor Author

dpetran commented Jul 1, 2024

That would already introduce an internal inconsistency, but it would also have weird consequences that would require that we introduce cascading inconsistencies to resolve.

Do you have an example in mind of this? I was able to get this working with ~2 code lines changed and I don't think it affects anything downstream - we'd just have to document the syntax.

We've long since departed from the path of True JSON-LD Semantics, so I don't believe that anybody is looking at the spec and complaining about inconsistencies between value node and node identifier representations in our syntax. It's also not clear to me what practical consequences would result from this inconsistency.

The downsides of using the value-map syntax is you need to have the xsd:anyURI prefix in your context, which requires the user to go find that xsd IRI somewhere and paste it, which makes using IRI values much more inconvenient. Also, if we're modeling the value as a scalar, {"@type" "xsd:anyURI" "@value" "ex:some-iri"} doesn't actually work as a scalar value in Fluree. It blows up because the value never gets expanded and encoded as a SID.

I do feel that id-maps are a lot easier to explain in this vein:

If you're in a position where you need to distinguish between a string and an IRI, wrap the IRI in an id-map.

I don't see any case in our syntax where this heuristic would cause a problem, plus it's easier to type and easier to remember

@dpetran dpetran force-pushed the feature/id-maps-in-values-pattern branch 2 times, most recently from 8df445c to 1bf3abd Compare July 3, 2024 22:41
@dpetran dpetran marked this pull request as draft July 8, 2024 21:30
The json-ld standard does not actually support iri expansion in a value map. Also, iris
are denoted with id maps in every other bit of FQL syntax. This commit allows both
json-ld-compliant iri declaration and makes our syntax more consistent.
Also, fixed a problem in the test data where id refs were being inserted as strings.
This section is meant for testing values clauses.
@dpetran dpetran force-pushed the feature/id-maps-in-values-pattern branch from 1bf3abd to 07bc37d Compare July 16, 2024 21:35
@dpetran
Copy link
Contributor Author

dpetran commented Jul 16, 2024

Rebased on main. Per the discussion in #818, we've decided to use {"@type" "@id" "@value" <iri>} value maps for iri literals in the :values clause.

@dpetran dpetran marked this pull request as ready for review July 16, 2024 21:47
Copy link
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧥

(if-let [dt-iri (get-expanded-datatype attrs context)]
(if (= const/iri-anyURI dt-iri)
(if (or (= const/iri-anyURI dt-iri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should eventually remove this, but I'm ok with keeping it in for now in case someone is using it.

@bplatz
Copy link
Contributor

bplatz commented Jul 17, 2024

Awesome thanks @dpetran

@dpetran dpetran merged commit 2b020dd into main Jul 17, 2024
7 checks passed
@dpetran dpetran deleted the feature/id-maps-in-values-pattern branch July 17, 2024 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants