Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL VALUES patterns #769

Merged
merged 87 commits into from
Jun 17, 2024
Merged

SPARQL VALUES patterns #769

merged 87 commits into from
Jun 17, 2024

Conversation

dpetran
Copy link
Contributor

@dpetran dpetran commented May 10, 2024

In SPARQL, a VALUES pattern can appear inside a WHERE clause alongside other patterns. In FQL :values is its own clause. This mismatch makes translating directly awkward, since we need to insert our translated pattern at a different point in the parse tree than where we are at translation time.

We got around it by translating it into a :bind pattern. However, a :bind pattern can only bind a single value to a variable, so we limited the scope of a VALUES pattern to only a single value.

This gets around it by "tagging" the translated value pattern and then after the whole parsing is done, moving it to the correct place in the tree: see format-values. It's awkward, but it gives much more flexibility to the user.

@bplatz
Copy link
Contributor

bplatz commented May 10, 2024

Should we move :values into the where clause? I'd imagine it is there because there could be multiple where clauses, or sub-queries which might have its own set of values.

@dpetran
Copy link
Contributor Author

dpetran commented May 10, 2024

It's not a bad idea, and it clearly works for SPARQL. I don't think it would be too complex to support, just a new (defmethod match-pattern :values) that directly injects into the solution-ch, but there may be some finicky bits about pattern ordering. And it would save me from having to rearrange the translated query : )

@dpetran dpetran force-pushed the feature/sparql-values branch 2 times, most recently from 7ab8cb7 to 64942a8 Compare May 17, 2024 18:57
@dpetran dpetran marked this pull request as ready for review May 17, 2024 19:01
@dpetran dpetran requested a review from a team May 17, 2024 19:01
@dpetran
Copy link
Contributor Author

dpetran commented May 31, 2024

Rebased on #777

@dpetran dpetran mentioned this pull request May 31, 2024
@zonotope
Copy link
Contributor

We got around it by translating it into a :bind pattern. However, a :bind pattern can only bind a single value to a variable, so we limited the scope of a VALUES pattern to only a single value.

:bind patterns can take any number of mappings of values to variables.

@dpetran
Copy link
Contributor Author

dpetran commented Jun 12, 2024

We got around it by translating it into a :bind pattern. However, a :bind pattern can only bind a single value to a variable, so we limited the scope of a VALUES pattern to only a single value.

:bind patterns can take any number of mappings of values to variables.

Right, maybe it is better stated that a :bind pattern can only bind a single value per variable, whereas :values can bind multiple values per variable.

zonotope
zonotope previously approved these changes Jun 12, 2024
(let [dt (datatype/infer-iri val)]
(where/match-value var-match val dt)))))
(if-let [lang (get attrs const/iri-language)]
(where/match-value var-match val const/iri-string {:lang lang})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Language tagged strings cannot have data type xsd:string; they must have data type rdf:langString. You should pass lang to the datatype/infer-iri function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally use const/iri-lang-string here but that broke other tests, so I changed it what you see here. After investigating after your comment it looks like we weren't actually assigning the correct datatype, so I've fixed that and this instance.

(reduce (fn [fql rule] (into fql (parse-rule rule)))
{}
parsed))
(->> parsed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the threading macro here because there's just one step.

{}
parsed))
(->> parsed
(reduce (fn [fql rule] (into fql (parse-rule rule))) {})))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be much more readable if you moved the function body after the argument vector to the next line

Copy link
Contributor

@zonotope zonotope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to approve because of the data type issue.

@zonotope zonotope dismissed their stale review June 12, 2024 15:03

The data type is not up to spec for language tagged strings

This is a basic implementation of the SPARQL VALUES pattern type.
Three approaches considered:

1) just document that :values needs to appear first
2) discard solutions that don't match the given solution
3) during the parse step, sort the patterns so :values is first

I don't like #1, though that's the most expedient. #2 is a bit wasteful as it forces us
to generate useless solutions. #3 is doable, and may be a minimal requirement in basic
query planning.

This implements solution #2.
Change 1: handle multiple inline values correctly.

Instead of checking that _every_ inline solution equal the solution matches, we needed
to _filter_ the inline solutions by whether they match the solution. If any matches, the
values can contribute their solutions to the solution chan.

This case allows multiple values to be handled correctly, as seen in the "pattern"
"single var" test.

Change 2: do not consider ::sids while checking match equality

When dealing with federated queries the ::sids map may be different for the same
value. We now dissoc the ::sids key from the solution matches before checking equality.

This is verified by the "federated" test case

Change 3: only consider :lang ::meta while checking equality

Two identical strings with different language tags are different values. However the :i
order or whether it comes from a reasoned flake do not matter for equality checks. We
now select only the :lang ::meta key, if ::meta exists in the solution match.

Language tags were not parsed, so I've added parsing support for that.
In the future the contents of a match may change, so instead of removing specific
unwanted elements we now extract only those elements of a match that establish its
identity for equality checking.
We were not parsing language tagged strings to langString datatypes correctly in where
patterns or in values patterns, nor inferring a langString datatype in the presence of a
language tag.

I had to update the test expectation because now the language tagged string no longer
conforms to the `sh:maxCount 1` _and_ the `sh:datatype xsd:string` constraints, instead of
just the `sh:maxCount` constraint.
In SPARQL, a VALUES pattern can appear inside a WHERE clause alongside other
patterns. In FQL :values is its own clause. This mismatch makes translating directly
awkward, since we need to insert our translated pattern at a different point in the
parse tree than where we are at translation time.

We got around it by translating it into a :bind pattern. However, a :bind pattern can
only bind a single value to a variable, so we limited the scope of a VALUES pattern to
only a single value.

This gets around it by "tagging" the translated value pattern and then after the whole
parsing is done, moving it to the correct place in the tree: see `format-values`. It's
awkward, but it gives much more flexibility to the user.
FQL now supports :values patterns so we can use that directly instead of translating to
a :values clause.
@dpetran
Copy link
Contributor Author

dpetran commented Jun 12, 2024

Rebased on #777

The tricky part is deciding when to `literal-quote` one or more of the args. Since that
information does not exist in the parse tree at the point where :Func needs parsing, we
need to figure out whether to quote based on the signature of the function, according to
the SPARQL spec: https://www.w3.org/TR/sparql11-query/#SparqlOps and the SPARQL grammar.

This way is more repetitive but more robust. Once we've got all the functions covered we
can search for abstractions to reduce verbosity.
dpetran and others added 27 commits June 12, 2024 16:23
The grammar wasn't expecting whitespace between the 'a' and the prev/next term.
Semantics and syntax from SPARQL IN expression.
It's not sparql-specific, so no need to qualify it.
The :Expression parser was turning everything into a string, which works in some cases
but introduces unwanted quoting in others. By deferring stringification until values
need to be incorporated into an actual string expression we avoid the unwanted quoting.

Do note that all the numeric functions now correctly expect numeric values instead of
strings.
abs expects only a single expression as an argument, and that argument does not need to
be quoted.
translate SPARQL IN/NOT IN expresssions into FQL
translate SPARQL EXISTS/NOT EXISTS filters into FQL
@dpetran dpetran merged commit a74defb into fix/shacl->fql Jun 17, 2024
1 of 2 checks passed
@dpetran dpetran deleted the feature/sparql-values branch June 17, 2024 17:49
@dpetran dpetran restored the feature/sparql-values branch June 17, 2024 17:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants