Optional details for matching properties in reconciliation queries #131

fsteeg · 2023-07-13T12:08:15Z

This adds three optional fields to properties in reconciliation queries. It's based on our discussions in recent meetings, in particular the suggestion in #105 (comment), adding an additional field to address #114, based on #128 (comment).

Changes from the linked discussions:

Replaced mode=must|should with required=true|false (we discussed filter as an option in our last meeting, but I felt required was clearer since filter can be used both meaning "accept" and "reject")
Renamed match=any|all|none to match_quantifier=any|all|none
Added match_qualifier=<string> to address use cases like Date comparison needs an absolute settings to compute relative diff score #114, which came up in otherwise unrelated Add an additional property match context to the features #128 (comment)

This is related to multiple issues, which I'd hope this could resolve:

Might resolve Allow more flexible filter constraint handling of property arrays #88 via match_quantifier
Might resolve Add ability to mark reconciliation property as required (used for candidate retrieval) #101 via required
Might resolve Supplying multiple properties with the same pid in a reconciliation query #105 by supporting multiple property statements with different configurations, as well as configuration of multiple values in one property statement via "any"/"all"/"none" (match_quantifier)
Might resolve Date comparison needs an absolute settings to compute relative diff score #114 via match_qualifier and the mention of the EDTF specification
Might resolve parts of Reformulate the query parameter as one of the properties in reconciliation queries #106, which maybe could have been resolved before, since query is optional already, but with this we have precise control over the behavior of the property replacing the query

tfmorris

It feels like unneeded complexity to me. One of the things that I like about the IETF process is that they require prior implementation and testing. This demonstrates that a) it's implementable and b) there's at least one real world use case.

It feels like this spec is accumulating more and more words/features without a commensurate increase in real world testing experience (unless there's implementation testing going on behind the scene that I'm missing).

But it's all just intuition / gut feeling...

draft/examples/reconciliation-query-batch/valid/example-full.json

tfmorris · 2023-07-13T19:33:59Z

draft/index.html

+            This can be used for general matching relations like "skos:exactMatch", "skos:closeMatch", etc. or for specific features like spatial matching with geo data
+            (e.g. containment search with "schema:containsPlace" etc.) or custom matching on date fields (e.g. services supporting the [[EDTF]] specification could use "EDTF:Level-0" etc.
+            To allow discovery of supported qualifiers by clients, services that support <code>match_qualifier</code> SHOULD return a <a href='#suggest-responses'>suggest response</a> with supported qualifiers
+            for a property at <code>/suggest/property/match_qualifier/&lt;pid&gt;</code> (relative to the service root).</p></dd>


Seems like this list should be small enough that it could just be returned with the response to the original property suggest query, although it probably needs descriptions too to help the user figure out what the heck they mean.

Yes, that would be an option too. My line of thought was that since this is a separate selection step in the client (first select the property, then select the qualifier), clients could reuse existing suggest endpoint logic (do the same lookup as for the property itself, but against a different endpoint), instead of building custom UI. But that may be a faulty assumption, since I'm looking at it more from the service side.

I'd be happy either way, but we'd probably need a more precise description of how the URL is constructed: property ids can be arbitrary strings, so we'd need to know if/how they are escaped to form the URL you specify. Probably just URL-encoding the property id?

Given the feedback by @tfmorris here and by @wetneb on the UI differences in our last meeting ("the set is small, and we'd like to see all options, without typing"), I've moved the match_qualifiers discovery into the property suggest response: b486232.

draft/index.html

fsteeg · 2023-07-14T11:29:47Z

Thanks a lot for your review @tfmorris! I replied inline to your specific comments.

It feels like this spec is accumulating more and more words/features without a commensurate increase in real world testing experience (unless there's implementation testing going on behind the scene that I'm missing). But it's all just intuition / gut feeling...

I got to say I do share that gut feeling! What I actually tried to do here was to condense all these different suggestions from the 5 linked issues into a fairly compact, non-breaking proposal (optional fields and retaining the current behavior if omitted) that addresses actual use cases.

There is one use case I forgot to mention: OpenRefine/OpenRefine#5615 could be implemented using required properties with match_quantifiers instead of the type filter.

One of the things that I like about the IETF process is that they require prior implementation and testing. This demonstrates that a) it's implementable and b) there's at least one real world use case.

Yes, I agree, and it's what I like so much about working on this spec, that it is based on years of proven functionality, and not "made up". I very much want to keep it that way too! After we addressed any obvious issues with the approach proposed in this PR, we should implement it in the testbench and our service to see if it actually works.

paulgirard

This proposal is a nice solution to the recent issues about the difficulty of selecting candidates.
We already discussed the complexity issue and to whether put to put the candidate selection decision in server or client side. The conclusion of this proposal feels balanced to me in this regards. One one hand we unfortunately can't move everything on client side leaving weird fuzzy matching to client as candidates pre-selection requires that kind of complexity. On the other those new parameters are optional leaving recon services developers/maintainers decide what to do.
Thus I feel like this would help recon services to grow in diversity and quality.

wetneb

Looks good to me.

I think I disagree that this solves #106 though. After this PR, the query field still has a sort of special status, and there is no standard way to replace it by a property (since clients cannot know which pid to use). But I think this PR makes it all the more interesting to solve #106, since the new settings you introduce cannot be applied to the query itself.

I agree with @tfmorris that we've been introducing quite a few things in the specs and the implementations are lagging behind. I'm keen to try things out in OpenRefine and some services but I have more urgent things on my plate in OpenRefine at the moment. Also it's a requirement for the standardization process at W3C that there are at least 2 independent implementations so there is no risk that we adopt something completely made up and untested.

wetneb · 2023-09-02T19:58:28Z

draft/index.html

+            This can be used for general matching relations like "skos:exactMatch", "skos:closeMatch", etc. or for specific features like spatial matching with geo data
+            (e.g. containment search with "schema:containsPlace" etc.) or custom matching on date fields (e.g. services supporting the [[EDTF]] specification could use "EDTF:Level-0" etc.
+            To allow discovery of supported qualifiers by clients, services that support <code>match_qualifier</code> SHOULD return a <a href='#suggest-responses'>suggest response</a> with supported qualifiers
+            for a property at <code>/suggest/property/match_qualifier/&lt;pid&gt;</code> (relative to the service root).</p></dd>


I'd be happy either way, but we'd probably need a more precise description of how the URL is constructed: property ids can be arbitrary strings, so we'd need to know if/how they are escaped to form the URL you specify. Probably just URL-encoding the property id?

thadguidry · 2023-09-02T22:56:32Z

No matter what syntax, we need to keep the ability for users to always optionally provide constraints (filters and property matching criteria) at query time, and not depend on services to have some fancy logic to figure things out the clients wants or wishes to return an adequate response. So I always prefer syntax that keeps more control for users.

wetneb · 2023-10-10T12:57:36Z

draft/examples/suggest-properties-response/valid/example.json

    },
    {
      "name": "place of birth",
      "description": "most specific known (e.g. city instead of country, or hospital instead of city) birth location of a person, animal or fictional character",
-      "id": "P19"
+      "id": "P19",
+      "match_qualifiers": ["schema:containsPlace", "schema:containedInPlace"]


I think we should settle on using only one format for the match_qualifiers field, otherwise it's painful to parse for clients. I'd go for the format with objects (which makes it possible to display human-readable info and is more extensible).

Also, the corresponding JSON schema should be updated accordingly (now that we are close to merging this, I think)

Implemented in c3c220a & fb60668.

wetneb

Looks good to me!

thadguidry · 2023-12-14T13:46:46Z

draft/index.html

+          <dd>A string to indicate how to match the values in <code>v</code>.
+            This can be used for general matching relations like "skos:exactMatch", "skos:closeMatch", etc. or for specific features like spatial matching with geo data
+            (e.g. containment search with "schema:containsPlace" etc.) or custom matching on date fields (e.g. services supporting the [[EDTF]] specification could use "EDTF:Level-0" etc.
+            To allow discovery of supported qualifiers by clients, services that support <code>match_qualifier</code> SHOULD return the supported <code>match_qualifiers</code> for each property


@fsteeg I think we're going to regret this where "match_qualifiers SHOULD return" instead of "MUST return". But we'll see how things pan out over the next year.

I don't think this has been discussed before. I guess my motivation was to keep the barrier low for implementers, but using MUST sounds good to me too.

My thought was that discovery of supported qualifiers will be impossible if we do not say MUST.
@wetneb your thoughts on this also?

Yes, why not MUST, your reasoning makes sense to me.

Updated in 1812f84.

Supported match qualifiers MUST be included in property suggestions, see #131 (comment)

Optional details for matching properties in reconciliation queries

1a17473

wetneb requested review from paulgirard, wetneb, thadguidry and tfmorris July 13, 2023 13:32

tfmorris reviewed Jul 13, 2023

View reviewed changes

fsteeg mentioned this pull request Jul 14, 2023

Make scores of reconciliation candidates optional #127

Closed

paulgirard approved these changes Jul 19, 2023

View reviewed changes

wetneb approved these changes Sep 2, 2023

View reviewed changes

wetneb mentioned this pull request Sep 10, 2023

Reconciliation against external identifiers, without names OpenRefine/OpenRefine#6044

Open

Add available match_qualifiers to property suggest response (#131)

b486232

fsteeg mentioned this pull request Oct 5, 2023

Submitting multiple entity names in the same reconciliation query #134

Closed

wetneb reviewed Oct 10, 2023

View reviewed changes

This was referenced Nov 20, 2023

Simplify type syntax in reconciliation queries #109

Closed

Reformulate the 'type' parameter as another property #145

Open

fsteeg added 2 commits December 11, 2023 13:26

Allow only objects in suggest response's match_qualifiers (#131)

c3c220a

Update schemas for new optional matching properties (#131)

fb60668

wetneb approved these changes Dec 14, 2023

View reviewed changes

fsteeg merged commit 1c25297 into master Dec 14, 2023
1 check passed

fsteeg deleted the 105-properties branch December 14, 2023 13:33

thadguidry reviewed Dec 14, 2023

View reviewed changes

fsteeg added a commit that referenced this pull request Dec 18, 2023

Update match_qualifier requirements as discussed in #131

1812f84

Supported match qualifiers MUST be included in property suggestions, see #131 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optional details for matching properties in reconciliation queries #131

Optional details for matching properties in reconciliation queries #131

fsteeg commented Jul 13, 2023 •

edited

tfmorris left a comment

tfmorris Jul 13, 2023

fsteeg Jul 14, 2023

wetneb Sep 2, 2023

fsteeg Sep 11, 2023

fsteeg commented Jul 14, 2023

paulgirard left a comment

wetneb left a comment

wetneb Sep 2, 2023

thadguidry commented Sep 2, 2023

wetneb Oct 10, 2023

fsteeg Dec 11, 2023

wetneb left a comment

thadguidry Dec 14, 2023

fsteeg Dec 15, 2023

thadguidry Dec 16, 2023 •

edited

wetneb Dec 16, 2023

fsteeg Dec 18, 2023

Optional details for matching properties in reconciliation queries #131

Optional details for matching properties in reconciliation queries #131

Conversation

fsteeg commented Jul 13, 2023 • edited

tfmorris left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fsteeg commented Jul 14, 2023

paulgirard left a comment

Choose a reason for hiding this comment

wetneb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thadguidry commented Sep 2, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wetneb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thadguidry Dec 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fsteeg commented Jul 13, 2023 •

edited

thadguidry Dec 16, 2023 •

edited