Feature: how to join values across data sources, with join conditions? #16

bjdmeest · 2021-03-25T13:17:10Z

it is currently possible to join values across data sources, but without join conditions (see eg https://github.com/RMLio/rml-fno-test-cases/blob/master/RMLFNOTC0009-CSV/mapping.ttl, see also https://kg-construct.slack.com/archives/C01QFSW77QF/p1615717859003600)

andimou · 2021-03-26T17:13:56Z

I think this might be affected in the end by how we handle joins in RML

samiscoding · 2021-03-29T14:22:06Z

Came from kg-construct/rml-core#1

it is currently possible to join values across data sources, but without join conditions (see eg https://github.com/RMLio/rml-fno-test-cases/blob/master/RMLFNOTC0009-CSV/mapping.ttl, see also https://kg-construct.slack.com/archives/C01QFSW77QF/p1615717859003600)

I think it goes back again to the basic definition of fnml:functionMap; is it correct to be able to define a rml:logicalSource for a rr:termMap different than the rml:logicalSource of the rr:triplesMap to which it belongs?

pmaria · 2021-10-08T06:29:11Z

Does a Function Triples Map need a Logical Source?

The current use case for a LogicalSource definition on a FunctionTriplesMap seems to be:

The ability to generate values from a different source and use these values as the result of a Function Term Map.

An example of this is included in one of the proposed FnO test cases: RMLFNOTC009

However, since a FunctionTriplesMap doesn't generate values directly, but generates intermediate function execution triples expressed in FnO, the question of how to handle joins between a TriplesMap and a FunctionTriplesMap with a different LogicalSource arises.

As this is not the same type of join as a join on a RefObjectMap this join would have to be defined. Subsequently, this would require another specific type of join to be implemented by engines.

At the same time we have a very similar mapping challenge for generating literal values by a joining different logical sources: join-on-literal challenge.

I believe it would be advantageous to come up with a solution that covers both generating literals from different LogicalSources using joins, as generating function values from different LogicalSources.
As this solution would not be specific to functions, I think we should look for a solution in the definition of LogicalSources. (pinging @thomas-delva)

samiscoding · 2021-10-08T08:54:12Z

Does a Function Triples Map need a Logical Source?

The current use case for a LogicalSource definition on a FunctionTriplesMap seems to be:

The ability to generate values from a different source and use these values as the result of a Function Term Map.

An example of this is included in one of the proposed FnO test cases: RMLFNOTC009

However, since a FunctionTriplesMap doesn't generate values directly, but generates intermediate function execution triples expressed in FnO, the question of how to handle joins between a TriplesMap and a FunctionTriplesMap with a different LogicalSource arises.

As this is not the same type of join as a join on a RefObjectMap this join would have to be defined. Subsequently, this would require another specific type of join to be implemented by engines.

At the same time we have a very similar mapping challenge for generating literal values by a joining different logical sources: join-on-literal challenge.

I believe it would be advantageous to come up with a solution that covers both generating literals from different LogicalSources using joins, as generating function values from different LogicalSources. As this solution would not be specific to functions, I think we should look for a solution in the definition of LogicalSources. (pinging @thomas-delva)

That's the reason I insist that we should consider the big picture while defining the fundamental concepts i.e. function triples map and function term map! Check the alternative definitions with an example in overview.md at the branch "function-alternative".

andimou · 2021-10-10T14:27:07Z

so your suggestion is to follow a similar approach as in the case of the rml:parentTriplesMap and have functionTriplesMap which can be optionally combined with a join? Then a FunctionTriplesMap SHOULD have exactly 1 LogicalSource and we define this either it is the same as the Logical Source or not in the same way we do with the Referencing Object Map?

samiscoding · 2021-10-11T08:51:14Z

@andimou yes, exactly!

thomas-delva · 2021-10-11T09:36:27Z

I believe it would be advantageous to come up with a solution that covers both generating literals from different LogicalSources using joins, as generating function values from different LogicalSources.
As this solution would not be specific to functions, I think we should look for a solution in the definition of LogicalSources.

When working on RML fields I had in mind you could do something like :sourceC rml:joinOf :sourceA, :sourceB . and then source C would be a "virtual" logical source that has all the fields defined in sources A and B, and the data in source C would be a join of the data in sources A and B. Then source C could be used in a triples map to generate RDF from two joined sources in a very general way: generating IRIs or literals in a homogeneous way, mixing fields of both A and B to generate one RDF term, generating function values, etc. Looking back, this rml:joinOf idea seems a bit too general and too far from current RML, so perhaps it can be simplified. Just throwing it out there. :)

In general I tend to agree FNML shouldn't need its own way to define joins. For the example in RMLFNOTC009 I wonder why one cannot just call grel:toUpperCase in the subject map of a new triples map and then join as usual with rr:parentTriplesMap. (This is a slight abuse as that subject map would generate literals, but as long as no invalid RDF triples are generated that should be fine imho.) (Disclaimer: I admit I'm not too up to date with the how and why of all FNML aspects.)

bjdmeest · 2022-03-02T13:40:57Z

I have the feeling the discussion is revolving around 'functions should or should not specify their own logical source', to solve exactly this issue. I think we first need to solve that before we can solve functions properly. That's why I made following

Proposal

I'm purposely not specifying the relation with existing RML and R2RML constructs, nor specifying exactly how to describe a function, instead, I'm making a proposal where we can have functions without defining their own logical sources, and still join values across data sources

TL:DR; functions are a special kind of term map / no logical source for functions / you specify input values for functions using term maps (so you can do nesting) / join conditions specify childterm and parentterm instead of child and parent (so you can put functions there) / referencingObjectMaps have a join result term to specify a new term based on values of the parent logical source instead of relying solely on the subject of the parent triples map

Definitions

A Triples Map is something that generates RDF constructs (Triples, Quads, RDF*, ... 🤷‍♂️) from a Logical Source, using Term Maps. RDF constructs consist of RDF Terms.
A Term Map is something that generates an RDF Term. It takes its values from the Logical Source of its Triples Map
A Function Term Map is something that generates an RDF Term after executing a function. It cannot specify its own Logical Source, i.e., takes its values from the Logical Source of its Triples Map.
A ReferencingMap is something that generates an RDF Term from a different Triples Map, called the Parent Triples Map, i.e., takes values from the Parent Triples Map's Logical Source, called the Parent Logical Source. It can use a JoinCondition and generates the Join Result Term.
A Join Condition specifies how to join these logical sources. It consists of a Child Term (generating a Term, taking values from the original Logical Source) and a Parent Term (generating a Term, taking values from the Parent Logical Source). By default (i.e., when no Join Condition is specified), a full join is performed.
A Join Result Term is a Term Map that generates the term from the values of the Parent Logical Source. By default (i.e., when no Join Result Term is specified), the Join Result Term is the Subject generated by the Parent Triples Map.

Results

Using these definitions, we can:

specify a Function Term Map is never having its own logical source, so we nicely separate concerns.
use functions anywhere in a join, eg., lowercase both (1) child and (2) parent values, specify a (3) special comparison function that does fuzzy matching, and (4) transform other values from the parent logical source for the result
if we nest functions, we can do something like join values across source

Diagrams

A function description (red = FnO stuff, green = FNML stuff, feel free to ignore those colors for now):

graph LR
    TM([TermMap])
    FM([FunctionTermMap]):::fnml
    TM -->|is-a| FM
    FM -->|execution| Ex([Execution]):::fnml
    FM -->|output| J(IRI):::fnml
    Ex -->|function| ExOM([fno:Function TermMap]):::fno
    Ex -->|parameterMap| ParamPOM([ParameterMap])
    ParamPOM -->|parameter| ParamPM(parameter):::fno
    ParamPOM -->|parameter value| ParamOM([parameter value TermMap])
    classDef fnml fill:#8F9
    classDef fno fill:#F89
    classDef rml fill:#89F
    classDef ls2 fill:#09F

A join description (dark blue === Parent Logical Source):

graph LR
    T3M([TriplesMap])
    T3M-->|predicatObjectMap| POM([PredicatObjectMap])
    POM -->|predicateMap| PM([PredicateMap])
    POM -->|objectMap| ROM([ReferencingObjectMap])
    ROM -->|parentTriplesMap| PT3M([TriplesMap]):::ls2
    ROM -->|joinCondition| JC([JoinCondition])
    ROM -->|joinResultTerm| JTM([TermMap]):::ls2
    JC -->|childTerm| ChTM([TermMap])
    JC -->|parentTerm| PaTM([TermMap]):::ls2
    classDef fnml fill:#8F9
    classDef fno fill:#F89
    classDef rml fill:#89F
    classDef ls2 fill:#09F

A join across sources example (result is "{childsource_value}{parentsource_value}"

graph LR
    T3M([TriplesMap])
    T3M-->|predicatObjectMap| POM([rr:PredicatObjectMap])
    POM -->|objectMap| FM
    FM([FunctionTermMap])
    FM -->|execution| Ex([Execution])
    FM -->|output| J(grel:stringOut):::fno
    Ex -->|function| ExFn(grel:array_join):::fno
    Ex -->|parameterMap| ParamPOM([ParameterMap])
    ParamPOM -->|parameter| P1(grel:array_value):::fno
    ParamPOM -->|parameter value| O1("{childsource_value}"):::fno
    ParamPOM -->|parameter| P2(grel:array_value):::fno
    ParamPOM -->|parameter value| ROM([ReferencingObjectMap])
    ROM -->|parentTriplesMap| PT3M([TriplesMap]):::ls2
    ROM -->|joinCondition| JC([JoinCondition])
    ROM -->|joinResultTerm| JTM("{parentsource_value}"):::ls2
    JC -->|childTerm| ChTM([TermMap]):::ls2
    JC -->|parentTerm| PaTM([TermMap]):::ls2
    classDef fnml fill:#8F9
    classDef fno fill:#F89
    classDef rml fill:#89F
    classDef ls2 fill:#09F

bjdmeest · 2022-03-02T13:51:27Z

@samiscoding and @pmaria could you have a look at my proposal here? I have the feeling we need to fix this first before we can fix FNML :) (@dachafra putting you in the loop since you were gonna check FNML in any case ;) )

pmaria · 2022-03-03T09:06:13Z

TL:DR; functions are a special kind of term map / no logical source for functions / you specify input values for functions using term maps (so you can do nesting) / join conditions specify childterm and parentterm instead of child and parent (so you can put functions there) / referencingObjectMaps have a join result term to specify a new term based on values of the parent logical source instead of relying solely on the subject of the parent triples map

Generally agree, although I don't see child and parent values as terms, rather as just "values".

Definitions

[...]
4. A ReferencingMap is something that generates an RDF Term from a different Triples Map, called the Parent Triples Map, i.e., takes values from the Parent Triples Map's Logical Source, called the Parent Logical Source. It can use a JoinCondition and generates the Join Result Term.

Do I understand it correctly that this is a new construct that is the generalization of a referencing object map?

A Join Condition specifies how to join these logical sources. It consists of a Child Term (generating a Term, taking values from the original Logical Source) and a Parent Term (generating a Term, taking values from the Parent Logical Source). By default (i.e., when no Join Condition is specified), a full join is performed.

A full join is not the current behavior when no join condition is specified for a referencing object map. What would be the use case for a full join?

Results

[...]

if we nest functions, we can do something like join values across source

Could you give an example of what this would look like?

Generally speaking I would steer clear of joining within a function term map, because:

joins are complex as it is,
now we would conceptually have to join during the "evaluation of an expression", which is different from the usual referencing object map joins.

Pros of this approach:

separation of concern wrt the logical source - function term map not having longical source
ability to generate values and terms using other sources
possible to combine expressions on both (or more) logical sources in a single result, e.g. template based on LS1 and LS2.

Downsides of this approach:

implementation, and I would say reasoning about the mapping, becomes complex because of conceptually different places to join.
must use functions to generate terms based on multiple sources

In general my preference would still be to have a more general way to join sources, such that:

it is possible to generate terms based on multiples sources from templates or any other possible future expressions type
the join logic can be implemented in a single general way

bjdmeest · 2022-03-04T10:11:24Z

if we nest functions, we can do something like join values across source

Could you give an example of what this would look like?

Is my final diagram a clarification? I can cook up some Turtle if you want :

Downsides of this approach:

* implementation, and I would say reasoning about the mapping, becomes complex because of conceptually different places to join.

* must use functions to generate terms based on multiple sources

I agree with your preference that joins can/should probably be solved more generally, my point was more that this structure allows complex joins across functions and sources and whatever. If we can solve the joins somewhere else, we can always limit the spec that function terms cannot be referencing object maps. But I prefer having a generic structure that later is limited than a specific structure that is hard to expand later on

In general my preference would still be to have a more general way to join sources, such that:
* it is possible to generate terms based on multiples sources from templates or any other possible future expressions type

* the join logic can be implemented in a single general way

👍

pmaria · 2022-03-04T12:40:20Z

if we nest functions, we can do something like join values across source

Could you give an example of what this would look like?

Is my final diagram a clarification? I can cook up some Turtle if you want :

yeah I think so. So basically you do something like

someFunction(value_TM1, value_TM2_via_join, ... , value_TMX_via_join)

bjdmeest · 2022-03-04T12:44:26Z

yeah I think so. So basically you do something like
someFunction(value_TM1, value_TM2_via_join, ... , value_TMX_via_join)

exactly, way simpler represented than what I was trying 😅

samiscoding · 2022-03-09T19:28:09Z

It is an interesting perspective to look at the problem, however,

Trying an example, I see that it leads to longer and more complex mapping rules compared to previous proposals. I'm a big fan of precision at the expense of complexity but if we can find a simpler solution that covers the definition of the same concepts we should consider it!
If I understand it correctly in this case one doesn't need to use "Fields" as discussed before instead of logicalSources, right?
Based on this definition, there wouldn't be any concept of FunctionTriplesMap, right?
I'm a bit confused by the concepts and syntaxes that you use from RML and R2RML. If we still want to reuse them then I see no reason to throw away previous proposals as we did during the Ghent meeting! Correct me if I'm wrong, wasn't the objection against our previous proposals in the meeting about not proposing it from scratch and reusing syntaxes? 😅

bjdmeest · 2022-03-14T09:48:55Z

It is an interesting perspective to look at the problem, however,

1. Trying an example, I see that it leads to longer and more complex mapping rules compared to previous proposals. I'm a big fan of precision at the expense of complexity but if we can find a simpler solution that covers the definition of the same concepts we should consider it!

Fully agree that it becomes more (too?) complex, the argument I mostly wanted to make was "We can keep source definition out of the function construct to allow joining values across data sources". It's very complex without additional constructs, but (i) it is currently possible and (ii) we can think of a better construct separate from functions :)

2. If I understand it correctly in this case one doesn't need to use "Fields" as discussed before instead of logicalSources, right?

True

3. Based on this definition, there wouldn't be any concept of FunctionTriplesMap, right?

We can steer away from linking function definitions with the triplesmap definition, but that's not completely cleared out yet, see kg-construct/rml-core#45 (comment)

4. I'm a bit confused by the concepts and syntaxes that you use from RML and R2RML. If we still want to reuse them then I see no reason to throw away previous proposals as we did during the Ghent meeting! Correct me if I'm wrong, wasn't the objection against our previous proposals in the meeting about not proposing it from scratch and reusing syntaxes? 😅

Huh, I had it completely the other way around, that it's confusing to reuse syntax and it would be better to make a clear distinction. Maybe we should clear that up with the community first.

bjdmeest · 2022-10-12T13:26:57Z

I removed the FnO label, as we decided that joins and functions are 2 complementary things that shouldn't be convoluted

dachafra · 2023-08-29T20:54:44Z

As it's a join issue, I'm going to move it to its corresponding repo

elsdvlee · 2024-01-26T07:39:19Z

Agreed with Ben to make a test case and verify if this issue can be solved using logical views.

bjdmeest · 2024-03-20T09:21:35Z

Fixed in https://github.com/kg-construct/rml-lv/blob/main/test-cases/RMLLVTC0003

bjdmeest self-assigned this Mar 25, 2021

bjdmeest mentioned this issue Mar 25, 2021

Transformation function over joined sources? kg-construct/rml-core#1

Closed

bjdmeest changed the title ~~describe the current "it is possible to join values across data sources, but without join conditions"~~ Feature: how to join values across data sources, with join conditions? Mar 26, 2021

bjdmeest referenced this issue in kg-construct/rml-core Mar 26, 2021

updated based on #2, #3, #4, #5, #7, #8, #9, #10, fixes #6

0bf4ec0

samiscoding mentioned this issue Mar 30, 2021

function map definition kg-construct/rml-core#11

Closed

bjdmeest mentioned this issue Oct 5, 2021

FnO output specification? kg-construct/rml-core#29

Closed

pmaria mentioned this issue Oct 8, 2021

FnO - Does a Function Triples Map need a Logical Source? kg-construct/rml-core#37

Closed

bjdmeest mentioned this issue Mar 4, 2022

Nested triples maps kg-construct/mapping-challenges#6

Open

bjdmeest mentioned this issue Mar 14, 2022

LanguageMap (or general literalelementmap): how to join? kg-construct/mapping-challenges#23

Closed

pmaria mentioned this issue Mar 1, 2023

Logical source cardinality kg-construct/rml-core#57

Closed

dachafra transferred this issue from kg-construct/rml-core Aug 29, 2023

elsdvlee transferred this issue from kg-construct/rml-jc Jan 26, 2024

elsdvlee closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: how to join values across data sources, with join conditions? #16

Feature: how to join values across data sources, with join conditions? #16

bjdmeest commented Mar 25, 2021 •

edited

Loading

andimou commented Mar 26, 2021

samiscoding commented Mar 29, 2021

pmaria commented Oct 8, 2021

samiscoding commented Oct 8, 2021

Does a Function Triples Map need a Logical Source?

andimou commented Oct 10, 2021

samiscoding commented Oct 11, 2021

thomas-delva commented Oct 11, 2021

bjdmeest commented Mar 2, 2022

bjdmeest commented Mar 2, 2022

pmaria commented Mar 3, 2022 •

edited

Loading

Definitions

Results

bjdmeest commented Mar 4, 2022

pmaria commented Mar 4, 2022

bjdmeest commented Mar 4, 2022

samiscoding commented Mar 9, 2022

bjdmeest commented Mar 14, 2022

bjdmeest commented Oct 12, 2022

dachafra commented Aug 29, 2023

elsdvlee commented Jan 26, 2024

bjdmeest commented Mar 20, 2024

Feature: how to join values across data sources, with join conditions? #16

Feature: how to join values across data sources, with join conditions? #16

Comments

bjdmeest commented Mar 25, 2021 • edited Loading

andimou commented Mar 26, 2021

samiscoding commented Mar 29, 2021

pmaria commented Oct 8, 2021

Does a Function Triples Map need a Logical Source?

samiscoding commented Oct 8, 2021

Does a Function Triples Map need a Logical Source?

andimou commented Oct 10, 2021

samiscoding commented Oct 11, 2021

thomas-delva commented Oct 11, 2021

bjdmeest commented Mar 2, 2022

Proposal

Definitions

Results

Diagrams

bjdmeest commented Mar 2, 2022

pmaria commented Mar 3, 2022 • edited Loading

Definitions

Results

bjdmeest commented Mar 4, 2022

pmaria commented Mar 4, 2022

bjdmeest commented Mar 4, 2022

samiscoding commented Mar 9, 2022

bjdmeest commented Mar 14, 2022

bjdmeest commented Oct 12, 2022

dachafra commented Aug 29, 2023

elsdvlee commented Jan 26, 2024

bjdmeest commented Mar 20, 2024

bjdmeest commented Mar 25, 2021 •

edited

Loading

pmaria commented Mar 3, 2022 •

edited

Loading