-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: how to join values across data sources, with join conditions? #16
Comments
I think this might be affected in the end by how we handle joins in RML |
I think it goes back again to the basic definition of |
Does a Function Triples Map need a Logical Source?The current use case for a
An example of this is included in one of the proposed FnO test cases: RMLFNOTC009 However, since a As this is not the same type of join as a join on a At the same time we have a very similar mapping challenge for generating literal values by a joining different logical sources: join-on-literal challenge. I believe it would be advantageous to come up with a solution that covers both generating literals from different |
That's the reason I insist that we should consider the big picture while defining the fundamental concepts i.e. function triples map and function term map! Check the alternative definitions with an example in |
so your suggestion is to follow a similar approach as in the case of the |
@andimou yes, exactly! |
When working on RML fields I had in mind you could do something like In general I tend to agree FNML shouldn't need its own way to define joins. For the example in RMLFNOTC009 I wonder why one cannot just call |
I have the feeling the discussion is revolving around 'functions should or should not specify their own logical source', to solve exactly this issue. I think we first need to solve that before we can solve functions properly. That's why I made following ProposalI'm purposely not specifying the relation with existing RML and R2RML constructs, nor specifying exactly how to describe a function, instead, I'm making a proposal where we can have functions without defining their own logical sources, and still join values across data sources TL:DR; functions are a special kind of term map / no logical source for functions / you specify input values for functions using term maps (so you can do nesting) / join conditions specify childterm and parentterm instead of child and parent (so you can put functions there) / referencingObjectMaps have a join result term to specify a new term based on values of the parent logical source instead of relying solely on the subject of the parent triples map Definitions
ResultsUsing these definitions, we can:
DiagramsA function description (red = FnO stuff, green = FNML stuff, feel free to ignore those colors for now): graph LR
TM([TermMap])
FM([FunctionTermMap]):::fnml
TM -->|is-a| FM
FM -->|execution| Ex([Execution]):::fnml
FM -->|output| J(IRI):::fnml
Ex -->|function| ExOM([fno:Function TermMap]):::fno
Ex -->|parameterMap| ParamPOM([ParameterMap])
ParamPOM -->|parameter| ParamPM(parameter):::fno
ParamPOM -->|parameter value| ParamOM([parameter value TermMap])
classDef fnml fill:#8F9
classDef fno fill:#F89
classDef rml fill:#89F
classDef ls2 fill:#09F
A join description (dark blue === Parent Logical Source): graph LR
T3M([TriplesMap])
T3M-->|predicatObjectMap| POM([PredicatObjectMap])
POM -->|predicateMap| PM([PredicateMap])
POM -->|objectMap| ROM([ReferencingObjectMap])
ROM -->|parentTriplesMap| PT3M([TriplesMap]):::ls2
ROM -->|joinCondition| JC([JoinCondition])
ROM -->|joinResultTerm| JTM([TermMap]):::ls2
JC -->|childTerm| ChTM([TermMap])
JC -->|parentTerm| PaTM([TermMap]):::ls2
classDef fnml fill:#8F9
classDef fno fill:#F89
classDef rml fill:#89F
classDef ls2 fill:#09F
A join across sources example (result is "{childsource_value}{parentsource_value}" graph LR
T3M([TriplesMap])
T3M-->|predicatObjectMap| POM([rr:PredicatObjectMap])
POM -->|objectMap| FM
FM([FunctionTermMap])
FM -->|execution| Ex([Execution])
FM -->|output| J(grel:stringOut):::fno
Ex -->|function| ExFn(grel:array_join):::fno
Ex -->|parameterMap| ParamPOM([ParameterMap])
ParamPOM -->|parameter| P1(grel:array_value):::fno
ParamPOM -->|parameter value| O1("{childsource_value}"):::fno
ParamPOM -->|parameter| P2(grel:array_value):::fno
ParamPOM -->|parameter value| ROM([ReferencingObjectMap])
ROM -->|parentTriplesMap| PT3M([TriplesMap]):::ls2
ROM -->|joinCondition| JC([JoinCondition])
ROM -->|joinResultTerm| JTM("{parentsource_value}"):::ls2
JC -->|childTerm| ChTM([TermMap]):::ls2
JC -->|parentTerm| PaTM([TermMap]):::ls2
classDef fnml fill:#8F9
classDef fno fill:#F89
classDef rml fill:#89F
classDef ls2 fill:#09F
|
@samiscoding and @pmaria could you have a look at my proposal here? I have the feeling we need to fix this first before we can fix FNML :) (@dachafra putting you in the loop since you were gonna check FNML in any case ;) ) |
Generally agree, although I don't see child and parent values as terms, rather as just "values".
Do I understand it correctly that this is a new construct that is the generalization of a referencing object map?
A full join is not the current behavior when no join condition is specified for a referencing object map. What would be the use case for a full join?
Could you give an example of what this would look like? Generally speaking I would steer clear of joining within a function term map, because:
Pros of this approach:
Downsides of this approach:
In general my preference would still be to have a more general way to join sources, such that:
|
Is my final diagram a clarification? I can cook up some Turtle if you want :
I agree with your preference that joins can/should probably be solved more generally, my point was more that this structure allows complex joins across functions and sources and whatever. If we can solve the joins somewhere else, we can always limit the spec that function terms cannot be referencing object maps. But I prefer having a generic structure that later is limited than a specific structure that is hard to expand later on
👍 |
yeah I think so. So basically you do something like someFunction(value_TM1, value_TM2_via_join, ... , value_TMX_via_join) |
exactly, way simpler represented than what I was trying 😅 |
It is an interesting perspective to look at the problem, however,
|
Fully agree that it becomes more (too?) complex, the argument I mostly wanted to make was "We can keep source definition out of the function construct to allow joining values across data sources". It's very complex without additional constructs, but (i) it is currently possible and (ii) we can think of a better construct separate from functions :)
True
We can steer away from linking function definitions with the triplesmap definition, but that's not completely cleared out yet, see kg-construct/rml-core#45 (comment)
Huh, I had it completely the other way around, that it's confusing to reuse syntax and it would be better to make a clear distinction. Maybe we should clear that up with the community first. |
I removed the FnO label, as we decided that joins and functions are 2 complementary things that shouldn't be convoluted |
As it's a join issue, I'm going to move it to its corresponding repo |
Agreed with Ben to make a test case and verify if this issue can be solved using logical views. |
Came from kg-construct/rml-core#1
it is currently possible to join values across data sources, but without join conditions (see eg https://github.com/RMLio/rml-fno-test-cases/blob/master/RMLFNOTC0009-CSV/mapping.ttl, see also https://kg-construct.slack.com/archives/C01QFSW77QF/p1615717859003600)
The text was updated successfully, but these errors were encountered: