Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformations: Implement proper outer join #26316

Closed
F4ncyMooN opened this issue Jul 14, 2020 · 18 comments
Closed

Transformations: Implement proper outer join #26316

F4ncyMooN opened this issue Jul 14, 2020 · 18 comments

Comments

@F4ncyMooN
Copy link

What happened:
When I use outer join transformer to calculate for the many to one case, I found the result is not expected.
For example, we have two data frames to join.

First is,

Label1 Label2 outer
lab1 lab2 100

Second is,

Label1 inner
lab1 1
lab1 2

Joined result only show 1 row, but I think the following result is correct.

What you expected to happen:

Label1 Label2 outer inner
lab1 lab2 100 1
lab1 lab2 100 2

Now the join function only show the first row.

How to reproduce it (as minimally and precisely as possible):
As above
Anything else we need to know?:
I found many other transformation functions rely on this join function in seriesToColumns.ts.
Is it expected or just a bug?

Environment:

  • Grafana version: 7.0+
  • Data source type & version: Any
  • OS Grafana is installed on: Any
  • User OS & Browser: Any
  • Grafana plugins: Outer join Transformer
  • Others:
@ivanahuckova
Copy link
Member

@dprokop can you look at this? 🙂

@ivanahuckova ivanahuckova changed the title join transformation result is wrong Transformations: Join transformation result is wrong Jul 15, 2020
@dprokop
Copy link
Member

dprokop commented Jul 17, 2020

I'm not 100% sure if this is by design. But looking at the provided example I think it looks correct, meaning that Grafana's result is probably off. @mckn do you have any thoughts on this?

@dprokop dprokop added this to Needs triage in Platform Backlog via automation Jul 17, 2020
@dprokop dprokop moved this from Needs triage to Needs investigation or followup in Platform Backlog Jul 17, 2020
@dprokop dprokop added the needs investigation for unconfirmed bugs. use type/bug for confirmed bugs, even if they "need" more investigating label Jul 17, 2020
@F4ncyMooN
Copy link
Author

F4ncyMooN commented Jul 27, 2020

But refer to this page. In an outer join, unmatched rows in one or both tables can be returned.
https://mode.com/sql-tutorial/sql-outer-joins/#outer-joins

And the current result only returned joined rows

@mckn
Copy link
Contributor

mckn commented Aug 3, 2020

@F4ncyMooN so the current implementation of join does that based on a field (If I recall it correctly). Have you tried the new merge transform added in Grafana 7.1?

@F4ncyMooN
Copy link
Author

I have used merge on time function and it will concat the second timeseries behind the first timeseries, like the following output.
What I expected is outer join.

time Label1 Label2 outer inner
xx lab1 lab2 100
xx lab1 1
xx lab1 2

@F4ncyMooN
Copy link
Author

By the way, I have implemented full outer join with multiple fields. Would you mind to review the code and merge into master? @mckn @dprokop

@mckn
Copy link
Contributor

mckn commented Aug 5, 2020

@F4ncyMooN that sounds awesome! Can you reference that PR here?

@torkelo torkelo removed this from Needs investigation or followup in Platform Backlog Aug 5, 2020
@torkelo torkelo removed this from Bugs in Frontend Platform Backlog Aug 5, 2020
@torkelo
Copy link
Member

torkelo commented Aug 5, 2020

This does not explain why the join is wrong, does not show what join results in.

Your example looks wrong, the expected output looks incorrect.

@torkelo
Copy link
Member

torkelo commented Aug 5, 2020

Sorry I missed this line

Joined result only show 1 row, but I think the following result is correct.

This transform was mainly built to transform time series to a combined table. So does not handle duplicate join keys (ie rows with same join id in a source table)

@svet-b
Copy link

svet-b commented Aug 6, 2020

Got hit by the same issue.

Trying to do an outer join on something like the following two tables on the id field:

Time id value
2020-08-06 18:17:42 1 12.3
2020-08-06 18:23:42 1 45.6
2020-08-06 18:29:42 1 78.9
2020-08-06 18:35:42 1 10.1
2020-08-06 18:41:42 1 12.3
id label
1 ABC

I believe the expected result of an actual outer join is:

Time id label value
2020-08-06 18:17:42 1 ABC 12.3
2020-08-06 18:23:42 1 ABC 45.6
2020-08-06 18:29:42 1 ABC 78.9
2020-08-06 18:35:42 1 ABC 10.1
2020-08-06 18:41:42 1 ABC 12.3

The actual result is:

Time id label value
2020-08-06 18:17:42 1 ABC 12.3

Rather than an outer join, the result appears to be either a left or right join (not sure how the sequence of tables is considered), and/or a reduce on the field which is used for the join.

@cpmoore
Copy link

cpmoore commented Aug 27, 2020

I’m experiencing the same behavior.
For my use case I was wanting to store the descriptions of reason codes in a Postgres database and use them to decorate my Prometheus results, however instead of joining the description to every value for the series, only a single record is returned.

@torkelo
Copy link
Member

torkelo commented Aug 27, 2020

The problem is the current join does not support duplicate rows (based on key filed) in each source table

@cpmoore
Copy link

cpmoore commented Aug 28, 2020

In my opinion, I would leave the current transformation (maybe rename it to something like single join) as it may be useful in some cases, but add a new left join transformation that supports multiple rows.

@dprokop dprokop added this to Needs triage in Platform Backlog via automation Sep 2, 2020
@dprokop dprokop moved this from Needs triage to Feature requests in Platform Backlog Sep 2, 2020
@dprokop dprokop changed the title Transformations: Join transformation result is wrong Transformations: Implement proper outer join Sep 2, 2020
@dprokop dprokop added type/feature-request and removed needs investigation for unconfirmed bugs. use type/bug for confirmed bugs, even if they "need" more investigating labels Sep 2, 2020
@MartinRoenneburg
Copy link

I ran into the same issue.

Two different ElasticSearch Datasources ...
Datasource 1: short living workflow data
[workflow1, stuff, metadata ID-1]
[workflow2, stuff, metadata ID-2]
[worklfow3, stuff, metadata ID-1]

Datasource 2: metadata
[metadata ID-1, details on ID-1]
[metadata ID-2, details on ID-2]
[metadata ID-3, details on ID-3]

my intention: show workflow data with additional metadata

Transformations:

Rename by regex to create a unique column name ...
Outer Join
Filter Data by values: to remove metadata that has no corresponding workflow

Expected: one to many (1:n) matching
[workflow1, stuff, metadata ID-1, details on ID-1]
[workflow2, stuff, metadata ID-2, details on ID-2]
[worklfow3, stuff, metadata ID-1, details on ID-1]

Found: 1:1 matching and resulting missing rows
[workflow1, stuff, metadata ID-1, details on ID-1]
[workflow2, stuff, metadata ID-2, details on ID-2]

I'm very interested in this feature !

@noleto
Copy link

noleto commented Aug 4, 2021

I ran into the same "tricky" behavior of the "outer join" transformation, it does not work properly when the table to join has several rows on the "field" used to join.

I found a workaround while waiting for the fix, here goes:

  1. create a query variable that pulls out the "ID field" and all the values you want to show into a single column: For instance, in my case, I'm using PosgreSQL and wanted to show an address mail for each user ID, so I created a variable named mapping_userid_email that is fed by the query SELECT concat('(''', user_id, ':', email, ''')') FROM user_account_info
  2. Now, go to the query you want to perform the outer join:
WITH fake_table_join AS (
SELECT split_part(user_email, ':', 1) as user_id, split_part(user_email, ':', 2) as email
    FROM (VALUES ${mapping_userid_email:raw}) AS fake_table(user_email)
)
SELECT your_table.*, fake_table_join.email
FROM your_table
LEFT JOIN fake_table_join ON your_table.user_id = fake_table_join.user_id

Of course, workaround works pretty well if you don't have a huge volume of data in your query variable (as Grafana loads all the values in memory for this variable)

@svet-b
Copy link

svet-b commented Sep 1, 2021

@noleto thanks for sharing the workaround. Unfortunately for my use case I'm trying to do the join on some Influx data, and since Influx isn't a relational database I don't think it's applicable. I thought I'd share that in case you (or anyone) might have relevant suggestions for a context like that.

@rakdman
Copy link

rakdman commented May 30, 2023

Is the issue highlighted by @F4ncyMooN on 14 July 202 is fixed ?

@gelicia
Copy link
Contributor

gelicia commented Feb 28, 2024

I believe this was fixed by #72176

If your use case is not supported, feel free to comment on this issue with what is missing and we can figure out if we're needing a separate, more specific ticket or not.

@gelicia gelicia closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests