Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to Handle Multiple Lineage Path for Same Column #228

Closed
Nuclassmore opened this issue Mar 3, 2022 · 1 comment · Fixed by #234
Closed

Failure to Handle Multiple Lineage Path for Same Column #228

Nuclassmore opened this issue Mar 3, 2022 · 1 comment · Fixed by #234
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@Nuclassmore
Copy link

Nuclassmore commented Mar 3, 2022

Why holders.get_column_lineage return simple_paths which equal to one only? Are there any reasons?
In my case, i have a target column (i.e. with alias X) which is the result of a CASE construction, and all of those columns (A,B,C) are from same column and same table (i.e. BASE_TABLE.Z), so result of nx.all_simple_paths returns the next paths:

  1. BASE_TABLE.Z->A->X
  2. BASE_TABLE.Z->B->X
  3. BASE_TABLE.Z->C->X

But get_column_lineage not return any one, the reason is the following lines:
if len(simple_paths) == 1:
columns.add(tuple(simple_paths[0]))

(simplified example)
CASE
when JOIN_TABLE_1.A not null JOIN_TABLE_1.A
when JOIN_TABLE_2.B not null JOIN_TABLE_2.B
else JOIN_TABLE_3.C
END as X
JOIN SELECT Z as A FROM BASE_TABLE as JOIN_TABLE_1
JOIN SELECT Z as B FROM BASE_TABLE as JOIN_TABLE_2
JOIN SELECT Z as C FROM BASE_TABLE as JOIN_TABLE_3

Thank you in advance!

@reata
Copy link
Owner

reata commented Mar 6, 2022

Thanks for reporting this. This is a bug. Actually if you see the comment in the code, you'll find that I didn't know back then whether there would be more than one simple path. And now you give a valid example. We should get this fixed.


Note for future code implementation:the UI interface is fine. But the command line interface is giving incorrect result. Related code is with https://github.com/reata/sqllineage/blob/master/sqllineage/core/holders.py#L33

SQL for reproducing this bug:

INSERT OVERWRTIE TABLE foo
SELECT 
CASE
when JOIN_TABLE_1.A not null JOIN_TABLE_1.A
when JOIN_TABLE_2.B not null JOIN_TABLE_2.B
else JOIN_TABLE_3.C
END as X
FROM JOIN_TABLE_0
JOIN (SELECT Z as A FROM BASE_TABLE) as JOIN_TABLE_1
JOIN (SELECT Z as B FROM BASE_TABLE) as JOIN_TABLE_2
JOIN (SELECT Z as C FROM BASE_TABLE) as JOIN_TABLE_3

Command line returns nothing:

sqllineage -f foo.sql -l column

@reata reata added bug Something isn't working good first issue Good for newcomers labels Mar 6, 2022
@reata reata self-assigned this Mar 13, 2022
@reata reata changed the title Why holders.get_column_lineage return simple_paths which equal to one only Failure to Handle Multiple Lineage Path for Same Column Mar 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants