Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Greenplum LATERAL Subquery doesn't work #613

Open
kkozhakin opened this issue May 14, 2024 · 2 comments
Open

Greenplum LATERAL Subquery doesn't work #613

kkozhakin opened this issue May 14, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@kkozhakin
Copy link
Contributor

Describe the bug
LATERAL with VALUES doesn't work

SQL
Paste the SQL text here. For example:

CREATE TEMPORARY TABLE result_table
    ON COMMIT DROP AS
SELECT
    a.f1,
    a.f2,
    b.f1 as f3,
    b.f2 as f4,
    b.f3 as f5
FROM tmp_base AS a
CROSS JOIN LATERAL (
VALUES
    ('s1', a.f3, a.f9),
    ('s2', a.f4, a.f10),
    ('s3', a.f5, a.f11),
    ('s4', a.f6, a.f12),
    ('s5', a.f7, a.f13),
    ('s6', a.f8, a.f14)
) AS b (f1, f2, f3);

INSERT INTO "schema_1"."table_1"
SELECT * FROM "result_table";

To Reproduce
Note here we refer to SQL provided in prior step as stored in a file named test.sql

from sqllineage.runner import LineageRunner
lr_sqlfluff = LineageRunner(
    sql, dialect='greenplum', silent_mode=True
)

for path in lr_sqlfluff.get_column_lineage(False):
    print(" <- ".join(str(col) for col in reversed(path)))
2024-03-21 09:36:17.221 pid=590 MainThread run_id=998b863b-7a77-44c4-b74f-72e621b1fe19   ERROR __main__: Lineage for task delivery_etl.delivery_gp_rep_supply_rep_supply_finance_pivoted: FAIL
Traceback (most recent call last):
  File "/place/sandbox-data/tasks/1/4/2232947541/mounted_arcadia/taxi/dmp/dwh/services/meta_etl/meta_etl/layer/yt/import_meta/dmp/collect_column_lineage_sql/impl.py", line 58, in get_lineage_by_sql
    paths = get_lineage_with_timeout(prepare_sql(row['combined_query']))
  File "/place/sandbox-data/tasks/1/4/2232947541/mounted_arcadia/taxi/dmp/dwh/libraries/column_lineage/column_lineage/collect_column_lineage_sql/impl.py", line 37, in new_f
    result = f(*args, **kwargs)
  File "/place/sandbox-data/tasks/1/4/2232947541/mounted_arcadia/taxi/dmp/dwh/libraries/column_lineage/column_lineage/collect_column_lineage_sql/impl.py", line 121, in get_lineage_with_timeout
    return _get_lineage(sql)
  File "/place/sandbox-data/tasks/1/4/2232947541/mounted_arcadia/taxi/dmp/dwh/libraries/column_lineage/column_lineage/collect_column_lineage_sql/impl.py", line 129, in _get_lineage
    for path in lr.get_column_lineage(False, False):
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/runner.py", line 26, in wrapper
    self._eval()
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/runner.py", line 199, in _eval
    stmt_holder = analyzer.analyze(stmt, session.metadata_provider)
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/analyzer.py", line 58, in analyze
    lineage_holder = extractor.extract(
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/create_insert.py", line 45, in extract
    holder |= self.delegate_to_select(segment, holder)
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/create_insert.py", line 138, in delegate_to_select
    return self.delegate_to(
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/base.py", line 206, in delegate_to
    return extractor_cls(self.dialect, self.metadata_provider).extract(
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/select.py", line 46, in extract
    for sq in self.list_subquery(segment):
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/extractors/base.py", line 84, in list_subquery
    result = cls._parse_subquery(list_subqueries(segment))
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/utils.py", line 183, in list_subqueries
    subquery += list_subqueries(from_expression_element)
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/utils.py", line 154, in list_subqueries
    as_segment, target = extract_as_and_target_segment(segment)
  File "/opt/dmp/toxenvs/py3/lib/python3.9/site-packages/sqllineage/core/parser/sqlfluff/utils.py", line 229, in extract_as_and_target_segment
    target = sublist[0] if is_subquery(sublist[0]) else sublist[0].segments[0]
IndexError: tuple index out of range

Expected behavior
A clear and concise description of what you expected to happen, and the output in accordance with the To Reproduce section.

schema_1.table_1.f1 <- tmp_base.f1
schema_1.table_1.f2 <- tmp_base.f2
schema_1.table_1.f4 <- tmp_base.f3
schema_1.table_1.f4 <- tmp_base.f4
schema_1.table_1.f4 <- tmp_base.f5
schema_1.table_1.f4 <- tmp_base.f6
schema_1.table_1.f4 <- tmp_base.f7
schema_1.table_1.f4 <- tmp_base.f8
schema_1.table_1.f5 <- tmp_base.f9
schema_1.table_1.f5 <- tmp_base.f10
schema_1.table_1.f5 <- tmp_base.f11
schema_1.table_1.f5 <- tmp_base.f12
schema_1.table_1.f5 <- tmp_base.f13
schema_1.table_1.f5 <- tmp_base.f14

Python version (available via python --version)

  • 3.10

SQLLineage version (available via sqllineage --version):

  • 1.5.3
@kkozhakin kkozhakin added the bug Something isn't working label May 14, 2024
@reata reata changed the title LATERAL with VALUES doesn't work Greenplum LATERAL Subquery doesn't work May 20, 2024
@reata
Copy link
Owner

reata commented May 20, 2024

We have issue handling lateral subqueries. It also throws the same exception for simple case like:

SELECT * FROM foo, LATERAL (SELECT * FROM bar WHERE bar.id = foo.bar_id) ss;

@reata
Copy link
Owner

reata commented May 20, 2024

With #615 merged, the exception is gone. The current output from master branch is:

<default>.result_table.f1 <- <default>.tmp_base.f1
<default>.result_table.f2 <- <default>.tmp_base.f2
<default>.result_table.f3 <- <default>.b.f1
<default>.result_table.f4 <- <default>.b.f2
<default>.result_table.f5 <- <default>.b.f3
schema_1.table_1.* <- <default>.result_table.*

Parsing column lineage from lateral subquery is another story then. <default>.b.f1 <default>.b.f2 and <default>.b.f3 here are less desirable but requires further effort.

Among the three, <default>.b.f1 comes from constant values, we have #480 to track that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants