Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing tables returned by sql_metadata.get_query_tables #112

Closed
tdebroc opened this issue Mar 19, 2021 · 8 comments · Fixed by #121
Closed

missing tables returned by sql_metadata.get_query_tables #112

tdebroc opened this issue Mar 19, 2021 · 8 comments · Fixed by #121
Labels
Milestone

Comments

@tdebroc
Copy link

tdebroc commented Mar 19, 2021

Hello,

I have 80 long queries, and for 27 of them, results are not correct for "sql_metadata.get_query_tables(ddl)"

Here is an example:

ddl_for_sql_metadata.sql

SELECT
  "attr1"
FROM
  (
   SELECT
     "attr2"
   FROM
     (database1.table1 "aliasTable"
   LEFT JOIN (
      SELECT
        "attr3"
      FROM
        ((
         SELECT
           "attr4"
         FROM
           database2.table2
         GROUP BY 1, 2, 3, 4
      )  "aliasTable2"
      LEFT JOIN (
         SELECT
           "attr5"
         FROM
           database3.table3
         GROUP BY 1, 2
      )  "X" ON ("table"."attr" = "table"."attr"))
   )  "Y" ON ("table"."attr" = "table"."attr"))
)  "Z"
WHERE (myCondition)
GROUP BY 1, 2, 3, 4, 5

Quick way to test:

import sql_metadata

def read_from_file(file_path):
    return open(file_path, "r").read()

ddl = read_from_file("ddl_for_sql_metadata.sql")
tables = sql_metadata.get_query_tables(ddl)
print(tables)

It returns only ['database2.table2', 'database3.table3']
and it should return database1.table1

@macbre macbre added the bug label Mar 19, 2021
@macbre macbre added this to the v1.12 milestone Mar 19, 2021
@macbre
Copy link
Owner

macbre commented Mar 19, 2021

@tdebroc , thanks for reporting a bug!

Can you please provide the full list of queries for the sake of better test coverage with more queries? GitHub's gist link will do 🙂 Thanks!

@tdebroc
Copy link
Author

tdebroc commented Mar 19, 2021

Hello macbre, yes I'll try to add some, it just very long queries and I need to anonymize them, so it takes time, but yes I'll do it because your project is nice :-)

@tdebroc
Copy link
Author

tdebroc commented Mar 19, 2021

Here is another one:

SELECT
  "xxxxx"
FROM
  (database1.table1 alias
LEFT JOIN database2.table2 ON ("tt"."ttt"."fff" = "xx"."xxx"))

which does't work

There is a problem with "(", because if I remove it, it works.

For info, I ended up writing my own parsing script and I remove all "(" ")" to do the check

@tdebroc
Copy link
Author

tdebroc commented Mar 19, 2021

I checked on the 27 diffs I have and

  • Most the diff seems linked to the management of "(".
  • Some of them are because sql_metadata is returning tables built WITH and these are not real tables so I don't want them

Example:

WITH
    database1.tableFromWith AS SELECT * FROM table3
SELECT
  "xxxxx"
FROM
  database1.tableFromWith alias
LEFT JOIN database2.table2 ON ("tt"."ttt"."fff" = "xx"."xxx")

should only return table3 and database2.table2 and not database1.tableFromWith

@macbre
Copy link
Owner

macbre commented Mar 23, 2021

Brackets handling improved in #113.

WITH handling will be improved when a new tokens iterator is introduced - see #98.

@tdebroc
Copy link
Author

tdebroc commented Mar 23, 2021

Wow you are fast !
When will it be available via pip install ? And which version ? So I can give a test.

I saw the tests case you have added, for the WITH, one interesting to add would be if there is several tables generated in the WITH:

Example:

        WITH
            database1.tableFromWith AS SELECT * FROM table3,
            database1.tableFromWith2 AS SELECT * FROM table4,
            database1.tableFromWith3 AS SELECT * FROM table5,
            database1.tableFromWith4 AS SELECT * FROM table6
        SELECT
          "xxxxx"
        FROM
          database1.tableFromWith alias
        LEFT JOIN database2.table2 ON ("tt"."ttt"."fff" = "xx"."xxx")

which should only return database2.table2

Best regards

@macbre
Copy link
Owner

macbre commented Mar 23, 2021

🙂

As said above, WHEN tests are still marked as skipped - sql-metadata internals need to mark the table as being used with WITH keyword and ignore them when returning the list of "real" tables.

I'll add the test that you suggested 👌🏻. In the meantime, stay tuned 🙂

@macbre
Copy link
Owner

macbre commented Mar 23, 2021

And a test case from the MySQL docs:

WITH
  cte1 AS (SELECT a, b FROM table1),
  cte2 AS (SELECT c, d FROM table2)
SELECT b, d FROM cte1 JOIN cte2
WHERE cte1.a = cte2.c;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants