Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For each must filter on schema #1167

Closed
jasonicarter opened this issue Mar 24, 2022 · 3 comments
Closed

For each must filter on schema #1167

jasonicarter opened this issue Mar 24, 2022 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@jasonicarter
Copy link

Using the only wildcard, %, with the for each check does not include the datasource schema resulting in a query which scans all tables within the information_schema.tables table.

QUERY PRODUCE...

SELECT table_name
FROM information_schema.tables
WHERE (lower(table_name) like '%');

SHOULD BE...

SELECT table_name
FROM information_schema.tables
WHERE (lower(table_name) like '%')
AND lower(table_schema) = '[SCHEMA_NAME]';

Soda Scan

soda scan -d raw_stripe stitch_stripe.yml

Configuration

data_source raw_stripe:
  type: snowflake
  connection:
    username: ${SNOWFLAKE_USER}
    password: ${SNOWFLAKE_PASSWORD}
    account: ${SNOWFLAKE_ACCOUNT}
    database: RAW
    schema: STRIPE
    warehouse: my_warehouse
    role: my_role

Check

for each table T:
  tables:
    - raw_stripe.charges # works
    - raw_stripe.% # works but queries all tables
    - '%' # works but queries all tables 
  checks:
    - row_count > 0

Result
Queries [RANDOM] tables not in the schema which results in execution errors

Soda Core 3.0.0b4
Reading configuration file "configuration.yml"
Adding SodaCL dir checks/raw/
Reading SodaCL file "checks/raw/stitch_stripe.yml"
Scan execution starts
Query raw_stripe.for_each_table_T[0]:
SELECT table_name 
FROM information_schema.tables
WHERE (lower(table_name) like '%')

Query raw_stripe.[RANDOM_TABLE].aggregation[0]:
SELECT 
  COUNT(*) 
FROM [RANDOM_TABLE].
Query execution error in raw_stripe.[RANDOM_TABLE]..aggregation[0]: 002003 (42S02): SQL compilation error:
Object '[RANDOM_TABLE].' does not exist or not authorized.

[...] -- ALL other tables in `information_schema.tables`
@tombaeyens tombaeyens added the bug Something isn't working label Mar 28, 2022
@tombaeyens tombaeyens changed the title Check for each with wildcard only leaves out schema querying entire information_schema.tables For each must filter on schema Mar 28, 2022
@tombaeyens
Copy link
Contributor

For each must filter only tables for the schema defined in the default data source.
And the docs for the for-each must be reviewed explaining that schema definition goes into the configuration and not in the for each check.

@tombaeyens
Copy link
Contributor

Just tested and could not reproduce. I got

Query postgres.for_each_table_T[0]:
SELECT table_name 
FROM information_schema.tables
WHERE (lower(table_name) like 'sodatest_customers_b7580920' OR lower(table_name) like 'sodatest_rawcustomers_54820c8f%' OR lower(table_name) like 'sodatest_rawcustomers_54820c8f%')
      AND lower(table_name) not like 'non_existing_table'
      AND lower(table_schema) = 'public'
Query postgres.sodatest_customers_b7580920.aggregation[0]:

So looks to be fixed. @vijaykiran can you let @jasonicarter know when we do another release?

@vijaykiran
Copy link
Contributor

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants