Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to create temporary table to load batch data #1012

Closed
mparayil opened this issue Jan 27, 2020 · 13 comments
Closed

Unable to create temporary table to load batch data #1012

mparayil opened this issue Jan 27, 2020 · 13 comments

Comments

@mparayil
Copy link

Current Behavior

When generating a batch of data pointing to snowflake sql database to validate the datasource using a query, this throws error stating a snowflake error with CANNOT CREATE TABLE. (see screenshot)

See below:

batch_query = """ select * from network_outreach where dispatch_request_time_utc >= to_date('2020-01-01') and dispatch_request_time_utc < to_date('2020-01-07'); """

batch_kwargs = {'table': "network_outreach", 'schema': 'dbo', 'query': batch_query}

batch = context.get_batch(data_asset_name=normalized_data_asset_name, expectation_suite_name=expectation_suite_name, batch_kwargs=batch_kwargs) print(batch.get_row_count())

Ideal behavior

batch_kwargs will allow raw query to be taken into account to allow the formation of a temp table pointing to snowflake database to create batch off.

Screen Shot 2020-01-27 at 11 52 08 AM

@zach
Copy link

zach commented Jan 29, 2020

Is this the same error as #569 in a different context?

@jcampbell
Copy link
Member

I think it's very related yes; unfortunately the underlying problem appears to be that SQLAlchemy doesn't provide a high-level API for creating a temporary table or require drivers to provide a DDL translation (which is probably because there's so much variety of support and behavior).

I'd love ideas for how to improve this and make it more streamlined.

@bravefoot
Copy link

It looks like sqlalchemy is unsure what schema to execute the statement in.
As a workaround, have you tried adding a schema to the sqlalchemy string as described here: https://docs.snowflake.net/manuals/user-guide/sqlalchemy.html#additional-connection-parameters

Or giving the user you are connecting as a default schema in Snowflake?

@mparayil
Copy link
Author

I've added schema to the connection string as well (see below) but that yields the same error when trying pass the query into batch_kwargs:

conn_url = f"snowflake://{SNOWFLAKE_SERVICE_USERNAME}:{SNOWFLAKE_SERVICE_PASSWORD}@{SNOWFLAKE_ACCOUNT}/{SNOWFLAKE_DSA_DATABASE}/{SNOWFLAKE_DSA_SCHEMA}?warehouse={SNOWFLAKE_DSA_WAREHOUSE}&role={SNOWFLAKE_DSA_ROLE}"

I'll look into using a default schema for the user. As a workaround, using @jcampbell suggestion of utilizing a PandasDatasource instead SqlAlchemyDatasource:

def sfquery_to_df(query: str, conn_url: str, parse_date_cols=None, chunk_size=None):
    engine = create_engine(conn_url)
    conn = engine.connect()
    chunk_list = []
    if parse_date_cols is None or chunk_size is None:
        for chunk in pd.read_sql_query(query, conn, chunksize=1000):
            chunk_list.append(chunk)
        df = pd.concat(chunk_list)
    conn.close()
    engine.dispose()
    return df


rule_df = sfquery_to_df(rule_query, conn_url, chunk_size=10000)
batch_kwargs = {'dataset': rule_df}
batch = context.get_batch('pandas_datasource/in_memory_generator/table_name', 
                          expectation_suite_name=expectation_suite_name,
                          batch_kwargs=batch_kwargs)

@NelsonTorres
Copy link
Contributor

Should we use snowflake-sqlalchemy tool for snowflake communications?
https://docs.snowflake.net/manuals/user-guide/sqlalchemy.html

@jcampbell
Copy link
Member

@NelsonTorres : yes, that's the right driver!

An update here: this is definitely something we'll have to add some workaround for, since currently we don't have a good way to create the temporary tables using the high-level SQLAlchemy API so end up having lots of dialect-specific issues. I'd be happy to merge in a PR that addresses Snowflake-specific syntax into the create temporary table logic of SqlAlchemyDataset, but while we're focusing on 0.9.0 feature updates haven't been able to do the requisite testing to make one ourselves.

@jcampbell jcampbell changed the title snowflake query unable to create temporary table to load batch data Unable to create temporary table to load batch data Feb 4, 2020
@jcampbell
Copy link
Member

I updated this title because the issue is common outisde of snowflake; we need to essentially create a better way to detect the backend and adjust the temporary table creation logic:

  • BigQuery already has a specific approach (no temporary table)
  • Oracle needs CREATE GLOBAL TEMPORARY TABLE (per Slack user Ken Geis)

jcampbell added a commit that referenced this issue Mar 5, 2020
Signed-off-by: James Campbell <james.p.campbell@gmail.com>
jcampbell added a commit that referenced this issue Mar 5, 2020
Signed-off-by: James Campbell <james.p.campbell@gmail.com>
@Aylr
Copy link
Contributor

Aylr commented Mar 6, 2020

Fixed in 0.9.3

@Aylr Aylr closed this as completed Mar 6, 2020
@armaandhull
Copy link
Contributor

@Aylr I am using 0.9.3 and the fix in #1123 hasn't worked so far for me using the CLI. Can you provide examples of how you'd pass the table_name and query when using great_expectations suite new?

@jcampbell
Copy link
Member

@armaandhull : unfortunately, the fix doesn't work with the CLI yet; there is an open PR now (#1129) that will update the CLI to take advantage of the change (and it also changes the relevant batch_kwarg for declaring a snowflake table to snowflake_transient_table)

@jcampbell
Copy link
Member

@armaandhull : the new 0.9.4 release supports snowflake through the suite new command now, though it is still not fully tested as we're building out our snowflake test setup.

@armaandhull
Copy link
Contributor

@jcampbell thanks for the update 👍 will try it out with 0.9.4

@sidhreddy
Copy link

  • Oracle needs CREATE GLOBAL TEMPORARY TABLE (per Slack user Ken Geis)

@jcampbell Is there plan to fix this for Oracle as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants