Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Enforce fully qualified table names in Snowflake #3423

Conversation

sfc-gh-madkins
Copy link
Collaborator

Signed-off-by: miles.adkins miles.adkins@snowflake.com

What this PR does / why we need it:

Enforces fully qualified snowflake source names

Which issue(s) this PR fixes:

Fixes #3286

Signed-off-by: miles.adkins <miles.adkins@snowflake.com>
@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sfc-gh-madkins

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sfc-gh-madkins
Copy link
Collaborator Author

/ok-to-test

@sfc-gh-madkins
Copy link
Collaborator Author

@cburroughs let me know if this fixes your data lineage issue?

@sfc-gh-madkins
Copy link
Collaborator Author

sfc-gh-madkins commented Dec 30, 2022

@cburroughs if you look at the failed unit test, you will see that there are other offline store options that only require a table name as well, so my guess is this issue isnt snowflake specific.

The reason you dont need fully qualified paths is due to the fact that some people set their offline store config to be in the same qualified path as where the table lives, and so this just works.

Here is the failed test:

def test_infer_datasource_names_dwh():
    table = "project.table"
    dwh_classes = [BigQuerySource, RedshiftSource, SnowflakeSource, SparkSource]

    for dwh_class in dwh_classes:
      data_source = dwh_class(table=table)

This PR throws an error for Snowflake because no database name is provided. Looking at the different offline store sources in general, most ask for a query, but some ask for separate database, public, table, some ask for fully qualified path. Therefore there is not going to be good consistency for Datahub integration.

Let me know if this makes sense

@sfc-gh-madkins
Copy link
Collaborator Author

@cburroughs maybe one last comment here, it looks like the datahub integration relies on a .yaml file -- this is often not used in prod environments

@cburroughs
Copy link
Contributor

Thanks! I took a look at the PR. I think it's fine that the snowflake accepts input that isn't fully qualified (and it would be a big hassle to work around requiring it to be fully qualified). My original trip-up was that when I ask a Snowflake Source "hey what database are you using?" I expect to get the database it would use for a query whether that is explicit or inferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reconstitute (Snowflake) DataSources without "duplicate" database declarations
3 participants