Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake import backend #584

Merged
merged 52 commits into from
Jan 20, 2023
Merged

Snowflake import backend #584

merged 52 commits into from
Jan 20, 2023

Conversation

naterush
Copy link
Contributor

@naterush naterush commented Dec 21, 2022

Description

  • Adds the validate_snowflake_credentials and get_available_snowflake_options_and_defaults api calls. Adds the snowflake_import step performer.
  • Adds the Snowflake-connector-python package to our test dependencies. It is only available on or after Python 3.7, so we take special care to not break out Python 3.6 tests.
  • Creates new GitHub secrets for connecting to our Snowflake testing instance, updates our Github workflows to use those secrets in our testing.

Testing

This is supposed to just be a backend PR, so see tests.

Last week I invited you to join our testing snowflake environment. You should be able to add those credentials to a .env file and then use those credentials for testing. Otherwise, give me a call and I can give you the Pytest credentials.

Documentation

Not yet.

@vercel
Copy link

vercel bot commented Dec 21, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
monorepo ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Jan 20, 2023 at 7:14PM (UTC)

@aarondr77
Copy link
Member

Note: For the purposes of this PR I set up a testing Snowflake instance, it has only the default getting started data in it. I will delete that instance after this PR and we can set up a new instance for anything else we want to do because I accidentally committed my credentials.

@aarondr77
Copy link
Member

WE NEED TO REMOVE [ActionEnum.SNOWFLAKEIMPORT]: before merging. Will do as last step after you review

@aarondr77
Copy link
Member

Two callouts for you to pay special attention to:

  1. One of the snowflake tables that we're using in testing returns a dataframe with int8 columns, which is not the default int64 columns that you get when creating a new dataframe. I'm not sure how this interacts with the rest of our type system. I think we should further test that in a second PR and fix it if required. But no sense in complicating this PR. See line 255 of test_snowflake_import.py

  2. See the comment at the top of types.py about conditionally creating the SnowflakeConnection mypy. I'm not sure how else to create this type, but feel free to give it a shot if you want to.

@naterush
Copy link
Contributor Author

I think it should interact with our type system mostly fine!

Copy link
Contributor Author

@naterush naterush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things to cleanup, I'll hop on em

.github/workflows/test-mitosheet-python.yml Show resolved Hide resolved
Comment on lines 105 to 110
table_loc_and_warehouse: SnowflakeTableLocationAndWarehouse = get_param(params, 'table_loc_and_warehouse')
connection_params_dict = get_connection_param_dict(get_param(params, 'credentials'), table_loc_and_warehouse)

query_params: SnowflakeQueryParams = get_param(params, 'query_params')
table = table_loc_and_warehouse['table']
sql_query = create_query(table, query_params)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one final refactor here. You should:

  1. Return these variables (namely, connection_params_dict and sql_query) in execution data above.
  2. Get it form the execution data in this function.

This means you have to redo zero work -- which again reduces the chances we'll make silly bugs and forget to change things.

Comment on lines 127 to 140
username = credentials['username']
password = credentials['password']
account = credentials['account']
warehouse = table_loc_and_warehouse['warehouse']
database = table_loc_and_warehouse['database']
schema = table_loc_and_warehouse['schema']

all_params: Dict[str, str] = {}
all_params['user'] = username
all_params['password'] = password
all_params['account'] = account
all_params['warehouse'] = warehouse
all_params['database'] = database
all_params['schema'] = schema
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be cut in half.


# Remove the test file
os.remove(TEST_FILE_PATHS[0])
os.remove(TEST_FILE_PATHS[1])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add an optimization test as well!

const [availableSnowflakeOptionsAndDefaults, setAvailableSnowflakeOptionsAndDefaults] = useState<AvailableSnowflakeOptionsAndDefaults | undefined>(undefined);
const [refreshDefaultsNumber, setRefreshDefaultsNumber] = useState(0)

const refreshAvailableOptionsAndDefaults = (newParams: SnowflakeImportParams): void => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, I think, should just a setParams call. And we should change the names of setParams above. Will cleanup this code a bit.

return await props.mitoAPI.validateSnowflakeCredentials(params.credentials)
}

const getNewParams = (database?: string | null, schema?: string | null, table?: string | null) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need not be defined inline

setParams((prevParams) => {
return {
...prevParams,
credentials: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we should use the partial updater function. I'll update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I found a bug in the setting of the new warehouse while updating to this -- a wrong key was being used. The partial updater function is actually more type safe than standard react!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also -- just a reminder that const paramsCopy: SnowflakeImportParams = {...params} is not an actual copy, if you have nested objects (which we certainly do here).

@naterush
Copy link
Contributor Author

I'm doing some refactoring of the frontend code. I'd love to talk through it at some point. There's a bunch of opportunity for learning, I think, namely around how to -factor- functions. What should be included and what shoudn't.

E.g. setCredentialsSectionIsOpen function should be called within the validateSnowflakeCredentials section, etc.

@naterush
Copy link
Contributor Author

@aarondr77 ok, a few things to call out:

  1. I moved to using window.structuredClone to create copies. Check that this works in a couple places -- but notably it is able to handle undefined values! It requires running the setup scripts again, as it upgrade packages.
  2. Check out the frontend refactors. I know this isn't a frontend PR, but there were some very low hanging fruit that a) cut a bunch of code, b) caught some bugs, and c) generally leads to a much more followable frontend.

In practice, I was able to remove the effect triggered by the update number entirely - and move to just making a function call. This makes things much simpler, and easier to follow - and better function names help a lot as well.

Let me know if you want to chat about it at all. If this looks good, let's a) remove the action, and b) merge this guy!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants