Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Handle passing of arrays to in statements more efficiently in SQLAlchemy 1.4 and higher #253

Merged
merged 24 commits into from Aug 25, 2021

Conversation

@jimfulton
Copy link
Contributor

@jimfulton jimfulton commented Aug 17, 2021

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #194 馃

@jimfulton jimfulton changed the title feat: Handle passing of arrays to in statements more efficient. feat: Handle passing of arrays to in statements more efficient in SQLAlchemy 1.4 and higher Aug 17, 2021
@jimfulton jimfulton changed the title feat: Handle passing of arrays to in statements more efficient in SQLAlchemy 1.4 and higher feat: Handle passing of arrays to in statements more efficiently in SQLAlchemy 1.4 and higher Aug 17, 2021
@jimfulton jimfulton marked this pull request as ready for review Aug 17, 2021
@jimfulton jimfulton requested a review from as a code owner Aug 17, 2021
@jimfulton
Copy link
Contributor Author

@jimfulton jimfulton commented Aug 17, 2021

For reviewers:

When dealing with an in expression, like sqlalchemy.literal(-1).in_(list(range(99999))) (see test_huge_in), SQLAlchemy wants to turn each element of the passed list into a separate parameter, which is inefficient and breaks for large lists.

BigQuery lets you pass array parameters, which is much better for this case.

See the comment here: https://github.com/googleapis/python-bigquery-sqlalchemy/pull/253/files#diff-efe1987a73ae5b54aa817dc8c70541025df89d9d08601c8ab33fa84ad8c09de7R421-R441

For SQLAlchemy 1.4, it's fairly easy to disable this behavior. (I beat my head against earlier versions and gave up. :)) The only trick is to make sure we call UNNEST on the parameter in SQL. We have to do that in the non-array case too, although it's more complicated.

Finally, in-related tests:

  • Have to expect different SQL for SQLAlchemy 1.4 and earlier versions.
  • For unit tests, because we're using sqlite, we don't get correct results, because sqlite doesn't support arrays, so we only check the SQL generated.

@jimfulton jimfulton requested a review from plamut Aug 17, 2021
def test_select_in_lit(faux_conn, last_query):
faux_conn.execute(sqlalchemy.select([sqlalchemy.literal(1).in_([1, 2, 3])]))
last_query(
"SELECT %(param_1:INT64)s IN UNNEST(%(param_2:INT64)s) AS `anon_1`",
Copy link
Collaborator

@tswast tswast Aug 24, 2021

Should the type of param_2 be ARRAY<INT64>?

Copy link
Contributor Author

@jimfulton jimfulton Aug 24, 2021

No, because the BQ DB API does special handling of arrays.

It sees that we have a scalar type of INT64 and that we have a sequence, and then creates a ArrayQueryParameter.

It happens that since I added struct support, passing ARRAY<INT64> would probably work (because I have to handle structs of arrays). But just usng INT64 works too.

tswast
tswast approved these changes Aug 25, 2021
@jimfulton jimfulton merged commit 7692704 into googleapis:master Aug 25, 2021
9 checks passed
@jimfulton jimfulton deleted the no-expand branch Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

2 participants