Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues for queries returning larger arrays #310

Closed
stijnsoetaert opened this issue Apr 16, 2021 · 2 comments · Fixed by #325
Closed

Performance issues for queries returning larger arrays #310

stijnsoetaert opened this issue Apr 16, 2021 · 2 comments · Fixed by #325
Assignees
Labels
api: spanner Issues related to the googleapis/python-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@stijnsoetaert
Copy link

We recently upgraded the google-cloud-spanner library to the latest version (1.17.1 -> 3.3.0) as it includes a fix for a bug we were experiencing. Unfortunately, we are now experiencing some large performance issues when executing queries that return some rows with large arrays: one specific query went from 4 to 17 seconds. This results in longer calculation times in our pipelines.

We investigated the issue and found that the performance drop was a result of the _merged_values taking longer using the latest version of the library. This function parses the protobuf using _parse_value_pb and timing the parsing of the query mentioned above on both versions of the library gives the following result:

proto type # calls avg execution time v1.17.1 (s) avg execution time v3.3.0 (s)
BOOL 672 4.00 x 10^-6 3.87 x 10^-6
INT64 384 7.28 x 10^-6 7.14 x 10^-6
FLOAT64 960 5.11 x 10^-6 4.40 x 10^-6
TIMESTAMP 480 8.12 x 10^-5 8.70 x 10^-5
DATE 672 4.57 x 10^-5 4.21 x 10^-5
STRING 2112 4.46 x 10^-6 3.93 x 10^-6
ARRAY 1920 4.43 x 10^-3 3.96 x 10^-2

As you can see, parsing an array has become a lot slower which is an issue if it is called quite often. This seems similar to another issue in this repo but the performance fixes mentioned over there are already merged.

Environment details

  • OS type and version: macOs Big Sur (version 11.2.3)
  • Python version: Python 3.7.5
  • pip version: pip 21.0.1
  • google-cloud-spanner version: 3.3.0

Steps to reproduce

Execute a query that returns one or more array columns with the arrays containing a large number of elements (for example: 999)

@product-auto-label product-auto-label bot added the api: spanner Issues related to the googleapis/python-spanner API. label Apr 16, 2021
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Apr 19, 2021
@larkee larkee added priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Apr 22, 2021
@larkee
Copy link
Contributor

larkee commented Apr 26, 2021

Thank you for filing this issue!

I believe that the increased time is coming from the move to using proto-plus which involved additional type conversions. I thought that my previous fix had gotten around this by reverting back to the quicker logic using the underlying protobuf but I guess something is still slowing it down for arrays. I'll need some time to look into this further. I'll try to have an update by the end of this week 👍

@larkee
Copy link
Contributor

larkee commented Apr 28, 2021

Great news! I believe I have found the cause for the array performance issues and have written a PR. I will try to get it into this week's release 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the googleapis/python-spanner API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants