-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Unbuffered cursors for large result sets #24365
Conversation
41bce62
to
ddde583
Compare
ddde583
to
4c10f81
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #24365 +/- ##
===========================================
- Coverage 62.12% 62.02% -0.11%
===========================================
Files 786 786
Lines 74999 75139 +140
Branches 6422 6422
===========================================
+ Hits 46596 46607 +11
- Misses 24743 24872 +129
Partials 3660 3660
Flags with carried forward coverage won't be shown. Click here to find out more. |
ab96b60
to
f9902c4
Compare
If you're reading 1000s of rows from MySQL, the default behaviour is to read all of them in memory at once. One of the use case for reading large rows is reporting where a lot of data is read and then processed in Python. The read row is hoever not used again but still consumes memory until entire function exits. SSCursor (Server Side Cursor) allows fetching one row at a time. Note: This is slower than fetching everything at once AND has risk of connection loss. So, don't use this as a crutch. If possible rewrite code so processing is done in SQL.
f9902c4
to
ff88fa0
Compare
If you're reading 1000s of rows from MySQL, the default behaviour is to read all of them in memory at once. One of the use case for reading large rows is reporting where a lot of data is read and then processed in Python. The read row is hoever not used again but still consumes memory until entire function exits. SSCursor (Server Side Cursor) allows fetching one row at a time. Note: This is slower than fetching everything at once AND has risk of connection loss. So, don't use this as a crutch. If possible rewrite code so processing is done in SQL.
* feat: `frappe.db.sql` results as iterator - Also avoid self.last_result that holds on to large result set reference. (cherry picked from commit 588157d) # Conflicts: # frappe/database/database.py * perf: avoid duplicate copies of result set When as_list, as_dict is done we hold on to original result set until next query is performed. This can be HUGE for large queries. (cherry picked from commit d5b2706) * test: add perf test for references (cherry picked from commit 03b6d8a) * chore: conflict * perf: Unbuffered cursors for large result sets (#24365) If you're reading 1000s of rows from MySQL, the default behaviour is to read all of them in memory at once. One of the use case for reading large rows is reporting where a lot of data is read and then processed in Python. The read row is hoever not used again but still consumes memory until entire function exits. SSCursor (Server Side Cursor) allows fetching one row at a time. Note: This is slower than fetching everything at once AND has risk of connection loss. So, don't use this as a crutch. If possible rewrite code so processing is done in SQL. --------- Co-authored-by: Ankush Menat <ankush@frappe.io>
# [15.10.0](v15.9.0...v15.10.0) (2024-01-16) ### Bug Fixes * add a check for `gpg` existing ([f0d65f1](f0d65f1)) * add empty space for notification mark read ([#24276](#24276)) ([e566f51](e566f51)) * check if autoname is promt before setting __newname ([9f08ab2](9f08ab2)) * collapse sidebar on picking workspace ([#24312](#24312)) ([#24314](#24314)) ([b3ef407](b3ef407)) * convert status field data to String before guessing the style ([#24226](#24226)) ([#24289](#24289)) ([1f5fb04](1f5fb04)) * don't add fallback for child table ([#24105](#24105)) ([1de3db8](1de3db8)) * Error when displaying dashboard with number card using average and sum functions ([#23883](#23883)) ([#24287](#24287)) ([5cc2281](5cc2281)) * Handle edge case while searching in current context ([460e1c2](460e1c2)) * include workspaces without domain restriction ([2f21a76](2f21a76)) * Make as_iterator work when there are no child queries ([55a26bf](55a26bf)) * **minor:** add optional chaining for this.$input ([#24340](#24340)) ([1302f08](1302f08)) * **minor:** check if markdown_preview exists ([#24336](#24336)) ([b512ad9](b512ad9)) * **minor:** increase rate limit for web form ([#24295](#24295)) ([#24297](#24297)) ([f1c139d](f1c139d)) * **minor:** return if no steps are defined. ([#24338](#24338)) ([373b0d4](373b0d4)) * misc ([#24303](#24303)) ([#24305](#24305)) ([3d515f2](3d515f2)) * mobile sidebar disappearing ([#24316](#24316)) ([#24342](#24342)) ([b21671b](b21671b)) * **mobile-ui:** tabs should scroll instead of stack ([#24309](#24309)) ([#24311](#24311)) ([fccf204](fccf204)) * **MultiCheck:** Use df.sort_options to enable/disable sort ([#24202](#24202)) ([#24291](#24291)) ([2a87904](2a87904)) * pass parent doctype on dashboard chart ([#24236](#24236)) ([#24238](#24238)) ([5a506dd](5a506dd)) * print perm check logs from DB query (backport [#24263](#24263)) ([#24268](#24268)) ([74eaaa5](74eaaa5)) * **response:** fixup non-ASCII character filenames ([9c6a58e](9c6a58e)) * sanitize html instead of escaping when creating/updating workspace ([#24284](#24284)) ([0be6579](0be6579)) * select field should not have debounce ([dc076e1](dc076e1)) * **sentry:** set scope for background jobs ([ed21f11](ed21f11)) * set correct recipient when reply to own email ([#24256](#24256)) ([#24260](#24260)) ([0b5923f](0b5923f)) * translate show all activity label ([#24363](#24363)) ([#24364](#24364)) ([4d2c3e5](4d2c3e5)) * **UX:** show status indicator in moblie view ([#24306](#24306)) ([#24308](#24308)) ([5940ce5](5940ce5)) ### Features * `frappe.db.sql` results `as_iterator` (backport [#19810](#19810)) ([#24346](#24346)) ([99a3a35](99a3a35)), closes [#24365](#24365) * Skip locked rows while selecting ([#24298](#24298)) ([#24302](#24302)) ([09ef3d6](09ef3d6))
* feat: `frappe.db.sql` results as iterator - Also avoid self.last_result that holds on to large result set reference. (cherry picked from commit 588157d) # Conflicts: # frappe/database/database.py * perf: avoid duplicate copies of result set When as_list, as_dict is done we hold on to original result set until next query is performed. This can be HUGE for large queries. (cherry picked from commit d5b2706) * test: add perf test for references (cherry picked from commit 03b6d8a) * chore: conflict * perf: Unbuffered cursors for large result sets (#24365) If you're reading 1000s of rows from MySQL, the default behaviour is to read all of them in memory at once. One of the use case for reading large rows is reporting where a lot of data is read and then processed in Python. The read row is hoever not used again but still consumes memory until entire function exits. SSCursor (Server Side Cursor) allows fetching one row at a time. Note: This is slower than fetching everything at once AND has risk of connection loss. So, don't use this as a crutch. If possible rewrite code so processing is done in SQL. --------- Co-authored-by: Ankush Menat <ankush@frappe.io> (cherry picked from commit 99a3a35) # Conflicts: # frappe/database/database.py # frappe/database/mariadb/database.py # pyproject.toml
#24346) (#24562) * feat: `frappe.db.sql` results `as_iterator` (backport #19810) (#24346) * feat: `frappe.db.sql` results as iterator - Also avoid self.last_result that holds on to large result set reference. (cherry picked from commit 588157d) # Conflicts: # frappe/database/database.py * perf: avoid duplicate copies of result set When as_list, as_dict is done we hold on to original result set until next query is performed. This can be HUGE for large queries. (cherry picked from commit d5b2706) * test: add perf test for references (cherry picked from commit 03b6d8a) * chore: conflict * perf: Unbuffered cursors for large result sets (#24365) If you're reading 1000s of rows from MySQL, the default behaviour is to read all of them in memory at once. One of the use case for reading large rows is reporting where a lot of data is read and then processed in Python. The read row is hoever not used again but still consumes memory until entire function exits. SSCursor (Server Side Cursor) allows fetching one row at a time. Note: This is slower than fetching everything at once AND has risk of connection loss. So, don't use this as a crutch. If possible rewrite code so processing is done in SQL. --------- Co-authored-by: Ankush Menat <ankush@frappe.io> (cherry picked from commit 99a3a35) # Conflicts: # frappe/database/database.py # frappe/database/mariadb/database.py # pyproject.toml * chore: conflicts * chore: remove test for dead functionality --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: Ankush Menat <ankush@frappe.io>
# [14.64.0](v14.63.0...v14.64.0) (2024-01-30) ### Bug Fixes * **Custom Field:** default fieldname in rename fieldname prompt ([#24492](#24492)) ([#24580](#24580)) ([7cdda1e](7cdda1e)) * **grid_row:** sort options based on selected data first, so as to maintain order ([b0e4b19](b0e4b19)) * ignore dead columns in user_settings ([#24572](#24572)) ([#24573](#24573)) ([5d2441d](5d2441d)) * improve translatability of search results ([#24498](#24498)) ([a74ba6c](a74ba6c)) * Missing traduction in the query popup ([051d622](051d622)) * **mobile:** scroll issue after workspace change ([#24555](#24555)) ([#24585](#24585)) ([7245292](7245292)) * Return empty result if no perm level access (backport [#24591](#24591)) ([#24592](#24592)) ([adcbeee](adcbeee)) * **search:** Fix URL encoding for search result ([#24558](#24558)) ([44ec1e3](44ec1e3)) * sentry minor fix ([#24588](#24588)) ([23f77ef](23f77ef)) * translatability ([#24553](#24553)) ([41d2fe2](41d2fe2)) ### Features * `frappe.db.sql` results `as_iterator` (backport [#19810](#19810)) (backport [#24346](#24346)) ([#24562](#24562)) ([7f3a12b](7f3a12b)), closes [#24365](#24365) ### Reverts * Revert "fix(data_import): respect the value of show_failed_logs checkbox" ([3c7f494](3c7f494))
If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.
One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.
SSCursor (Server Side Cursor) allows fetching one row at a time.
Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.
After:
Extends #19810
Closes #18826