Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: Unbuffered cursors for large result sets #24365

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

ankush
Copy link
Member

@ankush ankush commented Jan 15, 2024

If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.

One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.

SSCursor (Server Side Cursor) allows fetching one row at a time.

Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
  1008    104.5 MiB    104.5 MiB           1    @profile
  1009                                          def test_run_memory_profile(self):
  1010    104.5 MiB      0.0 MiB           1            frappe.db.sql("select * from `tabGL Entry` limit 1")  # warmup
  1011    195.5 MiB      91 MiB       50001            for gl in frappe.db.sql("select * from `tabGL Entry` order by modified limit 50000", as_dict=True, as_iterator=True):
  1012    195.5 MiB      0.0 MiB       50000                    continue  # consume iterator
  1013
  1014    109.0 MiB    -86.5 MiB           1            pass # notice drop due to gc trigger

After:

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
  1013    104.5 MiB    104.5 MiB           1    @profile
  1014                                          def test_reads(self):
  1015    105.0 MiB      0.0 MiB           2            with frappe.db.unbuffered_cursor():
  1016    104.5 MiB      0.0 MiB           1                    frappe.db.sql("select * from `tabGL Entry` limit 1")  # warmup
  1017    105.0 MiB      0.5 MiB       50002                    for gl in frappe.db.sql(
  1018    104.5 MiB      0.0 MiB           1                            "select * from `tabGL Entry` order by modified limit 50000", as_dict=True, as_iterator=True
  1019                                                          ):
  1020    105.0 MiB      0.0 MiB       50000                            continue  # just consume the iterator
  1021    105.0 MiB      0.0 MiB           1                    pass

Extends #19810
Closes #18826

@ankush ankush marked this pull request as ready for review January 15, 2024 14:58
@ankush ankush requested review from a team and surajshetty3416 and removed request for a team January 15, 2024 14:58
Copy link

codecov bot commented Jan 15, 2024

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (03b6d8a) 62.12% compared to head (ab96b60) 62.02%.
Report is 10 commits behind head on develop.

❗ Current head ab96b60 differs from pull request most recent head ff88fa0. Consider uploading reports for the commit ff88fa0 to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #24365      +/-   ##
===========================================
- Coverage    62.12%   62.02%   -0.11%     
===========================================
  Files          786      786              
  Lines        74999    75139     +140     
  Branches      6422     6422              
===========================================
+ Hits         46596    46607      +11     
- Misses       24743    24872     +129     
  Partials      3660     3660              
Flag Coverage Δ
server 70.93% <87.50%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.

One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.

SSCursor (Server Side Cursor) allows fetching one row at a time.

Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.
@ankush ankush merged commit a2525e5 into frappe:develop Jan 16, 2024
20 checks passed
@ankush ankush deleted the unbuffered_queries branch January 16, 2024 05:30
ankush added a commit that referenced this pull request Jan 16, 2024
If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.

One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.

SSCursor (Server Side Cursor) allows fetching one row at a time.

Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.
ankush added a commit that referenced this pull request Jan 16, 2024
* feat: `frappe.db.sql` results as iterator

- Also avoid self.last_result that holds on to large result set reference.

(cherry picked from commit 588157d)

# Conflicts:
#	frappe/database/database.py

* perf: avoid duplicate copies of result set

When as_list, as_dict is done we hold on to original result set until
next query is performed. This can be HUGE for large queries.

(cherry picked from commit d5b2706)

* test: add perf test for references

(cherry picked from commit 03b6d8a)

* chore: conflict

* perf: Unbuffered cursors for large result sets (#24365)

If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.

One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.

SSCursor (Server Side Cursor) allows fetching one row at a time.

Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.

---------

Co-authored-by: Ankush Menat <ankush@frappe.io>
frappe-pr-bot pushed a commit that referenced this pull request Jan 16, 2024
# [15.10.0](v15.9.0...v15.10.0) (2024-01-16)

### Bug Fixes

* add a check for `gpg` existing ([f0d65f1](f0d65f1))
* add empty space for notification mark read ([#24276](#24276)) ([e566f51](e566f51))
* check if autoname is promt before setting __newname ([9f08ab2](9f08ab2))
* collapse sidebar on picking workspace ([#24312](#24312)) ([#24314](#24314)) ([b3ef407](b3ef407))
* convert status field data to String before guessing the style ([#24226](#24226)) ([#24289](#24289)) ([1f5fb04](1f5fb04))
* don't add fallback for child table ([#24105](#24105)) ([1de3db8](1de3db8))
* Error when displaying dashboard with number card using average and sum functions ([#23883](#23883)) ([#24287](#24287)) ([5cc2281](5cc2281))
* Handle edge case while searching in current context ([460e1c2](460e1c2))
* include workspaces without domain restriction ([2f21a76](2f21a76))
* Make as_iterator work when there are no child queries ([55a26bf](55a26bf))
* **minor:** add optional chaining for this.$input ([#24340](#24340)) ([1302f08](1302f08))
* **minor:** check if markdown_preview exists ([#24336](#24336)) ([b512ad9](b512ad9))
* **minor:** increase rate limit for web form ([#24295](#24295)) ([#24297](#24297)) ([f1c139d](f1c139d))
* **minor:** return if no steps are defined. ([#24338](#24338)) ([373b0d4](373b0d4))
* misc ([#24303](#24303)) ([#24305](#24305)) ([3d515f2](3d515f2))
* mobile sidebar disappearing ([#24316](#24316)) ([#24342](#24342)) ([b21671b](b21671b))
* **mobile-ui:** tabs should scroll instead of stack ([#24309](#24309)) ([#24311](#24311)) ([fccf204](fccf204))
* **MultiCheck:** Use df.sort_options to enable/disable sort ([#24202](#24202)) ([#24291](#24291)) ([2a87904](2a87904))
* pass parent doctype on dashboard chart ([#24236](#24236)) ([#24238](#24238)) ([5a506dd](5a506dd))
* print perm check logs from DB query (backport [#24263](#24263)) ([#24268](#24268)) ([74eaaa5](74eaaa5))
* **response:** fixup non-ASCII character filenames ([9c6a58e](9c6a58e))
* sanitize html instead of escaping when creating/updating workspace ([#24284](#24284)) ([0be6579](0be6579))
* select field should not have debounce ([dc076e1](dc076e1))
* **sentry:** set scope for background jobs ([ed21f11](ed21f11))
* set correct recipient when reply to own email ([#24256](#24256)) ([#24260](#24260)) ([0b5923f](0b5923f))
* translate show all activity label ([#24363](#24363)) ([#24364](#24364)) ([4d2c3e5](4d2c3e5))
* **UX:** show status indicator in moblie view ([#24306](#24306)) ([#24308](#24308)) ([5940ce5](5940ce5))

### Features

* `frappe.db.sql` results `as_iterator` (backport [#19810](#19810)) ([#24346](#24346)) ([99a3a35](99a3a35)), closes [#24365](#24365)
* Skip locked rows while selecting ([#24298](#24298)) ([#24302](#24302)) ([09ef3d6](09ef3d6))
mergify bot added a commit that referenced this pull request Jan 28, 2024
* feat: `frappe.db.sql` results as iterator

- Also avoid self.last_result that holds on to large result set reference.

(cherry picked from commit 588157d)

# Conflicts:
#	frappe/database/database.py

* perf: avoid duplicate copies of result set

When as_list, as_dict is done we hold on to original result set until
next query is performed. This can be HUGE for large queries.

(cherry picked from commit d5b2706)

* test: add perf test for references

(cherry picked from commit 03b6d8a)

* chore: conflict

* perf: Unbuffered cursors for large result sets (#24365)

If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.

One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.

SSCursor (Server Side Cursor) allows fetching one row at a time.

Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.

---------

Co-authored-by: Ankush Menat <ankush@frappe.io>
(cherry picked from commit 99a3a35)

# Conflicts:
#	frappe/database/database.py
#	frappe/database/mariadb/database.py
#	pyproject.toml
ankush added a commit that referenced this pull request Jan 29, 2024
#24346) (#24562)

* feat: `frappe.db.sql` results `as_iterator` (backport #19810) (#24346)

* feat: `frappe.db.sql` results as iterator

- Also avoid self.last_result that holds on to large result set reference.

(cherry picked from commit 588157d)

# Conflicts:
#	frappe/database/database.py

* perf: avoid duplicate copies of result set

When as_list, as_dict is done we hold on to original result set until
next query is performed. This can be HUGE for large queries.

(cherry picked from commit d5b2706)

* test: add perf test for references

(cherry picked from commit 03b6d8a)

* chore: conflict

* perf: Unbuffered cursors for large result sets (#24365)

If you're reading 1000s of rows from MySQL, the default behaviour is to
read all of them in memory at once.

One of the use case for reading large rows is reporting where a lot of
data is read and then processed in Python. The read row is hoever not
used again but still consumes memory until entire function exits.

SSCursor (Server Side Cursor) allows fetching one row at a time.

Note: This is slower than fetching everything at once AND has risk of
connection loss. So, don't use this as a crutch. If possible rewrite
code so processing is done in SQL.

---------

Co-authored-by: Ankush Menat <ankush@frappe.io>
(cherry picked from commit 99a3a35)

# Conflicts:
#	frappe/database/database.py
#	frappe/database/mariadb/database.py
#	pyproject.toml

* chore: conflicts

* chore: remove test for dead functionality

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Ankush Menat <ankush@frappe.io>
frappe-pr-bot pushed a commit that referenced this pull request Jan 30, 2024
# [14.64.0](v14.63.0...v14.64.0) (2024-01-30)

### Bug Fixes

* **Custom Field:** default fieldname in rename fieldname prompt ([#24492](#24492)) ([#24580](#24580)) ([7cdda1e](7cdda1e))
* **grid_row:** sort options based on selected data first, so as to maintain order ([b0e4b19](b0e4b19))
* ignore dead columns in user_settings ([#24572](#24572)) ([#24573](#24573)) ([5d2441d](5d2441d))
* improve translatability of search results ([#24498](#24498)) ([a74ba6c](a74ba6c))
* Missing traduction in the query popup ([051d622](051d622))
* **mobile:** scroll issue after workspace change ([#24555](#24555)) ([#24585](#24585)) ([7245292](7245292))
* Return empty result if no perm level access (backport [#24591](#24591)) ([#24592](#24592)) ([adcbeee](adcbeee))
* **search:** Fix URL encoding for search result ([#24558](#24558)) ([44ec1e3](44ec1e3))
* sentry minor fix ([#24588](#24588)) ([23f77ef](23f77ef))
* translatability ([#24553](#24553)) ([41d2fe2](41d2fe2))

### Features

* `frappe.db.sql` results `as_iterator` (backport [#19810](#19810)) (backport [#24346](#24346)) ([#24562](#24562)) ([7f3a12b](7f3a12b)), closes [#24365](#24365)

### Reverts

* Revert "fix(data_import): respect the value of show_failed_logs checkbox" ([3c7f494](3c7f494))
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Iterator support for db.sql
1 participant