fix: paginate reading log query when no shelf is selected#12499
fix: paginate reading log query when no shelf is selected#12499AhmedxSaid wants to merge 1 commit intointernetarchive:masterfrom
Conversation
When bookshelf_id is falsy (all-books view), the query fetched every book for the user with no LIMIT or OFFSET, causing unbounded memory and slow Solr get_many calls for users with large reading logs. Also fixes total_results returning 0 in this case — it now sums across all shelves instead of looking up a None key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thank you for submitting this PR, @AhmedxSaid! 🤖 Copilot has been assigned for an initial review. A reviewer must first be assigned. There are currently 74 open PRs of equal or higher priority ahead of yours. Possible improvements for this PR
PR triage checklist (maintainers / Pam)
Note This comment was automatically generated by Pam, Open Library's Project AI Manager, on behalf of @mekarpeles. Pam is designed to provide status visibility, perform basic project management functions and relevant codebase research, and provide actionable feedback so contributors aren't left waiting. |
There was a problem hiding this comment.
Pull request overview
Fixes performance and pagination correctness for the “All” (no-shelf-selected) reading log view by ensuring the DB query remains paginated and totals reflect all shelves.
Changes:
- Add
ORDER BY created … LIMIT … OFFSET …to the all-shelvesbookshelves_booksquery to prevent unbounded reads. - Update
total_resultslogic to compute totals across shelves when no specific shelf is selected.
| cls.add_storage_items_for_deletes(reading_log_keys, solr_docs) | ||
|
|
||
| total_results = shelf_totals.get(bookshelf_id, 0) | ||
| total_results = shelf_totals.get(bookshelf_id) or sum(shelf_totals.values()) |
There was a problem hiding this comment.
total_results = shelf_totals.get(bookshelf_id) or sum(...) will return the sum of all shelves whenever the selected shelf count is 0 or missing from shelf_totals (e.g., a user has no books on that shelf). That makes pagination totals incorrect for empty shelves. Consider branching on bookshelf_id (all-shelves vs specific shelf) and using a default of 0 for the specific-shelf case instead of relying on truthiness.
| total_results = shelf_totals.get(bookshelf_id) or sum(shelf_totals.values()) | |
| if not bookshelf_id: | |
| total_results = sum(shelf_totals.values()) | |
| else: | |
| total_results = shelf_totals.get(bookshelf_id, 0) |
| # unrelated / not fixing in this PR. | ||
| query_params = {"username": username} | ||
| query = ( | ||
| "SELECT * from bookshelves_books WHERE username=$username " |
There was a problem hiding this comment.
In the all-shelves branch this query uses SELECT *, but downstream only relies on work_id, created, and edition_id. Selecting only the needed columns would reduce DB I/O and memory use, and keeps this query consistent with the shelf-specific query above.
| "SELECT * from bookshelves_books WHERE username=$username " | |
| "SELECT work_id, created, edition_id from bookshelves_books WHERE username=$username " |
Problem
When a user views their reading log without selecting a specific shelf (the "All" view),
get_sorted_reading_log_books()inbookshelves.pyoverwrites the query with one that has noLIMITorOFFSET:This fetches every book the user has ever logged into memory at once, then passes all those keys to a Solr
get_manycall. For power users with thousands of books this causes significant memory pressure and slow/failed page loads — and gets worse as their library grows.The
total_resultscalculation also had a related bug:shelf_totals.get(bookshelf_id, 0)returns0whenbookshelf_idis falsy, so pagination controls showed incorrect totals.Fix
LIMIT $limit OFFSET $offsetto the unbounded query, preserving the existinglimit/offsetfromquery_paramstotal_resultsto sum across all shelves when no specific shelf is selectedTest plan
🤖 Generated with Claude Code