Skip to content

fix: paginate reading log query when no shelf is selected#12499

Open
AhmedxSaid wants to merge 1 commit intointernetarchive:masterfrom
AhmedxSaid:fix/reading-log-unbounded-query
Open

fix: paginate reading log query when no shelf is selected#12499
AhmedxSaid wants to merge 1 commit intointernetarchive:masterfrom
AhmedxSaid:fix/reading-log-unbounded-query

Conversation

@AhmedxSaid
Copy link
Copy Markdown
Contributor

Problem

When a user views their reading log without selecting a specific shelf (the "All" view), get_sorted_reading_log_books() in bookshelves.py overwrites the query with one that has no LIMIT or OFFSET:

if not bookshelf_id:
    query = "SELECT * from bookshelves_books WHERE username=$username"
    # XXX Removing limit, offset, etc from data looks like a bug
    # unrelated / not fixing in this PR.
    query_params = {"username": username}

This fetches every book the user has ever logged into memory at once, then passes all those keys to a Solr get_many call. For power users with thousands of books this causes significant memory pressure and slow/failed page loads — and gets worse as their library grows.

The total_results calculation also had a related bug: shelf_totals.get(bookshelf_id, 0) returns 0 when bookshelf_id is falsy, so pagination controls showed incorrect totals.

Fix

  • Add LIMIT $limit OFFSET $offset to the unbounded query, preserving the existing limit/offset from query_params
  • Fix total_results to sum across all shelves when no specific shelf is selected

Test plan

  • View reading log "All" tab as a user with many books — verify only one page loads
  • Paginate through the reading log — verify correct pages are returned
  • View a specific shelf — verify existing behavior unchanged
  • Verify total book count shown in pagination is correct for the "All" view

🤖 Generated with Claude Code

When bookshelf_id is falsy (all-books view), the query fetched every
book for the user with no LIMIT or OFFSET, causing unbounded memory
and slow Solr get_many calls for users with large reading logs.

Also fixes total_results returning 0 in this case — it now sums
across all shelves instead of looking up a None key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@mekarpeles mekarpeles requested a review from Copilot April 30, 2026 17:11
@mekarpeles mekarpeles added the Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed] label Apr 30, 2026
@mekarpeles
Copy link
Copy Markdown
Member

Thank you for submitting this PR, @AhmedxSaid!

🤖 Copilot has been assigned for an initial review.

A reviewer must first be assigned. There are currently 74 open PRs of equal or higher priority ahead of yours.

Possible improvements for this PR

  • No issue reference — for non-trivial logic changes, please link the related GitHub issue (e.g. Closes #1234) so reviewers have the full context on why the change is needed.
  • Test cases — the change touches substantive logic in bookshelves.py but no test files are included. Consider adding or updating automated tests to cover the pagination and total count behavior.
  • Proof of testing — the test plan checkboxes are all unchecked. Please fill them in with your actual results, or add a brief note describing what you ran and what you observed.
PR triage checklist (maintainers / Pam)
  • PR description — not empty; explains what the change does and how to verify it
  • References an issue — PR body contains a #NNN reference
    • Linked issue is triaged — has a Priority: * label (not just Needs: Triage)
    • Linked issue is assigned — has at least one assignee
  • Commit history clean — no WIP/fixup/conflict noise; commit messages are meaningful
  • CI passing — no failing check-runs
  • Test cases present — if the change touches substantive logic, test coverage exists or is explained
  • Proof of testing — PR body includes a description of what was tested, a screenshot, or a video

Note

This comment was automatically generated by Pam, Open Library's Project AI Manager, on behalf of @mekarpeles. Pam is designed to provide status visibility, perform basic project management functions and relevant codebase research, and provide actionable feedback so contributors aren't left waiting.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes performance and pagination correctness for the “All” (no-shelf-selected) reading log view by ensuring the DB query remains paginated and totals reflect all shelves.

Changes:

  • Add ORDER BY created … LIMIT … OFFSET … to the all-shelves bookshelves_books query to prevent unbounded reads.
  • Update total_results logic to compute totals across shelves when no specific shelf is selected.

cls.add_storage_items_for_deletes(reading_log_keys, solr_docs)

total_results = shelf_totals.get(bookshelf_id, 0)
total_results = shelf_totals.get(bookshelf_id) or sum(shelf_totals.values())
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total_results = shelf_totals.get(bookshelf_id) or sum(...) will return the sum of all shelves whenever the selected shelf count is 0 or missing from shelf_totals (e.g., a user has no books on that shelf). That makes pagination totals incorrect for empty shelves. Consider branching on bookshelf_id (all-shelves vs specific shelf) and using a default of 0 for the specific-shelf case instead of relying on truthiness.

Suggested change
total_results = shelf_totals.get(bookshelf_id) or sum(shelf_totals.values())
if not bookshelf_id:
total_results = sum(shelf_totals.values())
else:
total_results = shelf_totals.get(bookshelf_id, 0)

Copilot uses AI. Check for mistakes.
# unrelated / not fixing in this PR.
query_params = {"username": username}
query = (
"SELECT * from bookshelves_books WHERE username=$username "
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the all-shelves branch this query uses SELECT *, but downstream only relies on work_id, created, and edition_id. Selecting only the needed columns would reduce DB I/O and memory use, and keeps this query consistent with the shelf-specific query above.

Suggested change
"SELECT * from bookshelves_books WHERE username=$username "
"SELECT work_id, created, edition_id from bookshelves_books WHERE username=$username "

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Submitter Input Waiting on input from the creator of the issue/pr [managed]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants