Skip to content

Adds ability to search/filter one's reading log #7052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

scottbarnes
Copy link
Collaborator

@scottbarnes scottbarnes commented Oct 4, 2022

Closes #5080. This may need significantly more work, however.

Feature. Adds basic capacity to filter/search one's reading log.

Technical

What this does, in theory

This draft PR likely needs some work, but before getting the problems, first comes an explanation of how this theoretically works:

  • From the Reading Log one can enter a 3+ character search term and search (1) up to 1000 of their logged books to see (2) up to 100 matching results within that 1000.
  • Currently, searching is not limited to title or author, but rather the defaults from do_search. This would be easy enough to change I think by switching to run_solr_query (unless it's possible to specify fields in do_search.

Issues

  • Because I could not figure out a good way to filter the results, there is the possibility for a fairly hefty pair of queries to occur. When a user enters a query of 3+ characters, this:
    1. requests up to 1000 of the user's logged books from the current shelf (e.g. "Want to Read") via get_users_logged_books in core/bookshelves.py;
    2. searches Solr for those work IDs and any matches for the search term (currently not limited to author/title); and
    3. returns only logged books that match the work IDs in the Solr search results -- meaning that only logged works that match the search term are returned and passed on to process_logged_books in upstream/mybooks.py, etc.
  • This requires new page loads for each search.
  • I have not tested this with more than 29 books in one shelf.
  • To avoid a circular imports issue in bookshelves.py, do_search is conditionally imported. This likely will need a better solution, though I didn't put any time into it on the theory that maybe this implementation has no future.
  • There is no pagination. The number of results on a page is limited to the number of filtered results, which is up to 100. Owing to the need to do two queries and 'merge' the results, I couldn't figure out a way to use query offsets effectively (which isn't to say there isn't an easy and obvious way). Additionally, I was thinking that if this is limited to a certain number of characters and/or title/author, not many searches would go near 100 results. I am unsure of the performance implications of this, or the reality of my supposition.

Testing

  • Visit the Reading Log and try filtering via the search box for "Currently Reading", "Want to Read", and "Already Read".
  • Verify that the results appear correct and that sorting by date added works.
  • Try searching for <= 2 characters and see if that does anything, both through the search box and via a GET request in the URL bar. With the direct GET, the search should just be ignored.

Screenshot

Already Read, without any filtering:
image

Searching for 'twain':
image

Same search, but with the sort order showing oldest-added first:
image

Searching a shelf of 29 books for the phrase "web":
image

A not very good screenshot of trying to enter < 3 characters into the search:
IMG_1516

Doing a GET request for a search of <= 2 characters just returns everything on the shelf, with the usual pagination and no filtering. (Though there is no indication that the search didn't work, aside from not mentioning results):
image

Thanks to

@cdrini for walking me through a lot of it, and @mekarpeles for the detailed explanation of the steps.

Stakeholders

@mekarpeles, @cdrini

@cdrini cdrini self-assigned this Oct 4, 2022
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach looks solid to me :P We'll need to provide all the data from the db to the endpoints anyways. The only potential downside is that there are now two solr requests, but we can speed that up by limiting the info we get back here. And honestly if that means having to update half the codebase so that it expects solr docs, then your approach is definitely the better solution! This looks good to me! Just need to do some testing :)

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from 5bcc6ec to e0d85e4 Compare October 5, 2022 18:18
@scottbarnes
Copy link
Collaborator Author

scottbarnes commented Oct 5, 2022

Thank you for all the suggested changes, @cdrini. I have updated the PR to include them. Additionally I've addressed, I believe, the bug where filtering the reading log was not working when a user was not logged in. (I had simply failed to pass q to that code path. Go figure!)

I will address pagination in a separate PR.

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from e0d85e4 to 1395bc3 Compare October 5, 2022 18:28
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looking good! One little change. Pushing to testing afresh for QA!

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from 1395bc3 to 8227d86 Compare October 5, 2022 20:01
@mekarpeles mekarpeles mentioned this pull request Oct 8, 2022
12 tasks
@scottbarnes
Copy link
Collaborator Author

scottbarnes commented Oct 10, 2022

@cdrini

I've updated the draft PR and if possible I would like to try it on testing.

Some places that deserve special attention:

  • I'm a bit worried about my change to line 6 of SearchResultsWork.html. I think I tested all references, but I want to draw attention to this specifically as it could change how documents are interpreted.
  • The output of readinglog_stats is suspect. I know for sure I am not getting the right data out of Solr for "Works by People", and I think my entire parsing of doc.get("seed") in the SearchResponse.docs is probably wrongheaded.
  • I think the strategy to get the ratings through to the template is obtuse.
  • The JSON representation of the reading log should work now, but this also deserves extra attention.

Other things of note:

  • There is much less querying of the databases now. Rather than 3-4 queries for the same data in different forms, it's usually just one query (in addition to the query to the reading log db) to Solr in get_users_logged_books in bookshelves.py, with that query response data passed through to the templates.
  • Pagination should work for search results (and not be broken for non-searches).
  • The limit for search queries is just over 2000 books; there is no limit on the number of matches (I think...).

Things that remain:

  • There is not a yet a notification for people who have more books than the limit. I simply forgot about it until looking at my notes just now, and I figured I should commit this now in case people have a chance to look at it before I get to the too-many-books notification. I will update this PR when I add that.

Sadly, I don't think the code is any more clear now (yet?).

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from 95f535f to e1688c2 Compare October 11, 2022 17:13
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add q to the json endpoint for consistency!

Otherwise is looking good! Investigating a weird possible perm issue :P Possible unrelated to this PR.

Thank you for taking the time to try to clean this up a little 😊

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch 2 times, most recently from 73784f2 to 32809f2 Compare October 12, 2022 22:12
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Let me push up that solr param change, and put up on testing!

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch 2 times, most recently from cb3b2aa to 2e73424 Compare October 12, 2022 23:16
@cdrini cdrini force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from 2e73424 to cb3b2aa Compare October 13, 2022 00:15
@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from cb3b2aa to 562216d Compare October 13, 2022 00:22
@cdrini cdrini added the Needs: Special Deploy This PR will need a non-standard deploy to production label Oct 13, 2022
@cdrini cdrini force-pushed the 5080/feature/adding-search-capability-to-reading-log branch 2 times, most recently from c9efbe6 to ad0ffdd Compare October 13, 2022 01:34
@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from ad0ffdd to 91324d4 Compare October 13, 2022 02:08
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 failing test left! :)

Test:

  • ✅ Pagination w no search

  • ✅ No q= more than 2k

  • ✅ Sorting no search

  • ✅ Set sort, then search

  • ✅ Set page, then search, page removed

  • ✅ JSON endpoint w/o q

  • ❌ JSON endpoint w/ q

    • edition IDs mis-aligned
  • ✅ JSON endpoint w/ q and sort

  • ✅ JSON endpoint w/ q and sort and page

  • ✅ Moving things btwn shelves

  • ✅ Add rating when querying

  • ✅ Move btwn shelves when querying

  • ✅ Can't see other users reading log

  • ✅ Can't see other users reading log with q

  • ✅ Can't see other users reading log with .json

  • ✅ Searching user reading log with 20k books works!

  • ✅ Reading log stats

@scottbarnes
Copy link
Collaborator Author

scottbarnes commented Oct 14, 2022

@cdrini

This adds querying to the JSON endpoints. I am hopeful it's fixed the edition IDs being mis-aligned, but for some reason in my local development environment I never have the edition IDs.

That said, injecting the edition IDs into the Solr results now uses a dictionary with a key of the work ID, and a value of a datclass with properties for the logged_date and the edition_id, so it no longer relies on ordering. I am hopeful this works.

I included two commits in this latest change: one addressing the JSON endpoint and the logged_date/edition ID stuff, and another refactoring get_works() and some related functions to hopefully make it more clear.

Another thing to draw attention to, in the latter commit, is that I made it so if someone goes to /WANT-TO-READ/, it will work. From the inclusion of key = key.lower() in get_works() I think this was intended, but it didn't appear to work.

Lastly, I did not do a great job reducing redundancy in get_users_logged_books(), though I think it is perhaps slightly more clear.

Edit:
I removed ReadingLog.KEYS and two of the functions it called (and a third that never got called) because I don't think any of them are used and they create the illusion that loans and waitlists are controlled from ReadingLog.

We need to test loans/waitlists. I think I can't fully test that locally.

Why I suspect we can remove the waitlists and loans things that I've removed in ReadingLog:

  • account/books.html renders account/loans.html if key == 'loans';
  • account/books.html is itself rendered by MyBooksTemplate, and not ReadingLog;
  • ReadingLog.get_loans(), ReadingLog.get_waitlist_summary(), and ReadingLog.get_waitlisted_editions() are not called when viewing the reading log, or any link in the left-hand menu that displays when viewing the reading log, including "Loans";
  • ReadingLog.get_waitlist_summary is not called by anything;
  • ReadingLog.get_loans() and ReadingLog.get_waitlisted_editions() are only called by ReadingLog.KEYS when the key is loans or waitlists, respectively;
  • Only ReadingLog.get_works() calls ReadingLog.KEYS;
  • ReadingLog.get_works() is only called in the following situations:
    • public_my_books_json with calls get_works() with a key of want-to-read;
    • readinglog_stats calls get_works() with a default key of loans, for some reason, but the key parameter is always supplied a value of either "want-to-read", "already-read", or "currently-reading";
    • MyBooksTemplate.render() calls get_works(), but only if the key is one of "currently-reading", "want-to-read", or "already-read";
  • There is no mention of loans or waitlists in accounts/reading_log.html

I just hope I am right about all of that.

@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch 2 times, most recently from 632e36f to 37a13d3 Compare October 14, 2022 04:49
…hanged self.KEYS to a Literal in ReadingLog() to get hints working end to end. Removed unused functions related to loans and waitlists in ReadingLog.
@scottbarnes scottbarnes force-pushed the 5080/feature/adding-search-capability-to-reading-log branch from 37a13d3 to b957256 Compare October 14, 2022 20:40
@scottbarnes scottbarnes requested a review from cdrini October 25, 2022 20:10
Comment on lines +1158 to +1161
for doc in self.docs:
work_id = extract_numeric_id_from_olid(doc.key)
rating = Ratings.get_users_rating_for_work(self.username, work_id)
self.ratings.append(rating or 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future PR it would be great to make this one bulk request! Now that it's in one place :)

Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok!! Final stretch!

@cdrini cdrini changed the title DRAFT: Closes #5080. Adds basic ability to search/filter one's reading log. Adds ability to search/filter one's reading log. Oct 28, 2022
@cdrini cdrini changed the title Adds ability to search/filter one's reading log. Adds ability to search/filter one's reading log Oct 28, 2022
Copy link
Collaborator

@cdrini cdrini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran the tests one more time; works like a charm!

@cdrini cdrini merged commit 92db345 into internetarchive:master Oct 28, 2022
@cdrini cdrini removed the Needs: Special Deploy This PR will need a non-standard deploy to production label Oct 29, 2022
@cdrini
Copy link
Collaborator

cdrini commented Oct 29, 2022

Solr-side of things deployed! This should be able to just go out normally on Monday :)

@scottbarnes scottbarnes deleted the 5080/feature/adding-search-capability-to-reading-log branch November 21, 2022 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

My Books (Reading Log) add Search & Filter capability
3 participants