Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GetLibraryItemAsyncEnumerable for unlimited library books #27

Merged
merged 15 commits into from
May 26, 2022

Conversation

Mbucari
Copy link
Contributor

@Mbucari Mbucari commented May 25, 2022

I implemented mkb79's approach outlined Here

I wrapped the getAllLibraryItemsAsync inside an IAsyncEnumerable. Libation now calls Api.GetLibraryItemAsyncEnumerable() and iterates through all library items in an async foreach.

@rmcrackan
Copy link
Owner

The reason I haven't spent much time on tokens is because I have 1 specific use case that I'm almost positive they won't address. My understanding is that fetch-by-token will show you added and removed books. However, I personally make heavy use of the user ratings -- how the community rates it but especially how I rate it. I can't make use of any get-library implementation unless it includes any changes to these. That said, I haven't actually tested it yet so maybe I'm wrong and it does solve this problem.

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

I tested it and it imports all fields, including personal rating. It appears to be identical to the call with page numbers like @mkb79 said, just presumably without an upper limit. I say presumably, because I don't have nearly enough books to test it and I'm not interested in adding the whole AudiblePlus catalogue in an attempt to find an upper limit!

@mkb79
Copy link

mkb79 commented May 25, 2022

@rmcrackan @Mbucari

I implemented mkb79's approach outlined Here

You can fetch the entire library with the continuation token. But this has some disadvantages, if you want to fetch the whole library. A new request can only be done if the previous has finished. If you use the total-count header, only the first response must be awaited and all other page requests can be run asynchronous! I did this on my package here.

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

If you use the total-count header, only the first response must be awaited and all other page requests can be run asynchronous!

That is a good idea. Unfortunately @rmcrackan noted that if requested num_results is larger than 300, the query will sometimes not return provided_review and reviews groups. That limitation caps the num_results at around 250, thus caps the total library size at 10,000 using pages. To exceed 10,000 books and still receive reviews groups, the continuation token and synchronous call appears the only option.

@mkb79
Copy link

mkb79 commented May 25, 2022

@Mbucari

Why don't use them together. Make your first library request. If total-count is larger than 10.000, use the continuation token otherwise request the outstanding pages with async request. My implementation gives me a huge speed boost on large library’s and low num_results values.

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

How would that work? Does the response from page 40 return a continuation token?

@mkb79
Copy link

mkb79 commented May 25, 2022

Oh I think you misunderstood me. You are requesting the first library page like before. Then checking the total-count header. If there are more than 10.000 items, requesting the outstanding with you continuation token implementation. Otherwise calculate the outstanding pages and request them at once.

FYI:
Don’t forget to save the state-token. You can use them on your next library update.

Edit:
Using the page param don’t give you a continuation token!

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

Thanks!

@Mbucari Mbucari marked this pull request as draft May 25, 2022 13:45
@Mbucari Mbucari marked this pull request as ready for review May 25, 2022 16:09
@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

OK, this is ready for review and testing. I implemented @mkb79's suggestion, and the flow is as follows:

When GetEnumerator() is called on ItemAsyncEnumerable, a producer thread is created to get all library Items. The producer:

  1. Tries to download the first 250 items without passing a page number. If that fails because of a TimeoutException or TaskCanceledException, try again while requesting on 50 books. This first call:
    1.1 Parses the items
    1.2 Retrieves the Total-Count value
    1.3 Retrieves and stores the State-Token
    1.4 Retrieves and stores the Continuation-Token
  2. After this first call, if Total-Count > 10000 then continue looping with the continuation token until no continuation token is returned.
  3. If Total-Count <= 10000, get remaining books by page number in a Parallel.For loop

No matter how the Items are retrieved (either by page number or continuation token), the Task<Item[]> is added to the GetItemsTasks BlockingCollection.

When the iterator moves next, it detects if it needs a new batch of Item[]. If it does, it tries to take a Task<Item[]> from the GetItemsTasks BlockingCollection. Then it awaits that task and collects the Item[].

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

@mkb79 I'm still not certain about something. When requesting by page=pageNumber, is the limitation 10,000 Items, or is the limitation 40 pages?

@rmcrackan
Copy link
Owner

My memory is rusty but I think it was 40 pages. I'd like to hear what he says though

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

@rmcrackan I just tested it and I could exceed 60 pages at 2 items per page.

@mkb79
Copy link

mkb79 commented May 25, 2022

@Mbucari
The limit is 40 pages (I can verify this only for the catalog endpoint). The limit for num_result is 1.000. So theoretical it is possible to fetch 40.000 items. But @rmcrackan limited the num_result to 250, so 250 * 40 = 10.000 items are possible.

@mkb79
Copy link

mkb79 commented May 25, 2022

@Mbucari

The problem with the catalog endpoint is, you can fetch more than 40 pages. But page 41 and above give you the result for page 40.

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

@mkb79 @rmcrackan

For my test that exceeded 40 pages on the library endpoint, all returned items are unique. I checked the entire json structure, not just the title. It seems the catalog limitation doesn't exist for the library endpoint.

@rmcrackan
Copy link
Owner

Is this ready for testing? Or does the 40-pages discussion mean that another round of revisions is on its way?

Did you benchmark the old vs new way? If yes, how does it compare?

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

I'm making changes under the assumption that the library endpoint can support an unlimited number of pages.

As for benchmarks, I haven't measured but what I have committed is definitely faster.

@mkb79
Copy link

mkb79 commented May 25, 2022

@Mbucari

For my test that exceeded 40 pages on the library endpoint, all returned items are unique. I checked the entire json structure, not just the title.

I can verify, that there is no page limit (more?)! I'll try this on the catalog/products endpoint next. Maybe the limit is removed?!

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

OK, this is super cool. Using the a IAsyncEnumerable yield return pattern, I trimmed all the new concurrent library functions to 4 new methods.

I did 4 benchmark runs on ApiExtended.getItemsAsync using the existing library scan and another 4 on the new library scan. My account has 107 books Items, and a total of 281 items with episodes. Here are the results, in ms:

Existing L\library scan:

6491
5397
6670
6667

Parallel async library scan:

2870
2132
2514
2435

@Mbucari
Copy link
Contributor Author

Mbucari commented May 25, 2022

Let me know if you have any question, but this is ready for review and testing. I think you'll be pleasantly surprised.

@rmcrackan
Copy link
Owner

Wow, those results are really promising. I look forward to trying on my library which is 1,200+. I'll either test tonight or tomorrow

@rmcrackan
Copy link
Owner

Umm, wow. I'll spare you the stats porn. The bottom line is that, on average, for my library of 1,209 titles, your new way takes 17.1% as long.

@rmcrackan
Copy link
Owner

Between your reworking of the api scan, background scanning, and my recent refactor of the entity framework insert/updates, even casual users should perceive a major performance boost. Oh yeah, and the UI is not longer blocking during scanning, and auto scan, and ... The past month or 2 has been friggin' massive.

@@ -118,7 +118,7 @@ public partial class Api
{
const string CATALOG_PATH = "/1.0/catalog";

#region GetCatalogProductAsyncAsync
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kinda weird find + replace must I have done to cause this??

@rmcrackan rmcrackan merged commit 04ecabd into rmcrackan:master May 26, 2022
@Mbucari
Copy link
Contributor Author

Mbucari commented May 26, 2022

Dayum! Did you run it at the default 50 items per page, or did you up that number? I'm still a little uncertain about this new revelation that apparently there's no longer a page limit. My instinct says to up the default to maybe 100 per page. A lower number will show greater speed up for users with fewer books because there will be more concurrent gets.

Another point of concern I have right now is inside libation, there is no limit on getChildEpisodesAsync. A new task is spun up for each episode, and they all run concurrently. Presumably audible has some per-ip concurrent connection limit, and I'm afraid without throttling that we might find it.

@rmcrackan
Copy link
Owner

Did you run it at the default 50 items per page

Yep, I tested your code as-is. I'm curious about performance with bigger pages but I haven't yet tested this. I wonder how these perform 50, 100, 200, 250.

Another point of concern I have right now is ... they all run concurrently

Yes, please throttle this. The implications could range from 'meh' to 'lifetime ip block'. Fortunately (fortunately?!?) something's going weird in either github or nuget and it's not letting me publish the updated package

@Mbucari
Copy link
Contributor Author

Mbucari commented May 26, 2022

Did you run it at the default 50 items per page

Yep, I tested your code as-is. I'm curious about performance with bigger pages but I haven't yet tested this. I wonder how these perform 50, 100, 200, 250.

Another point of concern I have right now is ... they all run concurrently

Yes, please throttle this. The implications could range from 'meh' to 'lifetime ip block'. Fortunately (fortunately?!?) something's going weird in either github or nuget and it's not letting me publish the updated package

Ok, I'll add episode throttling. That code is inside libation though, so no API changes necessary. Unfortunately. Lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants