Add GetLibraryItemAsyncEnumerable for unlimited library books #27

Mbucari · 2022-05-25T00:30:59Z

I implemented mkb79's approach outlined Here

I wrapped the getAllLibraryItemsAsync inside an IAsyncEnumerable. Libation now calls Api.GetLibraryItemAsyncEnumerable() and iterates through all library items in an async foreach.

rmcrackan · 2022-05-25T01:16:24Z

The reason I haven't spent much time on tokens is because I have 1 specific use case that I'm almost positive they won't address. My understanding is that fetch-by-token will show you added and removed books. However, I personally make heavy use of the user ratings -- how the community rates it but especially how I rate it. I can't make use of any get-library implementation unless it includes any changes to these. That said, I haven't actually tested it yet so maybe I'm wrong and it does solve this problem.

Mbucari · 2022-05-25T01:30:57Z

I tested it and it imports all fields, including personal rating. It appears to be identical to the call with page numbers like @mkb79 said, just presumably without an upper limit. I say presumably, because I don't have nearly enough books to test it and I'm not interested in adding the whole AudiblePlus catalogue in an attempt to find an upper limit!

mkb79 · 2022-05-25T08:34:12Z

@rmcrackan @Mbucari

I implemented mkb79's approach outlined Here

You can fetch the entire library with the continuation token. But this has some disadvantages, if you want to fetch the whole library. A new request can only be done if the previous has finished. If you use the total-count header, only the first response must be awaited and all other page requests can be run asynchronous! I did this on my package here.

Mbucari · 2022-05-25T12:22:17Z

If you use the total-count header, only the first response must be awaited and all other page requests can be run asynchronous!

That is a good idea. Unfortunately @rmcrackan noted that if requested num_results is larger than 300, the query will sometimes not return provided_review and reviews groups. That limitation caps the num_results at around 250, thus caps the total library size at 10,000 using pages. To exceed 10,000 books and still receive reviews groups, the continuation token and synchronous call appears the only option.

mkb79 · 2022-05-25T13:05:55Z

@Mbucari

Why don't use them together. Make your first library request. If total-count is larger than 10.000, use the continuation token otherwise request the outstanding pages with async request. My implementation gives me a huge speed boost on large library’s and low num_results values.

Mbucari · 2022-05-25T13:13:22Z

How would that work? Does the response from page 40 return a continuation token?

mkb79 · 2022-05-25T13:38:01Z

Oh I think you misunderstood me. You are requesting the first library page like before. Then checking the total-count header. If there are more than 10.000 items, requesting the outstanding with you continuation token implementation. Otherwise calculate the outstanding pages and request them at once.

FYI:
Don’t forget to save the state-token. You can use them on your next library update.

Edit:
Using the page param don’t give you a continuation token!

Mbucari · 2022-05-25T13:38:40Z

Thanks!

… libraries < 10000 books

Mbucari · 2022-05-25T16:19:46Z

OK, this is ready for review and testing. I implemented @mkb79's suggestion, and the flow is as follows:

When GetEnumerator() is called on ItemAsyncEnumerable, a producer thread is created to get all library Items. The producer:

Tries to download the first 250 items without passing a page number. If that fails because of a TimeoutException or TaskCanceledException, try again while requesting on 50 books. This first call:
1.1 Parses the items
1.2 Retrieves the Total-Count value
1.3 Retrieves and stores the State-Token
1.4 Retrieves and stores the Continuation-Token
After this first call, if Total-Count > 10000 then continue looping with the continuation token until no continuation token is returned.
If Total-Count <= 10000, get remaining books by page number in a Parallel.For loop

No matter how the Items are retrieved (either by page number or continuation token), the Task<Item[]> is added to the GetItemsTasks BlockingCollection.

When the iterator moves next, it detects if it needs a new batch of Item[]. If it does, it tries to take a Task<Item[]> from the GetItemsTasks BlockingCollection. Then it awaits that task and collects the Item[].

Mbucari · 2022-05-25T16:23:58Z

@mkb79 I'm still not certain about something. When requesting by page=pageNumber, is the limitation 10,000 Items, or is the limitation 40 pages?

rmcrackan · 2022-05-25T16:26:01Z

My memory is rusty but I think it was 40 pages. I'd like to hear what he says though

Mbucari · 2022-05-25T16:38:34Z

@rmcrackan I just tested it and I could exceed 60 pages at 2 items per page.

mkb79 · 2022-05-25T16:39:37Z

@Mbucari
The limit is 40 pages (I can verify this only for the catalog endpoint). The limit for num_result is 1.000. So theoretical it is possible to fetch 40.000 items. But @rmcrackan limited the num_result to 250, so 250 * 40 = 10.000 items are possible.

mkb79 · 2022-05-25T16:41:48Z

@Mbucari

The problem with the catalog endpoint is, you can fetch more than 40 pages. But page 41 and above give you the result for page 40.

Mbucari · 2022-05-25T16:46:39Z

@mkb79 @rmcrackan

For my test that exceeded 40 pages on the library endpoint, all returned items are unique. I checked the entire json structure, not just the title. It seems the catalog limitation doesn't exist for the library endpoint.

rmcrackan · 2022-05-25T17:29:32Z

Is this ready for testing? Or does the 40-pages discussion mean that another round of revisions is on its way?

Did you benchmark the old vs new way? If yes, how does it compare?

Mbucari · 2022-05-25T17:36:46Z

I'm making changes under the assumption that the library endpoint can support an unlimited number of pages.

As for benchmarks, I haven't measured but what I have committed is definitely faster.

mkb79 · 2022-05-25T17:50:39Z

@Mbucari

For my test that exceeded 40 pages on the library endpoint, all returned items are unique. I checked the entire json structure, not just the title.

I can verify, that there is no page limit (more?)! I'll try this on the catalog/products endpoint next. Maybe the limit is removed?!

Mbucari · 2022-05-25T21:04:26Z

OK, this is super cool. Using the a IAsyncEnumerable yield return pattern, I trimmed all the new concurrent library functions to 4 new methods.

I did 4 benchmark runs on ApiExtended.getItemsAsync using the existing library scan and another 4 on the new library scan. My account has 107 books Items, and a total of 281 items with episodes. Here are the results, in ms:

Existing L\library scan:

6491
5397
6670
6667

Parallel async library scan:

2870
2132
2514
2435

Mbucari · 2022-05-25T22:40:01Z

Let me know if you have any question, but this is ready for review and testing. I think you'll be pleasantly surprised.

rmcrackan · 2022-05-26T00:53:00Z

Wow, those results are really promising. I look forward to trying on my library which is 1,200+. I'll either test tonight or tomorrow

rmcrackan · 2022-05-26T01:54:29Z

Umm, wow. I'll spare you the stats porn. The bottom line is that, on average, for my library of 1,209 titles, your new way takes 17.1% as long.

rmcrackan · 2022-05-26T01:59:06Z

Between your reworking of the api scan, background scanning, and my recent refactor of the entity framework insert/updates, even casual users should perceive a major performance boost. Oh yeah, and the UI is not longer blocking during scanning, and auto scan, and ... The past month or 2 has been friggin' massive.

rmcrackan · 2022-05-26T02:01:34Z

AudibleApi/Api.Catalog.cs

@@ -118,7 +118,7 @@ public partial class Api
 	{
 		const string CATALOG_PATH = "/1.0/catalog";

-		#region GetCatalogProductAsyncAsync


What kinda weird find + replace must I have done to cause this??

Mbucari · 2022-05-26T02:03:33Z

Dayum! Did you run it at the default 50 items per page, or did you up that number? I'm still a little uncertain about this new revelation that apparently there's no longer a page limit. My instinct says to up the default to maybe 100 per page. A lower number will show greater speed up for users with fewer books because there will be more concurrent gets.

Another point of concern I have right now is inside libation, there is no limit on getChildEpisodesAsync. A new task is spun up for each episode, and they all run concurrently. Presumably audible has some per-ip concurrent connection limit, and I'm afraid without throttling that we might find it.

rmcrackan · 2022-05-26T02:24:03Z

Did you run it at the default 50 items per page

Yep, I tested your code as-is. I'm curious about performance with bigger pages but I haven't yet tested this. I wonder how these perform 50, 100, 200, 250.

Another point of concern I have right now is ... they all run concurrently

Yes, please throttle this. The implications could range from 'meh' to 'lifetime ip block'. Fortunately (fortunately?!?) something's going weird in either github or nuget and it's not letting me publish the updated package

Mbucari · 2022-05-26T02:28:06Z

Did you run it at the default 50 items per page

Yep, I tested your code as-is. I'm curious about performance with bigger pages but I haven't yet tested this. I wonder how these perform 50, 100, 200, 250.

Another point of concern I have right now is ... they all run concurrently

Yes, please throttle this. The implications could range from 'meh' to 'lifetime ip block'. Fortunately (fortunately?!?) something's going weird in either github or nuget and it's not letting me publish the updated package

Ok, I'll add episode throttling. That code is inside libation though, so no API changes necessary. Unfortunately. Lol

Add GetLibraryItemAsyncEnumerable for unlimited library books

8406f20

Mbucari mentioned this pull request May 25, 2022

Add support for unlimited library size. rmcrackan/Libation#258

Merged

Mbucari added 5 commits May 24, 2022 20:39

Make truly async so all Items are downloaded as fast as possible.

4d2f681

Add comment

7d3ff30

Only store Items from most recent call to Api

7a6b9d6

Use Task instead of ValueTask when awaiting more than once.

1bf75a7

No need to convert to json then back to string

2acec32

Add error handling and retrying

b216996

Mbucari marked this pull request as draft May 25, 2022 13:45

Mbucari added 2 commits May 25, 2022 10:04

Make ItemAsyncEnumerable a hybrid that can make parallel requests for…

e71dc06

… libraries < 10000 books

Default NumberOfResultPerPage is 250

3abbaf8

Mbucari marked this pull request as ready for review May 25, 2022 16:09

Off by 1

478adef

Mbucari added 2 commits May 25, 2022 14:58

Redesign and simplify GetLibraryItemAsyncEnumerable

25e4bdd

Change default get size to 50

8b41d34

Mbucari added 2 commits May 25, 2022 15:10

Better naming

773a20d

Parameterize request count and max concurrency

81684bf

Refactor for clarity

f648251

rmcrackan reviewed May 26, 2022

View reviewed changes

rmcrackan merged commit 04ecabd into rmcrackan:master May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GetLibraryItemAsyncEnumerable for unlimited library books #27

Add GetLibraryItemAsyncEnumerable for unlimited library books #27

Mbucari commented May 25, 2022

rmcrackan commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022 •

edited

Mbucari commented May 25, 2022

Mbucari commented May 25, 2022 •

edited

Mbucari commented May 25, 2022 •

edited

rmcrackan commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022

mkb79 commented May 25, 2022

Mbucari commented May 25, 2022

rmcrackan commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022 •

edited

Mbucari commented May 25, 2022 •

edited

Mbucari commented May 25, 2022

rmcrackan commented May 26, 2022

rmcrackan commented May 26, 2022

rmcrackan commented May 26, 2022

rmcrackan May 26, 2022

Mbucari commented May 26, 2022 •

edited

rmcrackan commented May 26, 2022

Mbucari commented May 26, 2022

Add GetLibraryItemAsyncEnumerable for unlimited library books #27

Add GetLibraryItemAsyncEnumerable for unlimited library books #27

Conversation

Mbucari commented May 25, 2022

rmcrackan commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022 • edited

Mbucari commented May 25, 2022

Mbucari commented May 25, 2022 • edited

Mbucari commented May 25, 2022 • edited

rmcrackan commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022

mkb79 commented May 25, 2022

Mbucari commented May 25, 2022

rmcrackan commented May 25, 2022

Mbucari commented May 25, 2022

mkb79 commented May 25, 2022 • edited

Mbucari commented May 25, 2022 • edited

Mbucari commented May 25, 2022

rmcrackan commented May 26, 2022

rmcrackan commented May 26, 2022

rmcrackan commented May 26, 2022

rmcrackan May 26, 2022

Choose a reason for hiding this comment

Mbucari commented May 26, 2022 • edited

rmcrackan commented May 26, 2022

Mbucari commented May 26, 2022

mkb79 commented May 25, 2022 •

edited

Mbucari commented May 25, 2022 •

edited

Mbucari commented May 25, 2022 •

edited

mkb79 commented May 25, 2022 •

edited

Mbucari commented May 25, 2022 •

edited

Mbucari commented May 26, 2022 •

edited