fix timeouts for larger libraries #84

peter-mcconnell · 2023-03-21T22:41:18Z

What does this PR do?

tl;dr; moves while not end to the UI and performs sections of this loop asyncronously. You can test this change with docker pull pemcconnell/cleanarr

Updates get_dupe_content so that it accepts a page parameter instead of running inside a while loop - this moves the paging iteration outside of this function, allowing us to create external paging mechanisms. It then updates the get_dupes method so it takes an optional page argument, which gets passed to the get_dupe_content method - this moves the paging iteration to the uri level. Lastly it updates the loadDupeContent method in the frontend code so that it will essentially be responsible for the original while not end logic. This puts an HTTP request between each paging of the results. Shouldn't be too noticable perf wise for existing usecases, but keen for validation there. I'm expecting a speed up for all usecases due to the async batching this PR introduces but I don't have a way to easily test for this. Importantly this means that Cleanarr now works on larger datasets.

An optimisation added that executes dupes requests in batches of PAGE_SIZE asyncronously. This defaults to 10 but can be overwritten with a new env REACT_APP_PAGE_SIZE. Cost: some needless calls to the dupes api. Reward: total time to load decreased for users in the UI. Due to this asyncronous logic this PR makes larger libraries not only work, but do so much faster (for me I went from ~90 second load time to ~15 after the async patch). 15 seconds is still way to long and I suspect 504s are still technically possible, but this is a reasonable start.

What does this not do?

I made no attempt to optimise any of the existing logic or external calls, aside from the asyncronous batching of ?page=X to ?page=Y. I suspect many opportunities for perf improvements remain in breaking the dupes api call up further, caching and/or reducing the number of properties requested.
I made no attempt at adding visual cues for the paging in the UI.
I didn't write tests. Sorry. I have limited time and wanted to hack something together in 30 mins and am not familiar with typescript so this wasn't fluent for me.
I haven't yet tested other parts of the software with my library - if I find other issues I'll try to patch and PR as I go

Caveats

I have never written typescript before. No clue if what I wrote is clean / idiosyncratic, but it seems to work.

How was it tested / how should a reviewer test it?

I ran this against my local instance which reliably 504s. It now works every time. You can see the paging in the console:
I'd appreciate some testing on scenarios that worked previously to ensure I'm not introducing noticable degradations.
General code review; esp. for the typescript

Scanning over the open issues I believe this will solve the following:

Closes #55
Closes #57
Closes #60
Closes #62
Closes #66
Closes #77
Closes #70

C2BB · 2023-03-23T00:51:38Z

Found your docker image and gave it a whirl, works like a charm, thank you so much! Been waiting for a solution to large libraries for some time :)

austempest · 2023-03-28T11:14:00Z

I didn't even think to see if peter-mcconnell had a branch. I pulled his image and can confirm it at least loads one one of my libraries. I'll try it on the rest later.

smason16 · 2023-03-28T21:00:45Z

Is this available for unraid, if so how?

Edit: I discovered how to to have community apps search for all docker images

peter-mcconnell · 2023-03-29T15:46:05Z

glad to hear it's working for you folks. enjoy the spare GBs

vanayr · 2023-04-02T11:58:41Z

Peter you are the man, working like a charm. Over 6k in my Movies library, so the timeouts were brutal.

se1exin · 2023-04-03T21:45:34Z

Wow this is amazing. Thanks for the awesome contribution @peter-mcconnell !

roggstudelpest · 2023-04-07T09:12:56Z

Not sure if this works on Synology Docker? Tried and cannot mount the /config folder. Are these changes in the latest Selexin release or is it possible to add them there?

*EDIT: I pulled your image from the Docker registry and that is the one that fails to load the /config mount path. Tried the same one used for Selexin's image and tried creating a new one. Both fail.

se1exin · 2023-04-07T23:27:03Z

@roggstudelpest this PR does not change to how the config folder works, and in my testing (albeit not on synology) the new image works fine with mounting the config folder.

Can you please raise a new issue and include the Synology errors?

snickers2k · 2023-04-21T01:23:10Z

strange that it now works for you guys... with latest release finally series are working, but 8k in movie library is not working...

bigsteve76 · 2023-05-10T22:00:25Z

This worked great for my TV Shows (230k ep.), but didn't work on Movies(45k). If I add my Movies library it gives the "HTTPConnectionPool(host='192.168.1.200', port=32400): Read timed out. (read timeout=30)" error every time after 30 seconds no matter what I have the PLEX_TIMEOUT set to. Has there been any solutions to this?

peter-mcconnell · 2023-05-17T15:19:58Z

Yeah there are undoubtedly more scenarios where timeouts are still likely. There are two high level routes to solving this problem in my head:

optimise the backend code further to get the calls down to some reasonable SLO for the average usecase. Ideally < 3s. The last time I looked at the code most of this time was due to fetching more attributes about the movies / shows that it finds, so this route may require a simplification of the UI also.
avoid the HTTP timeout issues entirely by building a little CLI that uses the same backend calls. This could be a "break glass" tool for people to run if their library is simply too heavy for the UI at present.

Option 1 of course is the neatest product solution as there is a single, reliable path for all users. However it is also the most involved from an engineering perspective and could bleed out to a more simplified UI view. Option 2 is the worst user-experience, but would be easy to build and likely be an "ok" option for people whilst waiting on Option 1.

I could try to knock up Option 2 tonight / tomorrow evening if I get some free time and will leave Option 1 to the maintainer(s)

bigsteve76 · 2023-05-17T17:30:45Z

Thank you for your response. That would be excellent. As I said in my previous comment, the TV works great and simply does exactly what I need it to do. I did two different Dockers so that I could use the TV version while trying to hash out the Movie versions issues.

basic paging implementation for dupes content (se1exin#84)

peter-mcconnell · 2023-05-17T23:41:05Z

@bigsteve76 I've put something rough together for you to play with: #94

This is VERY simple and rough around the edges but should provide a starting block for a CLI based approach to this product

deepzone1 · 2023-07-04T19:43:41Z

Would you mind adding it to community applications ? I have no clue on how to add dockers from dockerhub in unraid :/
Have a library of 49 000 movies with around 2000 dupes. Would love to get rid of them :P

peter-mcconnell mentioned this pull request Mar 21, 2023

Unable to start "Failed to load content!" #55

Closed

peter-mcconnell force-pushed the master branch from 0c6314a to 10e30e3 Compare March 21, 2023 23:24

basic paging implementation for dupes content

d66a6db

peter-mcconnell force-pushed the master branch from 10e30e3 to d66a6db Compare March 22, 2023 08:21

peter-mcconnell changed the title ~~basic paging implementation for dupes content~~ fix timeouts for larger libraries Mar 22, 2023

se1exin merged commit 3704db8 into se1exin:master Apr 3, 2023

peter-mcconnell added a commit to peter-mcconnell/Cleanarr that referenced this pull request May 17, 2023

Merge pull request #1 from se1exin/master

26c818e

basic paging implementation for dupes content (se1exin#84)

peter-mcconnell mentioned this pull request May 17, 2023

feat: adding a VERY ROUGH cli #94

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix timeouts for larger libraries #84

fix timeouts for larger libraries #84

peter-mcconnell commented Mar 21, 2023 •

edited

Loading

C2BB commented Mar 23, 2023

austempest commented Mar 28, 2023

smason16 commented Mar 28, 2023 •

edited

Loading

peter-mcconnell commented Mar 29, 2023

vanayr commented Apr 2, 2023

se1exin commented Apr 3, 2023

roggstudelpest commented Apr 7, 2023 •

edited

Loading

se1exin commented Apr 7, 2023

snickers2k commented Apr 21, 2023

bigsteve76 commented May 10, 2023 •

edited

Loading

peter-mcconnell commented May 17, 2023 •

edited

Loading

bigsteve76 commented May 17, 2023

peter-mcconnell commented May 17, 2023

deepzone1 commented Jul 4, 2023

fix timeouts for larger libraries #84

fix timeouts for larger libraries #84

Conversation

peter-mcconnell commented Mar 21, 2023 • edited Loading

C2BB commented Mar 23, 2023

austempest commented Mar 28, 2023

smason16 commented Mar 28, 2023 • edited Loading

peter-mcconnell commented Mar 29, 2023

vanayr commented Apr 2, 2023

se1exin commented Apr 3, 2023

roggstudelpest commented Apr 7, 2023 • edited Loading

se1exin commented Apr 7, 2023

snickers2k commented Apr 21, 2023

bigsteve76 commented May 10, 2023 • edited Loading

peter-mcconnell commented May 17, 2023 • edited Loading

bigsteve76 commented May 17, 2023

peter-mcconnell commented May 17, 2023

deepzone1 commented Jul 4, 2023

peter-mcconnell commented Mar 21, 2023 •

edited

Loading

smason16 commented Mar 28, 2023 •

edited

Loading

roggstudelpest commented Apr 7, 2023 •

edited

Loading

bigsteve76 commented May 10, 2023 •

edited

Loading

peter-mcconnell commented May 17, 2023 •

edited

Loading