Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix timeouts for larger libraries #84

Merged
merged 1 commit into from
Apr 3, 2023
Merged

Conversation

peter-mcconnell
Copy link
Collaborator

@peter-mcconnell peter-mcconnell commented Mar 21, 2023

What does this PR do?

tl;dr; moves while not end to the UI and performs sections of this loop asyncronously. You can test this change with docker pull pemcconnell/cleanarr

Updates get_dupe_content so that it accepts a page parameter instead of running inside a while loop - this moves the paging iteration outside of this function, allowing us to create external paging mechanisms. It then updates the get_dupes method so it takes an optional page argument, which gets passed to the get_dupe_content method - this moves the paging iteration to the uri level. Lastly it updates the loadDupeContent method in the frontend code so that it will essentially be responsible for the original while not end logic. This puts an HTTP request between each paging of the results. Shouldn't be too noticable perf wise for existing usecases, but keen for validation there. I'm expecting a speed up for all usecases due to the async batching this PR introduces but I don't have a way to easily test for this. Importantly this means that Cleanarr now works on larger datasets.

An optimisation added that executes dupes requests in batches of PAGE_SIZE asyncronously. This defaults to 10 but can be overwritten with a new env REACT_APP_PAGE_SIZE. Cost: some needless calls to the dupes api. Reward: total time to load decreased for users in the UI. Due to this asyncronous logic this PR makes larger libraries not only work, but do so much faster (for me I went from ~90 second load time to ~15 after the async patch). 15 seconds is still way to long and I suspect 504s are still technically possible, but this is a reasonable start.

What does this not do?

I made no attempt to optimise any of the existing logic or external calls, aside from the asyncronous batching of ?page=X to ?page=Y. I suspect many opportunities for perf improvements remain in breaking the dupes api call up further, caching and/or reducing the number of properties requested.
I made no attempt at adding visual cues for the paging in the UI.
I didn't write tests. Sorry. I have limited time and wanted to hack something together in 30 mins and am not familiar with typescript so this wasn't fluent for me.
I haven't yet tested other parts of the software with my library - if I find other issues I'll try to patch and PR as I go

Caveats

I have never written typescript before. No clue if what I wrote is clean / idiosyncratic, but it seems to work.

How was it tested / how should a reviewer test it?

  • I ran this against my local instance which reliably 504s. It now works every time. You can see the paging in the console:
    image
  • I'd appreciate some testing on scenarios that worked previously to ensure I'm not introducing noticable degradations.
  • General code review; esp. for the typescript

Scanning over the open issues I believe this will solve the following:

Closes #55
Closes #57
Closes #60
Closes #62
Closes #66
Closes #77
Closes #70

@peter-mcconnell peter-mcconnell changed the title basic paging implementation for dupes content fix timeouts for larger libraries Mar 22, 2023
@C2BB
Copy link

C2BB commented Mar 23, 2023

Found your docker image and gave it a whirl, works like a charm, thank you so much! Been waiting for a solution to large libraries for some time :)

@austempest
Copy link

I didn't even think to see if peter-mcconnell had a branch. I pulled his image and can confirm it at least loads one one of my libraries. I'll try it on the rest later.

@smason16
Copy link

smason16 commented Mar 28, 2023

Is this available for unraid, if so how?

Edit: I discovered how to to have community apps search for all docker images

@peter-mcconnell
Copy link
Collaborator Author

glad to hear it's working for you folks. enjoy the spare GBs

@vanayr
Copy link

vanayr commented Apr 2, 2023

Peter you are the man, working like a charm. Over 6k in my Movies library, so the timeouts were brutal.

@se1exin se1exin merged commit 3704db8 into se1exin:master Apr 3, 2023
@se1exin
Copy link
Owner

se1exin commented Apr 3, 2023

Wow this is amazing. Thanks for the awesome contribution @peter-mcconnell !

@roggstudelpest
Copy link

roggstudelpest commented Apr 7, 2023

Not sure if this works on Synology Docker? Tried and cannot mount the /config folder. Are these changes in the latest Selexin release or is it possible to add them there?

*EDIT: I pulled your image from the Docker registry and that is the one that fails to load the /config mount path. Tried the same one used for Selexin's image and tried creating a new one. Both fail.

@se1exin
Copy link
Owner

se1exin commented Apr 7, 2023

@roggstudelpest this PR does not change to how the config folder works, and in my testing (albeit not on synology) the new image works fine with mounting the config folder.

Can you please raise a new issue and include the Synology errors?

@snickers2k
Copy link

strange that it now works for you guys... with latest release finally series are working, but 8k in movie library is not working...

@bigsteve76
Copy link

bigsteve76 commented May 10, 2023

This worked great for my TV Shows (230k ep.), but didn't work on Movies(45k). If I add my Movies library it gives the "HTTPConnectionPool(host='192.168.1.200', port=32400): Read timed out. (read timeout=30)" error every time after 30 seconds no matter what I have the PLEX_TIMEOUT set to. Has there been any solutions to this?

@peter-mcconnell
Copy link
Collaborator Author

peter-mcconnell commented May 17, 2023

Yeah there are undoubtedly more scenarios where timeouts are still likely. There are two high level routes to solving this problem in my head:

  1. optimise the backend code further to get the calls down to some reasonable SLO for the average usecase. Ideally < 3s. The last time I looked at the code most of this time was due to fetching more attributes about the movies / shows that it finds, so this route may require a simplification of the UI also.

  2. avoid the HTTP timeout issues entirely by building a little CLI that uses the same backend calls. This could be a "break glass" tool for people to run if their library is simply too heavy for the UI at present.

Option 1 of course is the neatest product solution as there is a single, reliable path for all users. However it is also the most involved from an engineering perspective and could bleed out to a more simplified UI view. Option 2 is the worst user-experience, but would be easy to build and likely be an "ok" option for people whilst waiting on Option 1.

I could try to knock up Option 2 tonight / tomorrow evening if I get some free time and will leave Option 1 to the maintainer(s)

@bigsteve76
Copy link

Thank you for your response. That would be excellent. As I said in my previous comment, the TV works great and simply does exactly what I need it to do. I did two different Dockers so that I could use the TV version while trying to hash out the Movie versions issues.

peter-mcconnell added a commit to peter-mcconnell/Cleanarr that referenced this pull request May 17, 2023
basic paging implementation for dupes content (se1exin#84)
@peter-mcconnell
Copy link
Collaborator Author

@bigsteve76 I've put something rough together for you to play with: #94

This is VERY simple and rough around the edges but should provide a starting block for a CLI based approach to this product

@deepzone1
Copy link

Would you mind adding it to community applications ? I have no clue on how to add dockers from dockerhub in unraid :/
Have a library of 49 000 movies with around 2000 dupes. Would love to get rid of them :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment