Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of AniDB bans and 1000+ Anime Series #43

Open
winterbird-code opened this issue Jul 23, 2022 · 2 comments
Open

Handling of AniDB bans and 1000+ Anime Series #43

winterbird-code opened this issue Jul 23, 2022 · 2 comments

Comments

@winterbird-code
Copy link

Background

The AniDB API is heavily rate-limited and will ban any IP that requests more than maybe a few hundred series. Some of us data hoarders have far more series than that, meaning we can never do a full metadata refresh without getting banned.

Problems

I can think of three different scenarios when this causes problems:

  • During initial creation of a large library (it was horrible)
  • During full metadata refresh of a large library
  • When adding a large amount of series or episodes in different series in a short time (a day?)

A bit of brainstorming follows 🙂

Possible solutions?

When I added my library I manually kept track in the xml cache directory for when I started getting banned messages. At that point I:

  1. Stopped the library scanning
  2. Noted ID:s for the bad xml files and removed them
  3. Waited at least 24 hours
  4. Manually refreshed the metadata for the previously noted ID:s
  5. Restarted then library scan; rinse and repeat

Not the most smooth sailing possible. I guess PR #42 may have helped me to detect the bans which would be an improvement, but unless it also prevents the plugin from making more requests it might also have made the bans longer (I understand the AniDB API adds ban-time for each "unsolicited" request).

A possible high-level solution (which I don't know if it's possible) would be to limit the plugin to maybe 200 requests during a sliding window of 24 hours. When asked for more the plugin should just respond that it currently has no metadata and maybe schedule a new refresh for that series in 24/48/possible more hours. Same if a ban is detected: refuse to send any more requests for at least 24 hours. It will still take very long time for a large library to be refreshed; but it would be populated eventually and without any cumbersome user interaction.

As for library metadata refresh I think there can be some improvements to the xml-file caching. At the moment it looks like the xml files are cached for 7 days. One idea would be to make a daily scheduled task that updates the 50 (or 100 or something) oldest xml files for series that are present in the library and raise the cache time to 30, 60 or 90 days. This would keep the xml files fairly fresh even for larger collections and you wouldn't run in to trouble if you configure the libraries to refresh periodically.

I realize that this is a quite big change, and maybe the problem is rare enough that not many more than I find this troublesome. Unfortunately I cannot help with the code myself, but I'd like to put it up here as an idea, and maybe someone else finds it to be a fun challenge. Anyway; Thank you for an awesome plugin 😃

@nalsai
Copy link
Contributor

nalsai commented Jul 23, 2022

PR #42 throws an exception when an api error is detected which stops the task and doesn't save the bad xml. Unless you start another library scan or make more requests your ban-time shouldn't increase. You just have to manually start another library scan after you're unbanned and eventually you should have all metadata.

Your proposed solution seems good 👍
When I have time, I might implement the "refuse to send any more requests for at least 24 hours if a ban is detected" part.

I think that xml files are cached for 7 days because most anime release on a weekly schedule -> one episode (/the metadata for the episode) gets added to anidb per week.

@winterbird-code
Copy link
Author

Thanks, it would be a great improvement 👍

7 days (or maybe even 6 days) cache is absolutely the most reasonable for the common use case, but it makes it difficult to maintain the cache to be able to run library refreshes. Before I noticed the 7 days limit I assumed it always used the cache files until they were removed by the jellyfin scheduled task "clear cache", so I made a small python script to refresh the cache a little each day. Unfortunately 7 days is too short time to refresh the entire cache.

The scheduled task idea would help me (it's what I tried to do with the python script), but I'm not sure how much of a corner case it is. Just as you say it would also require some changes to the cache logic, such as refreshing "early" if a requested episode is missing from the cache. For now I'll just have to accept that an anime library cannot be metadata-refreshed. It's not optimal, but it's not critical either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants