Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor BeatmapStore to use MemoryCache #262

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tsunyoku
Copy link
Sponsor Member

@tsunyoku tsunyoku commented Apr 17, 2024

Resolves #143.

As originally mentioned by @smoogipoo, Microsoft's caching extensions work really well for this purpose. I've chosen MemoryCache to maintain existing behaviour however I'd suggest using a distributed cache with something like Redis if you are running multiple processor pods in production - you can share a consistent cache across them all.

This should allow for more fine-tuning regarding memory usage, and avoid unnecessary caching.

Remarks

Cache sizing

MemoryCache has a concept of an entry having a "size" which can be used to disallow a cache expanding beyond a set limit. The MEMORY_CACHE_SIZE_LIMIT environment variable will dictate the size limit for both beatmap and difficulty attribute caches, with each beatmap and difficulty attribute entry contributing 1 to this size (the "size" concept doesn't have any defined units and allows you to dictate size per entry)

Sliding expiration

There's a few ways you can choose to handle expiration with MemoryCache, but in this case I think it makes sense to use a sliding expiration as you will want frequently requested beatmaps or attributes to remain in the cache. Sliding expiration means it will evict items which haven't been accessed within the "sliding expiration" - this is set in seconds by the MEMORY_CACHE_SLIDING_EXPIRATION_SECONDS environment variable.

Separate caches

You can easily use 1 cache for both beatmaps and difficulty attributes, as keys can be anything which are an object. However, with the size limiting in place it made sense to me that they remain separate - this also keeps previous behaviour.

Cache defaults

I've defaulted the cache size to 1000 and the sliding expiration to 3600 seconds (1 hour). I think these are sane defaults, but I don't have insight on production numbers so these should definitely be adjusted.

@smoogipoo
Copy link
Contributor

smoogipoo commented Apr 18, 2024

This looks pretty much as I'd expect it to.

The size is probably a little bit too small, if my rough estimation is correct (going by the fields in the classes) then Beatmap should be 72 bytes and BeatmapDifficultyAttributes 24 bytes. 1000 of both of these would total up around 100KB.

This is probably at least 2 orders of magnitude less memory than a reasonable amount.

@ThePooN What's the memory usage statistics on production like? Is there a target you think is reasonable for this service to use in a long term (without accounting for spikes/etc, assuming caches are filled)?

@tsunyoku
Copy link
Sponsor Member Author

Yeah, numbers definitely need looking at since I have no real insight. I'll adjust the defaults once you get a better idea from @ThePooN, but I'll keep the configuration options there anyway so you can adjust on prod.

@ThePooN
Copy link
Member

ThePooN commented Apr 18, 2024

We're currently cruising at around 180MiB per instance. Seems like load only has an incidence on CPU, not RAM usage.
Looking back at historical data, it seems #143 has been resolved last December.

If that is of useful comparison, osu-performance is ran at one pod per gamemode, cruising at 450MiB for osu!, 110MiB for osu!taiko, 100MiB for osu!catch, 1080MiB for osu!mania.
This deployment architecture could be leveraged here as well for more optimal cache usage (unless we consider moving cache to redis).

I assume the relevant dataset is beatmaps with leaderboards and their difficulty attributes, correct?
As of writing, there are 169,242 ranked/approved/qualified/loved beatmaps, which represent 121,686,749 attributes.
If it's including graveyard/pending, I suggest not caching these at all, as that's a way bigger dataset of rarely-accessed rows (2,753,464 beatmaps, 190,383,808 attribs), and we'd need a strongly-consistent invalidation mechanism, definitely not worthwhile 😅

Either way, we indeed need to cache way more than 1000 rows. If I trust smoogipoo that a beatmap row is only 72 bytes, then caching the full leaderboard-enabled set would only represent 12MiB. Might as well use a plain HashMap or similar if that's true, I don't see why we'd need to evict them ever.

For attributes, I'd start with 100MiB. Realistically we could use way higher, but we will have to experiment and see what yields the most efficient results, so I appreciate the use of environment variables.

@tsunyoku
Copy link
Sponsor Member Author

tsunyoku commented Apr 18, 2024

Beatmap cache was a pre-existing thing as a hash map as you mentioned, just moved it to a MemoryCache alongside the attributes as I wasn't aware they weren't an issue. I can revert back to a ConcurrentDictionary if they're a non-issue or is just generally preferred but I won't make that choice.

As for attributes, I'll change the default to meet around the 100MiB suggested.

@peppy peppy self-requested a review April 19, 2024 05:42
@ThePooN
Copy link
Member

ThePooN commented Apr 19, 2024

For attributes, I'd start with 100MiB. Realistically we could use way higher, but we will have to experiment and see what yields the most efficient results, so I appreciate the use of environment variables.

For that to happen though, exposing a cache hit/miss ratio over datadog metrics is needed.

@bdach
Copy link
Collaborator

bdach commented Apr 19, 2024

This seems to be built in, just need to decide how often to send that over to not overdo it bills wise I suppose.

@ThePooN
Copy link
Member

ThePooN commented Apr 19, 2024

This seems to be built in, just need to decide how often to send that over to not overdo it bills wise I suppose.

Metrics are billed per indexes (combination of metric name + tags), volume/frequency doesn't matter.

@peppy
Copy link
Sponsor Member

peppy commented Apr 20, 2024

Is there a point to any of this if we're already having 100% of content in memory without issue?

ie. is this not wasted time/complexity?

@@ -29,14 +29,27 @@ public class BeatmapStore
{
private static readonly bool use_realtime_difficulty_calculation = Environment.GetEnvironmentVariable("REALTIME_DIFFICULTY") != "0";
private static readonly string beatmap_download_path = Environment.GetEnvironmentVariable("BEATMAP_DOWNLOAD_PATH") ?? "https://osu.ppy.sh/osu/{0}";
private static readonly uint memory_cache_size_limit = uint.Parse(Environment.GetEnvironmentVariable("MEMORY_CACHE_SIZE_LIMIT") ?? "1000");
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unit is super counter intuitive. if it can't be in megabytes then i'm against this change.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this can be MB or really any unit that you want, since we can just do some math with the size of the classes.

Copy link
Sponsor Member Author

@tsunyoku tsunyoku Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Circling back to this one I realise there isn't a way that I can reliably get the memory size of these since they are managed objects so stuff like sizeof(T) and Marshal.SizeOf<T>() are out of the question. @smoogipoo any ideas? 😅

Hardcoding a number sounds horrible, but maybe we can do a sizeof(T) summation for each field in the class somewhere and deal with the few byte inaccuracy...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but maybe we can do a sizeof(T) summation for each field in the class somewhere

At this point I'd probably rather hardcode a number with an inline comment or something.

Do mind that in the case of difficulty attributes the size will depend on the ruleset, so that'll have to be like looked up from a dictionary or something.

Copy link
Sponsor Member Author

@tsunyoku tsunyoku Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do mind that in the case of difficulty attributes the size will depend on the ruleset

I don't think this applies here, regardless of ruleset the same BeatmapDifficultyAttribute is fetched - we just need to handle the amount of attributes in the array

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I confused that with the client side classes. Sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I was going to post initially was something along these lines:

As bytes you optimise for size, but this is an estimation at best that doesn't account for fragmentation, heap sizes, and possibly other factors. It's also only realistically a lower bound because it doesn't factor burst usage (e.g. while a diffcalc goes on, perhaps). There's no real good way to measure this afaik - I did it visually based on what I know about C#.

On the other hand "1" allows you to think of it as the number of most recently used objects. You'd tune this directly based on hit/miss ratios to reach a target, and tune k8s to give the container enough resources after that.
In a way, the counterintuitiveness of it makes more intuitive.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that might make sense from one end, but when deploying, we'd have to be doing manual calculations to figure out memory requirements for the deploy. Generally we don't decide cache limits based on hit ratio but by available resources on the server.

Feels weird having a MemoryCache component which doesn't guarantee an upper limit though..

@tsunyoku
Copy link
Sponsor Member Author

tsunyoku commented Apr 20, 2024

Is there a point to any of this if we're already having 100% of content in memory without issue?

ie. is this not wasted time/complexity?

currently - probably not. if/when osu-performance is turned off then I think that might change, but @ThePooN is far more qualified to comment on that. I didn't know this was no longer an issue or I likely would have held off until then.

@bdach
Copy link
Collaborator

bdach commented Apr 20, 2024

I believe we did have the processor fall over at least once due to running out of memory didn't we? Which was at the time fixed by just recycling the cache more aggressively or something?

Introducing this could increase cache hit rate without running into risk of oom again, and thus improve performance / throughput.

@smoogipoo
Copy link
Contributor

Is there a point to any of this if we're already having 100% of content in memory without issue?

As @bdach points out, we've had it fall over in the past. Since then, we've been recycling the beatmap store every 60 sec:

if (buildStore == null || beatmapStore == null || currentTimestamp - lastStoreRefresh > 60_000)
{
buildStore = await BuildStore.CreateAsync(connection, transaction);
beatmapStore = await BeatmapStore.CreateAsync(connection, transaction);
lastStoreRefresh = currentTimestamp;
}

I can't tell if the cache is doing anything with the current metrics we have.

By the way, that recycling is still present in this PR. It would have to be removed for this to do anything, but that also presents an invalidation issue that needs to be addressed. I'm not sure how.

ie. is this not wasted time/complexity?

Maybe.

@ThePooN
Copy link
Member

ThePooN commented Apr 20, 2024

By the way, that recycling is still present in this PR. It would have to be removed for this to do anything, but that also presents an invalidation issue that needs to be addressed. I'm not sure how.

Do we actually need an invalidation path, given a small enough expiration time? I don't think a map can get disqualified and re-qualified within an hour?
Otherwise, the beatmap's md5sum can be added to the cache key if that's in the score input data.

@peppy
Copy link
Sponsor Member

peppy commented Apr 21, 2024

Does the memory cache API support invalidation? I'd say we could setup a listener for bss_process_queue or similar to handle invalidation in that case.

@tsunyoku
Copy link
Sponsor Member Author

tsunyoku commented Apr 21, 2024

The API supports invalidation via Remove(object key) but we'll have to subclass MemoryCache for the difficulty attribute keys as the beatmap ID will be part of the key, and for some reason MemoryCache doesn't let you access all keys without reflection so I tend to just opt for subclassing and tracking keys manually.

I didn't realise the cache was being recycled in the ScorePerformanceProcessor so I'll need to change that for only BeatmapStore since I haven't refactored BuildStore to use MemoryCache (I don't see a reason to).

@peppy
Copy link
Sponsor Member

peppy commented Apr 22, 2024

As above though, the recycling should not be happening if we are switching to this. Else it's not improving much at all (and just adding complexity).

Probably something similar to https://github.com/ppy/osu-server-spectator/blob/86a0ff06bda261a67a918d8ee07d1aa2f2b12006/osu.Server.Spectator/Hubs/Metadata/MetadataBroadcaster.cs#L48-L60 is required.

@tsunyoku
Copy link
Sponsor Member Author

tsunyoku commented Apr 24, 2024

Not sure I follow. If we're going to be using a listener to invalidate caches then I don't think we need to poll database for changes - removing the recycling should be enough. I've intentionally kept the recycling for only BuildStore since that's still just using a dictionary underneath, unless you'd like this changed too but I don't think there's a gain in that for builds?

@tsunyoku
Copy link
Sponsor Member Author

In d168720, I've refactored for the memory cache size to be in MB and adjusted the default to 100. Note the earlier conversation about there being no best way to calculate the memory taken up by an attribute, beatmap etc. so I've placed a constant for each with an inline explanation. Note that it's not perfect - an array itself will mean it's a couple more bytes than sizeof(T) * length and Nullable<T> adds a couple bytes too. It's a bit tedious, but if limiting in MB is the goal then I don't think it can get much better.

Comment on lines 35 to 43
/// <summary>
/// The size of a <see cref="BeatmapDifficultyAttribute"/> in megabytes. Used for tracking memory usage.
/// </summary>
private const int beatmap_difficulty_attribute_size = 24 / 1024 / 1024;

/// <summary>
/// The size of a <see cref="Beatmap"/> in megabytes. Used for tracking memory usage.
/// </summary>
private const int beatmap_size = 72 / 1024 / 1024;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um... these are going to be zero, are they not? It's integer division. It's gonna floor towards zero.

You're gonna have to switch to bytes I guess.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right...

@tsunyoku
Copy link
Sponsor Member Author

Moved to bytes because of integer division problem, shouldn't make any real difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BeatmapStore grows indefinitely (probably needs some kind of expiry logic)
5 participants