Update classic scoring formula to closer match stable score V1 #24924

bdach · 2023-09-25T17:33:54Z

Closes Classic scoring growth is too steep after scorev2 changes #23763
Closes Spinner "clear bonus" doesn't increment score when set to "classic" display mode #17011

The goal of this PR is to adjust the classic scoring algorithm to match score V1 closer in several (but not all) aspects. I will attempt to summarise somewhat briefly what this means below, but this will be a long opening post nevertheless, for the code changes here are 1% of the work that was put in to arrive at these changes.

First, let me explain that this PR's intention is not to port stable score V1 across literally. That is infeasible due to several constraints placed upon the scoring algorithms; some are negotiable, some are not. This PR only improves some aspects of classic score. I will attempt to explain this exhaustively but briefly in the sections below.

What is supposed to match closer?

osu!mania

mania is a bit of a black sheep in this diff, so let's get it out of the way first.

stable score V1 for mania is basically linear, and capped to 1 million anyways. Thus, the concept of "classic" score doesn't really make much sense; VSRG people are generally used to the 1M cap, and having score blow up like it did doesn't seem to benefit anybody. So this diff makes the "classic" setting do absolutely nothing for mania - as in, mania classic and standardised score are the same value. Period.

Now, standardised score is not actually the same as stable score V1, but bringing that across is generally not feasible due to other constraints. But it will do well enough - the rate of change and magnitude both match, and that's all I care about for the purposes of this diff.

If you only care about mania, you don't need to read the rest of this. Everything else is for the other rulesets.

Bonus is (slightly) less susceptible to being worthless at the start of a map

I'll be brief; the root issue at hand is mostly described already in #17011 (comment). What this PR does about it, is that it ensures that if standardised score rises by 10 points, classic score will rise by at least 1 point.

How was this achieved?

When fitting the new formulae to total score values on existing beatmaps, I forced a constant 100k score component on the formulae; multiplying the normalised standardised score ($\frac{\texttt{standardised}}{1000000}$) by 100k makes it so that every 10 standardised points correspond to at least 1 classic point.

This could be possibly considered a little ridiculous on short maps, as I'm sure you'll be able to tell by the test case changes, but I guess if it's any consolation, then it's not really that much worse than what standardised score does on short maps.

Magnitude of classic total score on perfect and near-perfect plays

The general ballpark of how high classic score for perfect plays should now be closer to stable. This will wildly vary per map for reasons I am yet to explain here, but the hope is that generally the fit will improve.

How was this achieved?

Using lazer's score V1 simulators, intended to be used for conversion to standardised score, I simulated the maximum base score (without bonus) for every beatmap from the data.ppy.sh data dumps for osu!, taiko, and catch (including converts). Using this data, I then fit an at-most-quadratic factor dependent on total basic object count for each ruleset, with the 100k constant factor described previously forced; the optimisation criterion was the minimisation of relative error when compared to stable.

The code used to generate the data to fit against, the generated data, and the fitting process have been described in this gist.

Note that the data extraction process already included changes from #24779 - although since I was fitting to maximum base score without bonus, changes therein should not generally have any meaningful impact, I don't think.

For reproduction, if required:

Apply this diff onto Split legacy scoring attribs into its own table #24779
Run this program against beatmap dumps from data.ppy.sh
You should get a CSV like so
This jupyter notebook describes the rest.

The general rate of ascent of classic score

The rates of ascent of score V1 and V2 on a perfect score on stable are - as far as I can tell - as follows:

ruleset	score V1	score V2
osu!	$O(n^2)$	$O(n^2)$
osu!taiko	$O(n)$	$O(n \log n)$ (until 400 combo) / $O(n)$ (over 400 combo)
osu!catch	$O(n^2)$	$O(n \log n)$ (until 200 combo) / $O(n)$ (over 200 combo)

The table above uses the big O notation for representing asymptotic boundedness of the scoring functions. $n$ is the number of objects in each case.

The previous rescale used was still assuming we were using old standardised (which was linear, $O(n)$). This no longer holds, which means that squaring osu! score, which previously made sense, now does not.

How was this achieved?

For both osu! and osu!taiko, since the asymptotic bound for both scoring algorithms is the same, classic score is now essentially a linear rescale of standardised score.

The scoring test scenes that I recently added also aim to illustrate that this matches better. See screenshots below:

ruleset	before	after
osu!
osu!taiko
osu!catch
osu!mania

When reading the diff, note that catch is the odd one out here. Going fully linear with catch actually presents in a worse estimate of the rate of ascent; therefore, the normalised standardised score for catch is still a quadratic factor.

What will not match closer? Why?

Scores cannot be reordered, and the corollaries

Since we've already established that we're not permitting scores to be reordered between classic and standardised scoring, this limits the possibilities for implementation of classic quite considerably. In general, for reorders to be impossible, classic must be a strictly monotonic function of standardised (in math terms, this means that if standardised score increases, classic score must also increase).

This has the following side-effects:

Classic score is still not monotonic with respect to misses (aside from catch)

If you miss with Score V1 on stable, you do not lose score; if you miss with Score V2, you do lose score, since the accuracy portion goes down. (The edge case is catch, wherein Score V2 is actually monotonic. I'm discarding it from further deliberations in this subsection.)

Since classic score has to be a monotonic function of standardised, and on standardised you lose score on missing, you still lose score on missing in classic, too.

Accounting for bonus score is best-effort

We generally cannot subtract bonus from the user's score, apply rescaling, and then add the bonus back again, as this action may reorder scores. This is why - despite actually ruining fits - the 100k factor is added like it is, to attempt to compensate for bonus.

osu!catch mod multipliers are essentially squared in classic scoring mode

Since classic scoring mode receives the standardised score post-application of mod multipliers to rescale (as it must, in order to avoid reordering scores), this means that in catch, if the multiplier is not 1, then its effect is actually squared.

This can be fixed, but at the cost of sacrificing catch's rate of ascent (it would have to be linear rather than quadratic).

Classic score still has no concept of a "score multiplier"

For osu!, taiko, and catch, score V1 is also a byproduct of a map-dependent "score multiplier", which depends on: drain rate, overall difficulty, circle size, object count and drain length.This does not exist for classic scoring yet. It potentially could, with generally minor difficulty - because it's a map-dependent value, it actually is a separate concern from all of the score reordering issues, and we could bring it back if we think the complexity is worth it. I generally didn't want to do that yet, though.

Custom rulesets

As a footnote, when touching this code and noticing how score implementations diverge, I decided that making classic score do anything for custom rulesets doesn't really make much sense, so I did what I did for mania and just made classic score do nothing for other rulesets. This is up for further discussion, although it has to be said, that with how divergent the score conversion is for particular rulesets, setting forth any algorithm at this point would be as arbitrary as just having classic score not work. Maybe that's something for ruleset API in the future.

@WitherFlower this is one for you to look over - interested to see if you've got any feedback on this since I considerably altered your original proposals.

WitherFlower · 2023-09-25T19:50:17Z

I don't have much to say regarding the changes to the scaling values, as it's what the optimization spit out and the method you used is essentially the same as I did but on the largest set of maps possible.

The main criticism I have with this is not using a linear scaling for catch. I think it's really unnatural to have mods give you more score than they say they should, and I don't think rate of growth matters enough to lose the linearity.

Aside from that, really great work, I wasn't prepared for that amount of detail on the notebook 😅

bdach · 2023-09-25T19:57:33Z

The main criticism I have with this is not using a linear scaling for catch. I think it's really unnatural to have mods give you more score than they say they should, and I don't think rate of growth matters enough to lose the linearity.

I'm still willing to budge on this, for what it's worth. I expect the review process to be mostly concerned with discussing the tradeoffs here and which ones should be taken over others.

Just needed to PR this for the discussion to commence properly is all.

peppy · 2023-09-26T04:38:41Z

Are we able to see a graphed example of mod multipliers applied in osu!catch?

bdach · 2023-09-26T04:57:25Z

What sort of graph would you be expecting to see, exactly? Do you mean the same one as the OP has but with mods, or something else entirely?

peppy · 2023-09-26T05:02:36Z

What sort of graph would you be expecting to see, exactly? Do you mean the same one as the OP has but with mods, or something else entirely?

yep, basically. can just be a once-off, to help explain how much of a change this looks like visually.

bdach · 2023-09-26T10:38:12Z

yep, basically. can just be a once-off, to help explain how much of a change this looks like visually.

Ok, so to answer your request...

The plots below show this branch as-is with various mod configurations. There are two variants for each plot: one plots the score as relative to the max (so the Y axis is relative), and the other plots the score as an absolute value. Therefore, the first plot shows the trend of score, while the second shows the actual magnitude of score.

Note
Everything below was plotted assuming map-dependent "score multiplier" of 4. The multiplier value was chosen to get score V1 as close to classic as possible in nomod. This begins to change wildly when score multiplier is wiggled.

mods	relative	absolute
none
DT
HT
NF

The notable conclusions here are:

The shape of the curve, or the rate of ascent, is generally preserved; this is because in both stable score V1 and in classic score, the multiplier is more or less (but not exactly) a flat multiplier onto score. So you could multiply either by anything and the general shape of the curve would remain the same.
The thing that varies wildly here is the magnitude. This is caused by the aforementioned multiplier squaring effect, combined with the fact that stable's mod multipliers generally do not always match lazer's. Refer to this table:

mod	stable score V1 multiplier	lazer standardised multiplier	lazer standardised multiplier, squared
No Fail	0.5x	0.5x	0.25x
Easy	0.5x	0.5x	0.25x
Half Time / Daycore	0.3x	0.7x¹	0.49x¹
Hidden	1.06x	1.06x	1.12x
Hard Rock	1.12x	1.12x	1.25x
Double Time / Nightcore	1.06x	1.10x²	1.21x²
Flashlight	1.12x	1.12x	1.25x

This branch as is currently uses the last column as the effective multiplier in classic scoring mode. Generally this is worse for most mods, with the sole exception of Half Time / Daycore, which gets a measly 0.3x on stable, so squaring 0.7x to 0.49x actually provides a closer match.

What @WitherFlower is proposing, is to apply something like the following instead:

diff --git a/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs b/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
index ba56e32268..af6f7a80f4 100644
--- a/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
+++ b/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
@@ -52,7 +52,7 @@ private static long convertStandardisedToClassic(int rulesetId, long standardise
                     return (long)Math.Round((objectCount * 1109 + 100000) * standardisedTotalScore / ScoreProcessor.MAX_SCORE);
 
                 case 2:
-                    return (long)Math.Round(Math.Pow(standardisedTotalScore / ScoreProcessor.MAX_SCORE * objectCount, 2) * 21.62 + standardisedTotalScore / 10d);
+                    return (long)Math.Round((objectCount * objectCount * 21.62 + 1000000) * standardisedTotalScore / ScoreProcessor.MAX_SCORE);
 
                 case 3:
                 default:

which looks like so:

mods	relative	absolute
none
DT
HT
NF

Notable takeaways from that second set of graphs:

Shape of curve changes to be more linear - rate of score growth no longer matches as closely.
However, otherwise absolute values are closer to stable, wherever the mod multipliers do not diverge between stable's and lazer's.

With default rate of 0.75x. ↩ ↩²
With default rate of 1.5x. ↩ ↩²

peppy · 2023-09-27T08:13:04Z

I think I prefer what you have, to be honest.

peppy · 2023-09-27T08:22:10Z

@smoogipoo did you want to go through this one at all?

WitherFlower · 2023-09-27T08:48:38Z

Fwiw, I also want to mention that the non-linear transformation will break the order in team versus. See #17824

bdach · 2023-09-27T08:51:36Z

Good point, also in playlists I guess...... forgot about this one.

Probably gonna have to go back to the linear version, then?

WitherFlower · 2023-09-27T08:55:24Z

Playlists are different, as the source of breakage there is in the different max scores. I think the solution on that front would be a scoring mode switch at playlist level.

peppy · 2023-09-27T08:59:42Z

I feel like playlists with classic score toggle are broken in... more advanced ways:

osu.2023-09-27.at.08.58.25.mp4

I think I've discussed this before with @smoogipoo (in addition to multiplayer team vs totals, another case of aggregating scores) and are not sure we want to allow the toggle to work here. aka. always using standardised scoring in certain places.

bdach · 2023-09-27T09:01:29Z

Yes, the point was moreso that squaring scores doesn't work anywhere where you sum them afterwards. Playlists is a case of that if you have multiple maps.

To spell it out (in a very simplified manner), the reason for this is that generally the following equation:

$$ \sum \texttt{score}_i^2 \neq \left( \sum \texttt{score}_i \right)^2 $$

does not hold. The function on the right side of the equation is monotonic and will not reorder sums, but the function on the left is not monotonic and can reorder sums.

Whether playlists should be forced to standardised is an orthogonal matter. It's a valid concern anyhow.

WitherFlower · 2023-09-27T09:02:10Z

Right, maybe a classic score "win condition" could be added to playlists later down the line then.

peppy · 2023-09-27T09:04:20Z

I'd say we go ahead with this, as it's a visual only change. We can adjust in the future without much consequence.

bdach added 2 commits September 25, 2023 19:16

Update classic scoring algorithm to closer match stable score V1

57c00e7

Update osu! scoring tests to match updated formulae

72c61c3

bdach added the area:scoring label Sep 25, 2023

bdach self-assigned this Sep 25, 2023

pull-request-size bot added the size/M label Sep 25, 2023

bdach mentioned this pull request Sep 26, 2023

Include mod multipliers in scoring test scenes #24933

Merged

1 task

peppy self-requested a review September 27, 2023 08:13

peppy approved these changes Sep 27, 2023

View reviewed changes

bdach mentioned this pull request Sep 28, 2023

Use classic score in total score processor ppy/osu-queue-score-statistics#157

Merged

4 tasks

smoogipoo approved these changes Sep 28, 2023

View reviewed changes

smoogipoo merged commit db9113b into ppy:master Sep 28, 2023
15 of 17 checks passed

bdach deleted the update-classic-scoring branch September 28, 2023 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update classic scoring formula to closer match stable score V1 #24924

Update classic scoring formula to closer match stable score V1 #24924

bdach commented Sep 25, 2023 •

edited

WitherFlower commented Sep 25, 2023

bdach commented Sep 25, 2023

peppy commented Sep 26, 2023

bdach commented Sep 26, 2023

peppy commented Sep 26, 2023

bdach commented Sep 26, 2023 •

edited

peppy commented Sep 27, 2023

peppy commented Sep 27, 2023

WitherFlower commented Sep 27, 2023 •

edited

bdach commented Sep 27, 2023

WitherFlower commented Sep 27, 2023

peppy commented Sep 27, 2023

bdach commented Sep 27, 2023 •

edited

WitherFlower commented Sep 27, 2023 •

edited

peppy commented Sep 27, 2023

Update classic scoring formula to closer match stable score V1 #24924

Update classic scoring formula to closer match stable score V1 #24924

Conversation

bdach commented Sep 25, 2023 • edited

What is supposed to match closer?

osu!mania

Bonus is (slightly) less susceptible to being worthless at the start of a map

How was this achieved?

Magnitude of classic total score on perfect and near-perfect plays

How was this achieved?

The general rate of ascent of classic score

How was this achieved?

What will not match closer? Why?

Scores cannot be reordered, and the corollaries

Classic score is still not monotonic with respect to misses (aside from catch)

Accounting for bonus score is best-effort

osu!catch mod multipliers are essentially squared in classic scoring mode

Classic score still has no concept of a "score multiplier"

Custom rulesets

WitherFlower commented Sep 25, 2023

bdach commented Sep 25, 2023

peppy commented Sep 26, 2023

bdach commented Sep 26, 2023

peppy commented Sep 26, 2023

bdach commented Sep 26, 2023 • edited

Footnotes

peppy commented Sep 27, 2023

peppy commented Sep 27, 2023

WitherFlower commented Sep 27, 2023 • edited

bdach commented Sep 27, 2023

WitherFlower commented Sep 27, 2023

peppy commented Sep 27, 2023

bdach commented Sep 27, 2023 • edited

WitherFlower commented Sep 27, 2023 • edited

peppy commented Sep 27, 2023

bdach commented Sep 25, 2023 •

edited

bdach commented Sep 26, 2023 •

edited

WitherFlower commented Sep 27, 2023 •

edited

bdach commented Sep 27, 2023 •

edited

WitherFlower commented Sep 27, 2023 •

edited