New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update classic scoring formula to closer match stable score V1 #24924
Conversation
I don't have much to say regarding the changes to the scaling values, as it's what the optimization spit out and the method you used is essentially the same as I did but on the largest set of maps possible. The main criticism I have with this is not using a linear scaling for catch. I think it's really unnatural to have mods give you more score than they say they should, and I don't think rate of growth matters enough to lose the linearity. Aside from that, really great work, I wasn't prepared for that amount of detail on the notebook 😅 |
I'm still willing to budge on this, for what it's worth. I expect the review process to be mostly concerned with discussing the tradeoffs here and which ones should be taken over others. Just needed to PR this for the discussion to commence properly is all. |
Are we able to see a graphed example of mod multipliers applied in osu!catch? |
What sort of graph would you be expecting to see, exactly? Do you mean the same one as the OP has but with mods, or something else entirely? |
yep, basically. can just be a once-off, to help explain how much of a change this looks like visually. |
Ok, so to answer your request... The plots below show this branch as-is with various mod configurations. There are two variants for each plot: one plots the score as relative to the max (so the Y axis is relative), and the other plots the score as an absolute value. Therefore, the first plot shows the trend of score, while the second shows the actual magnitude of score.
The notable conclusions here are:
This branch as is currently uses the last column as the effective multiplier in classic scoring mode. Generally this is worse for most mods, with the sole exception of Half Time / Daycore, which gets a measly 0.3x on stable, so squaring 0.7x to 0.49x actually provides a closer match. What @WitherFlower is proposing, is to apply something like the following instead: diff --git a/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs b/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
index ba56e32268..af6f7a80f4 100644
--- a/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
+++ b/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
@@ -52,7 +52,7 @@ private static long convertStandardisedToClassic(int rulesetId, long standardise
return (long)Math.Round((objectCount * 1109 + 100000) * standardisedTotalScore / ScoreProcessor.MAX_SCORE);
case 2:
- return (long)Math.Round(Math.Pow(standardisedTotalScore / ScoreProcessor.MAX_SCORE * objectCount, 2) * 21.62 + standardisedTotalScore / 10d);
+ return (long)Math.Round((objectCount * objectCount * 21.62 + 1000000) * standardisedTotalScore / ScoreProcessor.MAX_SCORE);
case 3:
default:
which looks like so:
Notable takeaways from that second set of graphs:
Footnotes |
I think I prefer what you have, to be honest. |
@smoogipoo did you want to go through this one at all? |
Fwiw, I also want to mention that the non-linear transformation will break the order in team versus. See #17824 |
Good point, also in playlists I guess...... forgot about this one. Probably gonna have to go back to the linear version, then? |
Playlists are different, as the source of breakage there is in the different max scores. I think the solution on that front would be a scoring mode switch at playlist level. |
I feel like playlists with classic score toggle are broken in... more advanced ways: osu.2023-09-27.at.08.58.25.mp4I think I've discussed this before with @smoogipoo (in addition to multiplayer team vs totals, another case of aggregating scores) and are not sure we want to allow the toggle to work here. aka. always using standardised scoring in certain places. |
Yes, the point was moreso that squaring scores doesn't work anywhere where you sum them afterwards. Playlists is a case of that if you have multiple maps. To spell it out (in a very simplified manner), the reason for this is that generally the following equation: does not hold. The function on the right side of the equation is monotonic and will not reorder sums, but the function on the left is not monotonic and can reorder sums. Whether playlists should be forced to standardised is an orthogonal matter. It's a valid concern anyhow. |
Right, maybe a classic score "win condition" could be added to playlists later down the line then. |
I'd say we go ahead with this, as it's a visual only change. We can adjust in the future without much consequence. |
The goal of this PR is to adjust the classic scoring algorithm to match score V1 closer in several (but not all) aspects. I will attempt to summarise somewhat briefly what this means below, but this will be a long opening post nevertheless, for the code changes here are 1% of the work that was put in to arrive at these changes.
First, let me explain that this PR's intention is not to port stable score V1 across literally. That is infeasible due to several constraints placed upon the scoring algorithms; some are negotiable, some are not. This PR only improves some aspects of classic score. I will attempt to explain this exhaustively but briefly in the sections below.
What is supposed to match closer?
osu!mania
mania is a bit of a black sheep in this diff, so let's get it out of the way first.
stable score V1 for mania is basically linear, and capped to 1 million anyways. Thus, the concept of "classic" score doesn't really make much sense; VSRG people are generally used to the 1M cap, and having score blow up like it did doesn't seem to benefit anybody. So this diff makes the "classic" setting do absolutely nothing for mania - as in, mania classic and standardised score are the same value. Period.
Now, standardised score is not actually the same as stable score V1, but bringing that across is generally not feasible due to other constraints. But it will do well enough - the rate of change and magnitude both match, and that's all I care about for the purposes of this diff.
If you only care about mania, you don't need to read the rest of this. Everything else is for the other rulesets.
Bonus is (slightly) less susceptible to being worthless at the start of a map
I'll be brief; the root issue at hand is mostly described already in #17011 (comment). What this PR does about it, is that it ensures that if standardised score rises by 10 points, classic score will rise by at least 1 point.
How was this achieved?
When fitting the new formulae to total score values on existing beatmaps, I forced a constant 100k score component on the formulae; multiplying the normalised standardised score ($\frac{\texttt{standardised}}{1000000}$ ) by 100k makes it so that every 10 standardised points correspond to at least 1 classic point.
This could be possibly considered a little ridiculous on short maps, as I'm sure you'll be able to tell by the test case changes, but I guess if it's any consolation, then it's not really that much worse than what standardised score does on short maps.
Magnitude of classic total score on perfect and near-perfect plays
The general ballpark of how high classic score for perfect plays should now be closer to stable. This will wildly vary per map for reasons I am yet to explain here, but the hope is that generally the fit will improve.
How was this achieved?
Using lazer's score V1 simulators, intended to be used for conversion to standardised score, I simulated the maximum base score (without bonus) for every beatmap from the data.ppy.sh data dumps for osu!, taiko, and catch (including converts). Using this data, I then fit an at-most-quadratic factor dependent on total basic object count for each ruleset, with the 100k constant factor described previously forced; the optimisation criterion was the minimisation of relative error when compared to stable.
The code used to generate the data to fit against, the generated data, and the fitting process have been described in this gist.
Note that the data extraction process already included changes from #24779 - although since I was fitting to maximum base score without bonus, changes therein should not generally have any meaningful impact, I don't think.
For reproduction, if required:
The general rate of ascent of classic score
The rates of ascent of score V1 and V2 on a perfect score on stable are - as far as I can tell - as follows:
The table above uses the big O notation for representing asymptotic boundedness of the scoring functions.$n$ is the number of objects in each case.
The previous rescale used was still assuming we were using old standardised (which was linear, $O(n)$). This no longer holds, which means that squaring osu! score, which previously made sense, now does not.
How was this achieved?
For both osu! and osu!taiko, since the asymptotic bound for both scoring algorithms is the same, classic score is now essentially a linear rescale of standardised score.
The scoring test scenes that I recently added also aim to illustrate that this matches better. See screenshots below:
When reading the diff, note that catch is the odd one out here. Going fully linear with catch actually presents in a worse estimate of the rate of ascent; therefore, the normalised standardised score for catch is still a quadratic factor.
What will not match closer? Why?
Scores cannot be reordered, and the corollaries
Since we've already established that we're not permitting scores to be reordered between classic and standardised scoring, this limits the possibilities for implementation of classic quite considerably. In general, for reorders to be impossible, classic must be a strictly monotonic function of standardised (in math terms, this means that if standardised score increases, classic score must also increase).
This has the following side-effects:
Classic score is still not monotonic with respect to misses (aside from catch)
If you miss with Score V1 on stable, you do not lose score; if you miss with Score V2, you do lose score, since the accuracy portion goes down. (The edge case is catch, wherein Score V2 is actually monotonic. I'm discarding it from further deliberations in this subsection.)
Since classic score has to be a monotonic function of standardised, and on standardised you lose score on missing, you still lose score on missing in classic, too.
Accounting for bonus score is best-effort
We generally cannot subtract bonus from the user's score, apply rescaling, and then add the bonus back again, as this action may reorder scores. This is why - despite actually ruining fits - the 100k factor is added like it is, to attempt to compensate for bonus.
osu!catch mod multipliers are essentially squared in classic scoring mode
Since classic scoring mode receives the standardised score post-application of mod multipliers to rescale (as it must, in order to avoid reordering scores), this means that in catch, if the multiplier is not 1, then its effect is actually squared.
This can be fixed, but at the cost of sacrificing catch's rate of ascent (it would have to be linear rather than quadratic).
Classic score still has no concept of a "score multiplier"
For osu!, taiko, and catch, score V1 is also a byproduct of a map-dependent "score multiplier", which depends on: drain rate, overall difficulty, circle size, object count and drain length.This does not exist for classic scoring yet. It potentially could, with generally minor difficulty - because it's a map-dependent value, it actually is a separate concern from all of the score reordering issues, and we could bring it back if we think the complexity is worth it. I generally didn't want to do that yet, though.
Custom rulesets
As a footnote, when touching this code and noticing how score implementations diverge, I decided that making classic score do anything for custom rulesets doesn't really make much sense, so I did what I did for mania and just made classic score do nothing for other rulesets. This is up for further discussion, although it has to be said, that with how divergent the score conversion is for particular rulesets, setting forth any algorithm at this point would be as arbitrary as just having classic score not work. Maybe that's something for ruleset API in the future.
@WitherFlower this is one for you to look over - interested to see if you've got any feedback on this since I considerably altered your original proposals.