Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update classic scoring formula to closer match stable score V1 #24924

Merged
merged 2 commits into from Sep 28, 2023

Conversation

bdach
Copy link
Collaborator

@bdach bdach commented Sep 25, 2023

The goal of this PR is to adjust the classic scoring algorithm to match score V1 closer in several (but not all) aspects. I will attempt to summarise somewhat briefly what this means below, but this will be a long opening post nevertheless, for the code changes here are 1% of the work that was put in to arrive at these changes.

First, let me explain that this PR's intention is not to port stable score V1 across literally. That is infeasible due to several constraints placed upon the scoring algorithms; some are negotiable, some are not. This PR only improves some aspects of classic score. I will attempt to explain this exhaustively but briefly in the sections below.

What is supposed to match closer?

osu!mania

mania is a bit of a black sheep in this diff, so let's get it out of the way first.

stable score V1 for mania is basically linear, and capped to 1 million anyways. Thus, the concept of "classic" score doesn't really make much sense; VSRG people are generally used to the 1M cap, and having score blow up like it did doesn't seem to benefit anybody. So this diff makes the "classic" setting do absolutely nothing for mania - as in, mania classic and standardised score are the same value. Period.

Now, standardised score is not actually the same as stable score V1, but bringing that across is generally not feasible due to other constraints. But it will do well enough - the rate of change and magnitude both match, and that's all I care about for the purposes of this diff.

If you only care about mania, you don't need to read the rest of this. Everything else is for the other rulesets.

Bonus is (slightly) less susceptible to being worthless at the start of a map

I'll be brief; the root issue at hand is mostly described already in #17011 (comment). What this PR does about it, is that it ensures that if standardised score rises by 10 points, classic score will rise by at least 1 point.

How was this achieved?

When fitting the new formulae to total score values on existing beatmaps, I forced a constant 100k score component on the formulae; multiplying the normalised standardised score ($\frac{\texttt{standardised}}{1000000}$) by 100k makes it so that every 10 standardised points correspond to at least 1 classic point.

This could be possibly considered a little ridiculous on short maps, as I'm sure you'll be able to tell by the test case changes, but I guess if it's any consolation, then it's not really that much worse than what standardised score does on short maps.

Magnitude of classic total score on perfect and near-perfect plays

The general ballpark of how high classic score for perfect plays should now be closer to stable. This will wildly vary per map for reasons I am yet to explain here, but the hope is that generally the fit will improve.

How was this achieved?

Using lazer's score V1 simulators, intended to be used for conversion to standardised score, I simulated the maximum base score (without bonus) for every beatmap from the data.ppy.sh data dumps for osu!, taiko, and catch (including converts). Using this data, I then fit an at-most-quadratic factor dependent on total basic object count for each ruleset, with the 100k constant factor described previously forced; the optimisation criterion was the minimisation of relative error when compared to stable.

The code used to generate the data to fit against, the generated data, and the fitting process have been described in this gist.

Note that the data extraction process already included changes from #24779 - although since I was fitting to maximum base score without bonus, changes therein should not generally have any meaningful impact, I don't think.

For reproduction, if required:

  1. Apply this diff onto Split legacy scoring attribs into its own table #24779
  2. Run this program against beatmap dumps from data.ppy.sh
  3. You should get a CSV like so
  4. This jupyter notebook describes the rest.

The general rate of ascent of classic score

The rates of ascent of score V1 and V2 on a perfect score on stable are - as far as I can tell - as follows:

ruleset score V1 score V2
osu! $O(n^2)$ $O(n^2)$
osu!taiko $O(n)$ $O(n \log n)$ (until 400 combo) / $O(n)$ (over 400 combo)
osu!catch $O(n^2)$ $O(n \log n)$ (until 200 combo) / $O(n)$ (over 200 combo)

The table above uses the big O notation for representing asymptotic boundedness of the scoring functions. $n$ is the number of objects in each case.

The previous rescale used was still assuming we were using old standardised (which was linear, $O(n)$). This no longer holds, which means that squaring osu! score, which previously made sense, now does not.

How was this achieved?

For both osu! and osu!taiko, since the asymptotic bound for both scoring algorithms is the same, classic score is now essentially a linear rescale of standardised score.

The scoring test scenes that I recently added also aim to illustrate that this matches better. See screenshots below:

ruleset before after
osu! before-osu after-osu
osu!taiko before-taiko after-taiko
osu!catch before-catch after-catch
osu!mania before-mania after-mania

When reading the diff, note that catch is the odd one out here. Going fully linear with catch actually presents in a worse estimate of the rate of ascent; therefore, the normalised standardised score for catch is still a quadratic factor.

What will not match closer? Why?

Scores cannot be reordered, and the corollaries

Since we've already established that we're not permitting scores to be reordered between classic and standardised scoring, this limits the possibilities for implementation of classic quite considerably. In general, for reorders to be impossible, classic must be a strictly monotonic function of standardised (in math terms, this means that if standardised score increases, classic score must also increase).

This has the following side-effects:

Classic score is still not monotonic with respect to misses (aside from catch)

If you miss with Score V1 on stable, you do not lose score; if you miss with Score V2, you do lose score, since the accuracy portion goes down. (The edge case is catch, wherein Score V2 is actually monotonic. I'm discarding it from further deliberations in this subsection.)

Since classic score has to be a monotonic function of standardised, and on standardised you lose score on missing, you still lose score on missing in classic, too.

Accounting for bonus score is best-effort

We generally cannot subtract bonus from the user's score, apply rescaling, and then add the bonus back again, as this action may reorder scores. This is why - despite actually ruining fits - the 100k factor is added like it is, to attempt to compensate for bonus.

osu!catch mod multipliers are essentially squared in classic scoring mode

Since classic scoring mode receives the standardised score post-application of mod multipliers to rescale (as it must, in order to avoid reordering scores), this means that in catch, if the multiplier is not 1, then its effect is actually squared.

This can be fixed, but at the cost of sacrificing catch's rate of ascent (it would have to be linear rather than quadratic).

Classic score still has no concept of a "score multiplier"

For osu!, taiko, and catch, score V1 is also a byproduct of a map-dependent "score multiplier", which depends on: drain rate, overall difficulty, circle size, object count and drain length.This does not exist for classic scoring yet. It potentially could, with generally minor difficulty - because it's a map-dependent value, it actually is a separate concern from all of the score reordering issues, and we could bring it back if we think the complexity is worth it. I generally didn't want to do that yet, though.

Custom rulesets

As a footnote, when touching this code and noticing how score implementations diverge, I decided that making classic score do anything for custom rulesets doesn't really make much sense, so I did what I did for mania and just made classic score do nothing for other rulesets. This is up for further discussion, although it has to be said, that with how divergent the score conversion is for particular rulesets, setting forth any algorithm at this point would be as arbitrary as just having classic score not work. Maybe that's something for ruleset API in the future.


@WitherFlower this is one for you to look over - interested to see if you've got any feedback on this since I considerably altered your original proposals.

@bdach bdach self-assigned this Sep 25, 2023
@WitherFlower
Copy link

I don't have much to say regarding the changes to the scaling values, as it's what the optimization spit out and the method you used is essentially the same as I did but on the largest set of maps possible.

The main criticism I have with this is not using a linear scaling for catch. I think it's really unnatural to have mods give you more score than they say they should, and I don't think rate of growth matters enough to lose the linearity.

Aside from that, really great work, I wasn't prepared for that amount of detail on the notebook 😅

@bdach
Copy link
Collaborator Author

bdach commented Sep 25, 2023

The main criticism I have with this is not using a linear scaling for catch. I think it's really unnatural to have mods give you more score than they say they should, and I don't think rate of growth matters enough to lose the linearity.

I'm still willing to budge on this, for what it's worth. I expect the review process to be mostly concerned with discussing the tradeoffs here and which ones should be taken over others.

Just needed to PR this for the discussion to commence properly is all.

@peppy
Copy link
Sponsor Member

peppy commented Sep 26, 2023

Are we able to see a graphed example of mod multipliers applied in osu!catch?

@bdach
Copy link
Collaborator Author

bdach commented Sep 26, 2023

What sort of graph would you be expecting to see, exactly? Do you mean the same one as the OP has but with mods, or something else entirely?

@peppy
Copy link
Sponsor Member

peppy commented Sep 26, 2023

What sort of graph would you be expecting to see, exactly? Do you mean the same one as the OP has but with mods, or something else entirely?

yep, basically. can just be a once-off, to help explain how much of a change this looks like visually.

@bdach
Copy link
Collaborator Author

bdach commented Sep 26, 2023

yep, basically. can just be a once-off, to help explain how much of a change this looks like visually.

Ok, so to answer your request...

The plots below show this branch as-is with various mod configurations. There are two variants for each plot: one plots the score as relative to the max (so the Y axis is relative), and the other plots the score as an absolute value. Therefore, the first plot shows the trend of score, while the second shows the actual magnitude of score.

Note
Everything below was plotted assuming map-dependent "score multiplier" of 4. The multiplier value was chosen to get score V1 as close to classic as possible in nomod. This begins to change wildly when score multiplier is wiggled.

mods relative absolute
none before_baseline_relative before_baseline_absolute
DT before_dt_relative before_dt_absolute
HT before_ht_relative before_ht_absolute
NF before_nf_relative before_nf_absolute

The notable conclusions here are:

  • The shape of the curve, or the rate of ascent, is generally preserved; this is because in both stable score V1 and in classic score, the multiplier is more or less (but not exactly) a flat multiplier onto score. So you could multiply either by anything and the general shape of the curve would remain the same.
  • The thing that varies wildly here is the magnitude. This is caused by the aforementioned multiplier squaring effect, combined with the fact that stable's mod multipliers generally do not always match lazer's. Refer to this table:
mod stable score V1 multiplier lazer standardised multiplier lazer standardised multiplier, squared
No Fail 0.5x 0.5x 0.25x
Easy 0.5x 0.5x 0.25x
Half Time / Daycore 0.3x 0.7x1 0.49x1
Hidden 1.06x 1.06x 1.12x
Hard Rock 1.12x 1.12x 1.25x
Double Time / Nightcore 1.06x 1.10x2 1.21x2
Flashlight 1.12x 1.12x 1.25x

This branch as is currently uses the last column as the effective multiplier in classic scoring mode. Generally this is worse for most mods, with the sole exception of Half Time / Daycore, which gets a measly 0.3x on stable, so squaring 0.7x to 0.49x actually provides a closer match.

What @WitherFlower is proposing, is to apply something like the following instead:

diff --git a/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs b/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
index ba56e32268..af6f7a80f4 100644
--- a/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
+++ b/osu.Game/Scoring/Legacy/ScoreInfoExtensions.cs
@@ -52,7 +52,7 @@ private static long convertStandardisedToClassic(int rulesetId, long standardise
                     return (long)Math.Round((objectCount * 1109 + 100000) * standardisedTotalScore / ScoreProcessor.MAX_SCORE);
 
                 case 2:
-                    return (long)Math.Round(Math.Pow(standardisedTotalScore / ScoreProcessor.MAX_SCORE * objectCount, 2) * 21.62 + standardisedTotalScore / 10d);
+                    return (long)Math.Round((objectCount * objectCount * 21.62 + 1000000) * standardisedTotalScore / ScoreProcessor.MAX_SCORE);
 
                 case 3:
                 default:

which looks like so:

mods relative absolute
none after_baseline_absolute after_baseline_relative
DT after_dt_relative after_dt_absolute
HT after_ht_relative after_ht_absolute
NF after_nf_relative after_nf_absolute

Notable takeaways from that second set of graphs:

  • Shape of curve changes to be more linear - rate of score growth no longer matches as closely.
  • However, otherwise absolute values are closer to stable, wherever the mod multipliers do not diverge between stable's and lazer's.

Footnotes

  1. With default rate of 0.75x. 2

  2. With default rate of 1.5x. 2

@peppy
Copy link
Sponsor Member

peppy commented Sep 27, 2023

I think I prefer what you have, to be honest.

@peppy peppy self-requested a review September 27, 2023 08:13
@peppy
Copy link
Sponsor Member

peppy commented Sep 27, 2023

@smoogipoo did you want to go through this one at all?

@WitherFlower
Copy link

WitherFlower commented Sep 27, 2023

Fwiw, I also want to mention that the non-linear transformation will break the order in team versus. See #17824

@bdach
Copy link
Collaborator Author

bdach commented Sep 27, 2023

Good point, also in playlists I guess...... forgot about this one.

Probably gonna have to go back to the linear version, then?

@WitherFlower
Copy link

Playlists are different, as the source of breakage there is in the different max scores. I think the solution on that front would be a scoring mode switch at playlist level.

@peppy
Copy link
Sponsor Member

peppy commented Sep 27, 2023

I feel like playlists with classic score toggle are broken in... more advanced ways:

osu.2023-09-27.at.08.58.25.mp4

I think I've discussed this before with @smoogipoo (in addition to multiplayer team vs totals, another case of aggregating scores) and are not sure we want to allow the toggle to work here. aka. always using standardised scoring in certain places.

@bdach
Copy link
Collaborator Author

bdach commented Sep 27, 2023

Yes, the point was moreso that squaring scores doesn't work anywhere where you sum them afterwards. Playlists is a case of that if you have multiple maps.

To spell it out (in a very simplified manner), the reason for this is that generally the following equation:

$$ \sum \texttt{score}_i^2 \neq \left( \sum \texttt{score}_i \right)^2 $$

does not hold. The function on the right side of the equation is monotonic and will not reorder sums, but the function on the left is not monotonic and can reorder sums.

Whether playlists should be forced to standardised is an orthogonal matter. It's a valid concern anyhow.

@WitherFlower
Copy link

WitherFlower commented Sep 27, 2023

Right, maybe a classic score "win condition" could be added to playlists later down the line then.

@peppy
Copy link
Sponsor Member

peppy commented Sep 27, 2023

I'd say we go ahead with this, as it's a visual only change. We can adjust in the future without much consequence.

@smoogipoo smoogipoo merged commit db9113b into ppy:master Sep 28, 2023
15 of 17 checks passed
@bdach bdach deleted the update-classic-scoring branch September 28, 2023 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
4 participants