Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistical accuracy PP and difficulty scaling for the osu!taiko ruleset #20963

Merged
merged 42 commits into from
Oct 7, 2024

Conversation

Natelytle
Copy link
Contributor

@Natelytle Natelytle commented Oct 27, 2022

Huge thanks to Frost for the majority of the math behind this rework, and to LTCA for helping me balance it.

Estimates UR from the play, and scales accuracy with it

Changes

  • Adds deviation estimation to the osu!taiko ruleset
  • Replaces the old accuracy formulas with new ones fit for the estimated UR values
  • Also includes some balancing changes LTCA decided would be best

Reasoning

  • The old accuracy PP formula did not scale well with higher overall difficulties, punishing lower accuracy more than it rewarded higher OD.
  • SR scaling was not affected by OD at all.
  • Unstable rate is an easier metric to work with, as there is a true "perfect" value.

Estimation Theory

In order to estimate UR, we assume all hits are normally distributed, with a mean of ±0 and a deviation σ. This gives us the probability that with a certain σ, any given hit gives a certain hit result (300, 100, miss). We can compare these percentages to the true percentages of any given judgement in a play, and return whichever σ value is the closest match.
image
Further documentation can be found in a google doc here.

This estimation requires MathNet.Numerics, a package for advanced mathematical formulas not present in C#.


SR/PP sheets:

Converts:
No converts:
Converts, ranked-only:
No converts, ranked-only: https://docs.google.com/spreadsheets/d/1M4FppnFUvf5YRsPRmNwC81XnQ4qKOPVmAwkWeBCq-F8/edit

As of 3a609c9

@Lawtrohux
Copy link
Member

@smoogipoo can we get two smoogisheet/s for this and #20558 combined with converts enabled/disabled?

@bdach bdach requested a review from a team October 27, 2022 19:07
Copy link
Member

@Lawtrohux Lawtrohux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, really like the direction of the PR and the initial results I saw while testing and balancing.

Copy link
Contributor

@vunyunt vunyunt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the question I had about buffing hard rock specifically, everything else looks good here.

One thing to note relating to the WIP rhythm rework is that it may take hit windows into consideration in a way that's not just a multiplier, we might be able to borrow some ideas from here. But we also want to be careful as to not twice-consider the effect of hit windows, as to avoid making high OD (high rhythm) map overrated.

@@ -84,35 +86,52 @@ private double computeDifficultyValue(ScoreInfo score, TaikoDifficultyAttributes
difficultyValue *= 1.025;

if (score.Mods.Any(m => m is ModHardRock))
difficultyValue *= 1.050;
difficultyValue *= 1.10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the changes here already account for hit windows properly, and does not concern SV, why does hard rock specifically need to be buffed here?

Copy link
Member

@Lawtrohux Lawtrohux Oct 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in balancing, as mid-range accuracy with HR was pretty underweighted (4x 100 on the limit does not exist HR was worth 20pp less than a HD SS). Though @Natelytle could the model be fit for greater accuracy leniency on super high OD's?

Copy link
Contributor Author

@Natelytle Natelytle Oct 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can without increasing complexity and decreasing estimation accuracy, I think a HR multiplier buff is a better direction if HR in particular is underweight

@Natelytle
Copy link
Contributor Author

Natelytle commented Oct 30, 2022

Low end was gaining too much (doubling in some cases) and LTCA said it would be good if I could buff high end a bit so I made the SR multiplier harsher at the low end and a bit more of a buff to the high end (feedback on that would be appreciated once smoogisheet is out)

@vunyunt
Copy link
Contributor

vunyunt commented Nov 14, 2022

@smoogipoo Requesting a smoogisheet for review, thanks in advance!

@smoogipoo
Copy link
Contributor

Will need conflict resolution on this PR, since the HDFL multiplier changed in both this and the previous PR.

@smoogipoo
Copy link
Contributor

Will generate sheet with 1.1x HDFL multiplier until a decision has been made/this branchs' merge conflict is resolved.

@Lawtrohux
Copy link
Member

Lawtrohux commented Nov 14, 2022

Either will be fine, however if possible this PR's would be ideal to offset the notion of OD being better represented. I also believe that 1.05x is a better either way.

Copy link
Member

@Lawtrohux Lawtrohux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now. While there are some values that are higher than ideal, this is intrinsically due to other parts of both the SR and PP calculations, not to statistical accuracy. Having this as the 'clean branch' will be ideal.

@Natelytle
Copy link
Contributor Author

This should be close to merge ready now, just one more sheet maybe. No external libraries are required anymore and switching to a confidence interval based system solved the issue of some values being too high for the devs' liking.

@smoogipoo
Copy link
Contributor

!diffcalc
RULESET=taiko

Copy link

github-actions bot commented Mar 11, 2024

Copy link
Member

@Lawtrohux Lawtrohux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing profile based values, and the smoogisheet, this LGTM. While there are still problem maps, that is a reflection of the colour system rather than statistical accuracy.

I'm unsure of utisiling utils as a split off from mathNET. However its code is just localising so I don't see an issue.

@stanriders
Copy link
Member

@smoogipoo this PR is considered approved by taiko pp committee and is ready for merge

@bdach
Copy link
Collaborator

bdach commented May 29, 2024

Deployment considerations:

  • 1 added difficulty attribute, which comes out to
    • 13 bytes per row
    • taiko has 4 DifficultyAdjustmentMods which is $2^4 = 16$ possible combinations
    • over ~26000 taiko-specific beatmaps + ~111700 converts
    • totalling ~28 MB of extra storage (not significant)
    • plus the need to do a full run of taiko diffcalc server-side

Other than that, I see no further roadblocks. Would like to see the review above addressed though (especially regarding the possible crash).

@bdach bdach added the next release Pull requests which are almost there. We'll aim to get them in the next release, but no guarantees! label Oct 7, 2024
@bdach
Copy link
Collaborator

bdach commented Oct 7, 2024

@Natelytle please check merge conflict resolution when able (especially the difficulty attribute numbering - no idea why that attribute was assigned 29 in the first place).

@bdach
Copy link
Collaborator

bdach commented Oct 7, 2024

!diffcalc
RULESET=taiko

@bdach bdach merged commit 6608d05 into ppy:master Oct 7, 2024
10 of 13 checks passed
Copy link

github-actions bot commented Oct 8, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:difficulty next release Pull requests which are almost there. We'll aim to get them in the next release, but no guarantees! ruleset/osu!taiko size/XL
Projects
Status: Deployed
Development

Successfully merging this pull request may close these issues.

8 participants