Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight PP Nerf/Buff based on saturation of specific play #91

Open
CCleanerShot opened this issue May 10, 2019 · 7 comments
Open

Slight PP Nerf/Buff based on saturation of specific play #91

CCleanerShot opened this issue May 10, 2019 · 7 comments

Comments

@CCleanerShot
Copy link

Idea:
A dynamic add-on system that attempts to polish the current pp system and can be used to level out any overweighted/underweighted maps during any meta. The system should (hopefully) assume nothing of the current pp system. Will not scale itself to hard nerf current overweighted jump maps because jump maps. Should have no bias of the type of map. If a 800pp play has been replicated multiple times, no matter what meta/pp system, it will be deemed overweighted and slightly hit. I personally call this Performance Points Polish (PP+) but you can call it whatever.

Goals
What It Will Do:
Nerf overweighted maps slightly
Buff underweighted maps very slightly

What It Will Not Do:
Be a dictating force in the pp meta

Prerequisites
I've added the following prerequisties to prevent abuse:
NF and SO mods do not count when detecting to nerf/buff. NFSO mod combinations are counted as nomod.
All mods will be scaled to Nomod nerf, but relative to nomod accuracy/combo similarity present. The scaling can go as low as 25% of the initial nerf.
Example: NFSO SS will not prevent it from being counted as nomod SS when detecting leaderboards
Example: If Honesty nomod would be nerfed 40pp, then Honesty HD maybe nerfed 32pp (80% change). If Save Me would receive a 20pp nerf, DT Save Me currently would scale down to 4pp nerf. Same for EZHDDT score IF the current EZHDDT scores aren't overweighted. (This is just theoretical base numbers, which will be scaled depending on map, which are discussed later here)

Nerfs

  1. If a difficulty becomes oversaturated in plays, then a small pp deduction will be placed on the entire difficulty
    Example: If the map is played alot, it will receive a nerf. Note: Whether the nerf will be 0pp or 10-50pp will be said in the rest of the points

  2. The deduction will be relative to how extreme the oversaturation is, and scale a bit with its raw pp value (so 2* maps dont get completely cucked, and hitting where its actually meant to go).
    Example: 500k plays may result in 2pp flat nerf to 200pp cap map. The same amount of plays to 1000pp will scale to 10pp nerf
    Example: 10k plays may result in 0.1pp flat nerf to 200pp cap map. The same amount of plays to 1000pp will scale to 1pp nerf

  3. The deduction will experience a minor amplifier based on the quality of players. This value will amplifier can be start by checking the higher quality players, then check if lower quality players have achieved the same play.
    Example: Big Black, without this point, may experience a super slight nerf of like 0.01. But because the majority of the players, by some definition combination of tournament placements and rank, as well as previously acquired PP+, who cannot even reach the full pp count of the map, it will be a negative number, a negative number low enough to warrant a buff (which I will discuss later)

  4. The deduction will then be rescaled and recalculated relative to the potential pp output of the difficulty and actual pp output of the difficulty. A.k.a., it's overweightedness
    This is where real nerfs can take place.
    Example: The 10pp nerf may scale to 20pp if the output for overweightedness is near full SS. A majority of maps that have Overweightness 0 will bring the imaginary 10pp nerf to 0pp.

  5. Rules 2 and 3, and 4 will be recalculated on a weekly, monthly, somethingly cycle.

  6. There can be a hidden secondary cycle placed onto maps that functions similarly to Point 5, but at a slower interval.
    Example: Point 5 may work on a biweekly cycle (every 2 weeks). If a map was once played alot, but has recently seen a lesser amount of plays, an admin can put it on the 2nd queue that works on a bimonthly cycle instead <---- is to prevent some consequential overload.

Buffs

  1. If a map has been sighted as oversaturated, but has a received a negative number reaching a certain threshold, it will instead receive a buff.
  2. The value will be halved, and the negative number will be a positive number
    Example: If Big Black were to be a (-) 20pp nerf, it will instead be a (+) 10pp buff.

???
I wrote this idea a long time ago randomly at midnight, so I don't know if it is really legible or not. Even if there are some inconsistencies in what I actually said, I will be firm believer in this idea having potential until heavily disproven. I've also made a picture of my idea with simple clarification.
DONT CRASH ON ME PLZ
I know this idea needs some heavily modification/improvement, but would like some approval on it's potential. It can theoretically bypass the need for a pp system, but I'm 100% against that belief anyways. Like I said earlier, I intend this to be a sort of polish or some sort, and the values should be much more smaller. An overweighted 800pp map shouldn't never even get hit by 80pp, my idea is that the system is small enough to only hit it by 30 max. (I made the #s bigger in examples for better visualization). Also, obviously the math shouldn't be this simple. Linear scaling saturation and quality of players is obviously a bad thing (look big black) and will overnerf/buff many things. Not every map will ideally be affected. Numbers under a small threshhold (e.g. [+ or -] 1pp) will be nullified for simplicity's sake.

@abraker95
Copy link

abraker95 commented May 10, 2019

ppv1 much?

ppv1 was based on how many people got scores on maps. It didn't pan out well. It's affected by map popularity a lot.

Also I don't think the idea of recalculating entire leaderboards every certain time period is realistic.

Finally, pp has to be a measurement in terms of the player's result on the map. That means it is independent of how many players set scores on a map. The number of players that set scores on a map do not make a map any easier or harder than it is.

@Francesco149
Copy link

ppv1 was fun in a way, but it has shown that using map popularity doesn't work well. your idea is to only use it for a slight rebalance and while it could help with holes in the pp system, having constantly changing pp values on any given play seems too ugly to me. and that's a lot of work to process all maps periodically for a small rebalance. seems too much like a band-aid fix.

i think grumd's methodology is best used to spot trends in pp mapping and see how we can address them with changes to the difficulty calculation

@tr3sleches
Copy link

I dunno, sounds good in practice, but it seems like it would require tons of hypothesis testing and stuff to actually be workable. Can you link grumd’s calculation for overweightedness? Because depending on how that is calculated or found, this will be fairly difficult to adequately calculate.
Also, wouldn’t that create unnecessary data strain because every time a score is posted because it would require the relative overweightedness to change and thus the pp to change?

@CCleanerShot
Copy link
Author

Also, wouldn’t that create unnecessary data strain

I was thinking that dropping any end value below (+ or -) 1, but would doing the calculations in itself be too much strain? My idea was for it to only look at medium to high-end cases to prevent unneeded extra load, which would apply to the majority of popular maps. There shouldn't need a repeat on the same maps if their trend trajectory doesn't lead them to overweightedness (which I think is somewhat detectable if you take their average pc per month and see if it will ever shoot in popularity. And if the trend doesn't show it played that much anytime soon, you can just stop the recalculations there, leaving the current pp change there, and put it on a 2nd cycle (or even 3rd cycle, which I guess would be once a year) for when they are recalculated.

because every time a score is posted because it would require the relative overweightedness to change and thus the pp to change?

Oh I didn't see this part. No it won't recalculate the change then and there, It'll do it by cycles (for example every 2 weeks).

That means it is independent of how many players set scores on a map. The number of players that set scores on a map do not make a map any easier or harder than it is.

Ya this definitely would need a change. How about instead of just raw popularity, we take the people who can somewhat reach the pp cap of a map? Even this is a bit iffy too, but it does need to be present in some form.

having constantly changing pp values on any given play seems too ugly to me.

It can be hidden/very minimal if you want. And it would only seemingly change constant if it's a hot map. A majority of maps would see negligible change. I actually really don't want it to be as noticeable as you might thing. Like I said earlier, Im thinking that it should only work for medium-high end cases, which insignificant values just being plain dropped.

Outside of that
I actually notice some flaws in my system, even if my arbitary formulas/numbers were ideal. I've somehow made the same mistake as my old idea...

One being a really stupid one, that was really poor of me to not catch.
If Top 10 players who already have 5pp+ submit bad plays on map (A) and #11 does super sick on map (A), he will get a significant pp+ right? So far as intended. But if that #11 player has bad plays on maps the other Top 10 players have played, they will get significant boost because of newly acquired #11's PP+. There's an infinite loop here. Ofc, like I said earlier, insignificant boosts are dropped, but potentially, either dropped or not, this is an infinite loop, no? My 1st idea, is that has to be significant enough to be able to be accounted for (e.g. Top player drunk party NFDTFL plays on my love doesn't shoot it up because just be they submitted a play) but even then, this is still a bit flawed. Like I said earlier, the formula won't be as simple as multiple 3 numbers, but even then, if it's a positive number, it's garenteed to be infinite loop. Or am I overthinking this?

A literal 1 exception will break the purpose of it
My main purpose of this is to help evaluate what's overweighted, right? If it's 0 Overweightedness, that's all done and no one bats an eye. But if a single heavily underranked person does well above what's expected, it'll be listed as ATLEAST 1 Overweightedness. This really bothers me alot, Because you can scale it up higher. What if it wasn't just a measly random 4*? What if 2 god players just emerged and legitimately did well and beyond what most people can do on the most insane maps? My current idea is to theoritically, there's a realistic limit, so if I downplay outliers to maybe like 80% of the original, it should be fine, right? This specifically really bothers me and would like some ideas.

Even though It's supposed to detected overweightedness, it doesn't actually reward a chunk of good plays
I'll prob change the buff tab when I get the chance, but I think I left some important stuff out lol. With the new maps going around, the Saturation/Overweightedness standards will rise a bit, even if the population is proportioned. And if someone sets a sick play on a old map no one plays anymore, I don't think it'll even see a buff, because compared to a modern constantly played map, it's value with be neglible if top players see it and don't play it because "oh if we just don't play won't be significant enough to see Saturation". If I correct anything here when I get the chance, I'll prob address this first, and I'd rather address this first, because this will inherently create a small pp meta, but still a pp meta for avoiding older maps, which is counter to my idea in the first place.

Anyways I'm still really on this idea, I just think it needs MAJOR MAJOR tweaking, which means I should be fine if alot of people shoot down my idea, because it's basically shit execution right now.

@abraker95
Copy link

This only works under the assumption all players are trying to set scores on the map and all players are trying their best. This is not the case and why this will never consistently yield an accurate representation of difficulty.

Your pp changing in response to others setting scores on a certain map is ludicrous. If I set a score, that score has one definite value that should consistently compare across the map's leaderboard and other maps' leaderboards. If pp on one map changes due to other players setting scores on the said map, then it is impossible to compare pp you get from one map to the pp you get on any other map. It's having a bunch of pp currencies that need to be converted between, always fluctuating map to map.

@CCleanerShot
Copy link
Author

This only works under the assumption all players are trying to set scores on the map and all players are trying their best. This is not the case and why this will never consistently yield an accurate representation of difficulty.

I'm also aware of this, but statistically, people will often try their best. I just think that the pp system should use atleast some statistical evidence. I know we're using grumd overweightedness as some basis, and my idea is basically a near forefront, but scuffed version of it that does way too many calculations, but I still the idea can work. I just need to present something less bare bones than what I had in mind.

If pp on one map changes due to other players setting scores on the said map, then it is impossible to compare pp you get from one map to the pp you get on any other map.

I think it's sort of like, if the game only consisted of 1000 people, and 1000 people happen to be bad tech map players. They think all tech maps are really hard and are the pinnacle goals of the game, and extremely underweighted. They all value the map way above what they should be. Now years later, the game has grown to 10,000 people. 50,000 people. There has been a large influx of people, and a good amount of them happen to be good at tech maps. While the previous 1000 were bad at the map style. Realistically, it wasn't all that bad, and if the pp+ system was implemented correctly, it'll devalue some of the tech maps slightly and automatically give us an impression of what's overweighted or not, without us having to run circles of the new discovery that they weren't all that bad.

There are many variables in mind and that I'm willing to acknowledge and account for, but statistic evidence would be really strong to use in some degree. Note there will be should be a lot of tuning to this.

Like for example, in times of bounties and stuff, there will be an unnatural amount of "quality players" trying to set scores on a map, a variable way more than usual. And as your first statement says, the once neglible "players aren't trying" variable will prob be not so neglible now, and inflate the "quality" factor too much. There should be a decrease in the "quality" factor during this high period, in proportion to how high it is ofc.

But I'm willing to spend a couple more months of this trying to account everything. Even if it isn't near perfect. I, at the very least, want this to measure the very obvious cases, which was my original idea (which was buffing/nerfing very high-end cases). <-------- My compromise.

@abraker95
Copy link

This might work, but it's not a system I like. It makes pp more like a currency whose value is based on the overall values going around in the period of time. It's kind like that right now, but at least the values change in response to mappers figuring out ways to make maps that inflate pp rather than pp always changing in response to leaderboards.

I don't like the idea of a system where a player's performance on a single play can change because it is not a system in which performance is a reliable unit of measurement, something I really wish pp would be. When I think of 500pp, I expect the player to have abilities of streaming, jumping, and reading that are consistent with 500pp. For example, if 500pp corresponds to the player having 50% chance of FC a 210 BPM stream 2 min long in 1 try, then I expect that correspondence to be consistent as per being a metric that can be measured.

If you aim to just making a working system based on layers of corrections to get close to what we think the pp values should be, then that is just sufficient enough. pp will remain to be an abstract value for which there is a vague relation to player skill like money has in relation to the worth of something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants