Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Breakdown Ratings: Use opponents overall rating when calculating breakdown ratings #9

Open
flovo opened this issue Oct 25, 2020 · 5 comments

Comments

@flovo
Copy link
Contributor

flovo commented Oct 25, 2020

image
Right now we use a separate rating pool when calculating each rating in the breakdown chart. This has the big disadvantage to make the ratings not comparable to each other. We can make the breakdown ratings comparable when we use the opponents overall rating when calculating them.

When updating the ratings of a player we use the player's category rating as base rating as we do at the moment. For the opponent's rating we always use her overall rating. The update algorithm stays the same for all breakdowns.

By using the opponents overall rating, we keep the breakdown ratings on the same scale as the overall rating.

For a player playing only one board size + speed combination, all 4 breakdown ratings will be the same, while they can be quite different at the moment.

anoek added a commit that referenced this issue Jan 12, 2021
@anoek
Copy link
Member

anoek commented Jan 14, 2021

I've played with this a bit. It certainly helps pull those values closer and intuitively seem more reasonable, though I am still unsure whether it crosses the threshold of being particularly useful, or if the result is left somewhere in the limbo between "pure but not particularly useful for a rank" and "we can use this rating to provide a meaningful rank". I'm a little wary of being stuck in between, since then no one would be happy, but I would sure love it if those ratings were useful in terms of providing a meaningful rank.

@flovo
Copy link
Contributor Author

flovo commented Jan 19, 2021

Subratings will always be worse than the overall rating due to them being a smaller sample of played games than the overall rating, but these ratings would have the same values as the player's overall rating if he only had played the games of the respective boardsize+speed. So they are as useful for rating as the rating of a player with few games can be.


Also it prevents this:
image
Right now the subratings are close to "not useful at all". The ratings right now can only be used in comparison with another player (besides the rating graph).


While this subratings took long to converge, I could use them to understand why my overall rating jumped between 11k and 16k for a while. My 9x9 performance was close to 10k while my 19x19 performance was more like 20k.
Figure_1

@anoek
Copy link
Member

anoek commented Jan 20, 2021

That's a nice graph. Ok I'm sold, we'll give it a try this iteration.

@flovo
Copy link
Contributor Author

flovo commented Jan 25, 2021

From the forum:

It’s great to see how predictive the OGS ratings are!
I really appreciate that side of it and the rigour applied is great.

What is still on my wishlist though is a system such that the rating system is also easy to understand.
I find that the EGD rating system is fairly simple to understand for example. Less predictive but much better ergonomics & usability.

Here are questions, I‘d like to understand about the OGS rating system:
What does it tell me if I have 12 different numbers that are somewhat dependent but not very much. Does it mean I am stronger in some aspects (slow games) than others? But then the aggregate numbers don’t necessarily bear that out since they are calculated differently. ===> looks like transitivity is broken in the rating system (transitivity: if chocolate is better than vanilla and vanilla is better than hazelnut, then chocolate should also be better than hazelnut).

How do those 12 numbers inform my rank, which Number (e.g. 2100?) Constitutes a 1d rating and how much do I stand to gain against an opponent of equal strength? (equal in all 12 or just the rank?)

If my blitz rating is a few hundred points weaker than slow games, what does that mean? Does it mean the scale is different or my skill level is different?

I am really happy with how OGS is developing, and don’t mean to criticize too much. My only feedback after reading through the detailed and rigorous analysis is that it shows that the human-to-rating-system interaction has not been given much weight in favor of maximal predictive power and I think a balance would be best.

@anoek
Copy link
Member

anoek commented Jan 25, 2021

Yep, we're going with it :) I think the current plan is to do this proposal of using opponents overall rating for all bands, go with no windowing straight vanilla glicko2, change the rating to rank parameters to align with egf/aga ranks better as well as not have a giant 30k lump, and see how that goes for this iteration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants