Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate high rate of replay upload timeouts #220

Closed
peppy opened this issue Feb 22, 2024 · 5 comments
Closed

Investigate high rate of replay upload timeouts #220

peppy opened this issue Feb 22, 2024 · 5 comments

Comments

@peppy
Copy link
Sponsor Member

peppy commented Feb 22, 2024

Image

https://sentry.ppy.sh/organizations/ppy/issues/34904/

@peppy
Copy link
Sponsor Member Author

peppy commented Feb 22, 2024

@bdach to follow up on this, we figured that the number being reported on datadog was much higher than actual, but I forget the reason why, or how i confirmed that this was the case. or if we actually decided this was still an issue..

root@db-master.osu.io:osu> select has_replay, count(*) from scores where preserve = 1 and passed = 1 and legacy_score_id is null and id between 2386002880 and 238700288
                        -> 0 group by has_replay;
+------------+----------+
| has_replay | count(*) |
+------------+----------+
| 1          | 22345    |
| 0          | 3669     |
+------------+----------+

The count without replays still seems quite high, but I might be missing something in the filtering conditions.

@bdach
Copy link
Collaborator

bdach commented Feb 22, 2024

That query is missing an approved in (1, 2).

Running the query did look significantly better but we never got to explain why the numbers were being overestimated like that. I believe we did rule out aborted submission.

Next step would be to take a sample of those timeout messages from logs or from sentry and look for possible patterns in database. Or just blindly try increasing timeout I guess.

@peppy
Copy link
Sponsor Member Author

peppy commented Feb 22, 2024

That query is missing an approved in (1, 2).

aha, that's the one.

root@db-master.osu.io:osu> select has_replay, count(*) from scores join osu_beatmaps using (beatmap_id) where preserve = 1 and passed = 1 and legacy_score_id is null an
                        -> d id between 2386002880 and 2387002880 and approved in (1,2) group by has_replay;
+------------+----------+
| has_replay | count(*) |
+------------+----------+
| 1          | 21148    |
| 0          | 36       |
+------------+----------+

@peppy
Copy link
Sponsor Member Author

peppy commented Mar 14, 2024

Let's close this one for the time being. As we discovered, the failure rate is actually very low.

@peppy peppy closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2024
@bdach
Copy link
Collaborator

bdach commented Mar 14, 2024

I'd still want to investigate the metrics themselves and why they're overreporting the actual count. Maybe we can save on some unnecessary work in the server.

bdach added a commit to bdach/osu-server-spectator that referenced this issue Mar 14, 2024
The client will not submit a play if hit statistics indicate that they
never hit anything:

	https://github.com/ppy/osu/blob/a47ccb8edd2392258b6b7e176b222a9ecd511fc0/osu.Game/Screens/Play/SubmittingPlayer.cs#L281

This check was not exercised spectator server-side, which meant that
scores with nothing hit would get completed and enqueued for upload that
would never succeed (because the score would never be submitted anyway).
To save on some processing, just port the same check server-side to skip
upload if nothing was hit.

I'm hoping the effects of this is going to be the decrease of the score
timeouts monitoring on sentry & datadog closer to real levels (see
discussion in ppy#220).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants