New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate high rate of replay upload timeouts #220
Comments
@bdach to follow up on this, we figured that the number being reported on datadog was much higher than actual, but I forget the reason why, or how i confirmed that this was the case. or if we actually decided this was still an issue.. root@db-master.osu.io:osu> select has_replay, count(*) from scores where preserve = 1 and passed = 1 and legacy_score_id is null and id between 2386002880 and 238700288
-> 0 group by has_replay;
+------------+----------+
| has_replay | count(*) |
+------------+----------+
| 1 | 22345 |
| 0 | 3669 |
+------------+----------+ The count without replays still seems quite high, but I might be missing something in the filtering conditions. |
That query is missing an Running the query did look significantly better but we never got to explain why the numbers were being overestimated like that. I believe we did rule out aborted submission. Next step would be to take a sample of those timeout messages from logs or from sentry and look for possible patterns in database. Or just blindly try increasing timeout I guess. |
aha, that's the one. root@db-master.osu.io:osu> select has_replay, count(*) from scores join osu_beatmaps using (beatmap_id) where preserve = 1 and passed = 1 and legacy_score_id is null an
-> d id between 2386002880 and 2387002880 and approved in (1,2) group by has_replay;
+------------+----------+
| has_replay | count(*) |
+------------+----------+
| 1 | 21148 |
| 0 | 36 |
+------------+----------+ |
Let's close this one for the time being. As we discovered, the failure rate is actually very low. |
I'd still want to investigate the metrics themselves and why they're overreporting the actual count. Maybe we can save on some unnecessary work in the server. |
The client will not submit a play if hit statistics indicate that they never hit anything: https://github.com/ppy/osu/blob/a47ccb8edd2392258b6b7e176b222a9ecd511fc0/osu.Game/Screens/Play/SubmittingPlayer.cs#L281 This check was not exercised spectator server-side, which meant that scores with nothing hit would get completed and enqueued for upload that would never succeed (because the score would never be submitted anyway). To save on some processing, just port the same check server-side to skip upload if nothing was hit. I'm hoping the effects of this is going to be the decrease of the score timeouts monitoring on sentry & datadog closer to real levels (see discussion in ppy#220).
https://sentry.ppy.sh/organizations/ppy/issues/34904/
The text was updated successfully, but these errors were encountered: