Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AckExplicitPolicy instead of AckAllPolicy #3288

Merged
merged 8 commits into from
Dec 19, 2023
Merged

Conversation

S7evinK
Copy link
Contributor

@S7evinK S7evinK commented Dec 16, 2023

Fixes #3240 and potentially a root cause for state resets.

While testing, I've had added some more debug logging:

time="2023-12-16T18:13:11.319458084Z" level=warning msg="already processed event" event_id="$qFYMl_F2vb1N0yxmvlFAMhqhGhLKq4kA-o_YCQKH7tQ" kind=KindNew times=2
time="2023-12-16T18:13:14.537389126Z" level=warning msg="already processed event" event_id="$EU-LTsKErT6Mt1k12-p_3xOHfiLaK6gtwVDlZ35lSuo" kind=KindNew times=5
time="2023-12-16T18:13:16.789551206Z" level=warning msg="already processed event" event_id="$dIPuAfTL5x0VyG873LKPslQeljCSxFT1WKxUtjIMUGE" kind=KindNew times=5
time="2023-12-16T18:13:17.383838767Z" level=warning msg="already processed event" event_id="$7noSZiCkzerpkz_UBO3iatpRnaOiPx-3IXc0GPDQVGE" kind=KindNew times=2
time="2023-12-16T18:13:22.091946597Z" level=warning msg="already processed event" event_id="$3Lvo3Wbi2ol9-nNbQ93N-E2MuGQCJZo5397KkFH-W6E" kind=KindNew times=1
time="2023-12-16T18:13:23.026417446Z" level=warning msg="already processed event" event_id="$lj1xS46zsLBCChhKOLJEG-bu7z-_pq9i_Y2DUIjzGy4" kind=KindNew times=4

So we did receive the same event over and over again. Given they are KindNew, we don't short circuit if we already processed them, which potentially caused the state to be calculated with a now wrong state snapshot.

Also fixes the back pressure metric. We now correctly increment the counter once we sent the message to NATS and decrement it once we actually processed an event.

@S7evinK S7evinK added C-Roomserver C-NATS T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Major Major functionality / product severely impaired, no satisfactory workaround. labels Dec 16, 2023
@S7evinK S7evinK requested a review from a team as a code owner December 16, 2023 19:15
Copy link

codecov bot commented Dec 16, 2023

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (d65449c) 65.71% compared to head (d395a91) 65.55%.

Files Patch % Lines
roomserver/internal/input/input.go 77.04% 11 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3288      +/-   ##
==========================================
- Coverage   65.71%   65.55%   -0.17%     
==========================================
  Files         509      509              
  Lines       57420    57466      +46     
==========================================
- Hits        37734    37670      -64     
- Misses      15851    15928      +77     
- Partials     3835     3868      +33     
Flag Coverage Δ
unittests 49.82% <77.04%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -49,7 +49,7 @@ import (
)

// TODO: Does this value make sense?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this comment anymore?

// The consumer already exists, try to update if necessary.
if info != nil {
switch {
case info.Config.AckWait.Nanoseconds() != consumerConfig.AckWait.Nanoseconds():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could use a comment.
Something like "The existing consumer's AckWait timeout does not match the current expected timeout value"

@S7evinK
Copy link
Contributor Author

S7evinK commented Dec 19, 2023

As a note: we probably should revisit processRoomEvent and possibly pass in the NATS message itself, so we can msg.InProgress() to reset the AckWait timer if we need to ask federation for events (which may take a while).

@S7evinK S7evinK merged commit f93d1c4 into main Dec 19, 2023
20 checks passed
@S7evinK S7evinK deleted the s7evink/roomserver-input branch December 19, 2023 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-NATS C-Roomserver S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AckAllPolicy may be a bad choice for RoomConsumer?
2 participants