Skip to content

Moving ElectedClientNotSummarized from sendErrorEvent to sendTelemetryEvent#12482

Merged
NicholasCouri merged 1 commit into
microsoft:mainfrom
NicholasCouri:sendTelemetryEvent
Oct 14, 2022
Merged

Moving ElectedClientNotSummarized from sendErrorEvent to sendTelemetryEvent#12482
NicholasCouri merged 1 commit into
microsoft:mainfrom
NicholasCouri:sendTelemetryEvent

Conversation

@NicholasCouri
Copy link
Copy Markdown
Contributor

@NicholasCouri NicholasCouri commented Oct 14, 2022

Today, the ElectedClientNotSummarized event is treated as an error and as the number of hits is dependent to the number of running clients, a running stress test with 100+ clients will end up generating 100+ errors. This is not necessarily a bug and ends up creating unnecessary incidents (Ex. Bug 2241).
After talking to the OCEs (Navin, Si) and Jatin, we've decided to move it from an error to a generic telemetry event to reduce the noise from the system.

@NicholasCouri NicholasCouri requested a review from a team as a code owner October 14, 2022 00:13
@github-actions github-actions Bot added area: runtime Runtime related issues base: main PRs targeted against main branch labels Oct 14, 2022
@NicholasCouri NicholasCouri changed the title Moving ElectedClientNotSummarized to sendTelemetryEvent Moving ElectedClientNotSummarized from sendError to sendTelemetryEvent Oct 14, 2022
@NicholasCouri NicholasCouri changed the title Moving ElectedClientNotSummarized from sendError to sendTelemetryEvent Moving ElectedClientNotSummarized from sendErrorEvent to sendTelemetryEvent Oct 14, 2022
@msfluid-bot
Copy link
Copy Markdown
Collaborator

@fluid-example/bundle-size-tests: +8 Bytes
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 400.62 KB 400.62 KB +4 Bytes
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 199.7 KB 199.71 KB +4 Bytes
loader.js 154.13 KB 154.13 KB No change
map.js 47.37 KB 47.37 KB No change
matrix.js 138.04 KB 138.04 KB No change
odspDriver.js 153.03 KB 153.03 KB No change
odspPrefetchSnapshot.js 40.52 KB 40.52 KB No change
sharedString.js 157.95 KB 157.95 KB No change
Total Size 1.29 MB 1.29 MB +8 Bytes

Baseline commit: 4e10602

Generated by 🚫 dangerJS against c712244

@vladsud
Copy link
Copy Markdown
Contributor

vladsud commented Oct 14, 2022

We also have "Behind" event and it has very similar semantics (also error, also issues by all clients).
Is this duplicate? Should we remove it?
Please note that nobody pays attention to non-error events, and usually it's not that hard (through other events) to figure out rough current sequenceNumber, so we can always figure out (with some work) how far behind is current summarizer.
So maybe deleted it all together? Not sure.

@agarwal-navin
Copy link
Copy Markdown
Contributor

Let's change this to an event in this PR so that we reduce the noise. And create an issue to merge "SummaryStatus::Behind" and "ElectedClientNotSummarizing".
We were seeing both of them frequently in the tests (mostly together I believe) so yes, they do seem to be similar.

@anthony-murphy
Copy link
Copy Markdown
Contributor

i think we do want some error logged somewhere when summaries are not happening, so there is a clear error signal for partners that something is wrong

@NicholasCouri
Copy link
Copy Markdown
Contributor Author

We also have "Behind" event and it has very similar semantics (also error, also issues by all clients). Is this duplicate? Should we remove it? Please note that nobody pays attention to non-error events, and usually it's not that hard (through other events) to figure out rough current sequenceNumber, so we can always figure out (with some work) how far behind is current summarizer. So maybe deleted it all together? Not sure.

We might eventually remove it - I personally think it might help when diagnosing issues on individual interactive containers - I filed 2241 just in case.

@NicholasCouri NicholasCouri merged commit f350755 into microsoft:main Oct 14, 2022
sharptrip pushed a commit to sharptrip/FluidFramework that referenced this pull request Oct 17, 2022
…yEvent (microsoft#12482)

Today, the ElectedClientNotSummarized event is treated as an error and
as the number of hits is dependent to the number of running clients, a
running stress test with 100+ clients will end up generating 100+
errors. This is not necessarily a bug and ends up creating unnecessary
incidents (Ex. [Bug
2241](https://dev.azure.com/fluidframework/internal/_workitems/edit/2241)).
After talking to the OCEs (Navin, Si) and Jatin, we've decided to move
it from an error to a generic telemetry event to reduce the noise from
the system.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: runtime Runtime related issues base: main PRs targeted against main branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants