MongoDB replication tracing#616
Merged
Merged
Conversation
🦋 Changeset detectedLatest commit: e44ba9f The changes in this PR will be included in the next version bump. This PR includes changesets to release 12 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
stevensJourney
approved these changes
Apr 30, 2026
Collaborator
stevensJourney
left a comment
There was a problem hiding this comment.
This looks good to me. Using Disposables for spans looks really neat.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a re-implementation of #615, using a new lightweight tracing API, adding the timings to the "Processed batch" logs:
The goal here is to help diagnose slow replication issues on a high level. When replication is slower than expected, we want to know whether the bottleneck is (1) the source db or network, (2) the storage db, or (3) replication process CPU. The stats here help identify that.
Timings:
duration: Duration for processing the batch, excludingchangestream.changestream: Time spent waiting for the next batch on the change stream, including the source db waiting for more changes, scanning the oplog, processing the change stream pipeline, and network transfer time.processing: Time spent converting from raw change stream buffers to input for sync config, as well as other batch processing overhead.evaluate: Time spent evaluating sync queries.storage: Time spent writing changes to the storage database.This also removes the previous behavior of
Updating resume LSN to ${lsn} after 20000 changes. We don't need that anymore - theflushafter each change stream batch takes care of that now.Implementation
#615 used manual tracking of start and end timestamps for various replication sections, and reported those in the APIs.
This instead keeps the APIs clean, only passing a
PerformanceTracerinstance through where needed. By using the Disposable interface, we can keep the code to generate the spans fairly clean.While the code keeps track of the entire trace structure with nested spans, the main output currently is just the duration we spend in each span itself, excluding nested spans - this is what we use for logs.
Perfetto Traces
This does have the ability to produce a trace file during development:
export POWERSYNC_TRACE_FILE=trace.json node lib/entry.js start -r sync -c powersync.yamlThis produces a trace file in the Chromium Trace Event format. This can be viewed on https://ui.perfetto.dev/.
Right now, this is mostly useful to visualize where the aggregated timings come from. In the future we can expose traces in other formats in addition to / instead of Perfetto.
Alternatives considered
performance.measure: