Skip to content

[DS-2888] LLO datasource: observation pacing, staleRefreshSkipThreshold, wake after observation and metrics#22005

Merged
pavel-raykov merged 6 commits intodevelopfrom
bm/DS-2888-datasource-frequency
May 4, 2026
Merged

[DS-2888] LLO datasource: observation pacing, staleRefreshSkipThreshold, wake after observation and metrics#22005
pavel-raykov merged 6 commits intodevelopfrom
bm/DS-2888-datasource-frequency

Conversation

@brunotm
Copy link
Copy Markdown
Collaborator

@brunotm brunotm commented Apr 14, 2026

Improves how often the LLO observation background loop can refresh pipeline-backed streams without blocking the plugin Observe path or OCR rounds.

staleRefreshSkipThreshold: This is the value S = (N/D)·T derived from data_source.go, where T is the plugin observation deadline (observationTimeout). After a successful cache write, each stream’s entry has a TTL; buildStreamsRefreshPlan treats a cached stream as still fresh (not a refresh driver) while time.Until(expiresAt) > S. Once remaining TTL drops to S or below, that stream becomes a refresh driver for that loop iteration (subject to pacing and registry grouping). Raising S makes that transition happen earlier after each write (more TTL still left when refresh is allowed), which increases freshness and pipeline load; lowering S delays drivers (staler reads, less load). It must satisfy S + observationLoopPacing(T) < cacheEntryTTL(T) so refresh can run before entries expire.

Changes:

Wake channel: After Observe call, coalesce a non-blocking hint so the loop can exit inter-iteration pacing early instead of always waiting observationLoopPacing(T); expose llo_datasource_observation_loop_wait_outcome_count (wake / timer / shutdown).
Cache staleness metric: Record writtenAt on cache writes and emit llo_datasource_cache_hit_entry_age_ms on plugin reads (per streamID).
Tuning / docs: Align comments and constants with 8/5 for staleRefreshSkipThreshold and 2·T cache TTL; document the invariant above.
Tests: Wake-before-pacing, concurrent Observe with separate StreamValues maps (-race), cache hit-age metrics; metric resets in teardown.

@brunotm brunotm added the build-publish Build and Publish image to SDLC label Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

✅ No conflicts with other open PRs targeting develop

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented Apr 14, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@brunotm brunotm force-pushed the bm/DS-2888-datasource-frequency branch from 939f176 to 9538747 Compare April 14, 2026 15:52
@brunotm brunotm changed the title llo/datasource: increase data source frequency llo/datasource: observation loop pacing Apr 15, 2026
@brunotm brunotm force-pushed the bm/DS-2888-datasource-frequency branch 12 times, most recently from 4e941e5 to b775d47 Compare April 22, 2026 14:00
@brunotm brunotm changed the title llo/datasource: observation loop pacing [DS-2888] LLO datasource: observation loop pacing, 2T cache TTL, 5/4 stale skip, per-pipeline cache flush Apr 22, 2026
@brunotm brunotm force-pushed the bm/DS-2888-datasource-frequency branch 7 times, most recently from c838673 to 1150ce5 Compare April 27, 2026 10:29
@brunotm brunotm changed the title [DS-2888] LLO datasource: observation loop pacing, 2T cache TTL, 5/4 stale skip, per-pipeline cache flush [DS-2888] LLO datasource: observation pacing, staleRefreshSkipThreshold, wake after observation and metrics Apr 27, 2026
@brunotm brunotm marked this pull request as ready for review April 27, 2026 10:34
@brunotm brunotm requested review from a team as code owners April 27, 2026 10:34
// must be used together to form a quote. So if any stream in the group failed, we drop
// the entire group to avoid writing a mix of fresh and stale values to the cache.
// Mutates observedValues in place.
func (d *dataSource) removeIncompleteGroups(lggr logger.Logger, observedValues map[streams.StreamID]llo.StreamValue, streamValues llo.StreamValues) []streams.StreamID {
Copy link
Copy Markdown
Contributor

@akuzni2 akuzni2 Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trying to understand why we'd delete this - how does the existing changes guarantee an atomic write from a pipeline with multiple outputs? Specifically if a pipeline has 3 output values are we still ensuring an "all-or-nothing" result?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the observation starts grouped by pipeline for all stream IDs in scope for the plugin (included in the passed streamValues), see buildStreamsRefreshPlan and they only get added if all streams are successfully observed, preserving atomicity and integrity.

it maintains the previous implementation guarantees but builds the plan upfront.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - we do have this test cache writes are atomic per pipeline group across observation cycles which makes sure that behavior stays put

@brunotm brunotm requested a review from akuzni2 April 28, 2026 07:54
@brunotm brunotm requested review from jmank88 April 30, 2026 08:02
@brunotm brunotm force-pushed the bm/DS-2888-datasource-frequency branch from 1150ce5 to d1a0ad7 Compare April 30, 2026 16:08
@brunotm brunotm force-pushed the bm/DS-2888-datasource-frequency branch from d1a0ad7 to bb124ae Compare May 4, 2026 10:56
@cl-sonarqube-production
Copy link
Copy Markdown

@brunotm brunotm added this pull request to the merge queue May 4, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026
@pavel-raykov pavel-raykov added this pull request to the merge queue May 4, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026
@pavel-raykov pavel-raykov added this pull request to the merge queue May 4, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 4, 2026
@pavel-raykov pavel-raykov added this pull request to the merge queue May 4, 2026
Merged via the queue into develop with commit 3213653 May 4, 2026
230 checks passed
@pavel-raykov pavel-raykov deleted the bm/DS-2888-datasource-frequency branch May 4, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build-publish Build and Publish image to SDLC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants