Skip to content

fix(credit): snapshotting concurrency#3997

Merged
GAlexIHU merged 1 commit into
mainfrom
fix/balance-snapshot
Mar 23, 2026
Merged

fix(credit): snapshotting concurrency#3997
GAlexIHU merged 1 commit into
mainfrom
fix/balance-snapshot

Conversation

@GAlexIHU
Copy link
Copy Markdown
Contributor

@GAlexIHU GAlexIHU commented Mar 23, 2026

Overview

A race-condition can happen between persisting new cached values and invalidating them, this change optimistically tries to acquire the lock and if not possible then simply drops the cache entry

Notes for reviewer

Summary by CodeRabbit

Release Notes

  • Refactor
    • Enhanced internal transaction locking mechanisms across credit and entitlement operations to provide improved flexibility in lock acquisition behavior.
    • Updated test utilities to align with refined locking implementation.

@GAlexIHU GAlexIHU requested a review from a team as a code owner March 23, 2026 17:43
@GAlexIHU GAlexIHU added the release-note/bug-fix Release note: Bug Fixes label Mar 23, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

This PR adds a wait bool parameter to the LockOwnerForTx and LockEntitlementForTx locking methods across the codebase, controlling whether lock acquisitions should block or fail immediately. Call sites are updated to pass true (wait for lock) in most contexts, with false (NoWait) used specifically in snapshot persistence logic.

Changes

Cohort / File(s) Summary
Credit Owner Lock Interface & Call Sites
openmeter/credit/grant/owner_connector.go, openmeter/credit/balance.go, openmeter/credit/grant.go, openmeter/credit/balance/service_test.go
Added wait bool parameter to OwnerConnector.LockOwnerForTx interface method; updated all call sites to pass true; mock test double updated with new parameter.
Entitlement Lock Interface & Implementation
openmeter/entitlement/repository.go, openmeter/entitlement/adapter/entitlement.go, openmeter/entitlement/metered/grant_owner_adapter.go
Added wait bool parameter to EntitlementRepo.LockEntitlementForTx method; forwarded through adapter layers; implementation now conditionally applies sql.WithLockAction(sql.NoWait) when wait == false.
Snapshot Persistence Logic
openmeter/credit/helper.go
Wrapped owner lock acquisition in transaction scope using LockOwnerForTx(ctx, owner, false), returning early on lock failure instead of continuing with snapshot creation logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • turip
  • chrisgacsal
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(credit): snapshotting concurrency' directly and clearly summarizes the main change—addressing a race condition in the snapshotting process by implementing optimistic locking behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/balance-snapshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
openmeter/credit/helper.go (1)

180-185: Add observability when lock acquisition fails.

When the optimistic lock fails, the function silently returns nil and skips snapshot persistence. This is fine from a correctness standpoint (matches the PR objective), but there's no logging or metric emission to help operators understand when/how often this happens.

Frequent lock contention causing snapshot misses could indicate a performance issue worth investigating. Consider adding a debug/info log here.

♻️ Suggested improvement
 	if err := transaction.RunWithNoValue(ctx, m.GrantRepo, func(ctx context.Context) error {
 		return m.OwnerConnector.LockOwnerForTx(ctx, snapParams.owner, false)
 	}); err != nil {
 		// If we failed to acquire the lock we simply don't save the snapshot
+		m.Logger.DebugContext(ctx, "skipping snapshot persistence due to lock contention", "owner", snapParams.owner)
 		return nil
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openmeter/credit/helper.go` around lines 180 - 185, The block calling
transaction.RunWithNoValue with m.GrantRepo and m.OwnerConnector.LockOwnerForTx
currently swallows errors and returns nil; update this error branch to emit
observability (e.g., process or package logger and/or metrics) so operators know
when lock acquisition for snapParams.owner fails—log a descriptive message
including the owner identifier (snapParams.owner) and the error returned from
RunWithNoValue/LockOwnerForTx, and optionally increment a contention/failure
metric; keep the current behavior of not persisting the snapshot after
logging/metric emission.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@openmeter/credit/balance/service_test.go`:
- Around line 97-99: Add a unit test that simulates lock acquisition failure by
making MockOwnerConnector.LockOwnerForTx return a non-nil error when called with
wait=false and then invoke the helper function in openmeter/credit/helper.go
(the code path that calls LockOwnerForTx with snapParams.owner, false); assert
that snapshot persistence is skipped (e.g., no calls to the persistence/mock
store or no snapshot written) and that the function returns gracefully without
propagating the lock error. Locate MockOwnerConnector.LockOwnerForTx and the
helper function in helper.go to implement the mock behavior and the assertions.

---

Nitpick comments:
In `@openmeter/credit/helper.go`:
- Around line 180-185: The block calling transaction.RunWithNoValue with
m.GrantRepo and m.OwnerConnector.LockOwnerForTx currently swallows errors and
returns nil; update this error branch to emit observability (e.g., process or
package logger and/or metrics) so operators know when lock acquisition for
snapParams.owner fails—log a descriptive message including the owner identifier
(snapParams.owner) and the error returned from RunWithNoValue/LockOwnerForTx,
and optionally increment a contention/failure metric; keep the current behavior
of not persisting the snapshot after logging/metric emission.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d8d3f1d1-a337-4293-9b66-726d3edf9a94

📥 Commits

Reviewing files that changed from the base of the PR and between 91cdcc7 and fb74161.

📒 Files selected for processing (8)
  • openmeter/credit/balance.go
  • openmeter/credit/balance/service_test.go
  • openmeter/credit/grant.go
  • openmeter/credit/grant/owner_connector.go
  • openmeter/credit/helper.go
  • openmeter/entitlement/adapter/entitlement.go
  • openmeter/entitlement/metered/grant_owner_adapter.go
  • openmeter/entitlement/repository.go

Comment on lines +97 to 99
func (m *MockOwnerConnector) LockOwnerForTx(ctx context.Context, id models.NamespacedID, wait bool) error {
return nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for tests that might cover wait=false or NoWait behavior
rg -n "wait.*false|NoWait|LockOwnerForTx.*false" --type go -g '*_test.go'

Repository: openmeterio/openmeter

Length of output: 47


🏁 Script executed:

# First, let's find where LockOwnerForTx with wait=false is actually called
rg -n "LockOwnerForTx.*false|wait.*false" --type go -g '*.go' | head -20

Repository: openmeterio/openmeter

Length of output: 166


🏁 Script executed:

# Look at the helper.go file mentioned in the comment to see how wait=false is used
fd -n "helper.go" . | grep -E "balance|credit"

Repository: openmeterio/openmeter

Length of output: 295


🏁 Script executed:

# Check the service_test.go file to understand what tests exist
wc -l openmeter/credit/balance/service_test.go

Repository: openmeterio/openmeter

Length of output: 109


🏁 Script executed:

# Look at helper.go context around line 181 where wait=false is called
sed -n '170,190p' openmeter/credit/helper.go

Repository: openmeterio/openmeter

Length of output: 1231


🏁 Script executed:

# Read the entire service_test.go to understand test coverage
cat -n openmeter/credit/balance/service_test.go | head -100

Repository: openmeterio/openmeter

Length of output: 3812


🏁 Script executed:

# Also check what functions call the helper function with wait=false
rg -B 5 "LockOwnerForTx.*false" --type go openmeter/credit/helper.go

Repository: openmeterio/openmeter

Length of output: 467


🏁 Script executed:

# Check if there are any other tests that might verify lock acquisition failures
rg -A 10 "func Test" openmeter/credit/balance/service_test.go | head -50

Repository: openmeterio/openmeter

Length of output: 352


🏁 Script executed:

# Check if there are integration tests that might cover lock failure scenarios
fd service_test.go --type f | xargs wc -l

Repository: openmeterio/openmeter

Length of output: 565


🏁 Script executed:

# Look at the rest of service_test.go to understand the test structure
tail -100 openmeter/credit/balance/service_test.go

Repository: openmeterio/openmeter

Length of output: 3673


Add tests for lock acquisition failures with wait=false.

The mock's signature update is correct, but there's no test coverage for when lock acquisition fails. Looking at openmeter/credit/helper.go:181, the code gracefully skips snapshot persistence when LockOwnerForTx(ctx, snapParams.owner, false) fails—but no tests verify this behavior. Since the mock always returns nil, we can't exercise this error path.

Consider adding a test that verifies the snapshot is skipped gracefully when the lock can't be acquired with wait=false. This would catch regressions if the lock-contention handling changes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openmeter/credit/balance/service_test.go` around lines 97 - 99, Add a unit
test that simulates lock acquisition failure by making
MockOwnerConnector.LockOwnerForTx return a non-nil error when called with
wait=false and then invoke the helper function in openmeter/credit/helper.go
(the code path that calls LockOwnerForTx with snapParams.owner, false); assert
that snapshot persistence is skipped (e.g., no calls to the persistence/mock
store or no snapshot written) and that the function returns gracefully without
propagating the lock error. Locate MockOwnerConnector.LockOwnerForTx and the
helper function in helper.go to implement the mock behavior and the assertions.

@GAlexIHU GAlexIHU enabled auto-merge (squash) March 23, 2026 17:53
@GAlexIHU GAlexIHU merged commit b36c866 into main Mar 23, 2026
37 of 39 checks passed
@GAlexIHU GAlexIHU deleted the fix/balance-snapshot branch March 23, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note/bug-fix Release note: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants