fix(billing): plan-change forced reset + atomic usage+extras outbox#3582
Conversation
🟠 B: BillingResetService.forceRotateForPlanChange() updates current week
snapshot mid-period when plan changes (was no-op when week doc existed,
keeping stale quota until next reset). Webhook plan.changed handler now
calls forceRotate with preserveUsage:true default. Consumers needing
clean-break downgrade pass {preserveUsage:false}.
🟠 C: Outbox pattern guarantees extras debit completes even if first attempt
fails. incrementMeterWithOutbox() inserts BillingMeterOutbox row alongside
usage increment; out-of-band cron retry-pending-extras-debit reconciles
within 5min. After 5 failed attempts, row marked 'failed' + billing.extras_debit.exhausted
event emitted for downstream alerting. unique index on idempotencyKey prevents
double-create races.
Documents both behaviors in modules/billing/README.md.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (8)
WalkthroughThis PR implements an outbox-based extras debit retry system for billing meter overages and integrates plan-change snapshot rotation. When meter usage exceeds quota limits, ChangesBilling Meter Outbox Reliability & Plan Rotation
Sequence DiagramsequenceDiagram
participant Client
participant MeterService as BillingMeterService
participant UsageService as BillingUsageService
participant OutboxRepo as OutboxRepository
participant ExtraService as BillingExtraService
participant Cron as RetryPendingExtrasDebitCron
participant OutboxService as BillingMeterOutboxService
participant Events as BillingEvents
rect rgba(100, 150, 200, 0.5)
Note over Client,ExtraService: Initial Optimistic Attribution
Client->>MeterService: attribute(orgId, units, ...)
MeterService->>UsageService: incrementMeterWithOutbox(...)
UsageService->>UsageService: incrementMeter()
alt Usage Applied & Extras > 0
UsageService->>OutboxRepo: create(orgId, key, extrasUnits)<br/>status='pending'
OutboxRepo-->>UsageService: outboxDoc
UsageService-->>MeterService: {applied, extrasConsumed, outbox}
else Usage Not Applied
UsageService-->>MeterService: {applied:false, extrasConsumed:0}
end
MeterService->>ExtraService: debit(orgId, extrasUnits, key)
alt Debit Succeeds
ExtraService-->>MeterService: {applied:true}
MeterService->>OutboxRepo: markCommitted(outboxDoc._id)
OutboxRepo-->>MeterService: {status:'committed'}
else Debit Fails
ExtraService-->>MeterService: {applied:false}
Note over MeterService: Log warning, defer to cron
end
MeterService-->>Client: {applied:true, ...}
end
rect rgba(150, 100, 200, 0.5)
Note over Cron,Events: Background Retry Loop (~5 min)
Cron->>OutboxService: retryPendingExtrasDebits(5min, limit=100)
OutboxService->>OutboxRepo: findPendingDue(5min, 100)
OutboxRepo-->>OutboxService: [pending rows]
loop For Each Pending Row
OutboxService->>ExtraService: debit(orgId, extrasUnits, key)
alt Debit Succeeds
OutboxService->>OutboxRepo: markCommitted(id)
else Debit Fails
OutboxService->>OutboxRepo: markFailedAttempt(id, error)
alt Attempts >= 5
OutboxRepo-->>OutboxService: {status:'failed'}
OutboxService->>Events: emit('billing.extras_debit.exhausted', {...})
Events-->>OutboxService: event emitted
end
end
end
OutboxService-->>Cron: {scanned, committed, failedAttempts, exhausted}
Cron->>Cron: exit with code (exhausted > 0 ? 1 : 0)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 44 minutes and 37 seconds.Comment |
Not up to standards ⛔🔴 Issues
|
| Category | Results |
|---|---|
| ErrorProne | 4 high |
🟢 Metrics 82 complexity · 23 duplication
Metric Results Complexity 82 Duplication 23
AI Reviewer: first review requested successfully. AI can make mistakes. Always validate suggestions.
TIP This summary will be updated as you push new changes.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #3582 +/- ##
==========================================
+ Coverage 86.56% 87.03% +0.47%
==========================================
Files 130 133 +3
Lines 3788 3873 +85
Branches 1133 1158 +25
==========================================
+ Hits 3279 3371 +92
+ Misses 394 388 -6
+ Partials 115 114 -1
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull Request Overview
The pull request successfully addresses the logic for mid-week plan changes and introduces an outbox pattern to harden extras billing. However, the current state is not up to standards according to Codacy, and there are critical reliability concerns that must be addressed before merging.
The most significant issue is that the 'atomic' outbox implementation in billing.usage.service.js is not actually atomic. Because it performs two sequential database writes without a transaction, a failure between these writes—combined with the idempotency protection on the first write—will lead to permanent loss of billing records for extras. This directly contradicts the goal of preventing revenue leaks. Additionally, several files have seen a significant spike in complexity (e.g., billing.reset.service.js and billing.meter.outbox.repository.js), which should be monitored for maintainability.
You must refactor the usage increment logic to use MongoDB transactions or implement a recovery check during idempotency replays to ensure the outbox record is always created.
About this PR
- The implementation of 'atomic' usage and outbox creation in the service layer lacks a database transaction. If the usage increment succeeds but the outbox row creation fails, the record for the extras debit is lost. Since the system uses idempotency keys, a retry by the caller will not fix this; it will simply skip the logic. This creates a systemic risk of revenue leakage.
Test suggestions
- Found recommended test scenario: forceRotateForPlanChange updates existing current-week usage document without an upsert
- Found recommended test scenario: forceRotateForPlanChange preserves existing meterUsed and breakdown by default
- Found recommended test scenario: forceRotateForPlanChange resets meterUsed to zero when preserveUsage is false
- Found recommended test scenario: incrementMeterWithOutbox creates a pending outbox row when usage overflows into extras
- Found recommended test scenario: attribute() marks the outbox row as committed immediately upon successful synchronous debit
- Found recommended test scenario: attribute() leaves the outbox row as pending and returns applied:true if synchronous debit fails
- Found recommended test scenario: retryPendingExtrasDebits cron reconciles pending rows and handles successful retries
- Found recommended test scenario: Outbox rows reach 'failed' status and emit billing.extras_debit.exhausted after 5 failures
TIP Improve review quality by adding custom instructions
TIP How was this review? Give us feedback
There was a problem hiding this comment.
Pull request overview
Hardens billing lifecycle behavior by (1) refreshing the active weekly usage snapshot on mid-cycle Stripe plan changes, and (2) introducing an outbox flow to make “usage increment + extras debit” resilient to partial failures.
Changes:
- Add
forceRotateForPlanChange()to refresh current-week quota/version snapshot on plan change (without weekly rollover), and wire it into the subscription webhook. - Introduce
BillingMeterOutbox+ retry service/cron to reconcile extras debits when synchronous debit fails after usage is already attributed. - Add/adjust unit + integration tests and update billing docs/cron README to reflect the new semantics.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| modules/billing/services/billing.webhook.service.js | Uses forceRotateForPlanChange() on plan.changed instead of resetWeek(). |
| modules/billing/services/billing.reset.service.js | Adds forceRotateForPlanChange() and emits billing.plan_change.rotated. |
| modules/billing/repositories/billing.usage.repository.js | Adds rotateWeekSnapshotForPlanChange() and session support to incrementMeter(). |
| modules/billing/services/billing.usage.service.js | Adds incrementMeterWithOutbox() wrapper that creates an outbox row on overflow. |
| modules/billing/services/billing.meter.service.js | Switches attribute() to optimistic “usage + outbox then best-effort debit”. |
| modules/billing/services/billing.meter.outbox.service.js | Implements retry loop + exhausted event emission for pending outbox rows. |
| modules/billing/repositories/billing.meter.outbox.repository.js | New repository for outbox CRUD + retry accounting. |
| modules/billing/models/billing.meter.outbox.model.mongoose.js | New Mongoose model for the outbox collection + indexes. |
| modules/billing/models/billing.meter.outbox.schema.js | New Zod schema mirroring the outbox model shape. |
| modules/billing/crons/retry-pending-extras-debit.cron.js | New cron entrypoint to reconcile pending outbox debits periodically. |
| modules/billing/lib/events.js | Documents new billing events emitted by the lifecycle hardening. |
| modules/billing/crons/README.md | Adds the new retry cron to the cron list and usage examples. |
| modules/billing/README.md | Documents plan-change semantics and extras-debit reliability/outbox behavior. |
| modules/billing/tests/billing.webhook.subscription.unit.tests.js | Updates webhook unit tests to expect forceRotateForPlanChange(). |
| modules/billing/tests/billing.reset.service.unit.tests.js | Adds unit tests for forceRotateForPlanChange() options/behavior. |
| modules/billing/tests/billing.usage.repository.unit.tests.js | Adds tests for snapshot rotation update behavior. |
| modules/billing/tests/billing.usage.service.unit.tests.js | Adds tests ensuring outbox creation happens only when needed. |
| modules/billing/tests/billing.meter.service.unit.tests.js | Updates meter attribution tests to use incrementMeterWithOutbox() and cover outbox commit/defer. |
| modules/billing/tests/billing.meter.outbox.unit.tests.js | New unit tests for outbox schema validation and repository helpers. |
| modules/billing/tests/billing.cron.retryPendingExtrasDebit.unit.tests.js | New unit tests for retry behavior and exhausted event emission. |
| modules/billing/tests/billing.lifecycle.integration.tests.js | New integration tests covering plan-change rotation and outbox reconciliation. |
…ron race Add status:'pending' filter to both findOneAndUpdate calls in markFailedAttempt so concurrent K8s CronJob instances cannot double-emit billing.extras_debit.exhausted. The second cron that loses the status-flip race receives null back and skips emit. Add regression test for the concurrent-nil path.
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modules/billing/crons/retry-pending-extras-debit.cron.js`:
- Around line 1-43: The cron entrypoint filename does not follow the modules
cron naming convention; rename the script to the billing-prefixed form (e.g.,
billing.retry-pending-extras-debit.cron.js) so it matches
modules/{name}/crons/{name}.{action}.js, and update any references (CronJob
manifests, deployment scripts, CI/docs, imports) that point to
retry-pending-extras-debit.cron.js to the new
billing.retry-pending-extras-debit.cron.js name to keep tooling and docs
consistent.
- Around line 33-39: The cron currently sets process.exitCode = 1 when
result.exhausted > 0 which treats a normal "exhausted" business outcome as an
infrastructure failure; update the success path in the
billing.retryPendingExtrasDebit flow (the block that logs the
scanned/committed/failedAttempts/exhausted message) to NOT set a non‑zero exit
code for result.exhausted — only set process.exitCode to non‑zero on real
runtime errors (the catch block) or if there are actual failedAttempts you want
to signal; keep the existing logging but remove or change the line that sets
process.exitCode based solely on result.exhausted so exhausted rows are
considered handled.
In `@modules/billing/repositories/billing.meter.outbox.repository.js`:
- Around line 71-75: Update the atomic updates to guard against race conditions
by including the current status in the query filter: change
BillingMeterOutbox().updateOne({ _id: id }, ...) in markCommitted to use
BillingMeterOutbox().updateOne({ _id: id, status: 'pending' }, ...) so only
pending rows can be moved to committed, and apply the same pattern in
markFailedAttempt (and the related update methods around lines 86-106) so their
updateOne filters include status: 'pending' (or the appropriate allowed
prior-status set) before setting lastError/lastAttemptedAt/incrementing
attempts; keep the rest of the update payload the same so updates remain atomic
and race-safe.
In `@modules/billing/repositories/billing.usage.repository.js`:
- Around line 251-253: When preserveUsage is false the reset only sets
update.$set.meterUsed and update.$set.meterBreakdown but leaves threshold flags
(alertedAt80/alertedAt100) intact, suppressing future alerts; modify the branch
where preserveUsage is checked to also clear those flags by setting
update.$set.alertedAt80 = false and update.$set.alertedAt100 = false (or
delete/reset them in meterBreakdown if they live there) so that the new quota
window can trigger thresholds again; locate the logic around preserveUsage,
update.$set.meterUsed and update.$set.meterBreakdown in
billing.usage.repository.js to apply the change.
In `@modules/billing/services/billing.meter.outbox.service.js`:
- Around line 49-67: The code treats any debitResult.applied === false as a
retryable failure, but BillingExtraService.debit() can return false for
idempotent replays; change the logic to detect duplicate/idempotency replays and
mark the outbox committed instead of incrementing attempts. Either (A) make
BillingExtraService.debit() include a distinguishing field (e.g.
debitResult.duplicate === true or debitResult.reason === 'idempotency_replay')
and update this handler to call
BillingMeterOutboxRepository.markCommitted(row._id) and increment committed when
that flag is present, or (B) if debit() cannot be changed, perform a lookup
using the outbox row’s idempotency key (from row.payload) to detect an existing
successful debit and then call markCommitted/emit as above; continue to use
markFailedAttempt/emitExhausted only for real failures.
- Around line 15-23: The emitExhausted helper currently calls
billingEvents.emit('billing.extras_debit.exhausted', ...) directly which can
throw and abort the sweep; wrap that emit call in a try/catch inside
emitExhausted (the const emitExhausted function) so listener errors are caught,
logged (non-fatal) and not rethrown—ensure you still pass the same payload
(organizationId, idempotencyKey, extrasUnits, attempts, lastError) but swallow
listener exceptions to avoid failing the entire cron/sweep.
In `@modules/billing/services/billing.usage.service.js`:
- Around line 233-243: incrementMeterWithOutbox can short-circuit and lose the
outbox row if incrementMeter applied the usage but the process died before
BillingMeterOutboxRepository.create completed; change the flow so when
result.applied === true and result.extrasConsumed > 0 you do a repository-level
recover-or-create instead of naively calling create and returning on failure:
call a new or existing repo method (e.g.,
BillingMeterOutboxRepository.findByIdempotencyKeyOrCreate /
claimOrRecoverOutbox) that atomically finds an existing outbox row by
organizationId + idempotencyKey and, if missing, inserts one with extrasUnits =
result.extrasConsumed using an upsert or unique constraint to avoid races;
return the found-or-created outbox with the result so the “usage applied but
outbox missing” case is repaired rather than short-circuited.
In `@modules/billing/services/billing.webhook.service.js`:
- Around line 256-258: The code sets planChangeResetTriggered = true
unconditionally after calling
BillingResetService.forceRotateForPlanChange(organizationId, { preserveUsage:
true }); but forceRotateForPlanChange can return null when no rotation occurred;
update the code to inspect the return value (e.g., const rotationResult = await
BillingResetService.forceRotateForPlanChange(...)) and only set
planChangeResetTriggered = true when rotationResult indicates a successful
rotation (non-null/truthy), so the subsequent fallback call to resetWeek(...)
still runs when no rotation happened.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cb12caee-2e8f-4fcc-b424-51740e48b8ad
📒 Files selected for processing (21)
modules/billing/README.mdmodules/billing/crons/README.mdmodules/billing/crons/retry-pending-extras-debit.cron.jsmodules/billing/lib/events.jsmodules/billing/models/billing.meter.outbox.model.mongoose.jsmodules/billing/models/billing.meter.outbox.schema.jsmodules/billing/repositories/billing.meter.outbox.repository.jsmodules/billing/repositories/billing.usage.repository.jsmodules/billing/services/billing.meter.outbox.service.jsmodules/billing/services/billing.meter.service.jsmodules/billing/services/billing.reset.service.jsmodules/billing/services/billing.usage.service.jsmodules/billing/services/billing.webhook.service.jsmodules/billing/tests/billing.cron.retryPendingExtrasDebit.unit.tests.jsmodules/billing/tests/billing.lifecycle.integration.tests.jsmodules/billing/tests/billing.meter.outbox.unit.tests.jsmodules/billing/tests/billing.meter.service.unit.tests.jsmodules/billing/tests/billing.reset.service.unit.tests.jsmodules/billing/tests/billing.usage.repository.unit.tests.jsmodules/billing/tests/billing.usage.service.unit.tests.jsmodules/billing/tests/billing.webhook.subscription.unit.tests.js
- crypto.randomInt for cron jitter (resolves Codacy critical) - markCommitted: add status:pending guard so committed/failed rows are immutable - emitExhausted: wrap billingEvents.emit in try/catch so listener throws cannot cause markFailedAttempt to double-count (Copilot thread 5) - webhook: forceRotateForPlanChange no longer suppresses resetWeek when period also changed (combined plan+period change now calls both, fixes Copilot thread 8) - tests: align to new markCommitted filter + combined plan+period webhook contract
- cron: exhausted rows are a handled business outcome — exit 0, warn instead of exit 1 so K8s CronJob does not treat normal reconciliation as an infrastructure failure - repository: rotateWeekSnapshotForPlanChange with preserveUsage=false now also clears alertedAt80/alertedAt100 so the new quota window can trigger threshold alerts again
|
@coderabbitai — responding to all 5 threads from the latest review: Thread 1 (rename cron): The existing billing crons use Thread 2 (exit code 1 for exhausted): Fixed in 69be3fc — exhausted rows now warn and exit 0. Infrastructure failures only set exit 1. Thread 3 (alertedAt80/100 on clean-break): Fixed in 69be3fc — rotateWeekSnapshotForPlanChange with preserveUsage=false now also clears alertedAt80 and alertedAt100. Thread 4 (idempotency replays vs real failures): Acknowledged — requires debit() API extension to surface a reason field. Deferred as follow-up issue. Thread 5 (lost-outbox gap, critical): Acknowledged design trade-off already documented in code comments and README. The session plumbing added in this PR supports a future transaction-based fix. A findOrCreate/upsert approach is also viable — deferred to a follow-up PR. |
|
Tip For best results, initiate chat on the files or code changes.
Thread 1 (rename cron): Understood — the Thread 2 (exit code 1 for exhausted): ✅ Confirmed — exhausted rows now exit 0 (warn only); exit 1 is reserved for infrastructure failures. Fix looks correct. Thread 3 (alertedAt80/100 on clean-break): ✅ Confirmed — clearing Thread 4 (idempotency replays vs real failures): Acknowledged. The core issue is that without a Thread 5 (lost-outbox gap, critical): Acknowledged. The trade-off (usage incremented but outbox insert not yet persisted) is a known atomicity gap; the session plumbing laid here and a future |
Summary
Bundles 2 lifecycle hardening findings from the post-recalibration billing audit:
forceRotateForPlanChange()—resetWeek()was no-op when the current ISO week doc existed, leaving stale quota/version snapshot on plan change (mid-week upgrades/downgrades broken until next weekly reset)Changes
B — Plan-change forced rotation
BillingResetService.forceRotateForPlanChange(orgId, { preserveUsage })updates current week snapshot quota/version mid-periodpreserveUsage: true(default) keepsmeterUsed(no double-charge / no refund)preserveUsage: falseresetsmeterUsed = 0(clean-break downgrade option)billing.webhook.service.jsplan.changed handler now callsforceRotateForPlanChangewith defaultpreserveUsage: truebilling.plan_change.rotatedevent (try/catch guarded) for audit trailrotateWeekSnapshotForPlanChange()added to usage repository (targeted update, no upsert)C — Outbox for atomic usage+extras
BillingMeterOutboxcollection withuniqueindex onidempotencyKey(prevents double-create race)BillingUsageService.incrementMeterWithOutbox()creates outbox row alongside usage incrementattribute()callsincrementMeterWithOutboxthen attempts synchronous debit; debit failure leaves outboxpendingfor retryretry-pending-extras-debit.cron.jsreconciles every 5min (with jitter)failed+billing.extras_debit.exhaustedevent emittedREADME
Test plan
forceRotateForPlanChangeunit tests: preserveUsage=true/false, no-op on missing weekSummary by CodeRabbit
New Features
Documentation