[494] Redis 잔량 차감 과정 장애 대응 추가#237
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a DB-backed fallback path and replay pipeline for Redis-backed traffic deduction: new metrics, per-trace execution context, delta record entity/mapper, DB deduct service, scheduled idempotent replay, Redis retry/backoff in adapter, in-flight state changes, DB schema, configuration, and tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Orchestrator as Deduct<br/>Orchestrator
participant Adapter as Hydrate/Refill<br/>Adapter
participant Redis
participant DB as DB Fallback<br/>Service
participant Delta as Delta<br/>Record Service
participant Scheduler as Replay<br/>Scheduler
Client->>Orchestrator: deduct(payload, traceId)
Orchestrator->>Adapter: executeWithRecovery(payload, amount, context)
loop Redis Retry 1..N
Adapter->>Redis: Lua deduct script
alt Redis success
Redis-->>Adapter: result
Adapter-->>Orchestrator: return result
else Redis failure/timeout
Adapter->>Adapter: markRedisRetry(trace, attempt)
Adapter->>Adapter: backoff
end
end
alt Redis exhausted -> DB fallback
Adapter->>Adapter: activateRedisFallback(context)
Adapter->>DB: deduct(poolType, payload, amount, context)
DB->>DB: validate policies & apply limits
DB->>Delta: recordDelta(traceId,...)
DB-->>Adapter: fallback result
Adapter-->>Orchestrator: return result
end
par Async replay
Scheduler->>Delta: lockReplayCandidates()
Delta-->>Scheduler: return records
loop for each record
Scheduler->>Redis: Lua replay script (idempotent)
alt success
Redis-->>Scheduler: ok
Scheduler->>Delta: markSuccess(id)
else failure
Scheduler->>Delta: markFailWithRetryIncrement(id)
end
Scheduler->>Adapter: updateReplayBacklog(...)
end
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (8)
src/main/resources/application-local.yml (1)
84-87: Consider exposingprocessing-stuck-secondshere for local tuning parity.
TrafficUsageDeltaReplayScheduleralso readsapp.traffic.redis-usage-replay.processing-stuck-seconds(with default 180). Adding it to local config would make replay behavior fully tunable in this profile as well.Suggested YAML addition
traffic: deduct: redis-retry: max-attempts: ${TRAFFIC_DEDUCT_REDIS_RETRY_MAX_ATTEMPTS:3} backoff-ms: ${TRAFFIC_DEDUCT_REDIS_RETRY_BACKOFF_MS:50} redis-usage-replay: fixed-delay-ms: ${TRAFFIC_REDIS_USAGE_REPLAY_FIXED_DELAY_MS:5000} batch-size: ${TRAFFIC_REDIS_USAGE_REPLAY_BATCH_SIZE:200} max-retry-count: ${TRAFFIC_REDIS_USAGE_REPLAY_MAX_RETRY_COUNT:20} + processing-stuck-seconds: ${TRAFFIC_REDIS_USAGE_REPLAY_PROCESSING_STUCK_SECONDS:180}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/resources/application-local.yml` around lines 84 - 87, Add the missing processing-stuck-seconds property to the local profile so TrafficUsageDeltaReplayScheduler can be tuned locally; specifically, in the redis-usage-replay block add app.traffic.redis-usage-replay.processing-stuck-seconds (or simply processing-stuck-seconds if following the local file's nested path style) with the default value 180 (or reference ${TRAFFIC_REDIS_USAGE_REPLAY_PROCESSING_STUCK_SECONDS:180}) so the scheduler (TrafficUsageDeltaReplayScheduler) reads the same configurable value in the local profile.src/main/java/com/pooli/traffic/domain/TrafficDeductExecutionContext.java (1)
19-21: Consider validating traceId for consistency.Other factory methods in this codebase (e.g.,
TrafficRedisKeyFactory.dedupeRunKey) validate that identifiers are non-null and non-blank. Consider adding similar validation here to fail fast on invalid input.🔧 Optional validation
public static TrafficDeductExecutionContext of(String traceId) { + if (traceId == null || traceId.isBlank()) { + throw new IllegalArgumentException("traceId must not be null or blank"); + } return new TrafficDeductExecutionContext(traceId); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/traffic/domain/TrafficDeductExecutionContext.java` around lines 19 - 21, The factory method TrafficDeductExecutionContext.of should validate its traceId parameter before constructing the object: ensure traceId is non-null and not blank (trimmed) and throw an IllegalArgumentException (or use Objects.requireNonNull with a clear message) if invalid, then call the constructor TrafficDeductExecutionContext(String) only when validation passes so callers fail fast on bad input.src/main/java/com/pooli/monitoring/metrics/TrafficDeductFallbackMetrics.java (1)
68-77: Minor race condition in lazy gauge registration.The check-then-register pattern has a race window where concurrent calls could both pass the null check before either registers. While Micrometer handles duplicate registrations gracefully (returning the existing gauge), this is slightly wasteful.
🔧 Thread-safe registration using AtomicBoolean
+import java.util.concurrent.atomic.AtomicBoolean; + `@Component` `@Profile`({"local", "traffic"}) `@RequiredArgsConstructor` public class TrafficDeductFallbackMetrics { private final MeterRegistry meterRegistry; private final AtomicLong replayBacklogGauge = new AtomicLong(0L); + private final AtomicBoolean gaugeRegistered = new AtomicBoolean(false); // ... private void ensureReplayBacklogGaugeRegistered() { - // 중복 등록을 피하기 위해 현재 등록 여부를 먼저 확인한다. - if (meterRegistry.find("traffic_redis_usage_replay_backlog").gauge() != null) { + if (gaugeRegistered.get()) { return; } - - Gauge.builder("traffic_redis_usage_replay_backlog", replayBacklogGauge, AtomicLong::get) - .description("Pending redis usage delta replay backlog") - .register(meterRegistry); + if (gaugeRegistered.compareAndSet(false, true)) { + Gauge.builder("traffic_redis_usage_replay_backlog", replayBacklogGauge, AtomicLong::get) + .description("Pending redis usage delta replay backlog") + .register(meterRegistry); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/monitoring/metrics/TrafficDeductFallbackMetrics.java` around lines 68 - 77, ensureReplayBacklogGaugeRegistered currently uses a check-then-register pattern that can race; add an AtomicBoolean guard (e.g., replayBacklogGaugeRegistered) as a thread-safe one-time registration flag and change the method to perform compareAndSet(false, true) before creating the Gauge; if compareAndSet fails, return early. Keep the existing meterRegistry.find(...) check as a fallback to avoid duplicate registration if the gauge already exists.src/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaReplaySchedulerTest.java (1)
58-119: Consider adding test for max retry exceeded scenario.The tests cover success and transient failure paths, but the scheduler also has a permanent failure path when
retryCount >= maxRetryCount(perTrafficUsageDeltaReplaySchedulerlines 68-77). This branch callsmarkFailWithRetryCount()instead ofmarkFailWithRetryIncrement().💡 Suggested test case
`@Test` `@DisplayName`("max retry 초과 시 영구 실패 상태로 전이한다") void marksFailWithRetryCountWhenMaxRetryExceeded() { // given TrafficRedisUsageDeltaRecord record = record(3L, 20); // retryCount >= maxRetryCount when(trafficUsageDeltaRecordService.lockReplayCandidatesAndMarkProcessing(10, 180)) .thenReturn(List.of(record)); when(trafficUsageDeltaRecordService.countBacklog()).thenReturn(0L); // when trafficUsageDeltaReplayScheduler.runReplayCycle(); // then verify(trafficUsageDeltaRecordService).markFailWithRetryCount(3L, 20, "max retry exceeded"); verify(trafficDeductFallbackMetrics).incrementReplayResult("max_retry_exceeded"); verify(cacheStringRedisTemplate, never()).execute(any(), anyList(), any(), any(), any(), any(), any(), any()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaReplaySchedulerTest.java` around lines 58 - 119, Add a unit test for the "max retry exceeded" branch in TrafficUsageDeltaReplayScheduler: create a test method (e.g., marksFailWithRetryCountWhenMaxRetryExceeded) that returns a TrafficRedisUsageDeltaRecord with retryCount >= maxRetryCount from trafficUsageDeltaRecordService.lockReplayCandidatesAndMarkProcessing(...), stub countBacklog as appropriate, run trafficUsageDeltaReplayScheduler.runReplayCycle(), then verify trafficUsageDeltaRecordService.markFailWithRetryCount(recordId, retryCount, "max retry exceeded") is called, trafficDeductFallbackMetrics.incrementReplayResult("max_retry_exceeded") is called, and cacheStringRedisTemplate.execute(...) is never invoked; use the existing helper record(...) and the same mocked dependencies (trafficUsageDeltaRecordService, trafficDeductFallbackMetrics, cacheStringRedisTemplate, etc.) to locate where to add this test.src/main/java/com/pooli/traffic/domain/enums/TrafficInFlightState.java (1)
17-23: Consider adding validation for invalid retry attempt values.The
defaultcase silently maps values like0, negative numbers, or values>3toREDIS_RETRY_3. While this is defensive, it could mask bugs in calling code.♻️ Optional: Add validation to catch programming errors
public static TrafficInFlightState fromRetryAttempt(int attempt) { + if (attempt < 1 || attempt > 3) { + throw new IllegalArgumentException("Invalid retry attempt: " + attempt); + } return switch (attempt) { case 1 -> REDIS_RETRY_1; case 2 -> REDIS_RETRY_2; default -> REDIS_RETRY_3; }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/traffic/domain/enums/TrafficInFlightState.java` around lines 17 - 23, The fromRetryAttempt method currently maps any non-1/2 value to REDIS_RETRY_3 which can hide bugs; update TrafficInFlightState.fromRetryAttempt to explicitly handle valid attempts 1, 2, 3 and validate input (e.g., if attempt < 1 || attempt > 3) throw an IllegalArgumentException with a clear message including the invalid attempt value instead of using the default; reference the method name TrafficInFlightState.fromRetryAttempt and enum constants REDIS_RETRY_1, REDIS_RETRY_2, REDIS_RETRY_3 when making the change.src/main/resources/mapper/traffic/TrafficDbSpeedBucketMapper.xml (1)
36-38: Replace deprecatedVALUES()with row alias syntax for MySQL 8.0.20+ compatibility.The
VALUES(used_bytes)syntax is deprecated as of MySQL 8.0.20 and will generate deprecation warnings. Use a row alias instead:Suggested fix
ON DUPLICATE KEY UPDATE used_bytes = used_bytes + new.used_bytes, updated_at = NOW(6)Update the INSERT statement to include an alias:
VALUES (...) AS new, then reference the alias in the UPDATE clause.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/resources/mapper/traffic/TrafficDbSpeedBucketMapper.xml` around lines 36 - 38, The ON DUPLICATE KEY UPDATE clause uses deprecated VALUES(used_bytes); change the INSERT so it assigns a row alias (e.g. AS new) and update the clause to reference that alias (use new.used_bytes in the expression updating used_bytes and keep updated_at = NOW(6)); locate the INSERT that leads into the ON DUPLICATE KEY UPDATE and add the alias and replace VALUES(...) occurrences with the alias.field references.src/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaRecordServiceTest.java (1)
35-50: Consider verifying the record content with ArgumentCaptor.The test confirms
insertIgnoreDuplicateis called but doesn't verify that theTrafficRedisUsageDeltaRecordpassed contains the expected field values (traceId, poolType, lineId, etc.). Using anArgumentCaptorwould strengthen this test.💡 Optional improvement to verify record content
+import org.mockito.ArgumentCaptor; + `@Test` `@DisplayName`("유효한 fallback 결과를 usage delta로 저장한다") void recordsUsageDeltaWhenInputIsValid() { // when trafficUsageDeltaRecordService.record( "trace-001", TrafficPoolType.INDIVIDUAL, 11L, 22L, 33, 100L, LocalDate.of(2026, 3, 16), YearMonth.of(2026, 3) ); // then - verify(trafficRedisUsageDeltaMapper).insertIgnoreDuplicate(any(TrafficRedisUsageDeltaRecord.class)); + ArgumentCaptor<TrafficRedisUsageDeltaRecord> captor = ArgumentCaptor.forClass(TrafficRedisUsageDeltaRecord.class); + verify(trafficRedisUsageDeltaMapper).insertIgnoreDuplicate(captor.capture()); + TrafficRedisUsageDeltaRecord captured = captor.getValue(); + assertEquals("trace-001", captured.getTraceId()); + assertEquals(TrafficPoolType.INDIVIDUAL, captured.getPoolType()); + assertEquals(100L, captured.getUsedBytes()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaRecordServiceTest.java` around lines 35 - 50, Add an ArgumentCaptor for TrafficRedisUsageDeltaRecord in TrafficUsageDeltaRecordServiceTest to capture the argument passed to trafficRedisUsageDeltaMapper.insertIgnoreDuplicate when calling trafficUsageDeltaRecordService.record(...); after invoking record, verify the mapper was called, capture the record, and assert its fields (traceId, poolType, lineId, planId, usageDelta, version/usageDate/YearMonth equivalents) match the inputs provided in the test to ensure the correct payload is constructed.src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java (1)
545-550: Consider distinguishing null from negative innormalizeNonNegative.The method treats both
nulland negative values as0. If negative values from the DB indicate a data integrity issue, they would be silently converted to0without logging. This is likely acceptable for this fallback path, but worth noting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java` around lines 545 - 550, The normalizeNonNegative method currently collapses null and negative values to 0; change it so null still yields 0L but negative values are treated distinctly: add a warning log that includes the offending value (and any identifying context available) and then return 0L for the fallback path; ensure a Logger (e.g., private static final Logger logger) is available in the class and use it in normalizeNonNegative(Long value) to emit the warning when value < 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java`:
- Around line 159-176: The retry branch after deductRemainingAmount currently
computes reloadedAnswer from selectRemainingAmount without re-applying policy
limits, which can allow a bypass if balance changed; update the retry logic in
TrafficDbDeductFallbackService so that after computing reloadedAmount you
recompute and clamp reloadedAnswer against the same daily/monthly/app limits
used earlier (the same logic that produced the original answer from
normalizedRequestedBytes), then set reloadedStatus based on the clamped value
and call deductRemainingAmount with that clamped reloadedAnswer (or
alternatively add a clear comment explaining why re-applying limits is
unnecessary if that's the intended, safe behavior); reference
deductRemainingAmount, selectRemainingAmount, normalizedRequestedBytes,
buildResult, and TrafficLuaStatus when updating the code.
In
`@src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java`:
- Around line 613-627: The code currently skips context.activateRedisFallback()
when context is null but proceeds to DB fallback, which loses request-level
fallback state; modify the block handling null context (in
TrafficHydrateRefillAdapterService around activateRedisFallback/markDbFallback)
to explicitly handle the null case: if context is null, log a warning (including
resolveTraceId(context, payload) and poolType) and either instantiate a minimal
context from payload.getTraceId() and call activateRedisFallback() so subsequent
calls see the fallback, or call a safe fallback-marking helper that records DB
fallback by traceId before calling
trafficInFlightDedupeService.markDbFallback(...) and
trafficDbDeductFallbackService.deduct(...); ensure resolveTraceId,
activateRedisFallback, trafficInFlightDedupeService.markDbFallback, and
trafficDbDeductFallbackService.deduct are used consistently.
---
Nitpick comments:
In
`@src/main/java/com/pooli/monitoring/metrics/TrafficDeductFallbackMetrics.java`:
- Around line 68-77: ensureReplayBacklogGaugeRegistered currently uses a
check-then-register pattern that can race; add an AtomicBoolean guard (e.g.,
replayBacklogGaugeRegistered) as a thread-safe one-time registration flag and
change the method to perform compareAndSet(false, true) before creating the
Gauge; if compareAndSet fails, return early. Keep the existing
meterRegistry.find(...) check as a fallback to avoid duplicate registration if
the gauge already exists.
In `@src/main/java/com/pooli/traffic/domain/enums/TrafficInFlightState.java`:
- Around line 17-23: The fromRetryAttempt method currently maps any non-1/2
value to REDIS_RETRY_3 which can hide bugs; update
TrafficInFlightState.fromRetryAttempt to explicitly handle valid attempts 1, 2,
3 and validate input (e.g., if attempt < 1 || attempt > 3) throw an
IllegalArgumentException with a clear message including the invalid attempt
value instead of using the default; reference the method name
TrafficInFlightState.fromRetryAttempt and enum constants REDIS_RETRY_1,
REDIS_RETRY_2, REDIS_RETRY_3 when making the change.
In `@src/main/java/com/pooli/traffic/domain/TrafficDeductExecutionContext.java`:
- Around line 19-21: The factory method TrafficDeductExecutionContext.of should
validate its traceId parameter before constructing the object: ensure traceId is
non-null and not blank (trimmed) and throw an IllegalArgumentException (or use
Objects.requireNonNull with a clear message) if invalid, then call the
constructor TrafficDeductExecutionContext(String) only when validation passes so
callers fail fast on bad input.
In
`@src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java`:
- Around line 545-550: The normalizeNonNegative method currently collapses null
and negative values to 0; change it so null still yields 0L but negative values
are treated distinctly: add a warning log that includes the offending value (and
any identifying context available) and then return 0L for the fallback path;
ensure a Logger (e.g., private static final Logger logger) is available in the
class and use it in normalizeNonNegative(Long value) to emit the warning when
value < 0.
In `@src/main/resources/application-local.yml`:
- Around line 84-87: Add the missing processing-stuck-seconds property to the
local profile so TrafficUsageDeltaReplayScheduler can be tuned locally;
specifically, in the redis-usage-replay block add
app.traffic.redis-usage-replay.processing-stuck-seconds (or simply
processing-stuck-seconds if following the local file's nested path style) with
the default value 180 (or reference
${TRAFFIC_REDIS_USAGE_REPLAY_PROCESSING_STUCK_SECONDS:180}) so the scheduler
(TrafficUsageDeltaReplayScheduler) reads the same configurable value in the
local profile.
In `@src/main/resources/mapper/traffic/TrafficDbSpeedBucketMapper.xml`:
- Around line 36-38: The ON DUPLICATE KEY UPDATE clause uses deprecated
VALUES(used_bytes); change the INSERT so it assigns a row alias (e.g. AS new)
and update the clause to reference that alias (use new.used_bytes in the
expression updating used_bytes and keep updated_at = NOW(6)); locate the INSERT
that leads into the ON DUPLICATE KEY UPDATE and add the alias and replace
VALUES(...) occurrences with the alias.field references.
In
`@src/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaRecordServiceTest.java`:
- Around line 35-50: Add an ArgumentCaptor for TrafficRedisUsageDeltaRecord in
TrafficUsageDeltaRecordServiceTest to capture the argument passed to
trafficRedisUsageDeltaMapper.insertIgnoreDuplicate when calling
trafficUsageDeltaRecordService.record(...); after invoking record, verify the
mapper was called, capture the record, and assert its fields (traceId, poolType,
lineId, planId, usageDelta, version/usageDate/YearMonth equivalents) match the
inputs provided in the test to ensure the correct payload is constructed.
In
`@src/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaReplaySchedulerTest.java`:
- Around line 58-119: Add a unit test for the "max retry exceeded" branch in
TrafficUsageDeltaReplayScheduler: create a test method (e.g.,
marksFailWithRetryCountWhenMaxRetryExceeded) that returns a
TrafficRedisUsageDeltaRecord with retryCount >= maxRetryCount from
trafficUsageDeltaRecordService.lockReplayCandidatesAndMarkProcessing(...), stub
countBacklog as appropriate, run
trafficUsageDeltaReplayScheduler.runReplayCycle(), then verify
trafficUsageDeltaRecordService.markFailWithRetryCount(recordId, retryCount, "max
retry exceeded") is called,
trafficDeductFallbackMetrics.incrementReplayResult("max_retry_exceeded") is
called, and cacheStringRedisTemplate.execute(...) is never invoked; use the
existing helper record(...) and the same mocked dependencies
(trafficUsageDeltaRecordService, trafficDeductFallbackMetrics,
cacheStringRedisTemplate, etc.) to locate where to add this test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a3900fdd-5ed1-40a1-b674-d162fedc9f40
📒 Files selected for processing (30)
src/main/java/com/pooli/monitoring/metrics/TrafficDeductFallbackMetrics.javasrc/main/java/com/pooli/traffic/domain/TrafficDeductExecutionContext.javasrc/main/java/com/pooli/traffic/domain/entity/TrafficRedisUsageDeltaRecord.javasrc/main/java/com/pooli/traffic/domain/enums/TrafficInFlightState.javasrc/main/java/com/pooli/traffic/domain/enums/TrafficRedisUsageDeltaStatus.javasrc/main/java/com/pooli/traffic/mapper/TrafficDbSpeedBucketMapper.javasrc/main/java/com/pooli/traffic/mapper/TrafficDbUsageMapper.javasrc/main/java/com/pooli/traffic/mapper/TrafficRedisUsageDeltaMapper.javasrc/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.javasrc/main/java/com/pooli/traffic/service/decision/TrafficDeductOrchestratorService.javasrc/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.javasrc/main/java/com/pooli/traffic/service/decision/TrafficUsageDeltaRecordService.javasrc/main/java/com/pooli/traffic/service/decision/TrafficUsageDeltaReplayScheduler.javasrc/main/java/com/pooli/traffic/service/runtime/TrafficInFlightDedupeService.javasrc/main/java/com/pooli/traffic/service/runtime/TrafficLuaScriptInfraService.javasrc/main/java/com/pooli/traffic/service/runtime/TrafficRedisKeyFactory.javasrc/main/java/com/pooli/traffic/service/runtime/TrafficRedisRuntimePolicy.javasrc/main/resources/application-local.ymlsrc/main/resources/application-traffic.ymlsrc/main/resources/db/migration/V2603161100__create_traffic_deduct_fallback_tables.sqlsrc/main/resources/mapper/traffic/TrafficDbSpeedBucketMapper.xmlsrc/main/resources/mapper/traffic/TrafficDbUsageMapper.xmlsrc/main/resources/mapper/traffic/TrafficRedisUsageDeltaMapper.xmlsrc/test/java/com/pooli/traffic/acceptance/TrafficFlowLocalAcceptanceTest.javasrc/test/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackServiceTest.javasrc/test/java/com/pooli/traffic/service/decision/TrafficDeductOrchestratorServiceTest.javasrc/test/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterServiceTest.javasrc/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaRecordServiceTest.javasrc/test/java/com/pooli/traffic/service/decision/TrafficUsageDeltaReplaySchedulerTest.javasrc/test/java/com/pooli/traffic/service/runtime/TrafficInFlightDedupeServiceTest.java
…는 상황을 'Redis'로 알리는 모순적인 로직 제거. 해당 사실은 로그로만 기록.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java (1)
475-481:⚠️ Potential issue | 🔴 CriticalPrevent DB fallback re-deduct after DB refill claim (double-decrement risk).
When Line 475 and Line 519 re-run deduct after a successful DB refill claim/apply,
executeDeductcan still hit Line 626 and call DB fallback deduct. That can decrement DB remaining a second time for the same request.🛠️ Suggested guard (disable DB fallback in post-refill deduct calls)
- private TrafficLuaExecutionResult executeDeduct( + private TrafficLuaExecutionResult executeDeduct( TrafficPoolType poolType, TrafficPayloadReqDto payload, String balanceKey, long requestedDataBytes, - TrafficDeductExecutionContext context + TrafficDeductExecutionContext context, + boolean allowDbFallback ) { @@ - return trafficDbDeductFallbackService.deduct(poolType, payload, requestedDataBytes, fallbackContext); + if (!allowDbFallback) { + log.warn( + "traffic_deduct_redis_retry_exhausted_post_refill traceId={} poolType={} requestedData={} fallback=skipped", + resolveTraceId(fallbackContext, payload), + poolType, + requestedDataBytes + ); + return TrafficLuaExecutionResult.builder() + .answer(0L) + .status(TrafficLuaStatus.NO_BALANCE) + .build(); + } + return trafficDbDeductFallbackService.deduct(poolType, payload, requestedDataBytes, fallbackContext); }- TrafficLuaExecutionResult initialResult = executeDeduct(poolType, payload, balanceKey, requestedDataBytes, context); + TrafficLuaExecutionResult initialResult = executeDeduct(poolType, payload, balanceKey, requestedDataBytes, context, true); @@ - TrafficLuaExecutionResult refillRetryResult = executeDeduct( + TrafficLuaExecutionResult refillRetryResult = executeDeduct( poolType, payload, balanceKey, retryTargetData, - context + context, + false ); @@ - TrafficLuaExecutionResult refillRetryResult = executeDeduct( + TrafficLuaExecutionResult refillRetryResult = executeDeduct( poolType, payload, balanceKey, retryTargetData, - context + context, + false );Also applies to: 519-525, 573-627
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java` around lines 475 - 481, The retry deduct after a successful DB refill is re-invoking executeDeduct (e.g., the call that assigns refillRetryResult using retryTargetData) which still allows the DB fallback deduct path to run and double-decrement DB state; fix by adding a boolean flag/enum parameter to executeDeduct (e.g., allowDbFallback or deductMode) to disable DB-fallback logic during post-refill retries, update callers that perform post-refill re-deducts (the calls that create refillRetryResult and the other retry sites) to pass the “disable DB fallback” value, and ensure the DB-fallback conditional inside executeDeduct respects that flag so the fallback deduct branch is skipped for those retry calls.
🧹 Nitpick comments (1)
src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java (1)
120-157: Extract the repeated policy-clamping pipeline into one helper.The same limit sequence is implemented twice (primary and reloaded paths). This is easy to drift later and can reintroduce policy mismatch bugs.
♻️ Refactor sketch
- answer = applyDailyLimitIfEnabled(policyActivation, lineLimit, payload.getLineId(), usageDate, answer); - ... - answer = applyAppSpeedLimitIfEnabled(policyActivation, poolType, appPolicy, payload, nowEpochSecond, answer); + LimitApplyResult limited = applyPolicyLimits( + policyActivation, + whitelistBypass, + poolType, + payload, + lineLimit, + appPolicy, + usageDate, + targetMonth, + nowEpochSecond, + answer, + finalStatus + ); + answer = limited.answer(); + finalStatus = limited.status(); + if (answer <= 0) { + return buildResult(0L, finalStatus); + } @@ - reloadedAnswer = applyDailyLimitIfEnabled(...); - ... - reloadedAnswer = applyAppSpeedLimitIfEnabled(...); + LimitApplyResult limitedReloaded = applyPolicyLimits( + policyActivation, + whitelistBypass, + poolType, + payload, + lineLimit, + appPolicy, + usageDate, + targetMonth, + nowEpochSecond, + reloadedAnswer, + reloadedStatus + ); + reloadedAnswer = limitedReloaded.answer(); + reloadedStatus = limitedReloaded.status(); + if (reloadedAnswer <= 0) { + return buildResult(0L, reloadedStatus); + }Also applies to: 170-236
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java` around lines 120 - 157, Extract the repeated clamping pipeline into a single helper method (e.g., clampLimitsSequentially or applyPolicyClamps) that accepts the inputs used here—policyActivation, lineLimit, appPolicy, payload (lineId, appId), whitelistBypass, poolType, usageDate, targetMonth, nowEpochSecond, currentAmount, normalizedRequestedBytes—and returns the resulting allowedBytes and final TrafficLuaStatus; inside it call applyDailyLimitIfEnabled, applySharedMonthlyLimitIfEnabled, applyAppDailyLimitIfEnabled and applyAppSpeedLimitIfEnabled in the same order, check for <=0 to return buildResult early, and update finalStatus when a clamp reduces the value; replace both duplicated blocks (this block and the block at lines 170-236) with calls to the new helper to keep logic centralized and avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In
`@src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java`:
- Around line 475-481: The retry deduct after a successful DB refill is
re-invoking executeDeduct (e.g., the call that assigns refillRetryResult using
retryTargetData) which still allows the DB fallback deduct path to run and
double-decrement DB state; fix by adding a boolean flag/enum parameter to
executeDeduct (e.g., allowDbFallback or deductMode) to disable DB-fallback logic
during post-refill retries, update callers that perform post-refill re-deducts
(the calls that create refillRetryResult and the other retry sites) to pass the
“disable DB fallback” value, and ensure the DB-fallback conditional inside
executeDeduct respects that flag so the fallback deduct branch is skipped for
those retry calls.
---
Nitpick comments:
In
`@src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.java`:
- Around line 120-157: Extract the repeated clamping pipeline into a single
helper method (e.g., clampLimitsSequentially or applyPolicyClamps) that accepts
the inputs used here—policyActivation, lineLimit, appPolicy, payload (lineId,
appId), whitelistBypass, poolType, usageDate, targetMonth, nowEpochSecond,
currentAmount, normalizedRequestedBytes—and returns the resulting allowedBytes
and final TrafficLuaStatus; inside it call applyDailyLimitIfEnabled,
applySharedMonthlyLimitIfEnabled, applyAppDailyLimitIfEnabled and
applyAppSpeedLimitIfEnabled in the same order, check for <=0 to return
buildResult early, and update finalStatus when a clamp reduces the value;
replace both duplicated blocks (this block and the block at lines 170-236) with
calls to the new helper to keep logic centralized and avoid drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0729239e-3491-4ebf-b477-e2b6ad35e1e2
📒 Files selected for processing (4)
src/main/java/com/pooli/traffic/service/decision/TrafficDbDeductFallbackService.javasrc/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.javasrc/main/java/com/pooli/traffic/service/runtime/TrafficInFlightDedupeService.javasrc/test/java/com/pooli/traffic/service/runtime/TrafficInFlightDedupeServiceTest.java
Misu0616
left a comment
There was a problem hiding this comment.
https://www.notion.so/yerin1412/325389b3e03980f1b104c1640a77230e?source=copy_link
다시 한 번 돌려봤는데 나온 거 있어서 일단 공유만 해봅니다,,
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/test/java/com/pooli/traffic/service/outbox/RedisOutboxRetrySchedulerTest.java (1)
54-84: LGTM!The test thoroughly validates the key behavior: when
clearIdempotencythrows after a successful REFILL, the outbox record remains in success state without triggering retry increments or compensation. The verifications are comprehensive and target the exact contract ofclearIdempotencyBestEffort.Consider adding edge case tests for completeness (e.g., null payload, blank UUID) to ensure the defensive guards in
clearIdempotencyBestEffortare exercised. This is optional since the current test covers the primary failure scenario.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/java/com/pooli/traffic/service/outbox/RedisOutboxRetrySchedulerTest.java` around lines 54 - 84, Add optional edge-case tests to exercise clearIdempotencyBestEffort: create tests in RedisOutboxRetrySchedulerTest that simulate (1) a null payload returned from redisOutboxRecordService.readPayload(record, RefillOutboxPayload.class) and (2) a payload with a blank or null uuid, then call redisOutboxRetryScheduler.runRetryCycle() and assert that clearIdempotency is not invoked and the record is marked success (verify redisOutboxRecordService.markSuccess(...) and verify trafficRefillOutboxSupportService.clearIdempotency never()/notCalled). Use the same setup pattern (lockRetryCandidatesAndMarkProcessing, outboxRetryStrategyRegistry.get, outboxEventRetryStrategy.execute returning SUCCESS) and reference RedisOutboxRetryScheduler.runRetryCycle, RedisOutboxRecord, RefillOutboxPayload, redisOutboxRecordService.readPayload, and trafficRefillOutboxSupportService.clearIdempotency to locate the code to change.src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java (1)
682-760: Consider handling DB fallback exceptions explicitly.If
trafficDbDeductFallbackService.deduct()throws an exception (e.g.,DataAccessException), it propagates uncaught. Based on the relevant code snippet, the DB fallback service'sdeduct()method has no internal exception handling. This could leave the caller unable to distinguish Redis failures from DB failures during fallback.Consider wrapping the DB fallback call in a try-catch to:
- Log a distinct metric/error for DB fallback failures
- Return a graceful degradation result (e.g.,
NO_BALANCEwith answer=0)♻️ Proposed fix
// 최종적으로 DB 서비스를 사용하여 차감 결과를 반환합니다. - return trafficDbDeductFallbackService.deduct(poolType, payload, requestedDataBytes, fallbackContext); + try { + return trafficDbDeductFallbackService.deduct(poolType, payload, requestedDataBytes, fallbackContext); + } catch (RuntimeException dbException) { + log.error( + "traffic_deduct_db_fallback_failed traceId={} poolType={} requestedData={}", + resolveTraceId(fallbackContext, payload), + poolType, + requestedDataBytes, + dbException + ); + trafficDeductFallbackMetrics.incrementDbFallback(poolType.name(), "db_error"); + return TrafficLuaExecutionResult.builder() + .answer(0L) + .status(TrafficLuaStatus.ERROR) + .build(); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java` around lines 682 - 760, The DB fallback call in executeDeduct (trafficDbDeductFallbackService.deduct(...)) needs explicit exception handling: wrap the final call that returns the DB fallback result in a try-catch for RuntimeException/DataAccessException, increment/log a distinct metric via trafficDeductFallbackMetrics (e.g., "db_fallback_failure"), record/log the traceId and exception, and return a safe TrafficLuaExecutionResult.builder().answer(0L).status(TrafficLuaStatus.NO_BALANCE).build() on failure so callers get a graceful degradation instead of an uncaught exception.src/test/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterServiceTest.java (1)
754-789: Consider adding metric verification for completeness.The test correctly verifies:
- 3 Redis retry marks via
markRedisRetry- DB fallback activation via
markDbFallback- Context state
isRedisFallbackActivated()- DB fallback service invocation
Consider also verifying the
trafficDeductFallbackMetricsinteractions to ensure metrics are recorded for observability:🔧 Optional metric verification
verify(trafficInFlightDedupeService).markRedisRetry(payload.getTraceId(), 3); verify(trafficInFlightDedupeService).markDbFallback(payload.getTraceId()); verify(trafficDbDeductFallbackService).deduct(TrafficPoolType.INDIVIDUAL, payload, 100L, context); + verify(trafficDeductFallbackMetrics).incrementRedisRetry(eq("INDIVIDUAL"), eq(1), anyString()); + verify(trafficDeductFallbackMetrics).incrementRedisRetry(eq("INDIVIDUAL"), eq(2), anyString()); + verify(trafficDeductFallbackMetrics).incrementRedisRetry(eq("INDIVIDUAL"), eq(3), anyString()); + verify(trafficDeductFallbackMetrics).incrementDbFallback("INDIVIDUAL", "redis_retry_exhausted"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/test/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterServiceTest.java` around lines 754 - 789, Add verification of the fallback metrics in the switchesToDbFallbackAfterRedisRetryExhausted test: after the existing assertions and verify calls, add verify(...) checks against the trafficDeductFallbackMetrics mock to assert that the Redis retry metric was incremented three times (e.g., verify(trafficDeductFallbackMetrics, times(3)).onRedisRetry(payload.getTraceId()) or the actual retry method name) and that the DB fallback metric was recorded once (e.g., verify(trafficDeductFallbackMetrics).onDbFallback(payload.getTraceId()) or the actual method). Use the exact metric method names on trafficDeductFallbackMetrics to match the production API and keep the existing verify order.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.java`:
- Around line 682-760: The DB fallback call in executeDeduct
(trafficDbDeductFallbackService.deduct(...)) needs explicit exception handling:
wrap the final call that returns the DB fallback result in a try-catch for
RuntimeException/DataAccessException, increment/log a distinct metric via
trafficDeductFallbackMetrics (e.g., "db_fallback_failure"), record/log the
traceId and exception, and return a safe
TrafficLuaExecutionResult.builder().answer(0L).status(TrafficLuaStatus.NO_BALANCE).build()
on failure so callers get a graceful degradation instead of an uncaught
exception.
In
`@src/test/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterServiceTest.java`:
- Around line 754-789: Add verification of the fallback metrics in the
switchesToDbFallbackAfterRedisRetryExhausted test: after the existing assertions
and verify calls, add verify(...) checks against the
trafficDeductFallbackMetrics mock to assert that the Redis retry metric was
incremented three times (e.g., verify(trafficDeductFallbackMetrics,
times(3)).onRedisRetry(payload.getTraceId()) or the actual retry method name)
and that the DB fallback metric was recorded once (e.g.,
verify(trafficDeductFallbackMetrics).onDbFallback(payload.getTraceId()) or the
actual method). Use the exact metric method names on
trafficDeductFallbackMetrics to match the production API and keep the existing
verify order.
In
`@src/test/java/com/pooli/traffic/service/outbox/RedisOutboxRetrySchedulerTest.java`:
- Around line 54-84: Add optional edge-case tests to exercise
clearIdempotencyBestEffort: create tests in RedisOutboxRetrySchedulerTest that
simulate (1) a null payload returned from
redisOutboxRecordService.readPayload(record, RefillOutboxPayload.class) and (2)
a payload with a blank or null uuid, then call
redisOutboxRetryScheduler.runRetryCycle() and assert that clearIdempotency is
not invoked and the record is marked success (verify
redisOutboxRecordService.markSuccess(...) and verify
trafficRefillOutboxSupportService.clearIdempotency never()/notCalled). Use the
same setup pattern (lockRetryCandidatesAndMarkProcessing,
outboxRetryStrategyRegistry.get, outboxEventRetryStrategy.execute returning
SUCCESS) and reference RedisOutboxRetryScheduler.runRetryCycle,
RedisOutboxRecord, RefillOutboxPayload, redisOutboxRecordService.readPayload,
and trafficRefillOutboxSupportService.clearIdempotency to locate the code to
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7d9d2448-e01c-4123-8940-8ce66e4401b3
📒 Files selected for processing (4)
src/main/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterService.javasrc/main/java/com/pooli/traffic/service/outbox/RedisOutboxRetryScheduler.javasrc/test/java/com/pooli/traffic/service/decision/TrafficHydrateRefillAdapterServiceTest.javasrc/test/java/com/pooli/traffic/service/outbox/RedisOutboxRetrySchedulerTest.java
개요
데이터 차감(deduct) 경로에 Redis 장애 대응을 추가했습니다.
핵심은 Redis Lua 차감 실패 시
3회 인라인 재시도(50/100/200ms)후, 같은 traceId 요청 범위에서만 DB fallback으로 전환하는 것입니다.fallback 시 정책 판정 순서는 Lua와 동일하게 유지했고, DB 차감 성공분은 기존 사용량 집계 테이블에 반영한 뒤 Redis usage delta를 저장하여 복구 스케줄러가 멱등 replay 하도록 구성했습니다.
리필 claim/outbox/revert 및 refill outbox 스케줄러 전략 동작은 변경하지 않았습니다.
주요 변경:
CLAIMED,REDIS_RETRY_1~3,DB_FALLBACK,DONE) 기록DAILY_TOTAL_DATA,DAILY_APP_TOTAL_DATA,FAMILY_SHARED_USAGE_DAILY)TRAFFIC_DB_SPEED_BUCKET추가 및 AppSpeed(3초 윈도우) DB 판정TRAFFIC_REDIS_USAGE_DELTA추가 및 replay 스케줄러(PENDING -> SUCCESS/FAIL)관련 BackLog
Resolves: (Backlog Number, ...)
PR 유형
PR Checklist
시퀀스 다이어그램 (현재 로직 기준)
1) 차감 요청 처리 (정상/장애 공통 흐름)
sequenceDiagram participant C as ConsumerRunner participant D as InFlightDedupe participant O as DeductOrchestrator participant H as HydrateRefillAdapter participant R as Redis(Lua) participant F as DbDeductFallbackService participant U as UsageDeltaRecordService participant DB as MySQL C->>D: tryClaim(traceId) = CLAIMED (TTL 180s) C->>O: orchestrate(payload) O->>H: executeIndividualWithRecovery(..., context) loop Redis deduct retry (max 3) H->>R: executeDeductLua alt success R-->>H: TrafficLuaExecutionResult else connection/timeout H->>D: mark REDIS_RETRY_n end end alt Redis success H-->>O: Lua result else retry exhausted H->>D: mark DB_FALLBACK H->>F: deduct(poolType, payload, requestedBytes, context) F->>DB: policy evaluate + remaining deduct + usage upsert F->>U: record usage delta (PENDING) F-->>H: TrafficLuaExecutionResult H-->>O: fallback result end O-->>C: final deduct result C->>D: release(traceId) = DONE (TTL 180s)2) Redis usage delta 복구 흐름
sequenceDiagram participant S as TrafficUsageDeltaReplayScheduler participant U as UsageDeltaRecordService participant DB as MySQL(TRAFFIC_REDIS_USAGE_DELTA) participant R as Redis participant M as Metrics S->>U: lockReplayCandidatesAndMarkProcessing(batch) U->>DB: PENDING/FAIL/PROCESSING(stuck) 조회 + PROCESSING 선점 U-->>S: candidates loop each candidate S->>R: Lua replay (idempotency key + daily/app/monthly usage 반영) alt success S->>U: markSuccess(id) U->>DB: status=SUCCESS S->>M: replay_result=success else fail S->>U: markFailWithRetryIncrement(id, error) U->>DB: status=FAIL, retry_count+1 S->>M: replay_result=fail end end S->>U: countBacklog() S->>M: replay_backlog gauge update테스트
통과:
참고:
Summary by CodeRabbit
New Features
Improvements
Infrastructure
Tests