Cap dynamic timeout at 120s, replace unit tests with integration tests#7945
Cap dynamic timeout at 120s, replace unit tests with integration tests#7945Vagoasdf merged 1 commit intofix-3459-rate-limitfrom
Conversation
…ests The uncapped dynamic timeout (max period factor + 5s) would produce a 86,405s (~24h) timeout for DAY-period limits. SurveyMonkey configures both minute and day limits, so a breached 500/day limit would leave a Celery worker sleeping for up to 24 hours. Capping at 120s preserves the MINUTE fix (65s < 120s) while failing fast for HOUR/DAY breaches. Replaced the unit tests that mocked all internals (get_cache, increment_usage, decrement_usage, time.time, time.sleep) with integration tests using real Redis and freezegun for time control. The previous tests verified mock return values rather than actual behavior, and relied on a fragile hand-crafted time.time side_effect list that would break on any loop change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 2 Skipped Deployments
|
There was a problem hiding this comment.
Code Review
Summary: This is a focused, correct fix. The one-line change — wrapping the dynamic timeout calculation in min(..., 120) — prevents Celery workers from sleeping for hours (or a full day) when a DAY- or HOUR-period rate limit is breached. The updated docstring and new integration tests explain the motivation clearly.
What's good
- The fix is minimal and does exactly what it says. For any connector that mixes short and long-period limits (e.g. SurveyMonkey's minute + day config), the worker will now fail fast with
RateLimiterTimeoutExceptionafter 120 s rather than parking for up to 24 h. - Replacing the heavily-mocked
TestLimitDynamicTimeout/TestLimitSleepsToBucketBoundaryunit tests withfreezegun-based integration tests that hit real Redis is an improvement: the tests exercise the actual Redis pipeline path and are harder to accidentally render vacuous. - The interaction between the cap and the existing
min(sleep_seconds + 0.05, max(remaining, 0))sleep logic is correct: a 86 400 s sleep request gets clamped to the remaining ~120 s, after which the while-loop condition fires and the exception is raised.
Findings
Three minor issues flagged inline:
-
Magic number (
rate_limiter.py:145–148):120is used directly; extracting it to a class constant (MAX_DEFAULT_TIMEOUT_SECONDS) would aid discoverability and consistency with the existingEXPIRE_AFTER_PERIOD_SECONDSconstant. -
Weak assertion in
test_dynamic_timeout_capped_for_day_limits(test_rate_limiter.py:300):assert sleep_total[0] < 130has no lower bound, so the test would pass silently if the secondlimit()call never breached the bucket. Adding a lower bound (e.g.assert 110 <= sleep_total[0] < 130) would make the assertion self-verifying. -
Misleading regression comment (
test_rate_limiter.py:227–228): The docstring attributes the historical bug to a "30 s default", but for a MINUTE request the pre-PR code already computed65 sdynamically. The comment refers to an earlier version and will confuse readers who look at the git history for context.
All three are nits; none affect correctness.
🔬 Codegraph: connected (46795 nodes)
💡 Write /code-review in a comment to re-run this review.
| timeout_seconds = min( | ||
| max(r.period.factor for r in requests) + 5 if requests else 30, | ||
| 120, | ||
| ) |
There was a problem hiding this comment.
src/fides/api/service/connectors/limiter/rate_limiter.py:145-148
The 120 magic number is used directly here. Consider extracting it as a class constant (e.g. MAX_DEFAULT_TIMEOUT_SECONDS = 120) alongside the existing EXPIRE_AFTER_PERIOD_SECONDS. This makes it easier to discover and adjust the cap without hunting through the method body, and ties the value to a self-documenting name.
MAX_DEFAULT_TIMEOUT_SECONDS: int = 120
...
timeout_seconds = min(
max(r.period.factor for r in requests) + 5 if requests else 30,
self.MAX_DEFAULT_TIMEOUT_SECONDS,
)|
|
||
| # Total mocked sleep must reflect the 120s cap, not the 86405s | ||
| # uncapped value. | ||
| assert sleep_total[0] < 130 |
There was a problem hiding this comment.
tests/ops/integration_tests/limiter/test_rate_limiter.py:300
The assertion only checks an upper bound. Since the intended behaviour is that the worker sleeps for the full 120 s timeout before raising, adding a lower bound would confirm the cap path was actually exercised rather than the call somehow succeeding early:
# Slept up to the 120 s cap, not the uncapped 86 405 s
assert 110 <= sleep_total[0] < 130Without a lower bound, this assertion would pass even if the second limit() call returned immediately (0 s sleep), masking a regression where the breach was never triggered.
| Regression: the old default timeout_seconds=30 was shorter than the MINUTE | ||
| bucket period (60s). When a breach occurred more than 30s before the next |
There was a problem hiding this comment.
tests/ops/integration_tests/limiter/test_rate_limiter.py:227-228
The docstring says "the old default timeout_seconds=30 was shorter than the MINUTE bucket period". Looking at the code immediately before this PR, the default for a MINUTE request was already 60 + 5 = 65 s (dynamic), not 30 s — the 30 fallback only applied to the empty-requests case. A reader arriving at this test without the full git history will be confused about where the 30 s came from.
Consider clarifying that this is a guard against a regression from an earlier (pre-dynamic-timeout) version, or just drop the historical backstory and describe what the test actually verifies today:
Verify that a MINUTE-period breach waits for the bucket to roll over rather than timing out, even when the next boundary is more than half the timeout away.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## fix-3459-rate-limit #7945 +/- ##
=======================================================
- Coverage 85.08% 85.05% -0.04%
=======================================================
Files 629 629
Lines 40868 40867 -1
Branches 4749 4749
=======================================================
- Hits 34774 34760 -14
- Misses 5023 5032 +9
- Partials 1071 1075 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Review fixes for the rate limiter changes in #7938:
Cap dynamic timeout at 120s. The uncapped
max(period.factor) + 5produces an 86,405s (~24h) timeout for DAY-period limits. SurveyMonkey configures bothminuteanddaylimits (rate: 120/min, rate: 500/day), so a breached 500/day limit would leave a Celery worker sleeping for up to 24 hours with no error surfaced. The 120s cap keeps the MINUTE fix intact (65s < 120s) while ensuring HOUR/DAY breaches fail fast.Replace unit tests with integration tests. The unit tests (
TestLimitDynamicTimeout,TestLimitSleepsToBucketBoundary) mockedget_cache,increment_usage,decrement_usage,time.time, ANDtime.sleep. With everything mocked, they tested that mocks returned what they were told to return rather than actual rate limiter behavior.test_sleep_duration_targets_next_minute_bucketwas particularly fragile, relying on a hand-crafted 6-entrytime.timeside_effect list that would break if anyone added a singletime.time()call anywhere in the loop. Replaced with two integration tests that use real Redis for bucket state andfreezegunfor time control (consistent with the rest of the test suite):test_minute_period_breach_waits_for_rollover- Core regression test. Proves a MINUTE-period breach waits for the bucket to roll instead of timing out. Fails onmain, passes with the fix from Fix 3459 rate limit #7938.test_dynamic_timeout_capped_for_day_limits- Proves the 120s cap. Mixed MINUTE+DAY limits timeout within ~120s, not 86,405s.Kept
TestSecondsUntilNextBucketsince those are pure arithmetic tests with no mocking needed.Test plan
test_minute_period_breach_waits_for_rolloverpasses (verifies the MINUTE-period fix from Fix 3459 rate limit #7938)test_dynamic_timeout_capped_for_day_limitspasses (verifies the 120s cap)TestSecondsUntilNextBucketunit tests still pass🤖 Generated with Claude Code