Skip to content

Cap dynamic timeout at 120s, replace unit tests with integration tests#7945

Merged
Vagoasdf merged 1 commit intofix-3459-rate-limitfrom
fix-3459-rate-limit-review
Apr 16, 2026
Merged

Cap dynamic timeout at 120s, replace unit tests with integration tests#7945
Vagoasdf merged 1 commit intofix-3459-rate-limitfrom
fix-3459-rate-limit-review

Conversation

@galvana
Copy link
Copy Markdown
Contributor

@galvana galvana commented Apr 16, 2026

Summary

Review fixes for the rate limiter changes in #7938:

  • Cap dynamic timeout at 120s. The uncapped max(period.factor) + 5 produces an 86,405s (~24h) timeout for DAY-period limits. SurveyMonkey configures both minute and day limits (rate: 120/min, rate: 500/day), so a breached 500/day limit would leave a Celery worker sleeping for up to 24 hours with no error surfaced. The 120s cap keeps the MINUTE fix intact (65s < 120s) while ensuring HOUR/DAY breaches fail fast.

  • Replace unit tests with integration tests. The unit tests (TestLimitDynamicTimeout, TestLimitSleepsToBucketBoundary) mocked get_cache, increment_usage, decrement_usage, time.time, AND time.sleep. With everything mocked, they tested that mocks returned what they were told to return rather than actual rate limiter behavior. test_sleep_duration_targets_next_minute_bucket was particularly fragile, relying on a hand-crafted 6-entry time.time side_effect list that would break if anyone added a single time.time() call anywhere in the loop. Replaced with two integration tests that use real Redis for bucket state and freezegun for time control (consistent with the rest of the test suite):

    • test_minute_period_breach_waits_for_rollover - Core regression test. Proves a MINUTE-period breach waits for the bucket to roll instead of timing out. Fails on main, passes with the fix from Fix 3459 rate limit #7938.
    • test_dynamic_timeout_capped_for_day_limits - Proves the 120s cap. Mixed MINUTE+DAY limits timeout within ~120s, not 86,405s.

    Kept TestSecondsUntilNextBucket since those are pure arithmetic tests with no mocking needed.

Test plan

  • test_minute_period_breach_waits_for_rollover passes (verifies the MINUTE-period fix from Fix 3459 rate limit #7938)
  • test_dynamic_timeout_capped_for_day_limits passes (verifies the 120s cap)
  • Existing rate limiter integration tests still pass
  • TestSecondsUntilNextBucket unit tests still pass

🤖 Generated with Claude Code

…ests

The uncapped dynamic timeout (max period factor + 5s) would produce a
86,405s (~24h) timeout for DAY-period limits. SurveyMonkey configures
both minute and day limits, so a breached 500/day limit would leave a
Celery worker sleeping for up to 24 hours. Capping at 120s preserves
the MINUTE fix (65s < 120s) while failing fast for HOUR/DAY breaches.

Replaced the unit tests that mocked all internals (get_cache,
increment_usage, decrement_usage, time.time, time.sleep) with
integration tests using real Redis and freezegun for time control.
The previous tests verified mock return values rather than actual
behavior, and relied on a fragile hand-crafted time.time side_effect
list that would break on any loop change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana requested a review from a team as a code owner April 16, 2026 16:55
@galvana galvana requested review from thabofletcher and removed request for a team April 16, 2026 16:55
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Apr 16, 2026 4:55pm
fides-privacy-center Ignored Ignored Apr 16, 2026 4:55pm

Request Review

@galvana galvana requested review from Vagoasdf and removed request for thabofletcher April 16, 2026 16:55
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary: This is a focused, correct fix. The one-line change — wrapping the dynamic timeout calculation in min(..., 120) — prevents Celery workers from sleeping for hours (or a full day) when a DAY- or HOUR-period rate limit is breached. The updated docstring and new integration tests explain the motivation clearly.

What's good

  • The fix is minimal and does exactly what it says. For any connector that mixes short and long-period limits (e.g. SurveyMonkey's minute + day config), the worker will now fail fast with RateLimiterTimeoutException after 120 s rather than parking for up to 24 h.
  • Replacing the heavily-mocked TestLimitDynamicTimeout / TestLimitSleepsToBucketBoundary unit tests with freezegun-based integration tests that hit real Redis is an improvement: the tests exercise the actual Redis pipeline path and are harder to accidentally render vacuous.
  • The interaction between the cap and the existing min(sleep_seconds + 0.05, max(remaining, 0)) sleep logic is correct: a 86 400 s sleep request gets clamped to the remaining ~120 s, after which the while-loop condition fires and the exception is raised.

Findings

Three minor issues flagged inline:

  1. Magic number (rate_limiter.py:145–148): 120 is used directly; extracting it to a class constant (MAX_DEFAULT_TIMEOUT_SECONDS) would aid discoverability and consistency with the existing EXPIRE_AFTER_PERIOD_SECONDS constant.

  2. Weak assertion in test_dynamic_timeout_capped_for_day_limits (test_rate_limiter.py:300): assert sleep_total[0] < 130 has no lower bound, so the test would pass silently if the second limit() call never breached the bucket. Adding a lower bound (e.g. assert 110 <= sleep_total[0] < 130) would make the assertion self-verifying.

  3. Misleading regression comment (test_rate_limiter.py:227–228): The docstring attributes the historical bug to a "30 s default", but for a MINUTE request the pre-PR code already computed 65 s dynamically. The comment refers to an earlier version and will confuse readers who look at the git history for context.

All three are nits; none affect correctness.

🔬 Codegraph: connected (46795 nodes)


💡 Write /code-review in a comment to re-run this review.

Comment on lines +145 to 148
timeout_seconds = min(
max(r.period.factor for r in requests) + 5 if requests else 30,
120,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/fides/api/service/connectors/limiter/rate_limiter.py:145-148

The 120 magic number is used directly here. Consider extracting it as a class constant (e.g. MAX_DEFAULT_TIMEOUT_SECONDS = 120) alongside the existing EXPIRE_AFTER_PERIOD_SECONDS. This makes it easier to discover and adjust the cap without hunting through the method body, and ties the value to a self-documenting name.

MAX_DEFAULT_TIMEOUT_SECONDS: int = 120

...

timeout_seconds = min(
    max(r.period.factor for r in requests) + 5 if requests else 30,
    self.MAX_DEFAULT_TIMEOUT_SECONDS,
)


# Total mocked sleep must reflect the 120s cap, not the 86405s
# uncapped value.
assert sleep_total[0] < 130
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/ops/integration_tests/limiter/test_rate_limiter.py:300

The assertion only checks an upper bound. Since the intended behaviour is that the worker sleeps for the full 120 s timeout before raising, adding a lower bound would confirm the cap path was actually exercised rather than the call somehow succeeding early:

# Slept up to the 120 s cap, not the uncapped 86 405 s
assert 110 <= sleep_total[0] < 130

Without a lower bound, this assertion would pass even if the second limit() call returned immediately (0 s sleep), masking a regression where the breach was never triggered.

Comment on lines +227 to +228
Regression: the old default timeout_seconds=30 was shorter than the MINUTE
bucket period (60s). When a breach occurred more than 30s before the next
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests/ops/integration_tests/limiter/test_rate_limiter.py:227-228

The docstring says "the old default timeout_seconds=30 was shorter than the MINUTE bucket period". Looking at the code immediately before this PR, the default for a MINUTE request was already 60 + 5 = 65 s (dynamic), not 30 s — the 30 fallback only applied to the empty-requests case. A reader arriving at this test without the full git history will be confused about where the 30 s came from.

Consider clarifying that this is a guard against a regression from an earlier (pre-dynamic-timeout) version, or just drop the historical backstory and describe what the test actually verifies today:

Verify that a MINUTE-period breach waits for the bucket to roll over rather than timing out, even when the next boundary is more than half the timeout away.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.05%. Comparing base (5ec49b2) to head (c3a194f).
⚠️ Report is 1 commits behind head on fix-3459-rate-limit.

Additional details and impacted files
@@                   Coverage Diff                   @@
##           fix-3459-rate-limit    #7945      +/-   ##
=======================================================
- Coverage                85.08%   85.05%   -0.04%     
=======================================================
  Files                      629      629              
  Lines                    40868    40867       -1     
  Branches                  4749     4749              
=======================================================
- Hits                     34774    34760      -14     
- Misses                    5023     5032       +9     
- Partials                  1071     1075       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Vagoasdf Vagoasdf merged commit ff0af8f into fix-3459-rate-limit Apr 16, 2026
64 of 66 checks passed
@Vagoasdf Vagoasdf deleted the fix-3459-rate-limit-review branch April 16, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants