Cap dynamic timeout at 120s, replace unit tests with integration tests by galvana · Pull Request #7945 · ethyca/fides

galvana · 2026-04-16T16:55:14Z

Summary

Review fixes for the rate limiter changes in #7938:

Cap dynamic timeout at 120s. The uncapped max(period.factor) + 5 produces an 86,405s (~24h) timeout for DAY-period limits. SurveyMonkey configures both minute and day limits (rate: 120/min, rate: 500/day), so a breached 500/day limit would leave a Celery worker sleeping for up to 24 hours with no error surfaced. The 120s cap keeps the MINUTE fix intact (65s < 120s) while ensuring HOUR/DAY breaches fail fast.
Replace unit tests with integration tests. The unit tests (TestLimitDynamicTimeout, TestLimitSleepsToBucketBoundary) mocked get_cache, increment_usage, decrement_usage, time.time, AND time.sleep. With everything mocked, they tested that mocks returned what they were told to return rather than actual rate limiter behavior. test_sleep_duration_targets_next_minute_bucket was particularly fragile, relying on a hand-crafted 6-entry time.time side_effect list that would break if anyone added a single time.time() call anywhere in the loop. Replaced with two integration tests that use real Redis for bucket state and freezegun for time control (consistent with the rest of the test suite):
- test_minute_period_breach_waits_for_rollover - Core regression test. Proves a MINUTE-period breach waits for the bucket to roll instead of timing out. Fails on main, passes with the fix from Fix 3459 rate limit #7938.
- test_dynamic_timeout_capped_for_day_limits - Proves the 120s cap. Mixed MINUTE+DAY limits timeout within ~120s, not 86,405s.
Kept TestSecondsUntilNextBucket since those are pure arithmetic tests with no mocking needed.

Test plan

test_minute_period_breach_waits_for_rollover passes (verifies the MINUTE-period fix from Fix 3459 rate limit #7938)
test_dynamic_timeout_capped_for_day_limits passes (verifies the 120s cap)
Existing rate limiter integration tests still pass
TestSecondsUntilNextBucket unit tests still pass

🤖 Generated with Claude Code

…ests The uncapped dynamic timeout (max period factor + 5s) would produce a 86,405s (~24h) timeout for DAY-period limits. SurveyMonkey configures both minute and day limits, so a breached 500/day limit would leave a Celery worker sleeping for up to 24 hours. Capping at 120s preserves the MINUTE fix (65s < 120s) while failing fast for HOUR/DAY breaches. Replaced the unit tests that mocked all internals (get_cache, increment_usage, decrement_usage, time.time, time.sleep) with integration tests using real Redis and freezegun for time control. The previous tests verified mock return values rather than actual behavior, and relied on a fragile hand-crafted time.time side_effect list that would break on any loop change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-04-16T16:55:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments

Project	Deployment	Actions	Updated (UTC)
fides-plus-nightly	Ignored		Apr 16, 2026 4:55pm
fides-privacy-center	Ignored		Apr 16, 2026 4:55pm

claude

Code Review

Summary: This is a focused, correct fix. The one-line change — wrapping the dynamic timeout calculation in min(..., 120) — prevents Celery workers from sleeping for hours (or a full day) when a DAY- or HOUR-period rate limit is breached. The updated docstring and new integration tests explain the motivation clearly.

What's good

The fix is minimal and does exactly what it says. For any connector that mixes short and long-period limits (e.g. SurveyMonkey's minute + day config), the worker will now fail fast with RateLimiterTimeoutException after 120 s rather than parking for up to 24 h.
Replacing the heavily-mocked TestLimitDynamicTimeout / TestLimitSleepsToBucketBoundary unit tests with freezegun-based integration tests that hit real Redis is an improvement: the tests exercise the actual Redis pipeline path and are harder to accidentally render vacuous.
The interaction between the cap and the existing min(sleep_seconds + 0.05, max(remaining, 0)) sleep logic is correct: a 86 400 s sleep request gets clamped to the remaining ~120 s, after which the while-loop condition fires and the exception is raised.

Findings

Three minor issues flagged inline:

Magic number (rate_limiter.py:145–148): 120 is used directly; extracting it to a class constant (MAX_DEFAULT_TIMEOUT_SECONDS) would aid discoverability and consistency with the existing EXPIRE_AFTER_PERIOD_SECONDS constant.
Weak assertion in test_dynamic_timeout_capped_for_day_limits (test_rate_limiter.py:300): assert sleep_total[0] < 130 has no lower bound, so the test would pass silently if the second limit() call never breached the bucket. Adding a lower bound (e.g. assert 110 <= sleep_total[0] < 130) would make the assertion self-verifying.
Misleading regression comment (test_rate_limiter.py:227–228): The docstring attributes the historical bug to a "30 s default", but for a MINUTE request the pre-PR code already computed 65 s dynamically. The comment refers to an earlier version and will confuse readers who look at the git history for context.

All three are nits; none affect correctness.

🔬 Codegraph: connected (46795 nodes)

💡 Write /code-review in a comment to re-run this review.

claude · 2026-04-16T16:58:25Z

+            timeout_seconds = min(
+                max(r.period.factor for r in requests) + 5 if requests else 30,
+                120,
            )


src/fides/api/service/connectors/limiter/rate_limiter.py:145-148

The 120 magic number is used directly here. Consider extracting it as a class constant (e.g. MAX_DEFAULT_TIMEOUT_SECONDS = 120) alongside the existing EXPIRE_AFTER_PERIOD_SECONDS. This makes it easier to discover and adjust the cap without hunting through the method body, and ties the value to a self-documenting name.

MAX_DEFAULT_TIMEOUT_SECONDS: int = 120 ... timeout_seconds = min( max(r.period.factor for r in requests) + 5 if requests else 30, self.MAX_DEFAULT_TIMEOUT_SECONDS, )

claude · 2026-04-16T16:58:32Z

+
+    # Total mocked sleep must reflect the 120s cap, not the 86405s
+    # uncapped value.
+    assert sleep_total[0] < 130


tests/ops/integration_tests/limiter/test_rate_limiter.py:300

The assertion only checks an upper bound. Since the intended behaviour is that the worker sleeps for the full 120 s timeout before raising, adding a lower bound would confirm the cap path was actually exercised rather than the call somehow succeeding early:

# Slept up to the 120 s cap, not the uncapped 86 405 s assert 110 <= sleep_total[0] < 130

Without a lower bound, this assertion would pass even if the second limit() call returned immediately (0 s sleep), masking a regression where the breach was never triggered.

claude · 2026-04-16T16:58:42Z

+    Regression: the old default timeout_seconds=30 was shorter than the MINUTE
+    bucket period (60s).  When a breach occurred more than 30s before the next


tests/ops/integration_tests/limiter/test_rate_limiter.py:227-228

The docstring says "the old default timeout_seconds=30 was shorter than the MINUTE bucket period". Looking at the code immediately before this PR, the default for a MINUTE request was already 60 + 5 = 65 s (dynamic), not 30 s — the 30 fallback only applied to the empty-requests case. A reader arriving at this test without the full git history will be confused about where the 30 s came from.

Consider clarifying that this is a guard against a regression from an earlier (pre-dynamic-timeout) version, or just drop the historical backstory and describe what the test actually verifies today:

Verify that a MINUTE-period breach waits for the bucket to roll over rather than timing out, even when the next boundary is more than half the timeout away.

codecov · 2026-04-16T17:04:29Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.05%. Comparing base (5ec49b2) to head (c3a194f).
⚠️ Report is 1 commits behind head on fix-3459-rate-limit.

Additional details and impacted files

@@                   Coverage Diff                   @@
##           fix-3459-rate-limit    #7945      +/-   ##
=======================================================
- Coverage                85.08%   85.05%   -0.04%     
=======================================================
  Files                      629      629              
  Lines                    40868    40867       -1     
  Branches                  4749     4749              
=======================================================
- Hits                     34774    34760      -14     
- Misses                    5023     5032       +9     
- Partials                  1071     1075       +4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

galvana requested a review from a team as a code owner April 16, 2026 16:55

galvana requested review from thabofletcher and removed request for a team April 16, 2026 16:55

galvana requested review from Vagoasdf and removed request for thabofletcher April 16, 2026 16:55

claude bot reviewed Apr 16, 2026

View reviewed changes

Vagoasdf merged commit ff0af8f into fix-3459-rate-limit Apr 16, 2026
64 of 66 checks passed

Vagoasdf deleted the fix-3459-rate-limit-review branch April 16, 2026 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap dynamic timeout at 120s, replace unit tests with integration tests#7945

Cap dynamic timeout at 120s, replace unit tests with integration tests#7945
Vagoasdf merged 1 commit intofix-3459-rate-limitfrom
fix-3459-rate-limit-review

galvana commented Apr 16, 2026

Uh oh!

vercel bot commented Apr 16, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Apr 16, 2026

Uh oh!

claude bot Apr 16, 2026

Uh oh!

claude bot Apr 16, 2026

Uh oh!

codecov bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		Regression: the old default timeout_seconds=30 was shorter than the MINUTE
		bucket period (60s). When a breach occurred more than 30s before the next

Conversation

galvana commented Apr 16, 2026

Summary

Test plan

Uh oh!

vercel bot commented Apr 16, 2026

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Code Review

What's good

Findings

Uh oh!

claude bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Apr 16, 2026 •

edited

Loading