Skip to content

Conversation

@drborges
Copy link

@drborges drborges commented Nov 28, 2025

Summary

Add support to Stale-While-Revalidate (SWR) prompt cache strategy based off the Design Document: docs/future-enhancements/STALE_WHILE_REVALIDATE_DESIGN.md

The GIF below shows the new feature in action against a self-hosted production instance of Langfuse using the following config:

Langfuse.configure do |config|
  config.public_key = ENV["LANGFUSE_PUB_KEY"]
  config.secret_key = ENV["LANGFUSE_SECRET_KEY"]
  config.base_url = ENV["LANGFUSE_BASE_URL"]

  # Required: Use Rails cache backend
  config.cache_backend = :rails
  config.cache_ttl = 2

  # Enable SWR
  config.cache_stale_while_revalidate = true
  config.cache_stale_ttl = 2
  config.cache_refresh_threads = 5
end

2025-11-28 14 23 40

📊 Impact

  • Performance: Enables instant responses for 99% of requests by serving stale data while refreshing in background
  • Backward Compatibility: Fully backward compatible - SWR is opt-in via configuration (requires :rails backend)
  • Dependencies: Requires concurrent-ruby gem for thread pool management

Usage

Configure the client:

Langfuse.configure do |config|
  config.public_key = ENV['LANGFUSE_PUBLIC_KEY']
  config.secret_key = ENV['LANGFUSE_SECRET_KEY']
  
  # Required: Use Rails cache backend
  config.cache_backend = :rails
  config.cache_ttl = 300 # Fresh for 5 minutes
  
  # Enable SWR
  config.cache_stale_while_revalidate = true
  config.cache_stale_ttl = 300 # Grace period: 5 more minutes
  config.cache_refresh_threads = 5 # Background thread pool size
end

Once configured, SWR works transparently:

client = Langfuse.client

# First request - populates cache
prompt = client.get_prompt("greeting") # ~100ms (API call)

# Subsequent requests while fresh
prompt = client.get_prompt("greeting") # ~1ms (cache hit)

# After cache_ttl expires but within grace period
prompt = client.get_prompt("greeting") # ~1ms (stale data + background refresh)

# Background refresh completes, next request gets fresh data
prompt = client.get_prompt("greeting") # ~1ms (fresh cache)

More details on the proposed documentation.

@drborges drborges force-pushed the feature/stale-while-revalidate branch 3 times, most recently from 979705c to ae8746b Compare November 28, 2025 14:55
@drborges drborges marked this pull request as draft November 28, 2025 17:13
@drborges
Copy link
Author

@NoahFisher I still have to deal with some rubocop offenses and so one final review of the test coverage but I could use some feedback on this draft, I also have one open question to double check in the PR description.

@drborges drborges force-pushed the feature/stale-while-revalidate branch from da9e215 to d798182 Compare November 28, 2025 18:00
@drborges drborges force-pushed the feature/stale-while-revalidate branch 3 times, most recently from 43fcf9b to 7ff496b Compare December 3, 2025 20:03
Add three new configuration options to support SWR caching:
- cache_stale_while_revalidate: Enable/disable SWR (default: false)
- cache_stale_ttl: Grace period for serving stale data (default: 300s)
- cache_refresh_threads: Background thread pool size (default: 5)

SWR caching requires Rails cache backend and will be validated
during configuration. This enables serving stale data instantly
while refreshing in the background for better performance.
Restructure CacheEntry to support stale-while-revalidate pattern:
- Replace single expires_at with fresh_until and stale_until
- Add fresh? method: entry is fresh and can be served immediately
- Add stale? method: entry is stale but usable (revalidate in background)
- Update expired? method: entry must be revalidated synchronously

This three-state model enables SWR caching where stale entries
can be served instantly while a background refresh occurs.
Add full SWR caching support with concurrent background refresh:
- fetch_with_stale_while_revalidate: Main SWR fetch method
- Three-state handling: FRESH (instant), STALE (instant + refresh), MISS (sync fetch)
- Concurrent thread pool for background refresh operations
- Stampede protection during background refresh
- Logger integration for debugging cache states
- Graceful shutdown of thread pool

The adapter now accepts stale_ttl and refresh_threads parameters.
When stale_ttl is set, SWR is enabled. When nil, falls back to
standard fetch_with_lock behavior.

This provides instant responses for 99% of requests by serving
stale data while refreshing in the background.
Update Client to pass SWR configuration options to RailsCacheAdapter:
- Pass stale_ttl (only when cache_stale_while_revalidate is enabled)
- Pass refresh_threads for background thread pool size
- Pass logger for debugging and error reporting

Extract create_rails_cache_adapter method for clarity.
When SWR is disabled, stale_ttl is nil, keeping standard behavior.

Tests verify correct configuration propagation and thread pool
creation based on SWR settings.
Refactor get_prompt to choose optimal caching strategy based on
available cache capabilities:
1. SWR cache (fetch_with_stale_while_revalidate) - best performance
2. Distributed cache (fetch_with_lock) - stampede protection
3. Simple cache (get/set) - basic in-memory caching
4. No cache - direct API fetch

Extract separate methods for each strategy for better maintainability
and testability. The strategy is selected at runtime based on which
methods the cache adapter responds to.

This enables automatic use of SWR when available without breaking
existing cache implementations.
This was the pattern already in place. This commit simply expands the scope to some of the new spec examples added, however, it would be interesting to discuss whether we need to allow for a larger number of memoized helpers or if some refactoring is needed for these specs.
@drborges drborges force-pushed the feature/stale-while-revalidate branch from 7ff496b to ab7628d Compare December 4, 2025 14:02
Add spec examples to verify that SWR (Stale-While-Revalidate) is correctly
enabled only when stale_ttl is positive, and disabled for zero or negative
values. Adds thread pool initialization coverage for both cache adapters.

All 785 tests pass with 97.36% coverage maintained.
@drborges drborges marked this pull request as ready for review December 4, 2025 18:08
@drborges
Copy link
Author

drborges commented Dec 4, 2025

@NoahFisher, @kxzk I think this is in a good spot for some review. Let me know if there are any concerns or change suggestions you'd like me to address. Thanks.

@drborges
Copy link
Author

drborges commented Dec 4, 2025

My last commit is disabling Rubocop's Metrics/ClassLength for the ApiClient class but it may be interesting to perhaps move parts of the class into mixin modules to keep the file under the desired length. Thoughts @NoahFisher?

@kxzk
Copy link
Collaborator

kxzk commented Dec 4, 2025

@NoahFisher, @kxzk I think this is in a good spot for some review. Let me know if there are any concerns or change suggestions you'd like me to address. Thanks.

Will try and review today.


# NOTE: expires_in is accepted for interface compatibility with StaleWhileRevalidate
# but not used here since CacheEntry objects manage their own expiration times
_ = expires_in
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still want to get rid of this bit. This is a result of a refactoring I did, but I think I can improve this bit a little further.

# api_client.get_prompt("greeting")
# end
def fetch_with_stale_while_revalidate(key, &)
return fetch_with_lock(key, &) unless swr_enabled?
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up adding this fallback but now am realizing that might've been over engineered and it may be more interesting to raise an error if SWR is disabled and this method called.

Then I can move fetch_with_lock back to the rails adapter as originally this functionality was not available in PromptCache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants