feat: add cache max-age of 5 seconds to `?live` requests so request collapsing works by KyleAMathews · Pull Request #1656 · electric-sql/electric

KyleAMathews · 2024-09-09T20:24:28Z

We need a short max-age cache on ?live responses so http proxies will collapse long-polling requests.

The time is a bit arbitrary but two considerations:

it's shorter than 20 seconds (our long-polling timeout) — which is necessary as otherwise clients would just poll the cached response over and over until the cache expired.
it's long enough to ensure the vast majority of clients all request within the same five second window. Live clients all get responses at the same time so all request again at the same time. So even accounting for world-wide spread of clients, five seconds should collect pretty much everyone.

There's a very slight chance that a new message could be returned within the 5 second timeout and then someone with ?live gets a cached response. But that's not a big deal as then their next response gets collapsed again with other clients.

…lapsing works

netlify · 2024-09-09T20:31:26Z

✅ Deploy Preview for electric-next ready!

Name	Link
🔨 Latest commit	`9f5bc19`
🔍 Latest deploy log	https://app.netlify.com/sites/electric-next/deploys/66df5acf4889160008e2dc67
😎 Deploy Preview	https://deploy-preview-1656--electric-next.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

thruflo · 2024-09-10T05:45:48Z

I get the live clients connecting at the same time. Because they block on data and then reconnect after it arrives.

There's a very slight chance that a new message could be returned within the 5 second timeout and then someone with ?live gets a cached response.

Trying to wrap my head around the scenario here. New data arrives within 5 seconds, the live request for one (or many) clients returns. A subsequent request gets that response, even if between the initial response and their request additional data has also arrived?

So if you miss being in the collapse gang and data is arriving in sequence you could end up with 5 seconds latency (for weird / hard to debug reasons from a front end POV)?

samwillis · 2024-09-10T06:13:52Z

If I understand correctly the 5s TTL starts when the response is returned (either due to a message or the timeout), and so if it was possible to have a 1ms TTL that would also allow all connections within the 20s long poll to be collapsed.

what's the shortest possible TTL that triggers request collapsing?

this is very cool!

KyleAMathews · 2024-09-10T13:35:07Z

So if you miss being in the collapse gang and data is arriving in sequence you could end up with 5 seconds latency

No because you'd get a cache hit for the initial response and then the client would immediately join the new gang at the new offset and get collapsed into that request. It's possible that they missed several responses within a five second TTL window but that'd just mean they get individual messages for a series of cached http responses so very fast so not the worst thing in the world (in a pretty unlikely edge case).

In general, this is following our normal scheme where we trade off the possibility of needing to do a few http requests in sequence so that we can have longer caches & increase cache hit ratio.

KyleAMathews · 2024-09-10T13:38:48Z

If I understand correctly the 5s TTL starts when the response is returned (either due to a message or the timeout), and so if it was possible to have a 1ms TTL that would also allow all connections within the 20s long poll to be collapsed.

I'm a bit fuzzy on this but I think Fastly, et al. have heuristics around request collapsing so if they've seen similar URLs that have very short caches, they might stop pretty quickly collapsing requests. So e.g. a 1s cache might lead to later requests not being collapsed. Maybe. We could probably do more testing to figure out exactly how this works but it doesn't seem to matter too much as long as it satisfies the two considerations I outlined.

thruflo · 2024-09-10T13:42:54Z

No because you'd get a cache hit for the initial response and then the client would immediately join the new gang at the new offset

I see -- so the idea is that the live response is triggered when there is data, which means it always contains a new offset, so the client then makes a new request with the new offset.

Just to sanity check in case it's a bug that breaks this assumption, do we only set a cache header on a non-empty live request? What happens in the event of a 20s timeout response?

KyleAMathews · 2024-09-10T13:44:22Z

Just to sanity check in case it's a bug that breaks this assumption, do we only set a cache header on a non-empty live request? What happens in the event of a 20s timeout response?

We are but... it's already expired by then 😆 so it doesn't have any effect (other than to tell the CDN that they should keep collapsing requests to the origin)

thruflo · 2024-09-10T13:59:36Z

We are but... it's already expired by then 😆

Trying to twist my brain around that statement. Also reading https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#max-age and https://www.fastly.com/documentation/reference/http/http-headers/Age

Is it that the response has an age added because the CDN monitored the time it took to generate the response? My naive mental model is that the 20s timeout response is our business, generates a response with a max-age and the cdn/client gets that with an age of 0. Thus caches it for 5 seconds.

(On a separate note, I trust you that it's not but if an empty live response were to be cached it would also trigger zero delay polling between the client and the CDN.)

KyleAMathews · 2024-09-10T14:08:04Z

yeah my mental model is that the max-age starts at the time the request is received at the CDN not when the response finishes. Which makes sense as the origin might take 500ms to transmit the data — but the staleness ticking clock starts as soon as the origin starts transmitting the data not when it happens to finish.

Also nothing broke dramatically i.e. clients started furiously looping getting the same cached response. I'll be doing more testing today so we'll see.

… to use for cache-busting (#1826) This PR is a fix for inconsistencies in caching in http proxying while clients are long-polling. It also adds `public` to our `cache-control` header as that's required by some http proxies in order to cache. HTTP Proxies don't treat the max-age in cache-control exactly the same way. Some start counting the age of the cache from the *beginning* of the request while others count from the *end* of the request. This inconsistency makes it difficult to reliably control caching and request collapsing behavior for long-polling requests. My previous PR in this area #1656 made request collapsing work nicely with proxies with the first behavior as they'd collapse all requests within the time from the start of a long-poll and the end of the max-age. And when the client went to request again after the long-poll had ended, the previous request cache had expired already so a new request would get sent to the origin. However, this approach caused issues with proxies with the second behavior as request collapsing would work but when the client re-polled, the cache hadn't yet expired so the client would go into an infinite loop requesting the same cached response over and over. So this PR adds a `cursor` generated by the server that clients use as part of `live` requests. This skips by any caches from the previous live request (which on proxies with the first behavior, would have expired already). The cursor is generated by finding the next alignment boundary. I.e. if the timeout is 20 seconds (which it is now but this could change) then we calculate the alignment boundary by taking the current unix timestamp and subtracting the Electric Epoch of October 9th, 2024 then dividing by 20 and rounding up and the multiplying by 20 again. In practice this partitions caches for live requests for a given offset into 20 second windows. --------- Co-authored-by: Stefanos Mousafeiris <msfstef@gmail.com>

…ollapsing works (#1656) We need a short max-age cache on `?live` responses so http proxies will collapse long-polling requests. The time is a bit arbitrary but two considerations: - it's shorter than 20 seconds (our long-polling timeout) — which is necessary as otherwise clients would just poll the cached response over and over until the cache expired. - it's long enough to ensure the vast majority of clients all request within the same five second window. Live clients all get responses at the same time so all request again at the same time. So even accounting for world-wide spread of clients, five seconds should collect pretty much everyone. There's a very slight chance that a new message could be returned within the 5 second timeout and then someone with `?live` gets a cached response. But that's not a big deal as then their next response gets collapsed again with other clients.

… to use for cache-busting (#1826) This PR is a fix for inconsistencies in caching in http proxying while clients are long-polling. It also adds `public` to our `cache-control` header as that's required by some http proxies in order to cache. HTTP Proxies don't treat the max-age in cache-control exactly the same way. Some start counting the age of the cache from the *beginning* of the request while others count from the *end* of the request. This inconsistency makes it difficult to reliably control caching and request collapsing behavior for long-polling requests. My previous PR in this area #1656 made request collapsing work nicely with proxies with the first behavior as they'd collapse all requests within the time from the start of a long-poll and the end of the max-age. And when the client went to request again after the long-poll had ended, the previous request cache had expired already so a new request would get sent to the origin. However, this approach caused issues with proxies with the second behavior as request collapsing would work but when the client re-polled, the cache hadn't yet expired so the client would go into an infinite loop requesting the same cached response over and over. So this PR adds a `cursor` generated by the server that clients use as part of `live` requests. This skips by any caches from the previous live request (which on proxies with the first behavior, would have expired already). The cursor is generated by finding the next alignment boundary. I.e. if the timeout is 20 seconds (which it is now but this could change) then we calculate the alignment boundary by taking the current unix timestamp and subtracting the Electric Epoch of October 9th, 2024 then dividing by 20 and rounding up and the multiplying by 20 again. In practice this partitions caches for live requests for a given offset into 20 second windows. --------- Co-authored-by: Stefanos Mousafeiris <msfstef@gmail.com>

KyleAMathews added 2 commits September 9, 2024 14:17

feat: add cache max-age of 5 seconds to ?live requests so request col…

58edad1

…lapsing works

Fix tests

9f5bc19

Second response does now get a cached response

aa349c9

KyleAMathews changed the title ~~feat: add cache max-age of 5 seconds to ?live requests so request collapsing works~~ feat: add cache max-age of 5 seconds to ?live requests so request collapsing works Sep 9, 2024

icehaunter approved these changes Sep 10, 2024

View reviewed changes

KyleAMathews merged commit f18fa57 into main Sep 10, 2024

KyleAMathews deleted the live-cache branch September 10, 2024 13:35

KyleAMathews added a commit that referenced this pull request Sep 10, 2024

chore: add changeset for #1656

a1cbeee

KyleAMathews added a commit that referenced this pull request Sep 10, 2024

chore: add changeset for #1656 (#1665)

6703657

KyleAMathews mentioned this pull request Oct 9, 2024

fix: during live requests, the server returns a cursor for the client to use for cache-busting #1826

Merged

KyleAMathews added a commit that referenced this pull request Nov 1, 2024

chore: add changeset for #1656 (#1665)

c0e1c6c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add cache max-age of 5 seconds to `?live` requests so request collapsing works#1656

feat: add cache max-age of 5 seconds to `?live` requests so request collapsing works#1656
KyleAMathews merged 3 commits into
mainfrom
live-cache

KyleAMathews commented Sep 9, 2024 •

edited

Loading

Uh oh!

netlify Bot commented Sep 9, 2024

Uh oh!

thruflo commented Sep 10, 2024

Uh oh!

samwillis commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

thruflo commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

thruflo commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

KyleAMathews commented Sep 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify Bot commented Sep 9, 2024

✅ Deploy Preview for electric-next ready!

Uh oh!

thruflo commented Sep 10, 2024

Uh oh!

samwillis commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

thruflo commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

thruflo commented Sep 10, 2024

Uh oh!

KyleAMathews commented Sep 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KyleAMathews commented Sep 9, 2024 •

edited

Loading