fix: during live requests, the server returns a cursor for the client to use for cache-busting#1826
Conversation
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
balegas
left a comment
There was a problem hiding this comment.
Okay, this will work. Now we can increase the max-age if we want, right? More collapsing, more data at the edge.
It also helps with nextjs caching behaviour, which I was sidestepping by adding a random to the URL... sounds familiar.
I've approved, but might be useful if team has a look at the Elixir before merging.
Yup — we could make it configurable. Probably also in practice we'll need to let clients set the long-polling seconds as well as there's some proxies that have pretty short timeouts. But the default could be much higher. |
msfstef
left a comment
There was a problem hiding this comment.
What an annoying issue from lack of standardisation! I like having a custom cache cursor though for the live requests, relatively cheap to implement and maintain and gives us better control.
I've left some comments and questions for clarification
| >() | ||
|
|
||
| #lastOffset: Offset | ||
| #nextLiveCursor: string // Seconds since our Electric Epoch 😎 |
There was a problem hiding this comment.
nit: slight inconsistency in naming between "live next cursor" and "next live cursor" - maybe we can even name it "live cache buster" or "live cache cursor" to make its purpose explicit in its name?
There was a problem hiding this comment.
true — I'll fix
| validateOptions(options) | ||
| this.options = { subscribe: true, ...options } | ||
| this.#lastOffset = this.options.offset ?? `-1` | ||
| this.#nextLiveCursor = `` |
There was a problem hiding this comment.
what does this mean for the request collapsing behaviour on the first live request, before a shared cursor is retrieved? as I'm thinking about it I don't think it's anything serious but worth clarifying
There was a problem hiding this comment.
yeah the initial request is different so collapsing/caching will be different. I couldn't think of any way around this but it's also not that big of deal as it's just one more request basically getting to Electric.
| now = DateTime.utc_now() | ||
|
|
||
| diff_in_seconds = DateTime.diff(now, oct9th2024, :second) | ||
| next_interval = ceil(diff_in_seconds / 20) * 20 |
There was a problem hiding this comment.
this is an arbitrary 20 here even though it's supposed to be related to the long poll timeout - perhaps make this function take an "interval size" as a parameter and use conn.assigns.config[:long_poll_timeout] when it gets called to ensure it stays consistent?
There was a problem hiding this comment.
Oh sweet! I didn't know there was a config already for this — I'll switch to it.
|
|
||
| defmodule TimeUtils do | ||
| def seconds_since_oct9th_2024_next_interval do | ||
| oct9th2024 = DateTime.from_naive!(~N[2024-10-09 00:00:00], "Etc/UTC") |
There was a problem hiding this comment.
you can store this value as an alias in this module, like
@oct9th2024 DateTime.from_naive!(~N[2024-10-09 00:00:00], "Etc/UTC")
def seconds_since_oct9th_2024_next_interval do
...
so it gets calculated once and reused rather than parsing the date on every call
There was a problem hiding this comment.
ok good idea as this will be called a ton
Co-authored-by: Stefanos Mousafeiris <msfstef@gmail.com>
Co-authored-by: Stefanos Mousafeiris <msfstef@gmail.com>
… to use for cache-busting (#1826) This PR is a fix for inconsistencies in caching in http proxying while clients are long-polling. It also adds `public` to our `cache-control` header as that's required by some http proxies in order to cache. HTTP Proxies don't treat the max-age in cache-control exactly the same way. Some start counting the age of the cache from the *beginning* of the request while others count from the *end* of the request. This inconsistency makes it difficult to reliably control caching and request collapsing behavior for long-polling requests. My previous PR in this area #1656 made request collapsing work nicely with proxies with the first behavior as they'd collapse all requests within the time from the start of a long-poll and the end of the max-age. And when the client went to request again after the long-poll had ended, the previous request cache had expired already so a new request would get sent to the origin. However, this approach caused issues with proxies with the second behavior as request collapsing would work but when the client re-polled, the cache hadn't yet expired so the client would go into an infinite loop requesting the same cached response over and over. So this PR adds a `cursor` generated by the server that clients use as part of `live` requests. This skips by any caches from the previous live request (which on proxies with the first behavior, would have expired already). The cursor is generated by finding the next alignment boundary. I.e. if the timeout is 20 seconds (which it is now but this could change) then we calculate the alignment boundary by taking the current unix timestamp and subtracting the Electric Epoch of October 9th, 2024 then dividing by 20 and rounding up and the multiplying by 20 again. In practice this partitions caches for live requests for a given offset into 20 second windows. --------- Co-authored-by: Stefanos Mousafeiris <msfstef@gmail.com>
Fixes #2589 ~We introduced the concept of a cursor in #1826 in order to avoid infinite loops of clients running into cached live responses by artificially "moving the cache forward" via the time based, coordinated cache buster.~ ~However we should not need it, as our cache buster is already the `offset` parameter. We were running into this issue because we were caching live responses that do not move the cache forward, i.e. empty live responses, so clients would continuously hit the same cache over and over again.~ ~Even with the "cursor" fix we still run into this issue but in a different way - empty live responses create a "chain" rather than a loop of cache hits, that can be arbitrarily long as we allow these cached live responses to be revalidated as well. This means someone who is at the tip of the log might end up following a huge chain of responses with no changes in them, and each of those requests made might trigger a separate revalidation request to the origin.~ ~Request collapsing on cache misses works regardless of the cache policy you set on a response (since the CDN does not yet know what the caching policy on the response will be). This is [a canonical example](https://developers.cloudflare.com/cache/concepts/revalidation/#example-2) from Cloudflare.~ ~Request collapsing on stale cache hits, as discussed [in the other Cloudflare example](https://developers.cloudflare.com/cache/concepts/revalidation/#example-1) will send back only a single revalidation out of the collapsed requests.~ ~Therefore to avoid these infinite loops, we can simply _not cache_ live responses with no changes in them, which retains the request collapsing behaviour without creating any infinite loops.~ ~For live responses that _do_ contain changes, we set the usual 5 second lifetime + 5 second stale lifetime, so that clients that are slightly behind can catch up using the cache, but the cursor is not needed as these cached live responses move the `offset` forward.~ ~If we go ahead with this change we do need to keep the cursor header present as we require it in our official client, although I've ripped out any logic for it since it won't actually be used and it is better to not change it between requests to ensure cache consistency - we can discuss a path to deprecation~ ### UPDATE We just use an etag that is always different for live responses that contain no changes to ensure they never get revalidated.
This PR is a fix for inconsistencies in caching in http proxying while clients are long-polling. It also adds
publicto ourcache-controlheader as that's required by some http proxies in order to cache.HTTP Proxies don't treat the max-age in cache-control exactly the same way. Some start counting the age of the cache from the beginning of the request while others count from the end of the request.
This inconsistency makes it difficult to reliably control caching and request collapsing behavior for long-polling requests.
My previous PR in this area #1656 made request collapsing work nicely with proxies with the first behavior as they'd collapse all requests within the time from the start of a long-poll and the end of the max-age. And when the client went to request again after the long-poll had ended, the previous request cache had expired already so a new request would get sent to the origin.
However, this approach caused issues with proxies with the second behavior as request collapsing would work but when the client re-polled, the cache hadn't yet expired so the client would go into an infinite loop requesting the same cached response over and over.
So this PR adds a
cursorgenerated by the server that clients use as part ofliverequests. This skips by any caches from the previous live request (which on proxies with the first behavior, would have expired already).The cursor is generated by finding the next alignment boundary. I.e. if the timeout is 20 seconds (which it is now but this could change) then we calculate the alignment boundary by taking the current unix timestamp and subtracting the Electric Epoch of October 9th, 2024 then dividing by 20 and rounding up and the multiplying by 20 again.
In practice this partitions caches for live requests for a given offset into 20 second windows.