Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify usage of _expected querystring parameter #600

Merged
merged 3 commits into from
Apr 24, 2024
Merged

Conversation

leplatrem
Copy link
Contributor

@bendk do you find it clearer like this? Don't hesitate to ask more questions :) Thanks!

@leplatrem leplatrem requested a review from a team as a code owner April 18, 2024 14:48
Copy link

@bendk bendk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is seems much clearer to me, but maybe it could be improved more by making it a bit more explicit (assuming that I'm understanding it correctly). WDYT?

Approving in any case, because it's an improvement either way and I'm not sure that my suggestion is an improvement.

@bendk
Copy link

bendk commented Apr 18, 2024

I just read your comments in bugzilla and realize my understanding is totally wrong, since I wasn't really thinking about the push notification scenario. Let me update my comments.

@@ -87,7 +87,7 @@ Returns the following response for the collection ``{cid}`` in the bucket ``{bid

.. note::

The ``_expected={}`` querystring parameter is mandatory but can be set to ``0`` if unknown. See section below about cache busting.
The ``_expected={}`` querystring parameter is mandatory. Either you pass the current collection timestamp value obtained when polling for changes in order to bust the CDN cache, or you use a hard-coded value (eg. ``0``) and rely on the cache TTL. See section below about cache busting.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what "current collection timestamp" means. Is it the timestamp of the last fetch? The timestamp returned in the data for the last fetch? The current time? Whichever one it is, please spell out how it would work to bust the cache.

Can you lay out a scenario when you would want to bust the cache while polling? In general, it seems like I wouldn't want to do that.

Also, how can we use the _since param in a cache-friendly manner? If all clients did something naive like use the timestamp they last polled, then it seems like each request would be a cache-miss.

@@ -108,7 +108,7 @@ Returns the list of collections and their current timestamp.

.. note::

The ``_expected={}`` querystring parameter is mandatory but can be set to ``0`` if unknown. See next section about cache busting.
The ``_expected={}`` querystring parameter is mandatory. Either you receive a Push notification from the server, and pass the timestamp value in order to bust the CDN cache, or you use a hard-coded value (eg. ``0``) and rely on the cache TTL. See section below about cache busting.
Copy link

@bendk bendk Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one makes total sense to me. Is _expected intended more for responding to push notifications than for polling?

@leplatrem
Copy link
Contributor Author

Thanks for your feedback. I will rework and improve it.

Some clarifications (will put them in the doc nicely later):

  • The push notification payload contains the highest timestamp of all collections
  • We pass this timestamp to the ?_expected={} queystring param when polling for changes (monitor/changes).
  • The polling endpoint (see prod) will return the list of collections with their respective timestamps (last_modified field):
{
  "metadata": {},
  "timestamp": 1713532462683,
  "changes": [
    {
      "id": "19e79f22-62cf-92e1-c12c-a3b4b9cf51be",
      "last_modified": 1603126502200,
      "bucket": "blocklists",
      "collection": "plugins",
      "host": "firefox.settings.services.mozilla.com"
    },
    {
      "id": "b7f595f9-5fc5-d863-b5dd-e5425dcf427a",
      "last_modified": 1604940558744,
      "bucket": "blocklists",
      "collection": "addons",
      "host": "firefox.settings.services.mozilla.com"
    },
   ...
  • We can now fetch each collection using the timestamp that is returned in the polling endpoint. For example, using the above values: /buckets/blocklists/plugins/changeset?_expected=1603126502200
  • Back then we called the queryparam _expected because that's the value you expect to retrieve in the timestamp field in the changeset response. Obviously, it's not true when the cache has expired or when using =0.

  • The changeset returns the whole list by default. But if we only want to fetch the changes since our last polling (because we store a local state, and only want the diff), then we add &_since=
  • The value we pass in since is the timestamp field we got from a previous changeset response

Can you lay out a scenario when you would want to bust the cache while polling? In general, it seems like I wouldn't want to do that.

We would have to agree on a definition of busting the cache. We have to make the distinction between always bust the cache (ie. use a random value in queryparam), and the first requests of each CDN node busting the cache.

When polling with push notification, we want the first requests to bust the polling endpoint with the latest value.

When polling without push notification, what I would recommend is to rely on the cache TTL for the polling endpoint (?_changeset=0), and then use the last_modified values for each collection ?_changeset={collection.last_modified}.
It means that you do an additional request (polling) but at least you don't rely on the cache TTL for the collection. As the service owners, we don't guarantee that we will keep the collection TTL under X hours.

Also, how can we use the _since param in a cache-friendly manner? If all clients did something naive like use the timestamp they last polled, then it seems like each request would be a cache-miss.

For each collection, the amount of possible values for the timestamps is finite. Indeed, each timestamp value corresponds to datetime of the data publication (reviewer approving changes on the server).
Some collections change several times a day, but that still represents a lot of caching.
The push notifications are also debounced to 5min (eg. several users. approving changes in a short amount of time)

Hope it's clearer with these explanations 😅 but don't hesitate to insist :)

I will try to apply these clarifications to the doc.

Copy link
Contributor

@alexcottner alexcottner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow lint is unhappy, but these changes seem to add good clarity

@bendk
Copy link

bendk commented Apr 22, 2024

That makes a lot of sense to me. I'd love to see you list out some of those example use cases in the docs, but that doesn't need to happen in this PR.

@leplatrem leplatrem merged commit 4880793 into main Apr 24, 2024
6 checks passed
@leplatrem leplatrem deleted the clarify-cache-bust branch April 24, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants