Comments API: fix caching issues #4744

rhymes · 2019-11-07T10:13:34Z

What type of PR is this? (check all applicable)

Refactor
Feature
Bug Fix
Documentation Update

Description

What's the problem - caching is hard

The main problem this PR sets to fix is how caching is done incorrectly (and perhaps too eagerly) on the two /api/comments actions: index and show.

Currently there are two sets of caching layers (that don't play well with each other): server side action caching and edge caching:

action caching has been removed from Rails starting version 4 (in 2013!), because of how complicated opaque cache invalidating is and the preferred method is key based expiration. With this refactoring I don't think we need an additional caching layer, but we can revisit that afterwards. I will benchmark a few endpoints with a lot of comments and see if we indeed need an additional caching layer (if we do, we can use the regular cache object(s) or cache the view)
edge caching is misconfigured. According to fastly-rails we need to set a unique cache control key, being this an endpoint where the content uniqueness is determined by a tree (or multiple trees) of comments, we need the comments cache keys to construct a truly unique key.

How did I go about the solution - what does this PR contain?

index finally returns all the comments and their descendants
show finally returns the requested comment and its descendants
there are more tests
there are less queries than before, even if we load more data by going through all descendants
the API response didn't change, there's only a new children array even for comments that have no children, which is best TBH because it makes displaying a tree or recursion on the client side easier
Removes all N+1 queries that existed so far

How did we do it? By using recursion to walk single or multiple trees and thanks to ancestry's arrange method.

Recursion

We use recursion in two places: building the edge cache keys with all the unique cache key for each comment in the tree and to build the JSON inside the Jbuilder files.

Ancestry "arrange"

The ancestry gem uses a very slick trick in which it loads all the comments of a subtree in one single SQL query and then arranges them in a hash that can be manipulated at runtime. This allows us to walk the tree without having to go back to the DB for each subtree level.

Queries before and after

# INDEX BEFORE

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"25"}
  Article Load (14.9ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 25], ["LIMIT", 1]]
  Comment Load (7.9ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2 AND "comments"."ancestry" IS NULL  [["commentable_id", 25], ["commentable_type", "Article"]]
  User Load (15.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN ($1, $2)  [["id", 5], ["id", 8]]
  Comment Load (0.7ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '9' ORDER BY "comments"."score" DESC
  Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '22' ORDER BY "comments"."score" DESC

# INDEX AFTER

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"25"}
  Article Load (1.0ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 25], ["LIMIT", 1]]
  Comment Load (0.5ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 25], ["commentable_type", "Article"]]
  User Load (3.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN ($1, $2, $3)  [["id", 5], ["id", 8], ["id", 11]]

# SHOW BEFORE

Processing by Api::V0::CommentsController#show as JSON
  Parameters: {"id"=>"9"}
  Comment Load (11.0ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
  User Load (16.8ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 5], ["LIMIT", 1]]
  Comment Load (1.0ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '9' ORDER BY score DESC
  User Load (0.6ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 11], ["LIMIT", 1]]
  Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '9/31' ORDER BY score DESC
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 11], ["LIMIT", 1]]

# SHOW AFTER

Processing by Api::V0::CommentsController#show as JSON
  Parameters: {"id"=>"9"}
  Comment Load (0.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
  User Load (0.6ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 5]]
  Comment Load (0.4ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE ("comments"."ancestry" LIKE '9/%' OR "comments"."ancestry" = '9')
  User Load (0.8ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 11]]

Related Tickets & Documents

Closes #2250

…ge cache key

… cache key

rhymes

Added a few notes to explain the PR further

rhymes · 2019-11-07T16:58:46Z

app/controllers/api/v0/comments_controller.rb

-      caches_action :index,
-                    cache_path: proc { |c| c.params.permit! },
-                    expires_in: 10.minutes
-
-      caches_action :show,
-                    cache_path: proc { |c| c.params.permit! },
-                    expires_in: 10.minutes


After having simplified the queries by effectively making only 1 query for all comments, we might not need these.
Since optimization is the root of all evil ™ and since we effectively have edge caching as well, I decided to go ahead and remove server side caching.

If we see it's a huge deal, we can add it back, perhaps using the regular cache object. See the issue description

rhymes · 2019-11-07T16:59:28Z

app/controllers/api/v0/comments_controller.rb

-      caches_action :show,
-                    cache_path: proc { |c| c.params.permit! },
-                    expires_in: 10.minutes
+      before_action :set_cache_control_headers, only: %i[index show]


this needs to be called explicitly to configure edge caching, see https://github.com/fastly/fastly-rails#headers

rhymes · 2019-11-07T17:01:02Z

app/controllers/api/v0/comments_controller.rb


      def index
        article = Article.find(params[:a_id])
-        @comments = article.comments.roots.includes(:user).select(%i[id processed_html user_id ancestry])
+
+        @comments = article.comments.includes(:user).select(%i[id processed_html user_id ancestry]).arrange


here we tell ancestry to build the comments in a hash that contains all the tress:

- comment 1 - reply to comment 1 - comment 2 - reply to comment 2 - reply to reply to comment 2 - comment 3

It will then be navigated by the edge_cache_keys method and the JSON serializer

rhymes · 2019-11-07T17:02:55Z

app/controllers/api/v0/comments_controller.rb

+        @comment = tree_with_root_comment.keys.first
+        @comments = tree_with_root_comment[@comment]


here we use ancestry to have the requested comment and its subtree in separate objects, so we can use the same serializer we use for the index:

- comment 1 - reply to comment 1 - reply to reply to comment 1

The whole tree will then be navigated by the edge_cache_keys method and the JSON serializer

rhymes · 2019-11-07T17:03:29Z

spec/requests/api/v0/comments_spec.rb

+      expect(json_great_grandchild_id_code).to eq(great_grandchild_comment.id_code_generated)
+    end
+
+    it "sets the correct edge caching surrogate key for all the comments" do


added test for the surrogate keys

rhymes · 2019-11-07T17:03:38Z

spec/requests/api/v0/comments_spec.rb

+      expect(json_great_grandchild_id_code).to eq(great_grandchild_comment.id_code_generated)
+    end
+
+    it "sets the correct edge caching surrogate key for all the comments" do


added test for the surrogate keys

rhymes · 2019-11-07T17:10:38Z

NOTE: we also need to clear the edge cache after merging this, the same way we've done with other API routes that had caching issues

mstruve

Couple optimization thoughts, let me know what you think!

app/controllers/api/v0/comments_controller.rb

mstruve · 2019-11-12T20:37:25Z

app/controllers/api/v0/comments_controller.rb

      end

      def show
-        @comment = Comment.find(params[:id].to_i(26))
+        tree_with_root_comment = Comment.subtree_of(params[:id].to_i(26)).


Given that users are very large objects and we might be loading a lot of them here I tried to look for ways we could limit the user data we were pulling as well and came up with using a joins to help limit the data and eliminate a db hit. But I might be missing something here since I have never worked with ancestry so let me know if I am!

[36] pry(main)> Comment.subtree_of(30).joins(:user).select("users.name, users.twitter_username").select(%i[id processed_html user_id ancestry]).arrange Comment Load (1.0ms) SELECT "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2 [["id", 30], ["LIMIT", 1]] Comment Load (5.8ms) SELECT users.name, users.twitter_username, "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" INNER JOIN "users" ON "users"."id" = "comments"."user_id" WHERE (("comments"."ancestry" LIKE '30/%' OR "comments"."ancestry" = '30') OR "comments"."id" = 30) => {#<Comment:0x00007f895ced4da0 id: 30, ancestry: nil, processed_html: "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<p>Cleanse lomo trust fund lumbersexual swag kombucha poutine.</p>\n\n</body></html>\n", user_id: 6>=>{}} [37] pry(main)> Comment.subtree_of(30).includes(:user).select(%i[id processed_html user_id ancestry]).arrange Comment Load (0.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2 [["id", 30], ["LIMIT", 1]] Comment Load (0.3ms) SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE (("comments"."ancestry" LIKE '30/%' OR "comments"."ancestry" = '30') OR "comments"."id" = 30) User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 [["id", 6]] => {#<Comment:0x00007f896ab54578 id: 30, ancestry: nil, processed_html: "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<p>Cleanse lomo trust fund lumbersexual swag kombucha poutine.</p>\n\n</body></html>\n", user_id: 6>=>{}} [38] pry(main)>

If we user joins there will be additional sql query each time we call comment.user (so, N+1)

We can't use joins or left_outer_joins, as @lightalloy said, you get a N+1 each time you access comment.user. This is taken from me calling curl http://localhost:3000/api/comments\?a_id\=13 where I just left a bunch of threaded comments in an article. The same happens for the show action because we load all children comments.

Notice how Rails issues a separate SQL query for each user:

Processing by Api::V0::CommentsController#index as JSON Parameters: {"a_id"=>"13"} Article Load (0.5ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]] Comment Load (0.7ms) SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" INNER JOIN "users" ON "users"."id" = "comments"."user_id" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2 [["commentable_id", 13], ["commentable_type", "Article"]] User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 3], ["LIMIT", 1]] User Load (0.5ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 9], ["LIMIT", 1]] User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 6], ["LIMIT", 1]] CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 6], ["LIMIT", 1]] User Load (0.4ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]] CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]] CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]] CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]] CACHE User Load (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]]

it even signals the N+1 in the logs, thanks to the bullet gem:

GET /api/comments?a_id=13 USE eager loading detected Comment => [:user] Add to your finder: :includes => [:user]

Unfortunately I can't find a way to use, directly from Rails, select on the preloaded query, in our case the SELECT users.

Rails AFAIK doesn't support select on preloaded relationships, see this comment by a member of the Rails team rails/rails#15185 (comment) - The reason being how includes can switch from preloading to eager loading depending on the algorithm.

I was somehow able, thanks to a StackOverflow answer to come up with the following:

article = Article.find 13 ActiveRecord::Associations::Preloader.new.preload(article.comments.select(%i[id processed_html user_id ancestry]), :user, select: %i[id name username]).first.send(:owners)

but: the API is not public, it forces us to use preload in all cases and it returns an array instead of a ActiveRecord::Relation, which breaks .arrange to create the tree back. There's also a gem, rails_select_on_includes (discouraged by the aforamentioned comment), but other than being a hack, it doesn't work with nested relations, which is what we need in thi case.

The only other way I can think of is to cheat, with something like:

@users = User.where(id: article.comments.pluck(:user_id)).select(%i[id name username twitter_username github_username website_url]) @comments = article.comments.select(%i[id processed_html user_id ancestry]).arrange

and then in the view code we pass along the @users and when it comes to render the user object we do this:

user = @users.find { |u| u.id == comment.user_id } json.partial! "comment", comment: comment, user: user

I made it work locally. The queries are:

Processing by Api::V0::CommentsController#index as JSON Parameters: {"a_id"=>"13"} Article Load (0.6ms) SELECT "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2 [["id", 13], ["LIMIT", 1]] (0.4ms) SELECT "comments"."user_id" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2 [["commentable_id", 13], ["commentable_type", "Article"]] Comment Load (0.3ms) SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2 [["commentable_id", 13], ["commentable_type", "Article"]] User Load (0.7ms) SELECT "users"."id", "users"."name", "users"."username", "users"."twitter_username", "users"."github_username", "users"."website_url" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9) [["id", 6], ["id", 9], ["id", 6], ["id", 3], ["id", 13], ["id", 13], ["id", 13], ["id", 13], ["id", 13]]

We have one more query (to get all the user ids to from comments), but we do indeed have smaller user objects in memory. Let me know what you think.

It would work in a similar way for the show action:

Processing by Api::V0::CommentsController#show as JSON Parameters: {"id"=>"1a"} Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2 [["id", 36], ["LIMIT", 1]] (0.4ms) SELECT "comments"."user_id" FROM "comments" WHERE (("comments"."ancestry" LIKE '6/35/36/%' OR "comments"."ancestry" = '6/35/36') OR "comments"."id" = 36) Comment Load (0.6ms) SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE (("comments"."ancestry" LIKE '6/35/36/%' OR "comments"."ancestry" = '6/35/36') OR "comments"."id" = 36) User Load (0.8ms) SELECT "users"."id", "users"."name", "users"."username", "users"."twitter_username", "users"."github_username", "users"."website_url" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4) [["id", 13], ["id", 13], ["id", 13], ["id", 13]]

we would have one more query but can build a custom user object

Hmmm I think the hack might be worth it just knowing how big our User objects are and that if we load a lot of them we are going to load a lot of data. What do you both think? I'm game for either way I do worry about memory though a bit with the initial approach.

It's hard to make a decision about those two without testing them in production

On one side we have "3 queries and more memory" and on the other we have "4 queries and less memory than option 1". Instictively I would lean on the second BUT:

the API is currently broken which allows us to have a little bit of wiggle room in testing

we have edge caching

we can test the current one, benchmark it with tools like wrk or hey by requesting an article with lots of comment, we can literally find in the DB the article that has the most

while the benchmark goes we can check what happens to the memory

if we are not satisfied we test the second one redoing the benchmark

if both are too slow or too resource intensive we add an additional layer of caching, hopefully we don't need it

Either way, I would very much like to benchmark how this endpoint performs

Lets leave it how it is and we can address it later if performance becomes an issue. No need to optimize prematurely

mstruve

LGTM 🚀

benhalpern · 2019-11-13T16:27:55Z

I might be missing something, but how are we invalidating the caches here? I'm definitely 100% for this approach, but I think we're still missing the concept of purging as used in Fastly-Rails.

Our alternate approach to do it based on URL endpoints is a hack that would be nice to get worked out of the codebase eventually, but I don't see how this is currently purging the appropriate caches here.

benhalpern · 2019-11-13T16:30:26Z

NOTE: we also need to clear the edge cache after merging this, the same way we've done with other API routes that had caching issues

This had not previously been cached on the edge, only in Memcache, so by moving from one to the other, we should be good right?

rhymes · 2019-11-13T16:36:17Z

@benhalpern I thought about it and I assumed that the CacheBuster stuff would actually do that. Otherwise why we have set_cache_control_headers around the code, without the proper fastly-rails .purge/.purge_all things? Wouldn't the other resource not get purged if that didn't work?

Right now the comments cache is purged by https://github.com/thepracticaldev/dev.to/blob/master/app/services/edge_cache/commentable/bust.rb which is called by https://github.com/thepracticaldev/dev.to/blob/master/app/jobs/comments/bust_cache_job.rb which is called by the comment's after_save and the before_destroy. There's also a https://github.com/thepracticaldev/dev.to/blob/880f60beffa8b246f65d6b77ad0594a36242d8bd/app/models/comment.rb#L279 which is called after save

Let me know if I'm missing something here

benhalpern · 2019-11-13T20:16:01Z

synchronous_bust is essentially the portion of the cache busting we wouldn't want to wait on, in case the user immediately refreshes. Other stuff can be async.

set_cache_control_headers sets headers so that Fastly knows to cache something at all. But once it's there, there are different ways, technically, to purge an endpoint or group of endpoints. We do it by telling Fastly which URLs to bust, which works, but isn't easily scalable to arbitrary different endpoints. It kind of locks us in to a simple set of URLs and such. That's kind of an interesting constraint that's not all bad, but it really is a smelly part of the code in general.

rhymes · 2019-11-14T10:47:36Z

@benhalpern what route do you propose for making sure we're purging these endpoints?

I can hook up a HTTP logger and see what happens with .purge and .purge_all and check which URLs they call, and maybe we can add those URLs to the bust_comment method. How that sounds?

rhymes · 2019-11-14T12:11:15Z

I went ahead and hooked httplog to the cache buster that would be automatically called when an article's comment is updated. What happens is the following:

[5] pry(main)> CacheBuster.new.bust_comment(article)
   (1.4ms)  SELECT  "articles"."id" FROM "articles" WHERE "articles"."published" = $1 ORDER BY hotness_score DESC LIMIT $2  [["published", true], ["LIMIT", 3]]
   (0.3ms)  BEGIN
  Article Update (0.6ms)  UPDATE "articles" SET "updated_at" = $1, "last_comment_at" = $2 WHERE "articles"."id" = $3  [["updated_at", "2019-11-14 11:44:23.565757"], ["last_comment_at", "2019-11-14 11:44:23.565757"], ["id", 13]]
   (47.7ms)  COMMIT
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/*
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/*?i=i

So we send a bunch of POSTs (2 o 3 per each comment, plus a generic one) using the "legacy" purging mechanism:

Previously purging made an POST call to the /purge endpoint of the Fastly API.
The new method of purging is done by making an HTTP request against the URL using the PURGE HTTP method.

This is confirmed by my tests of using fastly-ruby's purge manually:

[13] pry(main)> url = "https://dev.to/rhymes/comment/1c?i=i?i=i"
=> "https://dev.to/rhymes/comment/1c?i=i?i=i"
[14] pry(main)> fastly.purge(url)
D, [2019-11-14T12:51:38.107435 #76762] DEBUG -- : [httplog] Connecting: dev.to:443
D, [2019-11-14T12:51:38.427354 #76762] DEBUG -- : [httplog] Sending: PURGE http://dev.to:443/rhymes/comment/1c?i=i?i=i

The fastly-rails methods instead (correctly in line with the point of using set_cache_control_headers and set_surrogate_key_header use the new Fastly Purge API:

[15] pry(main)> comment.purge
D, [2019-11-14T13:01:02.677740 #76762] DEBUG -- : [httplog] Sending: POST http://api.fastly.com:443/service/Optional/purge/comments/1
[16] pry(main)> comment.purge_all
D, [2019-11-14T13:04:31.374067 #76762] DEBUG -- : [httplog] Sending: POST http://api.fastly.com:443/service/Optional/purge/comments

where Optional should be the configured FASTLY_SERVICE_ID.

Given all of this I'm not sure how we should proceed, the set_surrogate_key_header I have added are probably sligthly incorrect:

set_surrogate_key_header "api_comments_index", edge_cache_keys(@comments)

should be

set_surrogate_key_header "comments", edge_cache_keys(@comments)

(the documentation uses the name of the collection/table as a prefix which as you notice in the output of comment.purge is the correct one)

My proposal is the following:

I fix the surrogate key header names
I add .purge and .purge_all where relevant, in the bust cache job
we test it

Otherwise I can see if I can setup a Fastly account for development purposes and use that.

Let me know @benhalpern @mstruve

benhalpern · 2019-11-18T18:07:56Z

@rhymes thanks for looking into this. Your logic passes muster. I think the place I went wrong a long time ago was understanding the mapping of the surrogate key prefix to the purge method.

If set_surrogate_key_header "comments", edge_cache_keys(@comments) is appropriate, than any comment.purge where comment is located as part of the @comments collection will appropriately purge all appropriate areas.

I doubt we ever want to use straight up comment.purge_all because that would expire every single page with the comments prefix, which would be unwise.

We could, however, maybe do some custom surrogate keys that apply to types of collections and then manually send the equivalent of purge_all in certain contexts.

For existing stuff I think we can start trying to gradually move to this approach, but there is a lot to keep in mind in terms of how we migrate from old to new. If I'm understanding correctly, both the API and the relevant URLs should share set_surrogate_key_header "comments", but the HTML endpoints would be different in terms of relying on multiple resources and purging for different reasons. E.g. an article show page should have a surrogate key that can expire for any update to the article or the user or any of the comments, similar to the existing logic, but we wouldn't have to rely strictly on the URLs. I'm just not sure how that jives with the above logic. But I think we're getting close to a bigger picture understanding of how this works and what the older problems were.

rhymes · 2019-11-19T10:28:38Z

Agreed @benhalpern.

I went ahead and fixed the collection name for the surrogate keys and added .purge. The reason why fastly-rails -> purges suggests adding .purge_all to the create and destroy operations is to flush the collection so to have the new comment appear or disappear.

I doubt we ever want to use straight up comment.purge_all because that would expire every single page with the comments prefix, which would be unwise.

For existing stuff I think we can start trying to gradually move to this approach, but there is a lot to keep in mind in terms of how we migrate from old to new.

But I think we're getting close to a bigger picture understanding of how this works and what the older problems were.

I'm still not 100% sure how to accomplish all of that on top of the new fastly-rails API (I think it can only be accomplished with fastly-ruby, fastly-rails seems limited to a convention over configuration situation where you have straightforward caching made by single objects and collections of those objects, for the rest I think we should use fastly_client.purge(url))

BUT

in this very case I think we have the luxury of operating on top of an endpoint that currently is broken, so we can literally test in production what happens to its caching behavior once that is gradually fixed.

mstruve

I'm game to try this! Will defer to @benhalpern for the final word and merge since he still understands Fastly the best.

benhalpern

Okay, this looks good to me. I think this should all work and we can figure out how we want to work with the conventions, establish our own, or whatever.

I think we have a clearer picture than ever about how to go about using surrogate keys.

If this seems to be working as expected, we could attempt to add surrogate keys to the various UI comment endpoint.

Outside of Rails and our use cases, I think the explanation of Surrogate keys as a series of strings to partially match against is pretty straightforward. The abstractions on top of that seem to be where things get complicated.

https://docs.fastly.com/en/guides/getting-started-with-surrogate-keys#creating-relationships-between-keys-and-objects

rhymes · 2020-01-15T15:48:40Z

I wonder if it would help to collect a list of all the surrogate keys related to what they are used for that we set in the code to understand what we're dealing with in terms of "edge caching tree"

rhymes · 2020-01-15T15:49:33Z

BTW thanks for merging this, I'll test it and benchmark it right after it's deployed!

Moving over the change from forem#4744 into the migrated worker.

…5543) * Add article's record key to Fastly Surrogate-Key to separate caches After #4744 we successfully setup an auto purging system based on Fastly's surrogate key based cache and purging API. Unfortunately this worked only in the `:show` action as comments are their own tree-based resource. The `:index` though, is related to a single article, so without the article's record key in the `Surrogate-Key` header the first time we hit the API the result would become the value stored in Fastly's API, for all articles. * Fix commentable mock for job spec * Add comment.commentable.purge to the worker as well * Add more thorough tests for commentable

…orem#5543) * Add article's record key to Fastly Surrogate-Key to separate caches After forem#4744 we successfully setup an auto purging system based on Fastly's surrogate key based cache and purging API. Unfortunately this worked only in the `:show` action as comments are their own tree-based resource. The `:index` though, is related to a single article, so without the article's record key in the `Surrogate-Key` header the first time we hit the API the result would become the value stored in Fastly's API, for all articles. * Fix commentable mock for job spec * Add comment.commentable.purge to the worker as well * Add more thorough tests for commentable

rhymes added 10 commits November 7, 2019 01:00

Use ancestry in the index and add test

1cb4973

Add tests for show action

d2a1a1a

Return complete comments trees for index, kill n+1 and use correct ed…

a717463

…ge cache key

Return complete comments tree for show, kill n+1 and use correct edge…

3b510eb

… cache key

Refactor tests and add surrogate keys tests

c5cce26

Preload users for the whole subtree

46984b7

Clarify explanation of the edge cache keys construction

f3d0009

Refactor partials

05d407b

Remove action caching

99fafb8

Merge branch 'master' into rhymes/comments-api-fix-caching

4cb2a72

rhymes commented Nov 7, 2019

View reviewed changes

rhymes changed the title ~~[WIP] Comments API: fix caching issues~~ Comments API: fix caching issues Nov 7, 2019

pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Nov 7, 2019

rhymes requested review from benhalpern, lightalloy and mstruve November 7, 2019 17:15

rhymes mentioned this pull request Nov 11, 2019

API: can't get comments belonging to an article #2250

Closed

mstruve reviewed Nov 12, 2019

View reviewed changes

Replace .map.flatten with .flat_map

7c377b6

mstruve previously approved these changes Nov 13, 2019

View reviewed changes

pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Nov 13, 2019

rhymes added 2 commits November 19, 2019 11:19

Fix surrogate key prefix according to Fastly docs

bb0c9c1

Add comment.purge to Comments::BustCacheJob to use Fastly-Rails

c0c3964

rhymes dismissed mstruve’s stale review via c0c3964 November 19, 2019 10:28

pr-triage bot added PR: unreviewed bot applied label for PR's with no review and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Nov 19, 2019

Fix surrogate key spec

3700af9

rhymes requested a review from mstruve January 8, 2020 11:31

mstruve approved these changes Jan 8, 2020

View reviewed changes

pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Jan 8, 2020

benhalpern approved these changes Jan 15, 2020

View reviewed changes

benhalpern merged commit bcd8545 into forem:master Jan 15, 2020

pr-triage bot added PR: merged bot applied label for PR's that are merged and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Jan 15, 2020

rhymes deleted the rhymes/comments-api-fix-caching branch January 15, 2020 15:47

mstruve pushed a commit that referenced this pull request Jan 15, 2020

Fix spec that broke after we merged #4744 (#5529) [deploy]

4f9321e

VegaFromLyra added a commit to VegaFromLyra/dev.to that referenced this pull request Jan 16, 2020

Purge em all

77c824f

Moving over the change from forem#4744 into the migrated worker.

VegaFromLyra added a commit to VegaFromLyra/dev.to that referenced this pull request Jan 16, 2020

Purge em all

7229578

Moving over the change from forem#4744 into the migrated worker.

rhymes mentioned this pull request Jan 16, 2020

Add article's record key to Fastly Surrogate-Key to separate caches #5543

Merged

5 tasks

This was referenced Jan 19, 2020

Use Fastly-Rails Gem #838

Closed

Issue HTTP calls in CacheBuster concurrently #5585

Closed

Use correct cache keys on the articles API index endpoint #5606

Merged

benhalpern mentioned this pull request Jan 20, 2020

Add proper surrogate key headers for API users#show, articles#feed, stories#index (for users), twitch#show #5612

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments API: fix caching issues #4744

Comments API: fix caching issues #4744

rhymes commented Nov 7, 2019 •

edited

rhymes left a comment

rhymes Nov 7, 2019

rhymes Nov 7, 2019

rhymes Nov 7, 2019

rhymes Nov 7, 2019

rhymes Nov 7, 2019

rhymes Nov 7, 2019

rhymes commented Nov 7, 2019

mstruve left a comment

mstruve Nov 12, 2019

lightalloy Nov 13, 2019

rhymes Nov 13, 2019 •

edited

mstruve Nov 13, 2019

rhymes Nov 13, 2019 •

edited

mstruve Nov 13, 2019

mstruve left a comment

benhalpern commented Nov 13, 2019

benhalpern commented Nov 13, 2019

rhymes commented Nov 13, 2019 •

edited

benhalpern commented Nov 13, 2019

rhymes commented Nov 14, 2019

rhymes commented Nov 14, 2019 •

edited

benhalpern commented Nov 18, 2019

rhymes commented Nov 19, 2019 •

edited

mstruve left a comment

benhalpern left a comment

rhymes commented Jan 15, 2020

rhymes commented Jan 15, 2020 •

edited

		@comment = tree_with_root_comment.keys.first
		@comments = tree_with_root_comment[@comment]

Comments API: fix caching issues #4744

Comments API: fix caching issues #4744

Conversation

rhymes commented Nov 7, 2019 • edited

What type of PR is this? (check all applicable)

Description

What's the problem - caching is hard

How did I go about the solution - what does this PR contain?

Recursion

Ancestry "arrange"

Queries before and after

Related Tickets & Documents

rhymes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhymes commented Nov 7, 2019

mstruve left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhymes Nov 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhymes Nov 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstruve left a comment

Choose a reason for hiding this comment

benhalpern commented Nov 13, 2019

benhalpern commented Nov 13, 2019

rhymes commented Nov 13, 2019 • edited

benhalpern commented Nov 13, 2019

rhymes commented Nov 14, 2019

rhymes commented Nov 14, 2019 • edited

benhalpern commented Nov 18, 2019

rhymes commented Nov 19, 2019 • edited

mstruve left a comment

Choose a reason for hiding this comment

benhalpern left a comment

Choose a reason for hiding this comment

rhymes commented Jan 15, 2020

rhymes commented Jan 15, 2020 • edited

rhymes commented Nov 7, 2019 •

edited

rhymes Nov 13, 2019 •

edited

rhymes Nov 13, 2019 •

edited

rhymes commented Nov 13, 2019 •

edited

rhymes commented Nov 14, 2019 •

edited

rhymes commented Nov 19, 2019 •

edited

rhymes commented Jan 15, 2020 •

edited