Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments API: fix caching issues #4744

Merged
merged 14 commits into from Jan 15, 2020
Merged

Comments API: fix caching issues #4744

merged 14 commits into from Jan 15, 2020

Conversation

rhymes
Copy link
Contributor

@rhymes rhymes commented Nov 7, 2019

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Documentation Update

Description

What's the problem - caching is hard

The main problem this PR sets to fix is how caching is done incorrectly (and perhaps too eagerly) on the two /api/comments actions: index and show.

Currently there are two sets of caching layers (that don't play well with each other): server side action caching and edge caching:

  • action caching has been removed from Rails starting version 4 (in 2013!), because of how complicated opaque cache invalidating is and the preferred method is key based expiration. With this refactoring I don't think we need an additional caching layer, but we can revisit that afterwards. I will benchmark a few endpoints with a lot of comments and see if we indeed need an additional caching layer (if we do, we can use the regular cache object(s) or cache the view)

  • edge caching is misconfigured. According to fastly-rails we need to set a unique cache control key, being this an endpoint where the content uniqueness is determined by a tree (or multiple trees) of comments, we need the comments cache keys to construct a truly unique key.

How did I go about the solution - what does this PR contain?

  1. index finally returns all the comments and their descendants
  2. show finally returns the requested comment and its descendants
  3. there are more tests
  4. there are less queries than before, even if we load more data by going through all descendants
  5. the API response didn't change, there's only a new children array even for comments that have no children, which is best TBH because it makes displaying a tree or recursion on the client side easier
  6. Removes all N+1 queries that existed so far

How did we do it? By using recursion to walk single or multiple trees and thanks to ancestry's arrange method.

Recursion

We use recursion in two places: building the edge cache keys with all the unique cache key for each comment in the tree and to build the JSON inside the Jbuilder files.

Ancestry "arrange"

The ancestry gem uses a very slick trick in which it loads all the comments of a subtree in one single SQL query and then arranges them in a hash that can be manipulated at runtime. This allows us to walk the tree without having to go back to the DB for each subtree level.

Queries before and after

# INDEX BEFORE

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"25"}
  Article Load (14.9ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 25], ["LIMIT", 1]]
  Comment Load (7.9ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2 AND "comments"."ancestry" IS NULL  [["commentable_id", 25], ["commentable_type", "Article"]]
  User Load (15.3ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN ($1, $2)  [["id", 5], ["id", 8]]
  Comment Load (0.7ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '9' ORDER BY "comments"."score" DESC
  Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '22' ORDER BY "comments"."score" DESC

# INDEX AFTER

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"25"}
  Article Load (1.0ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 25], ["LIMIT", 1]]
  Comment Load (0.5ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 25], ["commentable_type", "Article"]]
  User Load (3.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN ($1, $2, $3)  [["id", 5], ["id", 8], ["id", 11]]

# SHOW BEFORE

Processing by Api::V0::CommentsController#show as JSON
  Parameters: {"id"=>"9"}
  Comment Load (11.0ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
  User Load (16.8ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 5], ["LIMIT", 1]]
  Comment Load (1.0ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '9' ORDER BY score DESC
  User Load (0.6ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 11], ["LIMIT", 1]]
  Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."ancestry" = '9/31' ORDER BY score DESC
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 11], ["LIMIT", 1]]

# SHOW AFTER

Processing by Api::V0::CommentsController#show as JSON
  Parameters: {"id"=>"9"}
  Comment Load (0.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
  User Load (0.6ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 5]]
  Comment Load (0.4ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE ("comments"."ancestry" LIKE '9/%' OR "comments"."ancestry" = '9')
  User Load (0.8ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 11]]

Related Tickets & Documents

Closes #2250

Copy link
Contributor Author

@rhymes rhymes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few notes to explain the PR further

Comment on lines -6 to -12
caches_action :index,
cache_path: proc { |c| c.params.permit! },
expires_in: 10.minutes

caches_action :show,
cache_path: proc { |c| c.params.permit! },
expires_in: 10.minutes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having simplified the queries by effectively making only 1 query for all comments, we might not need these.
Since optimization is the root of all evil ™ and since we effectively have edge caching as well, I decided to go ahead and remove server side caching.

If we see it's a huge deal, we can add it back, perhaps using the regular cache object. See the issue description

caches_action :show,
cache_path: proc { |c| c.params.permit! },
expires_in: 10.minutes
before_action :set_cache_control_headers, only: %i[index show]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be called explicitly to configure edge caching, see https://github.com/fastly/fastly-rails#headers


def index
article = Article.find(params[:a_id])
@comments = article.comments.roots.includes(:user).select(%i[id processed_html user_id ancestry])

@comments = article.comments.includes(:user).select(%i[id processed_html user_id ancestry]).arrange
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we tell ancestry to build the comments in a hash that contains all the tress:

- comment 1
  - reply to comment 1
- comment 2
  - reply to comment 2
    - reply to reply to comment 2
- comment 3

It will then be navigated by the edge_cache_keys method and the JSON serializer

Comment on lines +23 to +24
@comment = tree_with_root_comment.keys.first
@comments = tree_with_root_comment[@comment]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we use ancestry to have the requested comment and its subtree in separate objects, so we can use the same serializer we use for the index:

- comment 1
  - reply to comment 1
    - reply to reply to comment 1

The whole tree will then be navigated by the edge_cache_keys method and the JSON serializer

expect(json_great_grandchild_id_code).to eq(great_grandchild_comment.id_code_generated)
end

it "sets the correct edge caching surrogate key for all the comments" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test for the surrogate keys

expect(json_great_grandchild_id_code).to eq(great_grandchild_comment.id_code_generated)
end

it "sets the correct edge caching surrogate key for all the comments" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test for the surrogate keys

@rhymes
Copy link
Contributor Author

rhymes commented Nov 7, 2019

NOTE: we also need to clear the edge cache after merging this, the same way we've done with other API routes that had caching issues

@rhymes rhymes changed the title [WIP] Comments API: fix caching issues Comments API: fix caching issues Nov 7, 2019
@pr-triage pr-triage bot added the PR: unreviewed bot applied label for PR's with no review label Nov 7, 2019
Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple optimization thoughts, let me know what you think!

app/controllers/api/v0/comments_controller.rb Outdated Show resolved Hide resolved
end

def show
@comment = Comment.find(params[:id].to_i(26))
tree_with_root_comment = Comment.subtree_of(params[:id].to_i(26)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that users are very large objects and we might be loading a lot of them here I tried to look for ways we could limit the user data we were pulling as well and came up with using a joins to help limit the data and eliminate a db hit. But I might be missing something here since I have never worked with ancestry so let me know if I am!

[36] pry(main)> Comment.subtree_of(30).joins(:user).select("users.name, users.twitter_username").select(%i[id processed_html user_id ancestry]).arrange
  Comment Load (1.0ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 30], ["LIMIT", 1]]
  Comment Load (5.8ms)  SELECT users.name, users.twitter_username, "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" INNER JOIN "users" ON "users"."id" = "comments"."user_id" WHERE (("comments"."ancestry" LIKE '30/%' OR "comments"."ancestry" = '30') OR "comments"."id" = 30)
=> {#<Comment:0x00007f895ced4da0
  id: 30,
  ancestry: nil,
  processed_html: "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<p>Cleanse lomo trust fund lumbersexual swag kombucha poutine.</p>\n\n</body></html>\n",
  user_id: 6>=>{}}
[37] pry(main)> Comment.subtree_of(30).includes(:user).select(%i[id processed_html user_id ancestry]).arrange
  Comment Load (0.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 30], ["LIMIT", 1]]
  Comment Load (0.3ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE (("comments"."ancestry" LIKE '30/%' OR "comments"."ancestry" = '30') OR "comments"."id" = 30)
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 6]]
=> {#<Comment:0x00007f896ab54578
  id: 30,
  ancestry: nil,
  processed_html: "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<p>Cleanse lomo trust fund lumbersexual swag kombucha poutine.</p>\n\n</body></html>\n",
  user_id: 6>=>{}}
[38] pry(main)>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we user joins there will be additional sql query each time we call comment.user (so, N+1)

Copy link
Contributor Author

@rhymes rhymes Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use joins or left_outer_joins, as @lightalloy said, you get a N+1 each time you access comment.user. This is taken from me calling curl http://localhost:3000/api/comments\?a_id\=13 where I just left a bunch of threaded comments in an article. The same happens for the show action because we load all children comments.

Notice how Rails issues a separate SQL query for each user:

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"13"}
  Article Load (0.5ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  Comment Load (0.7ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" INNER JOIN "users" ON "users"."id" = "comments"."user_id" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 13], ["commentable_type", "Article"]]
  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 3], ["LIMIT", 1]]
  User Load (0.5ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]

it even signals the N+1 in the logs, thanks to the bullet gem:

GET /api/comments?a_id=13
USE eager loading detected
  Comment => [:user]
  Add to your finder: :includes => [:user]

Unfortunately I can't find a way to use, directly from Rails, select on the preloaded query, in our case the SELECT users.

Rails AFAIK doesn't support select on preloaded relationships, see this comment by a member of the Rails team rails/rails#15185 (comment) - The reason being how includes can switch from preloading to eager loading depending on the algorithm.

I was somehow able, thanks to a StackOverflow answer to come up with the following:

article = Article.find 13
ActiveRecord::Associations::Preloader.new.preload(article.comments.select(%i[id processed_html user_id ancestry]), :user, select: %i[id name username]).first.send(:owners)

but: the API is not public, it forces us to use preload in all cases and it returns an array instead of a ActiveRecord::Relation, which breaks .arrange to create the tree back. There's also a gem, rails_select_on_includes (discouraged by the aforamentioned comment), but other than being a hack, it doesn't work with nested relations, which is what we need in thi case.

The only other way I can think of is to cheat, with something like:

@users = User.where(id: article.comments.pluck(:user_id)).select(%i[id name username twitter_username github_username website_url])
@comments = article.comments.select(%i[id processed_html user_id ancestry]).arrange

and then in the view code we pass along the @users and when it comes to render the user object we do this:

user = @users.find { |u| u.id == comment.user_id }
json.partial! "comment", comment: comment, user: user

I made it work locally. The queries are:

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"13"}
  Article Load (0.6ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
   (0.4ms)  SELECT "comments"."user_id" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 13], ["commentable_type", "Article"]]
  Comment Load (0.3ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 13], ["commentable_type", "Article"]]
  User Load (0.7ms)  SELECT "users"."id", "users"."name", "users"."username", "users"."twitter_username", "users"."github_username", "users"."website_url" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)  [["id", 6], ["id", 9], ["id", 6], ["id", 3], ["id", 13], ["id", 13], ["id", 13], ["id", 13], ["id", 13]]

We have one more query (to get all the user ids to from comments), but we do indeed have smaller user objects in memory. Let me know what you think.

It would work in a similar way for the show action:

Processing by Api::V0::CommentsController#show as JSON
  Parameters: {"id"=>"1a"}
  Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 36], ["LIMIT", 1]]
   (0.4ms)  SELECT "comments"."user_id" FROM "comments" WHERE (("comments"."ancestry" LIKE '6/35/36/%' OR "comments"."ancestry" = '6/35/36') OR "comments"."id" = 36)
  Comment Load (0.6ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE (("comments"."ancestry" LIKE '6/35/36/%' OR "comments"."ancestry" = '6/35/36') OR "comments"."id" = 36)
  User Load (0.8ms)  SELECT "users"."id", "users"."name", "users"."username", "users"."twitter_username", "users"."github_username", "users"."website_url" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4)  [["id", 13], ["id", 13], ["id", 13], ["id", 13]]

we would have one more query but can build a custom user object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I think the hack might be worth it just knowing how big our User objects are and that if we load a lot of them we are going to load a lot of data. What do you both think? I'm game for either way I do worry about memory though a bit with the initial approach.

Copy link
Contributor Author

@rhymes rhymes Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to make a decision about those two without testing them in production

On one side we have "3 queries and more memory" and on the other we have "4 queries and less memory than option 1". Instictively I would lean on the second BUT:

  • the API is currently broken which allows us to have a little bit of wiggle room in testing
  • we have edge caching
  • we can test the current one, benchmark it with tools like wrk or hey by requesting an article with lots of comment, we can literally find in the DB the article that has the most
  • while the benchmark goes we can check what happens to the memory
  • if we are not satisfied we test the second one redoing the benchmark
  • if both are too slow or too resource intensive we add an additional layer of caching, hopefully we don't need it

Either way, I would very much like to benchmark how this endpoint performs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave it how it is and we can address it later if performance becomes an issue. No need to optimize prematurely

mstruve
mstruve previously approved these changes Nov 13, 2019
Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Nov 13, 2019
@benhalpern
Copy link
Contributor

I might be missing something, but how are we invalidating the caches here? I'm definitely 100% for this approach, but I think we're still missing the concept of purging as used in Fastly-Rails.

Our alternate approach to do it based on URL endpoints is a hack that would be nice to get worked out of the codebase eventually, but I don't see how this is currently purging the appropriate caches here.

@benhalpern
Copy link
Contributor

NOTE: we also need to clear the edge cache after merging this, the same way we've done with other API routes that had caching issues

This had not previously been cached on the edge, only in Memcache, so by moving from one to the other, we should be good right?

@rhymes
Copy link
Contributor Author

rhymes commented Nov 13, 2019

@benhalpern I thought about it and I assumed that the CacheBuster stuff would actually do that. Otherwise why we have set_cache_control_headers around the code, without the proper fastly-rails .purge/.purge_all things? Wouldn't the other resource not get purged if that didn't work?

Right now the comments cache is purged by https://github.com/thepracticaldev/dev.to/blob/master/app/services/edge_cache/commentable/bust.rb which is called by https://github.com/thepracticaldev/dev.to/blob/master/app/jobs/comments/bust_cache_job.rb which is called by the comment's after_save and the before_destroy. There's also a https://github.com/thepracticaldev/dev.to/blob/880f60beffa8b246f65d6b77ad0594a36242d8bd/app/models/comment.rb#L279 which is called after save

Let me know if I'm missing something here

@benhalpern
Copy link
Contributor

synchronous_bust is essentially the portion of the cache busting we wouldn't want to wait on, in case the user immediately refreshes. Other stuff can be async.

set_cache_control_headers sets headers so that Fastly knows to cache something at all. But once it's there, there are different ways, technically, to purge an endpoint or group of endpoints. We do it by telling Fastly which URLs to bust, which works, but isn't easily scalable to arbitrary different endpoints. It kind of locks us in to a simple set of URLs and such. That's kind of an interesting constraint that's not all bad, but it really is a smelly part of the code in general.

@rhymes
Copy link
Contributor Author

rhymes commented Nov 14, 2019

@benhalpern what route do you propose for making sure we're purging these endpoints?

I can hook up a HTTP logger and see what happens with .purge and .purge_all and check which URLs they call, and maybe we can add those URLs to the bust_comment method. How that sounds?

@rhymes
Copy link
Contributor Author

rhymes commented Nov 14, 2019

I went ahead and hooked httplog to the cache buster that would be automatically called when an article's comment is updated. What happens is the following:

[5] pry(main)> CacheBuster.new.bust_comment(article)
   (1.4ms)  SELECT  "articles"."id" FROM "articles" WHERE "articles"."published" = $1 ORDER BY hotness_score DESC LIMIT $2  [["published", true], ["LIMIT", 3]]
   (0.3ms)  BEGIN
  Article Update (0.6ms)  UPDATE "articles" SET "updated_at" = $1, "last_comment_at" = $2 WHERE "articles"."id" = $3  [["updated_at", "2019-11-14 11:44:23.565757"], ["last_comment_at", "2019-11-14 11:44:23.565757"], ["id", 13]]
   (47.7ms)  COMMIT
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/6?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/auerspencer/comment/f?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/trantow_faviola/comment/k?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/mitchtreutel/comment/11?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/19?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1a?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/rhymes/comment/1b?i=i?i=i
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/*
[httplog] Sending: POST http://api.fastly.com:443/purge/https://dev.to/coraleeabshire/the-sun-also-rises-3i1h/comments/*?i=i

So we send a bunch of POSTs (2 o 3 per each comment, plus a generic one) using the "legacy" purging mechanism:

Previously purging made an POST call to the /purge endpoint of the Fastly API.
The new method of purging is done by making an HTTP request against the URL using the PURGE HTTP method.

This is confirmed by my tests of using fastly-ruby's purge manually:

[13] pry(main)> url = "https://dev.to/rhymes/comment/1c?i=i?i=i"
=> "https://dev.to/rhymes/comment/1c?i=i?i=i"
[14] pry(main)> fastly.purge(url)
D, [2019-11-14T12:51:38.107435 #76762] DEBUG -- : [httplog] Connecting: dev.to:443
D, [2019-11-14T12:51:38.427354 #76762] DEBUG -- : [httplog] Sending: PURGE http://dev.to:443/rhymes/comment/1c?i=i?i=i

The fastly-rails methods instead (correctly in line with the point of using set_cache_control_headers and set_surrogate_key_header use the new Fastly Purge API:

[15] pry(main)> comment.purge
D, [2019-11-14T13:01:02.677740 #76762] DEBUG -- : [httplog] Sending: POST http://api.fastly.com:443/service/Optional/purge/comments/1
[16] pry(main)> comment.purge_all
D, [2019-11-14T13:04:31.374067 #76762] DEBUG -- : [httplog] Sending: POST http://api.fastly.com:443/service/Optional/purge/comments

where Optional should be the configured FASTLY_SERVICE_ID.

Given all of this I'm not sure how we should proceed, the set_surrogate_key_header I have added are probably sligthly incorrect:

set_surrogate_key_header "api_comments_index", edge_cache_keys(@comments)

should be

set_surrogate_key_header "comments", edge_cache_keys(@comments)

(the documentation uses the name of the collection/table as a prefix which as you notice in the output of comment.purge is the correct one)

My proposal is the following:

  • I fix the surrogate key header names
  • I add .purge and .purge_all where relevant, in the bust cache job
  • we test it

Otherwise I can see if I can setup a Fastly account for development purposes and use that.

Let me know @benhalpern @mstruve

@benhalpern
Copy link
Contributor

@rhymes thanks for looking into this. Your logic passes muster. I think the place I went wrong a long time ago was understanding the mapping of the surrogate key prefix to the purge method.

If set_surrogate_key_header "comments", edge_cache_keys(@comments) is appropriate, than any comment.purge where comment is located as part of the @comments collection will appropriately purge all appropriate areas.

I doubt we ever want to use straight up comment.purge_all because that would expire every single page with the comments prefix, which would be unwise.

We could, however, maybe do some custom surrogate keys that apply to types of collections and then manually send the equivalent of purge_all in certain contexts.

For existing stuff I think we can start trying to gradually move to this approach, but there is a lot to keep in mind in terms of how we migrate from old to new. If I'm understanding correctly, both the API and the relevant URLs should share set_surrogate_key_header "comments", but the HTML endpoints would be different in terms of relying on multiple resources and purging for different reasons. E.g. an article show page should have a surrogate key that can expire for any update to the article or the user or any of the comments, similar to the existing logic, but we wouldn't have to rely strictly on the URLs. I'm just not sure how that jives with the above logic. But I think we're getting close to a bigger picture understanding of how this works and what the older problems were.

@pr-triage pr-triage bot added PR: unreviewed bot applied label for PR's with no review and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Nov 19, 2019
@rhymes
Copy link
Contributor Author

rhymes commented Nov 19, 2019

Agreed @benhalpern.

I went ahead and fixed the collection name for the surrogate keys and added .purge. The reason why fastly-rails -> purges suggests adding .purge_all to the create and destroy operations is to flush the collection so to have the new comment appear or disappear.

I doubt we ever want to use straight up comment.purge_all because that would expire every single page with the comments prefix, which would be unwise.

For existing stuff I think we can start trying to gradually move to this approach, but there is a lot to keep in mind in terms of how we migrate from old to new.

But I think we're getting close to a bigger picture understanding of how this works and what the older problems were.

I'm still not 100% sure how to accomplish all of that on top of the new fastly-rails API (I think it can only be accomplished with fastly-ruby, fastly-rails seems limited to a convention over configuration situation where you have straightforward caching made by single objects and collections of those objects, for the rest I think we should use fastly_client.purge(url))

BUT

in this very case I think we have the luxury of operating on top of an endpoint that currently is broken, so we can literally test in production what happens to its caching behavior once that is gradually fixed.

@rhymes rhymes requested a review from mstruve January 8, 2020 11:31
Copy link
Contributor

@mstruve mstruve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm game to try this! Will defer to @benhalpern for the final word and merge since he still understands Fastly the best.

@pr-triage pr-triage bot added PR: reviewed-approved bot applied label for PR's where reviewer approves changes and removed PR: unreviewed bot applied label for PR's with no review labels Jan 8, 2020
Copy link
Contributor

@benhalpern benhalpern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this looks good to me. I think this should all work and we can figure out how we want to work with the conventions, establish our own, or whatever.

I think we have a clearer picture than ever about how to go about using surrogate keys.

If this seems to be working as expected, we could attempt to add surrogate keys to the various UI comment endpoint.

Outside of Rails and our use cases, I think the explanation of Surrogate keys as a series of strings to partially match against is pretty straightforward. The abstractions on top of that seem to be where things get complicated.

https://docs.fastly.com/en/guides/getting-started-with-surrogate-keys#creating-relationships-between-keys-and-objects

@benhalpern benhalpern merged commit bcd8545 into forem:master Jan 15, 2020
@pr-triage pr-triage bot added PR: merged bot applied label for PR's that are merged and removed PR: reviewed-approved bot applied label for PR's where reviewer approves changes labels Jan 15, 2020
@rhymes rhymes deleted the rhymes/comments-api-fix-caching branch January 15, 2020 15:47
@rhymes
Copy link
Contributor Author

rhymes commented Jan 15, 2020

I wonder if it would help to collect a list of all the surrogate keys related to what they are used for that we set in the code to understand what we're dealing with in terms of "edge caching tree"

@rhymes
Copy link
Contributor Author

rhymes commented Jan 15, 2020

BTW thanks for merging this, I'll test it and benchmark it right after it's deployed!

VegaFromLyra added a commit to VegaFromLyra/dev.to that referenced this pull request Jan 16, 2020
Moving over the change from forem#4744 into the migrated worker.
VegaFromLyra added a commit to VegaFromLyra/dev.to that referenced this pull request Jan 16, 2020
Moving over the change from forem#4744 into the migrated worker.
benhalpern pushed a commit that referenced this pull request Jan 17, 2020
…5543)

* Add article's record key to Fastly Surrogate-Key to separate caches

After #4744 we successfully setup an auto purging system based on Fastly's surrogate key based cache and purging API.
Unfortunately this worked only in the `:show` action as comments are their own tree-based resource.
The `:index` though, is related to a single article, so without the article's record key in the `Surrogate-Key` header the first time we hit the API the result would become the value stored in Fastly's API, for all articles.

* Fix commentable mock for job spec

* Add comment.commentable.purge to the worker as well

* Add more thorough tests for commentable
atsmith813 pushed a commit to atsmith813/dev.to that referenced this pull request Jan 17, 2020
…orem#5543)

* Add article's record key to Fastly Surrogate-Key to separate caches

After forem#4744 we successfully setup an auto purging system based on Fastly's surrogate key based cache and purging API.
Unfortunately this worked only in the `:show` action as comments are their own tree-based resource.
The `:index` though, is related to a single article, so without the article's record key in the `Surrogate-Key` header the first time we hit the API the result would become the value stored in Fastly's API, for all articles.

* Fix commentable mock for job spec

* Add comment.commentable.purge to the worker as well

* Add more thorough tests for commentable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: merged bot applied label for PR's that are merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: can't get comments belonging to an article
4 participants