Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments API: fix caching issues #4744

Merged
merged 14 commits into from Jan 15, 2020
Merged
40 changes: 31 additions & 9 deletions app/controllers/api/v0/comments_controller.rb
Expand Up @@ -3,21 +3,43 @@ module V0
class CommentsController < ApiController
respond_to :json

caches_action :index,
cache_path: proc { |c| c.params.permit! },
expires_in: 10.minutes

caches_action :show,
cache_path: proc { |c| c.params.permit! },
expires_in: 10.minutes
Comment on lines -6 to -12
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having simplified the queries by effectively making only 1 query for all comments, we might not need these.
Since optimization is the root of all evil ™ and since we effectively have edge caching as well, I decided to go ahead and remove server side caching.

If we see it's a huge deal, we can add it back, perhaps using the regular cache object. See the issue description

before_action :set_cache_control_headers, only: %i[index show]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be called explicitly to configure edge caching, see https://github.com/fastly/fastly-rails#headers


def index
article = Article.find(params[:a_id])
@comments = article.comments.roots.includes(:user).select(%i[id processed_html user_id ancestry])

@comments = article.comments.includes(:user).select(%i[id processed_html user_id ancestry]).arrange
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we tell ancestry to build the comments in a hash that contains all the tress:

- comment 1
  - reply to comment 1
- comment 2
  - reply to comment 2
    - reply to reply to comment 2
- comment 3

It will then be navigated by the edge_cache_keys method and the JSON serializer


set_surrogate_key_header "api_comments_index", edge_cache_keys(@comments)
end

def show
@comment = Comment.find(params[:id].to_i(26))
tree_with_root_comment = Comment.subtree_of(params[:id].to_i(26)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that users are very large objects and we might be loading a lot of them here I tried to look for ways we could limit the user data we were pulling as well and came up with using a joins to help limit the data and eliminate a db hit. But I might be missing something here since I have never worked with ancestry so let me know if I am!

[36] pry(main)> Comment.subtree_of(30).joins(:user).select("users.name, users.twitter_username").select(%i[id processed_html user_id ancestry]).arrange
  Comment Load (1.0ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 30], ["LIMIT", 1]]
  Comment Load (5.8ms)  SELECT users.name, users.twitter_username, "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" INNER JOIN "users" ON "users"."id" = "comments"."user_id" WHERE (("comments"."ancestry" LIKE '30/%' OR "comments"."ancestry" = '30') OR "comments"."id" = 30)
=> {#<Comment:0x00007f895ced4da0
  id: 30,
  ancestry: nil,
  processed_html: "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<p>Cleanse lomo trust fund lumbersexual swag kombucha poutine.</p>\n\n</body></html>\n",
  user_id: 6>=>{}}
[37] pry(main)> Comment.subtree_of(30).includes(:user).select(%i[id processed_html user_id ancestry]).arrange
  Comment Load (0.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 30], ["LIMIT", 1]]
  Comment Load (0.3ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE (("comments"."ancestry" LIKE '30/%' OR "comments"."ancestry" = '30') OR "comments"."id" = 30)
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = $1  [["id", 6]]
=> {#<Comment:0x00007f896ab54578
  id: 30,
  ancestry: nil,
  processed_html: "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body>\n<p>Cleanse lomo trust fund lumbersexual swag kombucha poutine.</p>\n\n</body></html>\n",
  user_id: 6>=>{}}
[38] pry(main)>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we user joins there will be additional sql query each time we call comment.user (so, N+1)

Copy link
Contributor Author

@rhymes rhymes Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use joins or left_outer_joins, as @lightalloy said, you get a N+1 each time you access comment.user. This is taken from me calling curl http://localhost:3000/api/comments\?a_id\=13 where I just left a bunch of threaded comments in an article. The same happens for the show action because we load all children comments.

Notice how Rails issues a separate SQL query for each user:

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"13"}
  Article Load (0.5ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  Comment Load (0.7ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" INNER JOIN "users" ON "users"."id" = "comments"."user_id" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 13], ["commentable_type", "Article"]]
  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 3], ["LIMIT", 1]]
  User Load (0.5ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 9], ["LIMIT", 1]]
  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
  User Load (0.4ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
  CACHE User Load (0.0ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]

it even signals the N+1 in the logs, thanks to the bullet gem:

GET /api/comments?a_id=13
USE eager loading detected
  Comment => [:user]
  Add to your finder: :includes => [:user]

Unfortunately I can't find a way to use, directly from Rails, select on the preloaded query, in our case the SELECT users.

Rails AFAIK doesn't support select on preloaded relationships, see this comment by a member of the Rails team rails/rails#15185 (comment) - The reason being how includes can switch from preloading to eager loading depending on the algorithm.

I was somehow able, thanks to a StackOverflow answer to come up with the following:

article = Article.find 13
ActiveRecord::Associations::Preloader.new.preload(article.comments.select(%i[id processed_html user_id ancestry]), :user, select: %i[id name username]).first.send(:owners)

but: the API is not public, it forces us to use preload in all cases and it returns an array instead of a ActiveRecord::Relation, which breaks .arrange to create the tree back. There's also a gem, rails_select_on_includes (discouraged by the aforamentioned comment), but other than being a hack, it doesn't work with nested relations, which is what we need in thi case.

The only other way I can think of is to cheat, with something like:

@users = User.where(id: article.comments.pluck(:user_id)).select(%i[id name username twitter_username github_username website_url])
@comments = article.comments.select(%i[id processed_html user_id ancestry]).arrange

and then in the view code we pass along the @users and when it comes to render the user object we do this:

user = @users.find { |u| u.id == comment.user_id }
json.partial! "comment", comment: comment, user: user

I made it work locally. The queries are:

Processing by Api::V0::CommentsController#index as JSON
  Parameters: {"a_id"=>"13"}
  Article Load (0.6ms)  SELECT  "articles".* FROM "articles" WHERE "articles"."id" = $1 LIMIT $2  [["id", 13], ["LIMIT", 1]]
   (0.4ms)  SELECT "comments"."user_id" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 13], ["commentable_type", "Article"]]
  Comment Load (0.3ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE "comments"."commentable_id" = $1 AND "comments"."commentable_type" = $2  [["commentable_id", 13], ["commentable_type", "Article"]]
  User Load (0.7ms)  SELECT "users"."id", "users"."name", "users"."username", "users"."twitter_username", "users"."github_username", "users"."website_url" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4, $5, $6, $7, $8, $9)  [["id", 6], ["id", 9], ["id", 6], ["id", 3], ["id", 13], ["id", 13], ["id", 13], ["id", 13], ["id", 13]]

We have one more query (to get all the user ids to from comments), but we do indeed have smaller user objects in memory. Let me know what you think.

It would work in a similar way for the show action:

Processing by Api::V0::CommentsController#show as JSON
  Parameters: {"id"=>"1a"}
  Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."id" = $1 LIMIT $2  [["id", 36], ["LIMIT", 1]]
   (0.4ms)  SELECT "comments"."user_id" FROM "comments" WHERE (("comments"."ancestry" LIKE '6/35/36/%' OR "comments"."ancestry" = '6/35/36') OR "comments"."id" = 36)
  Comment Load (0.6ms)  SELECT "comments"."id", "comments"."processed_html", "comments"."user_id", "comments"."ancestry" FROM "comments" WHERE (("comments"."ancestry" LIKE '6/35/36/%' OR "comments"."ancestry" = '6/35/36') OR "comments"."id" = 36)
  User Load (0.8ms)  SELECT "users"."id", "users"."name", "users"."username", "users"."twitter_username", "users"."github_username", "users"."website_url" FROM "users" WHERE "users"."id" IN ($1, $2, $3, $4)  [["id", 13], ["id", 13], ["id", 13], ["id", 13]]

we would have one more query but can build a custom user object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I think the hack might be worth it just knowing how big our User objects are and that if we load a lot of them we are going to load a lot of data. What do you both think? I'm game for either way I do worry about memory though a bit with the initial approach.

Copy link
Contributor Author

@rhymes rhymes Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to make a decision about those two without testing them in production

On one side we have "3 queries and more memory" and on the other we have "4 queries and less memory than option 1". Instictively I would lean on the second BUT:

  • the API is currently broken which allows us to have a little bit of wiggle room in testing
  • we have edge caching
  • we can test the current one, benchmark it with tools like wrk or hey by requesting an article with lots of comment, we can literally find in the DB the article that has the most
  • while the benchmark goes we can check what happens to the memory
  • if we are not satisfied we test the second one redoing the benchmark
  • if both are too slow or too resource intensive we add an additional layer of caching, hopefully we don't need it

Either way, I would very much like to benchmark how this endpoint performs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave it how it is and we can address it later if performance becomes an issue. No need to optimize prematurely

includes(:user).
select(%i[id processed_html user_id ancestry]).
arrange

# being only one tree we know that the root comment is the first (and only) key
@comment = tree_with_root_comment.keys.first
@comments = tree_with_root_comment[@comment]
Comment on lines +23 to +24
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we use ancestry to have the requested comment and its subtree in separate objects, so we can use the same serializer we use for the index:

- comment 1
  - reply to comment 1
    - reply to reply to comment 1

The whole tree will then be navigated by the edge_cache_keys method and the JSON serializer


set_surrogate_key_header "api_comments_show", edge_cache_keys(tree_with_root_comment)
end

private

# ancestry wraps a single or multiple trees of comments into a single hash,
# in the case of an article comments, the hash has multiple keys (the root comments),
# in the case of a comment and its descendants, the hash has only one key.
# Either way, we need to use recursion to extract all the comment cache keys
# collecting both the keys of each level of root comments and their descendants
# NOTE: the objects are already loaded in memory by "ancestry",
# so no additional SQL query is performed during this extraction, avoiding N+1s
def edge_cache_keys(comments_trees)
keys = comments_trees.keys.map do |comment|
rhymes marked this conversation as resolved.
Show resolved Hide resolved
Array.wrap(comment.record_key) + edge_cache_keys(comments_trees[comment])
end
keys.flatten
end
end
end
Expand Down
6 changes: 6 additions & 0 deletions app/views/api/v0/comments/_comment.json.jbuilder
@@ -0,0 +1,6 @@
json.type_of "comment"

json.id_code comment.id_code_generated
json.body_html comment.processed_html

json.partial! "api/v0/shared/user", user: comment.user
@@ -0,0 +1,6 @@
json.partial! "comment", comment: comment

# recursively render the comment subtree
json.children do
json.partial! "comments_with_children", comments: children
end
@@ -0,0 +1,4 @@
json.array! comments.keys do |root_comment|
# ancestry organizes root comments and their descendants in a hash structure
json.partial! "comment_with_children", comment: root_comment, children: comments[root_comment]
end
44 changes: 1 addition & 43 deletions app/views/api/v0/comments/index.json.jbuilder
@@ -1,43 +1 @@
json.array! @comments do |comment|
json.type_of "comment"

json.id_code comment.id_code_generated
json.body_html comment.processed_html
json.user do
json.name comment.user.name
json.username comment.user.username
json.twitter_username comment.user.twitter_username
json.github_username comment.user.github_username
json.website_url comment.user.processed_website_url
json.profile_image ProfileImage.new(comment.user).get(640)
json.profile_image_90 ProfileImage.new(comment.user).get(90)
end

json.children comment.children.order(score: :desc).each do |children_comment|
json.id_code children_comment.id_code_generated
json.body_html children_comment.processed_html
json.user do
json.name children_comment.user.name
json.username children_comment.user.username
json.twitter_username children_comment.user.twitter_username
json.github_username children_comment.user.github_username
json.website_url children_comment.user.processed_website_url
json.profile_image ProfileImage.new(children_comment.user).get(640)
json.profile_image_90 ProfileImage.new(children_comment.user).get(90)
end

json.children children_comment.children.order(score: :desc).each do |grandchild_comment|
json.id_code grandchild_comment.id_code_generated
json.body_html grandchild_comment.processed_html
json.user do
json.name grandchild_comment.user.name
json.username grandchild_comment.user.username
json.twitter_username grandchild_comment.user.twitter_username
json.github_username grandchild_comment.user.github_username
json.website_url grandchild_comment.user.processed_website_url
json.profile_image ProfileImage.new(grandchild_comment.user).get(640)
json.profile_image_90 ProfileImage.new(grandchild_comment.user).get(90)
end
end
end
end
json.partial! "comments_with_children", comments: @comments
39 changes: 1 addition & 38 deletions app/views/api/v0/comments/show.json.jbuilder
@@ -1,38 +1 @@
json.type_of "comment"
json.id_code @comment.id_code_generated
json.body_html @comment.processed_html
json.user do
json.name @comment.user.name
json.username @comment.user.username
json.twitter_username @comment.user.twitter_username
json.github_username @comment.user.github_username
json.website_url @comment.user.processed_website_url
json.profile_image ProfileImage.new(@comment.user).get(640)
json.profile_image_90 ProfileImage.new(@comment.user).get(90)
end
json.children @comment.children.order("score DESC").each do |children_comment|
json.id_code children_comment.id_code_generated
json.body_html children_comment.processed_html
json.user do
json.name children_comment.user.name
json.username children_comment.user.username
json.twitter_username children_comment.user.twitter_username
json.github_username children_comment.user.github_username
json.website_url children_comment.user.processed_website_url
json.profile_image ProfileImage.new(children_comment.user).get(640)
json.profile_image_90 ProfileImage.new(children_comment.user).get(90)
end
json.children children_comment.children.order("score DESC").each do |grandchild_comment|
json.id_code grandchild_comment.id_code_generated
json.body_html grandchild_comment.processed_html
json.user do
json.name grandchild_comment.user.name
json.username grandchild_comment.user.username
json.twitter_username grandchild_comment.user.twitter_username
json.github_username grandchild_comment.user.github_username
json.website_url grandchild_comment.user.processed_website_url
json.profile_image ProfileImage.new(grandchild_comment.user).get(640)
json.profile_image_90 ProfileImage.new(grandchild_comment.user).get(90)
end
end
end
json.partial! "comment_with_children", comment: @comment, children: @comments
82 changes: 50 additions & 32 deletions spec/requests/api/v0/comments_spec.rb
Expand Up @@ -6,12 +6,12 @@ def json_response
end

let_it_be(:article) { create(:article) }
let_it_be(:root_comment) { create(:comment, commentable: article) }
let_it_be(:child_comment) { create(:comment, commentable: article, parent: root_comment) }
let_it_be(:grandchild_comment) { create(:comment, commentable: article, parent: child_comment) }
let_it_be(:great_grandchild_comment) { create(:comment, commentable: article, parent: grandchild_comment) }

describe "GET /api/comments" do
before do
create(:comment, commentable: article)
end

it "returns not found if wrong article id" do
get "/api/comments?a_id=gobbledygook"
expect(response).to have_http_status(:not_found)
Expand All @@ -23,69 +23,87 @@ def json_response
end

it "does not include children comments in the root list" do
# create child comment
create(:comment, commentable: article, parent: article.comments.first)

get "/api/comments?a_id=#{article.id}"
expected_ids = article.comments.roots.map(&:id_code_generated)
expect(json_response.map { |cm| cm["id_code"] }).to match_array(expected_ids)
end

it "includes children comments in the children list" do
# create child comment
parent_comment = article.comments.first
child_comment = create(:comment, commentable: article, parent: parent_comment)

get "/api/comments?a_id=#{article.id}"
comment_with_children = json_response.detect { |cm| cm["id_code"] == parent_comment.id_code_generated }
comment_with_children = json_response.detect { |cm| cm["id_code"] == root_comment.id_code_generated }
expect(comment_with_children["children"][0]["id_code"]).to eq(child_comment.id_code_generated)
end

it "includes granchildren comments in the children-children list" do
# create granchild comment
root_comment = article.comments.first
child_comment = create(:comment, commentable: article, parent: root_comment)
granchild_comment = create(:comment, commentable: article, parent: child_comment)
it "includes grandchildren comments in the children-children list" do
get "/api/comments?a_id=#{article.id}"

comment_with_descendants = json_response.detect { |cm| cm["id_code"] == root_comment.id_code_generated }
expect(comment_with_descendants["children"][0]["children"][0]["id_code"]).to eq(grandchild_comment.id_code_generated)
end

it "includes great-grandchildren comments in the children-children-children list" do
get "/api/comments?a_id=#{article.id}"

comment_with_descendants = json_response.detect { |cm| cm["id_code"] == root_comment.id_code_generated }
expect(comment_with_descendants["children"][0]["children"][0]["id_code"]).to eq(granchild_comment.id_code_generated)
json_great_grandchild_id_code = comment_with_descendants["children"][0]["children"][0]["children"][0]["id_code"]
expect(json_great_grandchild_id_code).to eq(great_grandchild_comment.id_code_generated)
end

it "sets the correct edge caching surrogate key for all the comments" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test for the surrogate keys

sibling_root_comment = create(:comment, commentable: article)

get "/api/comments?a_id=#{article.id}"

expected_key = [
"api_comments_index", sibling_root_comment.record_key,
root_comment.record_key, child_comment.record_key,
grandchild_comment.record_key, great_grandchild_comment.record_key
].join(" ")
expect(response.headers["surrogate-key"]).to eq(expected_key)
end
end

describe "GET /api/comments/:id" do
let_it_be(:comment) { create(:comment, commentable: article) }

it "returns not found if wrong comment id" do
get "/api/comments/foobar"
expect(response).to have_http_status(:not_found)
end

it "returns the comment" do
get "/api/comments/#{comment.id_code_generated}"
expect(json_response["id_code"]).to eq(comment.id_code_generated)
get "/api/comments/#{root_comment.id_code_generated}"
expect(json_response["id_code"]).to eq(root_comment.id_code_generated)
end

it "includes children comments in the children list" do
# create child comment
child_comment = create(:comment, commentable: article, parent: comment)
get "/api/comments/#{root_comment.id_code_generated}"

get "/api/comments/#{comment.id_code_generated}"
comment_with_children = json_response
expect(comment_with_children["children"][0]["id_code"]).to eq(child_comment.id_code_generated)
end

it "includes granchildren comments in the children-children list" do
# create granchild comment
root_comment = comment
child_comment = create(:comment, commentable: article, parent: root_comment)
granchild_comment = create(:comment, commentable: article, parent: child_comment)
it "includes grandchildren comments in the children-children list" do
get "/api/comments/#{root_comment.id_code_generated}"

comment_with_descendants = json_response
expect(comment_with_descendants["children"][0]["children"][0]["id_code"]).to eq(grandchild_comment.id_code_generated)
end

get "/api/comments/#{comment.id_code_generated}"
it "includes great-grandchildren comments in the children-children-children list" do
get "/api/comments/#{root_comment.id_code_generated}"

comment_with_descendants = json_response
expect(comment_with_descendants["children"][0]["children"][0]["id_code"]).to eq(granchild_comment.id_code_generated)
json_great_grandchild_id_code = comment_with_descendants["children"][0]["children"][0]["children"][0]["id_code"]
expect(json_great_grandchild_id_code).to eq(great_grandchild_comment.id_code_generated)
end

it "sets the correct edge caching surrogate key for all the comments" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added test for the surrogate keys

get "/api/comments/#{root_comment.id_code_generated}"

expected_key = [
"api_comments_show", root_comment.record_key, child_comment.record_key,
grandchild_comment.record_key, great_grandchild_comment.record_key
].join(" ")
expect(response.headers["surrogate-key"]).to eq(expected_key)
end
end
end