Use collection ids and timestamps to generate truly unique collection cache keys #21503

christos · 2015-09-04T22:06:40Z

Pull request #20884 introduced collection cache keys to help cache partials for entire collections, provided the collection members update their timestamp columns when they change.

However, as highlighted here and here the solution does not create a unique cache key identifying the collection contents in the following two scenarios.

1 - collection count SQL is not compatible with limit()

The collection size SQL generated by the original PR's code for Developer.where(name: "David").limit(1) is incorrect:

SELECT COUNT(*) AS size, MAX("developers"."updated_at") AS timestamp 
  FROM "developers"
  WHERE "developers"."name" = "David" 
  LIMIT 1

The size returned by the above query is not what the LIMIT specifies, but the size of the result of all developers whose name is David. This is expected SQL behaviour, by the way.

2 - Replacing an old record in the collection does not invalidate the cache key

This has already been explained by @swalkinshaw here and the following test fails against master.

test "collection_cache_key changes when old collection members are replaced" do
  project = Project.create
  project.developers.create(updated_at: 2.hours.ago, name: "anonymous")
  project.developers.create(updated_at: 4.hours.ago, name: "eponymous")

  key1 = project.developers.collection_cache_key

  project.developers.where(name: "eponymous").destroy_all
  project.developers.create(updated_at: 5.hours.ago, name: "anonymous")

  key2 = project.developers.collection_cache_key

  assert_not_equal key2, key1
end

The solution proposed by @swalkinshaw here is what I have been using successfully in production, but in the form of an application helper called cache_collection. The helper mirrors the cache view helper accepting a collection instead of a record.

This PR implement the same solution as my cache_collection helper but using the collection_cache_key approach from #20884

I've made a few modifications.

I've removed the SQL query signature digest, because the query contents (i.e. ids and timestamps) can uniquely identify a collection with records whose timestamp columns are properly updated when the records change.

Instead of plucking just the collection ids and the maximum timestamp, I chose to pluck all timestamps taking advantage of the fact that the pluck method fires just one query for both ids and timestamps.

I've also switched to a SHA256 digest to minimise the probability of a collision. My understanding is that given the large size of the digested content a collision is very improbable.

Personally, I prefer the view helper approach, as it makes it more easy to see that asking for a collection cache key will incur a database request.

Having said that, and seeing as #20884 also incurs a database query, I think this more complete solution will save people some headaches when they encounter the above scenarios where the original PR would fail to provide a correct cache key.

If the Rails core team is happy with my implementation in this PR, then let me know and I will update the method documentation as well.

rails-bot · 2015-09-04T22:06:42Z

r? @carlosantoniodasilva

(@rails-bot has picked a reviewer for you, use r? to override)

swalkinshaw · 2015-09-05T03:25:17Z

@christos this looks like a good solution. Much simpler as well 👍

jonatack · 2015-09-05T16:03:29Z

activerecord/lib/active_record/collection_cache_key.rb

-
-        size = result["size"]
-        timestamp = column_type.deserialize(result["timestamp"])
+        unique_signature = collection.unscope(:order).pluck(primary_key, timestamp_column).flatten.join("-")


Just a thought: Perhaps freeze the two "-" strings while you're at it, since master targets Ruby 2.2+? For what it's worth, the same string is frozen throughout ActiveSupport::Inflector.

@jonatack Done.

… cache keys

christos · 2015-10-23T17:23:53Z

Is there any interest on this being merged?

r? @sgrif this is related to a feature you merged back in July.

manuelmeurer · 2015-10-23T18:55:10Z

👍 Let's get this merged!

sgrif · 2015-10-23T18:59:18Z

I'm at a conference but I have this pinned and will look at it soon.

On Fri, Oct 23, 2015, 1:55 PM Manuel Meurer notifications@github.com
wrote:

[image: 👍] Let's get this merged!

—
Reply to this email directly or view it on GitHub
#21503 (comment).

christos · 2015-11-20T10:54:01Z

@sgrif Ping?

hdabrows · 2015-11-25T22:12:37Z

activerecord/lib/active_record/collection_cache_key.rb

-
-        size = result["size"]
-        timestamp = column_type.deserialize(result["timestamp"])
+        unique_signature = collection.unscope(:order).pluck(primary_key, timestamp_column).flatten.join("-".freeze)


What do you mean by PostgreSQL order failures? If the scope has an order then the cache key query has to preserve it otherwise it won't match the same set of records.

[3] pry(main)> Post => Post(id: integer, published_at: datetime, created_at: datetime, updated_at: datetime) [4] pry(main)> 20.times { |i| Post.create!(published_at: Time.current - 1.hour * i) } [5] pry(main)> Post.count (0.1ms) SELECT COUNT(*) FROM "posts" => 20 [6] pry(main)> Post.order(published_at: :desc).limit(10).pluck(:id, :updated_at) (0.3ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" ORDER BY "posts"."published_at" DESC LIMIT 10 => [[1, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [2, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [3, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [4, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [5, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [6, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [7, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [8, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [9, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [10, Wed, 25 Nov 2015 21:40:23 UTC +00:00]] [7] pry(main)> Post.order(published_at: :asc).limit(10).pluck(:id, :updated_at) (0.2ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" ORDER BY "posts"."published_at" ASC LIMIT 10 => [[20, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [19, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [18, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [17, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [16, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [15, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [14, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [13, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [12, Wed, 25 Nov 2015 21:40:23 UTC +00:00], [11, Wed, 25 Nov 2015 21:40:23 UTC +00:00]] [8] pry(main)> Post.order(published_at: :desc).limit(10).cache_key (0.2ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" LIMIT 10 => "posts/collection-digest-7f2daae36b230c64b72b28cafd70ddbd07d0f6369fb46e3155cc90ebcdad5058" [9] pry(main)> Post.order(published_at: :asc).limit(10).cache_key (0.1ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" LIMIT 10 => "posts/collection-digest-7f2daae36b230c64b72b28cafd70ddbd07d0f6369fb46e3155cc90ebcdad5058"

@hdabrows Well spotted.

You can see the Postgres failure in this build

The real culprit is pluck, which for an association that goes through a join table, creates a query that tries to pluck attributes that are not both in the SELECT DISTINCT and ORDER BY clauses as required by Postgres.

ActiveRecord::StatementInvalid: PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: ... "developers_projects"."project_id" = $1 ORDER BY developers... ^ : SELECT DISTINCT "developers"."id", "developers"."updated_at" FROM "developers" INNER JOIN "developers_projects" ON "developers"."id" = "developers_projects"."developer_id" WHERE "developers_projects"."project_id" = $1 ORDER BY developers.name desc, developers.id desc

To be honest, I have no idea how to pluck what I need without always executing the entire query without pluck.

Any ideas?

@christos , I pulled one request to try to fix this. #28075
I am not sure if this is a good idea

kipcole9 · 2015-11-30T02:01:31Z

Looks like this change would also fix the bug that results in an exception from trying to generate a cache key on an empty unloaded collection:

        query = collection
          .select("COUNT(*) AS size", "MAX(#{column}) AS timestamp")
          .unscope(:order)
        result = connection.select_one(query)
        size = result["size"]    # exception when result is nil
        timestamp = column_type.deserialize(result["timestamp"])  # exception when result is nil

swalkinshaw · 2015-11-30T02:08:01Z

activerecord/lib/active_record/collection_cache_key.rb

-
-        size = result["size"]
-        timestamp = column_type.deserialize(result["timestamp"])
+        unique_signature = collection.pluck(primary_key, timestamp_column).flatten.join("-".freeze)


@christos now that you removed the need for the unscoped, these statements are identical.

cantonic · 2015-12-14T06:14:03Z

This is what I have implemented in a project inspired by this PR:

module CollectionCacheKey
  def collection_cache_key(collection = all, timestamp_column = :updated_at)
    model_signature = collection.model_name.cache_key

    if collection.loaded?
      unique_signature = collection.pluck(primary_key, timestamp_column).to_s
    else
      unique_signature = collection.unscope(:order).pluck(primary_key, timestamp_column).to_s
    end

    "#{model_signature}/collection-digest-#{unique_signature}"
  end
end

Note that I have used to_s instead of flatten.join("-".freeze) for stringifying the unique_signature.

I found out through a quick benchmark that this is the fastest way to generate a unique cache key.

Even if you end up having more inelegant cache keys like foo/collection-digest-[1, 2, 3] instead of foo/collection-digest-1-2-3, one should put performance over style in this case since caching is about speed improvement to the max in my opinion.

sgrif · 2016-02-11T17:15:28Z

@christos Are you still interested in working on this? This PR has failing tests which haven't been addressed, and you have a conditional where both branches are identical that needs to be removed.

christos · 2016-03-06T17:48:58Z

@sgrif I am stuck as per this comment

I see you came across the problem in the initial PR and you fixed it by removing the order scope, using unscope(:order).

You later realised that it could mess up the IDs/timestamps, when combining a pluck with unscope, limit, and offset

I'll have a go again trying to create failing tests when unscope(:order) is called, but I am not sure how to resolve it without loading the entire collection instead of using pluck

bquorning · 2017-01-23T14:22:50Z

I am stuck as per this comment

~~That link leads nowhere now, but I assume you got stuck on the failing Postgres tests:~~ The “outdated diff” comment #21503 (comment) mentions the failing Postgres tests:

ActiveRecord::StatementInvalid: PG::InvalidColumnReference:
  ERROR:  for SELECT DISTINCT, ORDER BY expressions must appear in select list

So, calling pluck(primary_key, timestamp_column) changes the selected columns from * to id, created_at. But since the query needs ordering by name and id, and the select is distinct, the query must select name as well. It doesn’t, and Postgres complains.

Isn’t Arel to blame here? Why does Arel generate a query that can’t be performed against Postgres?

bquorning · 2017-03-20T22:04:02Z

@sgrif Perhaps you know the answer to that question? (from my comment above)

Isn’t Arel to blame here? Why does Arel generate a query that can’t be performed against Postgres?

feliperaul · 2019-09-16T14:33:52Z

Do we have any update on this?

talpava · 2019-09-16T15:02:54Z

вопрос ьолько матарьсльного характера! с этого вопроса бначалась тема!это оплата официальному обменнику на сайте гидра, на обмен ВТС!которые я так и не получил!где мои деньги, ского спрашивать! кругом обман. я ради интеоеса после ещё делал обмен. деньги так и не пришли.17:35, 16 сентября 2019 г., flprben <notifications@github.com>:Do we have any update on this? —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread. Отправлено из мобильной Яндекс.Почты: http://yandex

talpava · 2019-09-16T15:04:22Z

номер кошелька меняеться!но я же оплатил!обмен не состоялся! денег нет:-(17:35, 16 сентября 2019 г., flprben <notifications@github.com>:Do we have any update on this? —You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread. Отправлено из мобильной Яндекс.Почты: http://yandex

rails-bot · 2019-12-17T20:52:34Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

SleeplessByte · 2019-12-18T05:38:31Z

I don't think it has been resolved?

rails-bot · 2020-03-17T05:41:55Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

rails-bot assigned carlosantoniodasilva Sep 4, 2015

christos mentioned this pull request Sep 4, 2015

Add #cache_key to ActiveRecord::Relation. #20884

Merged

jonatack reviewed Sep 5, 2015
View reviewed changes

Christos Zisopoulos added 2 commits October 23, 2015 19:21

Use collection ids and timestamps to generate truly unique collection…

7dca69f

… cache keys

Unscopes :order from query to fix postgres ORDER failures

afacd96

rails-bot assigned sgrif and unassigned carlosantoniodasilva Oct 23, 2015

Freeze static strings used for creating the cache key

ff2a2a6

hdabrows reviewed Nov 25, 2015
View reviewed changes

No need to remove order from relation scope

f888508

swalkinshaw reviewed Nov 30, 2015
View reviewed changes

maclover7 added the activerecord label Jan 10, 2016

prathamesh-sonpatki added this to the 5.0.0 milestone Jan 16, 2016

sgrif removed this from the 5.0.0 milestone Feb 23, 2016

This was referenced Dec 10, 2016

Caching woes since #20884 #27326

Closed

collection_cache_key now respects limit() #27331

Closed

christos mentioned this pull request Feb 6, 2017

ActiveRecord::Relation#cache_key returns same key for different records with hm:t association #27570

Open

zztczcx mentioned this pull request Feb 20, 2017

Add order_by fields to pluck when generate unique collection cache key using ids and timestamps #28075

Closed

aried3r mentioned this pull request Apr 23, 2017

cache_key respects the limit in a relation even if a relation is not loaded #28776

Merged

matthewd mentioned this pull request Feb 15, 2018

ActiveRecord::Relation.cache_key produces same key on different results when using limit #31996

Closed

aried3r mentioned this pull request Nov 14, 2018

cache_key not changing when using limit, order and offset (e.g. pagination) #34408

Closed

alipman88 mentioned this pull request Nov 15, 2019

Add first & last_id segments to collection cache_version #37724

Open

rails-bot bot added the stale label Dec 17, 2019

rails-bot bot removed the stale label Dec 18, 2019

rails-bot bot added the stale label Mar 17, 2020

rails-bot bot closed this Mar 24, 2020

Use collection ids and timestamps to generate truly unique collection cache keys #21503

Use collection ids and timestamps to generate truly unique collection cache keys #21503

Uh oh!

Conversation

christos commented Sep 4, 2015

1 - collection count SQL is not compatible with limit()

2 - Replacing an old record in the collection does not invalidate the cache key

Uh oh!

rails-bot commented Sep 4, 2015

Uh oh!

swalkinshaw commented Sep 5, 2015

Uh oh!

jonatack Sep 5, 2015

Choose a reason for hiding this comment

Uh oh!

christos Oct 23, 2015

Choose a reason for hiding this comment

Uh oh!

christos commented Oct 23, 2015

Uh oh!

manuelmeurer commented Oct 23, 2015

Uh oh!

sgrif commented Oct 23, 2015

Uh oh!

christos commented Nov 20, 2015

Uh oh!

hdabrows Nov 25, 2015

Choose a reason for hiding this comment

Uh oh!

christos Nov 27, 2015

Choose a reason for hiding this comment

Uh oh!

zztczcx Feb 20, 2017

Choose a reason for hiding this comment

Uh oh!

kipcole9 commented Nov 30, 2015

Uh oh!

swalkinshaw Nov 30, 2015

Choose a reason for hiding this comment

Uh oh!

cantonic commented Dec 14, 2015

Uh oh!

sgrif commented Feb 11, 2016

Uh oh!

christos commented Mar 6, 2016

Uh oh!

bquorning commented Jan 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bquorning commented Mar 20, 2017

Uh oh!

feliperaul commented Sep 16, 2019

Uh oh!

talpava commented Sep 16, 2019 via email

Uh oh!

talpava commented Sep 16, 2019 via email

Uh oh!

rails-bot bot commented Dec 17, 2019

Uh oh!

SleeplessByte commented Dec 18, 2019

Uh oh!

rails-bot bot commented Mar 17, 2020

Uh oh!

Uh oh!

bquorning commented Jan 23, 2017 •

edited

Loading