-
Notifications
You must be signed in to change notification settings - Fork 21.9k
Use collection ids and timestamps to generate truly unique collection cache keys #21503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use collection ids and timestamps to generate truly unique collection cache keys #21503
Conversation
(@rails-bot has picked a reviewer for you, use r? to override) |
@christos this looks like a good solution. Much simpler as well 👍 |
|
||
size = result["size"] | ||
timestamp = column_type.deserialize(result["timestamp"]) | ||
unique_signature = collection.unscope(:order).pluck(primary_key, timestamp_column).flatten.join("-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: Perhaps freeze the two "-"
strings while you're at it, since master targets Ruby 2.2+? For what it's worth, the same string is frozen throughout ActiveSupport::Inflector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonatack Done.
Is there any interest on this being merged? r? @sgrif this is related to a feature you merged back in July. |
👍 Let's get this merged! |
I'm at a conference but I have this pinned and will look at it soon. On Fri, Oct 23, 2015, 1:55 PM Manuel Meurer notifications@github.com
|
@sgrif Ping? |
|
||
size = result["size"] | ||
timestamp = column_type.deserialize(result["timestamp"]) | ||
unique_signature = collection.unscope(:order).pluck(primary_key, timestamp_column).flatten.join("-".freeze) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by PostgreSQL order failures? If the scope has an order then the cache key query has to preserve it otherwise it won't match the same set of records.
[3] pry(main)> Post
=> Post(id: integer, published_at: datetime, created_at: datetime, updated_at: datetime)
[4] pry(main)> 20.times { |i| Post.create!(published_at: Time.current - 1.hour * i) }
[5] pry(main)> Post.count
(0.1ms) SELECT COUNT(*) FROM "posts"
=> 20
[6] pry(main)> Post.order(published_at: :desc).limit(10).pluck(:id, :updated_at)
(0.3ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" ORDER BY "posts"."published_at" DESC LIMIT 10
=> [[1, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[2, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[3, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[4, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[5, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[6, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[7, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[8, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[9, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[10, Wed, 25 Nov 2015 21:40:23 UTC +00:00]]
[7] pry(main)> Post.order(published_at: :asc).limit(10).pluck(:id, :updated_at)
(0.2ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" ORDER BY "posts"."published_at" ASC LIMIT 10
=> [[20, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[19, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[18, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[17, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[16, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[15, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[14, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[13, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[12, Wed, 25 Nov 2015 21:40:23 UTC +00:00],
[11, Wed, 25 Nov 2015 21:40:23 UTC +00:00]]
[8] pry(main)> Post.order(published_at: :desc).limit(10).cache_key
(0.2ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" LIMIT 10
=> "posts/collection-digest-7f2daae36b230c64b72b28cafd70ddbd07d0f6369fb46e3155cc90ebcdad5058"
[9] pry(main)> Post.order(published_at: :asc).limit(10).cache_key
(0.1ms) SELECT "posts"."id", "posts"."updated_at" FROM "posts" LIMIT 10
=> "posts/collection-digest-7f2daae36b230c64b72b28cafd70ddbd07d0f6369fb46e3155cc90ebcdad5058"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hdabrows Well spotted.
You can see the Postgres failure in this build
The real culprit is pluck
, which for an association that goes through a join table, creates a query that tries to pluck attributes that are not both in the SELECT DISTINCT
and ORDER BY
clauses as required by Postgres.
ActiveRecord::StatementInvalid: PG::InvalidColumnReference: ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
LINE 1: ... "developers_projects"."project_id" = $1 ORDER BY developers...
^
: SELECT DISTINCT "developers"."id", "developers"."updated_at" FROM "developers" INNER JOIN "developers_projects" ON "developers"."id" = "developers_projects"."developer_id" WHERE "developers_projects"."project_id" = $1 ORDER BY developers.name desc, developers.id desc
To be honest, I have no idea how to pluck what I need without always executing the entire query without pluck
.
Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this change would also fix the bug that results in an exception from trying to generate a cache key on an empty unloaded collection: query = collection
.select("COUNT(*) AS size", "MAX(#{column}) AS timestamp")
.unscope(:order)
result = connection.select_one(query)
size = result["size"] # exception when result is nil
timestamp = column_type.deserialize(result["timestamp"]) # exception when result is nil |
|
||
size = result["size"] | ||
timestamp = column_type.deserialize(result["timestamp"]) | ||
unique_signature = collection.pluck(primary_key, timestamp_column).flatten.join("-".freeze) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christos now that you removed the need for the unscoped
, these statements are identical.
This is what I have implemented in a project inspired by this PR: module CollectionCacheKey
def collection_cache_key(collection = all, timestamp_column = :updated_at)
model_signature = collection.model_name.cache_key
if collection.loaded?
unique_signature = collection.pluck(primary_key, timestamp_column).to_s
else
unique_signature = collection.unscope(:order).pluck(primary_key, timestamp_column).to_s
end
"#{model_signature}/collection-digest-#{unique_signature}"
end
end Note that I have used I found out through a quick benchmark that this is the fastest way to generate a unique cache key. Even if you end up having more inelegant cache keys like |
@christos Are you still interested in working on this? This PR has failing tests which haven't been addressed, and you have a conditional where both branches are identical that needs to be removed. |
@sgrif I am stuck as per this comment I see you came across the problem in the initial PR and you fixed it by removing the order scope, using You later realised that it could mess up the IDs/timestamps, when combining a I'll have a go again trying to create failing tests when |
So, calling Isn’t Arel to blame here? Why does Arel generate a query that can’t be performed against Postgres? |
@sgrif Perhaps you know the answer to that question? (from my comment above)
|
Do we have any update on this? |
вопрос ьолько матарьсльного характера! с этого вопроса бначалась тема!это оплата официальному обменнику на сайте гидра, на обмен ВТС!которые я так и не получил!где мои деньги, ского спрашивать! кругом обман. я ради интеоеса после ещё делал обмен. деньги так и не пришли.17:35, 16 сентября 2019 г., flprben <notifications@github.com>:Do we have any update on this?
—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread.
Отправлено из мобильной Яндекс.Почты: http://yandex
|
номер кошелька меняеться!но я же оплатил!обмен не состоялся! денег нет:-(17:35, 16 сентября 2019 г., flprben <notifications@github.com>:Do we have any update on this?
—You are receiving this because you are subscribed to this thread.Reply to this email directly, view it on GitHub, or mute the thread.
Отправлено из мобильной Яндекс.Почты: http://yandex
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I don't think it has been resolved? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Pull request #20884 introduced collection cache keys to help cache partials for entire collections, provided the collection members update their timestamp columns when they change.
However, as highlighted here and here the solution does not create a unique cache key identifying the collection contents in the following two scenarios.
1 - collection count SQL is not compatible with limit()
The collection size SQL generated by the original PR's code for
Developer.where(name: "David").limit(1)
is incorrect:The
size
returned by the above query is not what theLIMIT
specifies, but the size of the result of all developers whose name isDavid
. This is expected SQL behaviour, by the way.2 - Replacing an old record in the collection does not invalidate the cache key
This has already been explained by @swalkinshaw here and the following test fails against master.
The solution proposed by @swalkinshaw here is what I have been using successfully in production, but in the form of an application helper called
cache_collection
. The helper mirrors thecache
view helper accepting a collection instead of a record.This PR implement the same solution as my
cache_collection
helper but using thecollection_cache_key
approach from #20884I've made a few modifications.
I've removed the SQL query signature digest, because the query contents (i.e. ids and timestamps) can uniquely identify a collection with records whose timestamp columns are properly updated when the records change.
Instead of plucking just the collection ids and the maximum timestamp, I chose to pluck all timestamps taking advantage of the fact that the
pluck
method fires just one query for both ids and timestamps.I've also switched to a SHA256 digest to minimise the probability of a collision. My understanding is that given the large size of the digested content a collision is very improbable.
Personally, I prefer the view helper approach, as it makes it more easy to see that asking for a collection cache key will incur a database request.
Having said that, and seeing as #20884 also incurs a database query, I think this more complete solution will save people some headaches when they encounter the above scenarios where the original PR would fail to provide a correct cache key.
If the Rails core team is happy with my implementation in this PR, then let me know and I will update the method documentation as well.