Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ActiveSupport::Cache.expand_cache_key sometimes returns falsely same key in parallel tests #51399

Open
thedarkside opened this issue Mar 23, 2024 · 11 comments

Comments

@thedarkside
Copy link

Steps to reproduce

Thats the hard part here! I still didnt figure out how to reproduce it safely.
What i can tell is that this does not happen when i execute a single testcase. It also doesnt happen when i set parallel workers to 1.

When parallel workers is set to processors and all tests are executed via rails test, the following test is breaking repeatedly:

      it "returns updated records omitting cache and returning new etag" do
        get "/apis/v1/me/tickets"
        assert_valid_json_response

        assert_changes ->{ last_response.headers["etag"] } do
          travel 1.second
          tickets(:multi).touch

          get "/apis/v1/me/tickets"
          assert_valid_json_response
        end
      end

The etag header is calculated basically via this line of code which i also found out to return the same wrong key. The updated_at on the record tickets(:multi) changes, but the underlying query fetching the max(tickets.updated_at) seem to return an older timestamp? At least the timestamp in the key remains the same.

ActiveSupport::Cache.expand_cache_key current_user.tickets.visible

The visible scope:

scope :visible, -> { joins(:admittances).where(tickets: {hidden: false}).merge(Admittance.visible).group("tickets.id")}

Expected behavior

The etag header should change in the same way whether i execute a single test or all together without flakiness.

Actual behavior

The etag header remains equal which is wrong. But this happens only when executing all tests parallelized which is kinda flaky.

Notes

I found out that changing the scope implementation from group to distinct fixes it:

scope :visible, -> { joins(:admittances).where(tickets: {hidden: false}).merge(Admittance.visible).select("distinct on (#{table_name}.id) #{table_name}.*")}

System configuration

Rails version: 7.1.2

Ruby version: 3.2.2

@skipkayhil
Copy link
Member

Can you use one of the bug report templates to provide something reproducible? Otherwise its not really possible to figure out what's happening

@skipkayhil skipkayhil added the more-information-needed When reporter needs to provide more information label Mar 23, 2024
@JoeDupuis
Copy link
Contributor

Can you post the output of current_user.tickets.visible.to_sql ?
Unsure, but I suspect you might have an ambiguous column updated_at clashing from the join.

@thedarkside
Copy link
Author

thedarkside commented Mar 24, 2024

Meanwhile broke it down to a testcase where it is getting even stranger.

When i run all tests in parallel, the cache key assertion on tickets(:multi).touch fails but when i run only this particular test alone, the cache key assertion on the tickets(:single).touch fails but the assertion on tickets(:multi).touch passes!

      it "changes the key when touched" do
        user = users(:one)
        assert_same_elements tickets(:single, :multi), user.tickets.visible

        assert_changes ->{ ActiveSupport::Cache.expand_cache_key User.find(user.id).tickets.visible } do
          sleep 1; travel 1.second

          tickets(:single).tap do |ticket|
            assert_changes -> { ticket.reload.updated_at } do
              ticket.touch
            end
          end
        end

        assert_changes ->{ ActiveSupport::Cache.expand_cache_key User.find(user.id).tickets.visible } do
          sleep 1; travel 1.second

          tickets(:multi).tap do |ticket|
            assert_changes -> { ticket.reload.updated_at } do
              ticket.touch
            end
          end
        end
      end

The output of user.tickets.visible.to_sql is this one (@JoeDupuis):

SELECT "tickets".* FROM "tickets" INNER JOIN "admittances" ON "admittances"."ticket_id" = "tickets"."id" WHERE "tickets"."user_id" = 980190962 AND "tickets"."hidden" = FALSE AND "admittances"."hidden" = FALSE AND "admittances"."match_id" IS NOT NULL GROUP BY "tickets"."id"

And for the sake of completeness here the complete implementation of the scopes in use:

class Ticket < ApplicationRecord
  scope :visible, -> { joins(:admittances).where(tickets: {hidden: false}).merge(Admittance.visible).group("tickets.id")}
end 

class Admittance < ApplicationRecord
  scope :visible, -> { where(hidden: false).where.not(match_id: nil) }
end

The sql query generated by the expand_cache_key call:

   (1.6ms)  SELECT COUNT(*) AS "size", MAX("tickets"."updated_at") AS timestamp FROM "tickets" INNER JOIN "admittances" ON "admittances"."ticket_id" = "tickets"."id" WHERE "tickets"."user_id" = $1 AND "tickets"."hidden" = $2 AND "admittances"."hidden" = $3 AND "admittances"."match_id" IS NOT NULL GROUP BY "tickets"."id"  [["user_id", 980190962], ["hidden", false], ["hidden", false]]

Btw, the database in use is postgres:16.1

I still suspect the group by clause in the query but this kind of flakiness iam observing is still surprising and i want either pinpoint a bug which can be fixed if there is one or at least understand it to be able to avoid this issue in the future.

@rails-bot rails-bot bot removed the more-information-needed When reporter needs to provide more information label Mar 24, 2024
@JoeDupuis
Copy link
Contributor

The user.tickets.visible.to_sql query doesn't have ambiguous updated_at like I thought.
Can you send the output of ActiveSupport::Cache.expand_cache_key User.find(user.id).tickets.visible or the failure message from the assertion?

@thedarkside
Copy link
Author

ActiveSupport::Cache.expand_cache_key User.find(user.id).tickets.visible

Sure, here it is

Minitest::Assertion: #<Proc:0x00007fa4b0d03138 <path snip> (lambda)> didn't change.
Expected "tickets/query-78446d320691b6aed46375b2d3dd774d-2-20240324113656000319" to not be equal to "tickets/query-78446d320691b6aed46375b2d3dd774d-2-20240324113656000319".

@skipkayhil skipkayhil added the more-information-needed When reporter needs to provide more information label Mar 25, 2024
@JoeDupuis
Copy link
Contributor

JoeDupuis commented Mar 25, 2024

You are not reloading the association inside of the assert_changes lambda.
Change these line from:
assert_changes ->{ ActiveSupport::Cache.expand_cache_key User.find(user.id).tickets.visible } do
to
assert_changes ->{ ActiveSupport::Cache.expand_cache_key(User.find(user.id).reload.tickets.visible) } do

You're hitting the cache the second time around. Still can't figure out why you're getting flakes though 🤔

@thedarkside
Copy link
Author

thedarkside commented Mar 26, 2024

@JoeDupuis ive already checked it beforehand. That would be visible in the log next to the db queries. Its not! The queries are executed twice without any "Cache" prefix.

If this would be the cause, it also would produce the same issue by changing the query from "group by" to "distinct" imho.

But to make sure ive changed it as you proposed. The result is still the same:

Minitest::Assertion: #<Proc:0x00007fe78cce3c38 <snip> (lambda)> didn't change.
Expected "tickets/query-78446d320691b6aed46375b2d3dd774d-2-20240326115329679131" to not be equal to "tickets/query-78446d320691b6aed46375b2d3dd774d-2-20240326115329679131".
      it "changes the key when touched" do
        user = users(:one)
        assert_same_elements tickets(:single, :multi), user.tickets.visible

        assert_changes ->{ ActiveSupport::Cache.expand_cache_key User.find(user.id).reload.tickets.visible } do
          sleep 1; travel 1.second

          tickets(:single).tap do |ticket|
            assert_changes -> { ticket.reload.updated_at } do
              ticket.touch
            end
          end
        end

        assert_changes ->{ ActiveSupport::Cache.expand_cache_key User.find(user.id).reload.tickets.visible } do
          sleep 1; travel 1.second

          tickets(:multi).tap do |ticket|
            assert_changes -> { ticket.reload.updated_at } do
              ticket.touch
            end
          end
        end
      end

@rails-bot rails-bot bot removed the more-information-needed When reporter needs to provide more information label Mar 26, 2024
@thedarkside
Copy link
Author

thedarkside commented Mar 26, 2024

oh @JoeDupuis, isnt the group by causing the database to return multiple rows instead one being expected by expand_cache_key?

Looking at the sql query:

SELECT COUNT(*) AS "size", MAX("tickets"."updated_at") ... FROM "tickets" ... GROUP BY "tickets"."id" 

Wouldnt that mean that ActiveSupport::Cache.expand_cache_key is broken for all relations fed in which include a group by clause?

@thedarkside
Copy link
Author

thedarkside commented Mar 26, 2024

Now that i have spent some more time onto this i believe its even worse. If i use the "distinct on" query, the result size shall be 2 but the cache key has encoded a 3 because the "distinct on" part gets discarded for the count&max query!
Thats due joining a 1-* relation and one ticket having two admittances in the result.

scope :visible, -> { joins(:admittances).where(tickets: {hidden: false}).merge(Admittance.visible).select("distinct on (#{table_name}.id) #{table_name}.*")  }

Calling it produced this query

SELECT distinct on (tickets.id) tickets.* FROM "tickets" INNER JOIN "admittances" ON "admittances"."ticket_id" = "tickets"."id" WHERE "tickets"."user_id" = $1 AND "tickets"."hidden" = $2 AND "admittances"."hidden" = $3 AND "admittances"."match_id" IS NOT NULL /* loading for inspect */ LIMIT $4

with 2 tickets being returned as expected. One ticket has 2 admittances attached but we want it being returned once.

But executing ActiveSupport::Cache.expand_cache_key User.find(user.id).tickets.visible produced this query (notice the dropped "distinct on" selection):

SELECT COUNT(*) AS "size", MAX("tickets"."updated_at") AS timestamp FROM "tickets" INNER JOIN "admittances" ON "admittances"."ticket_id" = "tickets"."id" WHERE "tickets"."user_id" = $1 AND "tickets"."hidden" = $2 AND "admittances"."hidden" = $3 AND "admittances"."match_id" IS NOT NULL

resulting in the following cache key:

"tickets/query-3eec5f867a195797fbd94c739b34c0e9-3-20240326135851096507"

notice the -3- indicating a result set of size 3.

That means the cache key can change even when the result doesnt! Probably not that big of a deal because its an edge case but surely its still a flaw to me.

The safest way handling this would be that the cache_version implementation wraps the query into a subquery without altering it. This would prevent conflicts with a "group by" clause and also with the "distinct on".

SELECT COUNT(*) AS ..., MAX("tickets"."updated_at") AS ... FROM (SELECT ... FROM ...)

What do you think?

@JoeDupuis
Copy link
Contributor

Oh good catch!
I was only able to reproduce by hitting the cache, but what was missing from my repro was a third "unused" ticket.

Now I have a repro!
I'll take a look later tonight.

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails"
  # If you want to test against edge Rails replace the previous line with this:
  # gem "rails", github: "rails/rails", branch: "main"

  gem "sqlite3"
  gem "debug"
end

require "active_record"
require "minitest/autorun"
require "logger"
require "active_support/testing/assertions"
require "active_support/testing/time_helpers"


# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :users, force: true do |t|
    t.timestamps
  end

  create_table :tickets, force: true do |t|
    t.integer :user_id
    t.string :name
    t.timestamps
  end

end

class User < ActiveRecord::Base
  has_many :tickets
end

class Ticket < ActiveRecord::Base
  belongs_to :user
  has_many :admittances
end

class BugTest < Minitest::Test
  include ActiveSupport::Testing::Assertions
  include ActiveSupport::Testing::TimeHelpers

  def test_association_stuff
    user = User.create!
    other_ticket = Ticket.create!(user:, name: :other)
    single = Ticket.create!(user:, name: :single)
    multi = Ticket.create!(user:, name: :multi)

    assert_changes -> { pp ActiveSupport::Cache.expand_cache_key(user.tickets.group("tickets.id")) } do
      travel 1.second
      single.tap do |ticket|
        assert_changes -> { ticket.reload.updated_at } do
          ticket.touch
        end
      end
    end
  end
end

@thedarkside
Copy link
Author

@JoeDupuis i wanted to add the "distinct on" issue as a separate test but then realized that its not possible with sqlite.

Its definitely a second issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants