New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordering by an associated table column breaks limit/offset queries #30531

Closed
Mr0grog opened this Issue Sep 6, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@Mr0grog

Mr0grog commented Sep 6, 2017

This is similar to one of the problems described in #28364, but I’m not sure it’s exactly the same, so I’m filing separately. Feel free to mark as a duplicate if it is.

ActiveRecord smartly breaks up a query using limit and offset into two queries if it includes an associated model/table (so you get one query for the primary table’s IDs based on the specified limit and offset, then a second query for the full, joined data based on those IDs).

However, if you add ordering by a column in the associated table, that ordering is included in the first query, which means you might get the wrong IDs. The returned rows are not distinct records from the primary table anymore; each row is now a distinct join on the primary and associated table. Example below.

This actually seems largely correct, but if you are also ordering by some aspect of the primary table first, the ordering from the associated table could be left off the first query, which would be really helpful.

Steps to reproduce

I’ve got an app that tracks changes to web pages over time, so we have a Page and a Version model:

class Page < ApplicationRecord
  has_many :versions, inverse_of: :page
end

class Version < ApplicationRecord
  belongs_to :page, required: true, inverse_of: :versions, touch: true

  # There is also a `captured_at` attribute, which is a datetime
end

To illustrate the problem, query for a set of pages with a limit and offset, including the associated versions:

base_query = Page
  .joins(:versions)
  .includes(:versions)
  .order(updated_at: :desc)
  .order('versions.capture_time')

limited = base_query.limit(10).offset(300)

This results in two SQL queries (reformatted for readability):

SELECT DISTINCT
  "pages"."id",
  "pages"."updated_at" AS alias_0,
  -- the field below is the problem
  versions.capture_time AS alias_1
FROM "pages" INNER JOIN "versions" ON "versions"."page_id" = "pages"."id"
ORDER BY "pages"."updated_at" DESC, versions.capture_time
LIMIT 10
OFFSET 300;

SELECT
  "pages"."id" AS t0_r0,
  -- ...lots more columns from `pages` here...
  "versions"."id" AS t1_r0,
  -- ...lots more columns from `versions` here...
FROM "pages" INNER JOIN "versions" ON "versions"."page_id" = "pages"."id"
WHERE
  "pages"."id" IN ('3f23c537-15b8-4357-8623-329f5ff4330b', '3f23c537-15b8-4357-8623-329f5ff4330b', ...etc)
ORDER BY "pages"."updated_at" DESC, versions.capture_time;

You can see above that the first query includes a column from the associated table, which means the limit and offset don’t really function usefully as offsets into the list of Pages.

Expected behavior

It would be nicer to see the following two queries instead:

SELECT DISTINCT
  "pages"."id",
  "pages"."updated_at" AS alias_0,
  -- no versions.capture_time field
FROM "pages" INNER JOIN "versions" ON "versions"."page_id" = "pages"."id"
ORDER BY "pages"."updated_at" DESC
LIMIT 10
OFFSET 300;

-- Same as second query above
SELECT
  "pages"."id" AS t0_r0,
  ...
  "versions"."id" AS t1_r0,
  ...
FROM "pages" INNER JOIN "versions" ON "versions"."page_id" = "pages"."id"
WHERE
  "pages"."id" IN ('3f23c537-15b8-4357-8623-329f5ff4330b', '3f23c537-15b8-4357-8623-329f5ff4330b', ...etc)
ORDER BY "pages"."updated_at" DESC, versions.capture_time;

I get that this would only work because I am also ordering based on a field from the primary table (so the associated table’s ordering doesn’t affect the result of the first query), but it would be nice if this could be accommodated.

If it’s not possible to specially handle the above case where there is ordering on a column of the primary table, maybe this would be more generic and work:

SELECT DISTINCT
  "id"
FROM (
  SELECT
    "pages"."id" AS id,
    "pages"."updated_at" AS alias_0,
    versions.capture_time AS alias_1
  FROM "pages" INNER JOIN "versions" ON "versions"."page_id" = "pages"."id"
  ORDER BY "pages"."updated_at" DESC, versions.capture_time
)
LIMIT 10
OFFSET 300;

-- Same as second query above
SELECT
  "pages"."id" AS t0_r0,
  ...
  "versions"."id" AS t1_r0,
  ...
FROM "pages" INNER JOIN "versions" ON "versions"."page_id" = "pages"."id"
WHERE
  "pages"."id" IN ('3f23c537-15b8-4357-8623-329f5ff4330b', '3f23c537-15b8-4357-8623-329f5ff4330b', ...etc)
ORDER BY "pages"."updated_at" DESC, versions.capture_time;

At the moment, I’m working around the issue by doing:

# Note: in real-world scenarios, there are other conditions on `versions`,
# which is why I still need the `joins` here.
base_query = Page
  .joins(:versions)
  .order(updated_at: :desc)

limited = base_query.limit(10).offset(300)

limited_ids = limited.pluck(:id, :updated_at).collect { |columns| columns[0] }
actual_results = query
  .where(id: limited_ids)
  .includes(:versions)
  .order('versions.capture_time')

I’d also love to know if there’s a better way to handle this in the mean time.

System configuration

Rails version: 5.1.3

Ruby version: 2.4.1

@rails-bot

This comment has been minimized.

Show comment
Hide comment
@rails-bot

rails-bot bot Dec 5, 2017

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 5-1-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

rails-bot bot commented Dec 5, 2017

This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 5-1-stable branch or on master, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.

@rails-bot rails-bot bot closed this Dec 12, 2017

@Mr0grog

This comment has been minimized.

Show comment
Hide comment
@Mr0grog

Mr0grog Dec 13, 2017

This is still an issue as far as I'm aware.

Mr0grog commented Dec 13, 2017

This is still an issue as far as I'm aware.

@prostko

This comment has been minimized.

Show comment
Hide comment
@prostko

prostko Sep 21, 2018

Still an issue in 2018

prostko commented Sep 21, 2018

Still an issue in 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment