Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activerecord DISTINCT ON for json data type in postgresql #17706

Open
3zcurdia opened this issue Nov 21, 2014 · 22 comments · May be fixed by #41622
Open

Activerecord DISTINCT ON for json data type in postgresql #17706

3zcurdia opened this issue Nov 21, 2014 · 22 comments · May be fixed by #41622

Comments

@3zcurdia
Copy link

When a field on postgres is json type, and you make a call to uniq or distinct, the query generates SELECT DISTINCT but in order to work with json type on postgres it should be SELECT DISTINCT ON (), Also you can check more details here

@namxam
Copy link

namxam commented Dec 3, 2014

Can confirm that. I am having the exact same issue right now. It works fine if one uses DISTINCT ON (TABLE_NAME.id). Any ideas how to fix it?

I guess it would be nice if one could specify columns like: Events.distinct(:id) instead of just setting true and false. Probably distinct(true, :id) for backwards compatibility.

@mess110
Copy link
Contributor

mess110 commented Dec 15, 2014

isolated gist

@jgrannas
Copy link

jgrannas commented Feb 1, 2015

Same issue here with rails 4.2 & postgres 9.4.0

I however cannot seem to replace my uniq with a select distinct on, maybe my syntax is wrong?

Before adding a JSON datatype column to my Product model this worked:

Product.includes(:categories, :product_images).where(categories:{id:[2,1,6]}).uniq

This is what i did

Product.includes(:categories).select("DISTINCT ON (products.id) products.*").where(categories:{id:[2,1,6]})

@indaso
Copy link

indaso commented Feb 1, 2015

I've also been able to reproduce the issue. I'm currently testing a fix using the idea by naxnam to have a distinct method that takes in the name of the column as well as true and false values. Events.distinct(:id) should work as well as Events.distinct(true).

@sgrif
Copy link
Contributor

sgrif commented Feb 3, 2015

FWIW, I think that needing to write .select("DISTINCT ON (products.id) products.*") in this case is completely reasonable.

@mikeatlas
Copy link

Unfortunately with DISTINCT ON, all order by expressions must also be included in the list. Here's a scenario I'm tackling right now:

I have a JSON field in a PostGIS-enabled 9.3 database table:

Model.where(:type => 1).distinct

Error:
could not identify an equality operator for type json

Fine, let's do this manually:
Model.where(:type=>1).select("DISTINCT ON (model.id) model.*")

Success. Sort of. Now, in a contrived example, let's do a fancy ORDER BY expression. In PostGIS, I can find "nearby" objects in the real world based on spatial data:

Model.where(:type=>1).select("DISTINCT ON (model.id) model.*").limit(1)
   .order("multi_bounds::geometry <#> st_setsrid(st_makepoint(#{query["longitude"]}, #{query["latitude"]}), 4326)")

With error:
PG::Error: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

(excuse me for omitting in the sample, but I already cast lat/lon data to float, so SQL-i is safe here)

Okay, so, the documentation for DISTINCT ON says we can just append ORDER BY expressions in clauses in the DISTINCT ON list, which seems simple if you are doing an ordering based on a column name, but now you'll need to support more than just :attribute_symbols as parameters. You really also need to support raw string expressions because they'll need to be duplicated as well in the correct order supplied by the order scope.

@Fire-Dragon-DoL
Copy link
Contributor

@sgrif there is a problem though, what if you are performing an eager_load (and using the fields in where clause), in that case, how you deal with: .select("DISTINCT ON (products.id) products.*") ?

Edit: Nevermind, looks like it won't collide

@parubets
Copy link

parubets commented Feb 3, 2016

Fix:
"I can confirm that upgrading to Postgres 9.4 and using JSONB fixes this." by @seanlinsley

Source: activerecord-hackery/ransack#453

@seanlinsley
Copy link
Contributor

Yeah, unfortunately Postgres json doesn't implement equality, but jsonb does. So migrating json columns to jsonb should fix this issue.

@alecdotninja
Copy link

In case anyone else runs across this issue and, for whatever reason, would prefer to use DISTINCT ON, I've created a gem that adds this functionality to ActiveRecord.

@dijonkitchen
Copy link
Contributor

Should this issue be closed now if using jsonb in postgres solves the problem?

@GregT-home
Copy link

FWIW, our shop has hit this and we are tied to Postgres 9.3 for a while yet. It is an annoying problem because the addition of a JSON field to, in this case, a User record has caused otherwise unrelated code elsewhere in the system to fail. I will be changing our code to avoid using JSON, but it would be helpful if this could somehow be fixed.

@seanlinsley
Copy link
Contributor

seanlinsley commented Mar 30, 2017

@gltarsa this isn't something that can be fixed by Rails. If anything, you should report it as a bug in the JSON data type in Postgres.

@matthewd this ticket should be closed.

@calin-ciobanu
Copy link

👍 for adding distinct(:field_name) . I am in the situation that I have a:
Model.select(:field1, :field2).distinct if I am to use the suggested select('DISTINCT ON ... I will have to manually create the rest of the query "models"."field1", "models"."field2" . The list of fields is dynamic and quite long and manually building the select statement is kind of a hack.

I am using PG > 9.3 but cannot migrate to jsonb because some legacy business logic relies on the order of the keys (which json guarantees but jsonb does not). Please help

@seanlinsley
Copy link
Contributor

seanlinsley commented May 8, 2018

@calin-ciobanu you could coerce the JSON into JSONB during the query:

# distinct on the entire JSON object
Model.select 'distinct on (data::jsonb) *'
# distinct on a key in the JSON
Model.select "distinct on (data::jsonb->'id') *"

Though that wouldn't work if you need JSON objects with different key orders to be considered distinct.

@aforty
Copy link

aforty commented Oct 27, 2020

Any news on this, or workarounds besides the awkward select('distinct on...')?

@aliismayilov aliismayilov linked a pull request Mar 5, 2021 that will close this issue
@aliismayilov
Copy link
Contributor

The same problem exists if there's point type column in the table.

I've made #41622 to introduce .distinct_on on ActiveRecord relations. That should help building valid PostgreSQL queries with clear syntax.

@hrdwdmrbl
Copy link

Why does Rails not use DISTINCT ON(users.id) users.* when user calls User.distinct(:id) ?

@machty
Copy link
Contributor

machty commented Mar 17, 2023

@hrdwdmrbl DISTINCT ON is specific to Postgres, whereas DISTINCT is SQL standard and MySQL and others don't implement Postgres's DISTINCT ON.

@hrdwdmrbl
Copy link

@machty Sure, that's fine, but then shouldn't the Postgres adapter be doing that? You know, the quirks of different implementations is the reason for adapters. In the specific case of Postgres, the ActiveRecord API is just not working at all the way we would expect it to.

@machty
Copy link
Contributor

machty commented Mar 17, 2023

@hrdwdmrbl tbh I don't think I have enough expertise to think through every conceivable use case / corner case to be certain that DISTINCT ON(users.PRIMARY_KEY) users.* is a good default. Also there are issues with count queries that would need to be addressed even if this behavior was tucked away within the Postgres adapter.

@hrdwdmrbl
Copy link

@machty Just to clarify a minor point, I think it should be PRIMARY_KEY by default and COLUMN when passing a column as an argument. .distinct(:foo) -> DISTINCT ON (users.foo) users.*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.