Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The select method in [ActiveRecord] is producing incorrect results with PostgreSQL. #52029

Closed
maniSHarma7575 opened this issue Jun 6, 2024 · 7 comments

Comments

@maniSHarma7575
Copy link
Contributor

maniSHarma7575 commented Jun 6, 2024

.select calls with multiple SQL fragment arguments, using the same function (such as min or max) incorrectly returned values when using PostgreSQL.

Steps to reproduce

# frozen_string_literal: true

require "bundler/inline"
require 'debug'

gemfile(true) do
  source "https://rubygems.org"

  #gem "rails", github: "rails/rails", branch: "6-1-stable"
  gem "rails", github: "rails/rails", branch: "main"
  #gem "rails", path: "/Users/manishsharma/Development/personal/opensource/experiment/51991/rails"
  gem 'pg', '~> 1.1'
  gem "sqlite3", "~> 1.4"
  gem "mysql2"
end

require "active_record"
require "minitest/autorun"
require "logger"

# Run this in psql to setup the test database
# ```
# CREATE USER rails_repro_test WITH PASSWORD 'rails_repro_test';
# ALTER USER rails_repro_test WITH SUPERUSER;
# CREATE DATABASE rails_repro_test;
# GRANT ALL PRIVILEGES ON DATABASE rails_repro_test to rails_repro_test;
# ````

# ActiveRecord::Base.establish_connection(
#   adapter: 'mysql2',
#   host: 'localhost',
#   username: 'root',
#   password: '',
#   database: 'rails_repro_test'
# )

ActiveRecord::Base.establish_connection(adapter: "postgresql", database: "rails_repro_test")

#ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :file_parts, force: true do |t|
    t.string :part_name, null: false
    t.datetime :part_start_at, null: false
  end
end

class FilePart < ActiveRecord::Base
end

class BugTest < Minitest::Test
  def test_pluck_with_multiple_min
    FilePart.delete_all

    FilePart.create!(part_name: '2024/06/01/14_16.a1b2c3d4.txt', part_start_at: '2024-06-01 14:16:00')
    FilePart.create!(part_name: '2024/05/31/11_12.a2b3c4d5.txt', part_start_at: '2024-05-31 11:12:00')


    # this case doesn't works
    assert_equal 3, FilePart.select('min(part_name), min(part_start_at)').as_json.first.size

    # this is the output of the select query, we have two fields with minimum value but we are getting only one
    # => [{"min"=>"2024-05-31T11:12:00.000Z", "id"=>nil}]

    # this case doesn't work
    assert_equal 3, FilePart.select('min(part_name)', 'min(part_start_at)').as_json.first.size
    # => [{"min"=>"2024-05-31T11:12:00.000Z", "id"=>nil}]
  end
end

Expected behavior

    assert_equal 3, FilePart.select('min(part_name), min(part_start_at)').as_json.first.size

Actual behavior

    assert_equal 2, FilePart.select('min(part_name), min(part_start_at)').as_json.first.size

Research and Findings

internal_exec_query is responsible for:

  1. Fetching the result using the pg gem.
  2. Building an ActiveRecord::Result object for the fetched result.

def internal_exec_query(sql, name = "SQL", binds = [], prepare: false, async: false, allow_retry: false, materialize_transactions: true) # :nodoc:
execute_and_clear(sql, name, binds, prepare: prepare, async: async, allow_retry: allow_retry, materialize_transactions: materialize_transactions) do |result|
types = {}
fields = result.fields
fields.each_with_index do |fname, i|
ftype = result.ftype i
fmod = result.fmod i
types[fname] = types[i] = get_oid_type(ftype, fmod, fname)
end
build_result(columns: fields, rows: result.values, column_types: types.freeze)
end
end

Fetching the result from the pg gem.

For example:

FilePart.select('min(part_name)', 'min(part_start_at)')

In this case, the pg gem returns the values for fields as:

pp fields = result.fields
["min", "min"]

Build an ActiveRecord::Result object for the fetched result.

When constructing the ActiveRecord::Result object, we utilize the fields as the column names:

build_result(columns: fields, rows: result.values, column_types: types.freeze)

# This is an internal hook to make possible connection adapters to build
# custom result objects with connection-specific data.
def build_result(columns:, rows:, column_types: nil)
ActiveRecord::Result.new(columns, rows, column_types)
end

Initialize a model object from ActiveRecord::Result:

def _load_from_sql(result_set, &block) # :nodoc:

When creating an object, if two columns have the same key name, the object min gets overridden, resulting in an incorrect outcome.

Why does the pg gem return the same column name for two different fields?

pg gem internally uses PQfname method from the libpq.
res: https://github.com/ged/ruby-pg/blob/d072b21852865ecb84e6345df11d68eed50702bb/ext/pg_result.c#L502

#include <stdio.h>
#include <stdlib.h>
#include <libpq-fe.h> // Include the PostgreSQL C library

int main() {
    // Establish a connection to the database
    PGconn *conn = PQconnectdb("host=postgres dbname=rails_repro_test user=postgres password=postgres");

    // Check if the connection was successful
    if (PQstatus(conn) != CONNECTION_OK) {
        fprintf(stderr, "Connection to database failed: %s", PQerrorMessage(conn));
        PQfinish(conn);
        return 1;
    }

    // Execute a query
    PGresult *res = PQexec(conn, "SELECT min(username), min(email) from users;");

    // Check if the query was successful
    if (PQresultStatus(res) != PGRES_TUPLES_OK) {
        fprintf(stderr, "Query failed: %s", PQerrorMessage(conn));
        PQclear(res);
        PQfinish(conn);
        return 1;
    }

    // Retrieve column names
    int num_fields = PQnfields(res);
    for (int i = 0; i < num_fields; i++) {
        printf("Column %d: %s\n", i, PQfname(res, i));
    }

    // Clear result and close connection
    PQclear(res);
    PQfinish(conn);

    return 0;
}
# ./test
Column 0: min
Column 1: min

OR

postgres=# SELECT min(username), min(email) from users;
 min | min
-----+-----
     |
(1 row)


### System configuration
**Rails version**: "8.0.0.alpha", "6-1-stable", "7-1-stable"

**Ruby version**: `ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin23]`
@maniSHarma7575 maniSHarma7575 changed the title [Active Record] Select method giving wrong result The select method in [ActiveRecord] is producing incorrect results. Jun 6, 2024
@maniSHarma7575 maniSHarma7575 changed the title The select method in [ActiveRecord] is producing incorrect results. The select method in [ActiveRecord] is producing incorrect results with PostgreSQL. Jun 6, 2024
@justinko
Copy link
Contributor

justinko commented Jun 7, 2024

1.) Rails has no knowledge of the columns being used in your select examples.
2.) This is how PQfname behaves (returns min) when you don't use aliases.

I would close this issue as it has nothing to do with Rails.

@maniSHarma7575
Copy link
Contributor Author

1.) Rails has no knowledge of the columns being used in your select examples. 2.) This is how PQfname behaves (returns min) when you don't use aliases.

I would close this issue as it has nothing to do with Rails.

@justinko, the output from the select method is incorrect. It is missing some fields that should be present in the output. PostgreSQL is returning two columns with the field names min and min.

However, in Rails, we are ignoring one of these fields. We should ensure that the number of columns fetched in the select query matches the number of columns we handle.

FilePart.select('min(part_name), min(part_start_at)').as_json
D, [2024-06-07T13:01:19.214175 #80053] DEBUG -- : FilePart Load (0.9ms) SELECT min(part_name), min(part_start_at) FROM "file_parts"
[{"min"=>"2024-05-31T11:12:00.000Z", "id"=>nil}]

@justinko
Copy link
Contributor

justinko commented Jun 7, 2024

What would you like to see?

[{"min_part_name"=>"2024-05-31T11:12:00.000Z", "min_part_start_at"=>"2024-05-31T11:12:00.000Z", "id"=>nil}]

That?

@maniSHarma7575
Copy link
Contributor Author

maniSHarma7575 commented Jun 7, 2024

What would you like to see?

[{"min_part_name"=>"2024-05-31T11:12:00.000Z", "min_part_start_at"=>"2024-05-31T11:12:00.000Z", "id"=>nil}]

That?

@justinko IMO It should be consistent with mysql and sqlite.

MySQL

(ruby) FilePart.select('min(part_name), min(part_start_at)').as_json.first
D, [2024-06-07T17:08:28.850168 #86936] DEBUG -- :   FilePart Load (1.7ms)  SELECT min(part_name), min(part_start_at) FROM `file_parts`
{"min(part_name)"=>"2024/05/31/11_12.a2b3c4d5.txt", "min(part_start_at)"=>"2024-05-31T11:12:00.000Z", "id"=>nil}

Sqlite

(ruby) FilePart.select('min(part_name), min(part_start_at)').as_json.first
D, [2024-06-07T17:12:16.720248 #88035] DEBUG -- :   FilePart Load (0.1ms)  SELECT min(part_name), min(part_start_at) FROM "file_parts"
{"min(part_name)"=>"2024/05/31/11_12.a2b3c4d5.txt", "min(part_start_at)"=>"2024-05-31 11:12:00", "id"=>nil}

PostgreSQL

(ruby) FilePart.select('min(part_name), min(part_start_at)').as_json.first
D, [2024-06-07T17:13:38.814009 #88447] DEBUG -- :   FilePart Load (0.9ms)  SELECT min(part_name), min(part_start_at) FROM "file_parts"
{"min"=>"2024-05-31T11:12:00.000Z", "id"=>nil}

@justinko
Copy link
Contributor

justinko commented Jun 7, 2024

IMO It should be consistent with mysql and sqlite.

Yes, it would be great if postgres also returned min(part_name) ... but it doesn't, and Rails shouldn't be responsible for that.

@matthewd
Copy link
Member

matthewd commented Jun 8, 2024

This is functioning as intended.

mysql> select min(a) from (select 5 as a) x;
+--------+
| min(a) |
+--------+
|      5 |
+--------+
sqlite> select min(a) from (select 5 as a) x;
+--------+
| min(a) |
+--------+
| 5      |
+--------+
[local]/postgres=# select min(a) from (select 5 as a) x;
 min
-----
   5

We use the column names as returned by the database.

@matthewd matthewd closed this as completed Jun 8, 2024
@maniSHarma7575
Copy link
Contributor Author

maniSHarma7575 commented Jun 17, 2024

This is functioning as intended.

mysql> select min(a) from (select 5 as a) x;
+--------+
| min(a) |
+--------+
|      5 |
+--------+
sqlite> select min(a) from (select 5 as a) x;
+--------+
| min(a) |
+--------+
| 5      |
+--------+
[local]/postgres=# select min(a) from (select 5 as a) x;
 min
-----
   5

We use the column names as returned by the database.

@matthewd, we're encountering an issue when parsing a column returned by PostgreSQL in Rails.

In the example below, one of the fields is missing. PostgreSQL returns two fields, but when Rails parses it, only one value is taken for fields with the same key, and the other is ignored.

This results in a mismatch between the output from PostgreSQL and what Rails select results.

SELECT min(part_name), min(part_start_at) FROM "file_parts";
 min             |          min
-----------------|------------------------
 2024/05/31/11_12.a2b3c4d5.txt    | 2024-05-31T11:12:00.000Z
FilePart.select('min(part_name), min(part_start_at)').as_json.first
D, [2024-06-07T17:13:38.814009 #88447] DEBUG -- :   FilePart Load (0.9ms)  SELECT min(part_name), min(part_start_at) FROM "file_parts"
{"min"=>"2024-05-31T11:12:00.000Z", "id"=>nil}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants