Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IN queries on array columns #51137

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gregnavis
Copy link
Contributor

@gregnavis gregnavis commented Feb 20, 2024

Motivation / Background

A non-array column can be queried via where(name: [value1, value2]) which is automatically converted to name IN (value1, value2) in SQL. An array column would treat that array as the value to search for, so it'd be natural to expect an array of arrays would result in an IN query. Unfortunately, that wasn't the case as where(name: [[value1], [value2]]) would return name = {{value1}, {value2}} in SQL, instead of name IN ({value1}, {value2}).

Detail

That behavior stemmed from the fact that Active Record would force equality comparison for array columns if the value was an array. In order to allow IN queries this check was change to check whether it's a one-dimensional array by checking that the first item of is not an array.

After these changes the following logic applies for querying array columns:

  1. Using a one-dimensional array results in an ordinary equality query.
  2. Using a two-dimensional array produces an IN query.

Checklist

Before submitting the PR make sure the following are checked:

  • This Pull Request is related to one change. Changes that are unrelated should be opened in separate PRs.
  • Commit message has a detailed description of what changed and why. If this PR fixes a related issue include it in the commit message. Ex: [Fix #issue-number]
  • Tests are added or updated if you fix a bug or add a feature.
  • CHANGELOG files are updated for the changed libraries if there is a behavior change or additional feature. Minor bug fixes and documentation changes should not be included.

This commit fixes a bug (or at least an inconsistency) that made it
possible to submit IN queries on array columns.

A non-array column can be queried via `where(name: [value1, value2])`
which is automatically converted to `name IN (value1, value2)` in SQL.
An array column would treat that array as the value to search for, so
it'd be natural to expect an array of arrays would result in an `IN`
query.
Unfortunately, that wasn't the case as `where(name: [[value1], [value2]])`
would return `name = {{value1}, {value2}}` in SQL, instead of `name IN ({value1}, {value2})`.

That behavior stemmed from the fact that Active Record would force
equality comparison for array columns if the value was an array.
In order to allow `IN` queries this check was change to check whether
it's a **one-dimensional** array by checking that the first item of is not an array.

After these changes the following logic applies for querying array
columns:

1. Using a one-dimensional array results in an ordinary equality query.
2. Using a two-dimensional array produces an `IN` query.
@matthewd
Copy link
Member

check whether it's a one-dimensional array

Do we not properly store multi-dimensional array values? 😕

@gregnavis
Copy link
Contributor Author

@matthewd, wow, seems I was confused about Postgres arrays. I was assuming that putting type[] (which is what array: true does) on a column makes it a one-dimensional array, but seems Postgres is happy to accept higher-dimensional arrays, even though type[][] is also a thing. 😕

Relevant excerpt from PostgreSQL docs:

The current implementation does not enforce the declared number of dimensions either. Arrays of a particular element type are all considered to be of the same type, regardless of size or number of dimensions. So, declaring the array size or number of dimensions in CREATE TABLE is simply documentation; it does not affect run-time behavior.

Given that an array column can contain a value of an arbitrary dimension it seems it's impossible to differentiate between arrays-as-single-values vs arrays-as-multiple-values. I'm afraid we can close this PR 😞

@gregnavis
Copy link
Contributor Author

... unless, we're open to adding some special syntax for those cases. Not sure if it'd be useful for other column types, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants