Improved IN handling#250
Conversation
|
|
||
| defp expr({:in, _, [_, {:^, _, [_ix, 0]}]}, _sources, _params, _query), do: "0" | ||
|
|
||
| defp expr({:in, _, [left, {:^, _, [ix, len]}]}, sources, params, query) when len > 0 do |
There was a problem hiding this comment.
Notably this functionality only happens with ^ values, literals maintain their current functionality, which helps this remain compatible with complex uses of IN such as with subqueries.
| def dumpers(:uuid, Ecto.UUID), do: [&__MODULE__.hex_uuid/1] | ||
| def dumpers(:uuid, type), do: [type, &__MODULE__.hex_uuid/1] | ||
| def dumpers(:binary_id, type), do: [type, &__MODULE__.hex_uuid/1] | ||
| def dumpers({:in, sub}, {:in, sub}), do: [{:array, sub}] |
There was a problem hiding this comment.
This line here is what tells Ecto itself that in ^val should result in a single parameter for val
|
Thank you! I'd like to convince myself that it's indeed equivalent to the previous approach (lists with nulls, different types, etc.) before merging so I'm taking some time :) Btw, just in case, https://clickhouse.com/docs/sql-reference/operators/in suggests
But I guess this change is not about millions of values but rather maybe hundreds to thousands The CI is failing for unrelated issues, "fixed" in #252. |
|
Nice! We also started recently using IN queries with fairly large payloads for the new consolidated views feature. We currently limit it to 14k sites so we have at maximum 14k IDs in the We also tested creating dictionaries and using these for lookups so the list of IDs does not need to be part of the query payload. It was more efficient as expected but for now we didn't go for that solution to reduce maintenance burden. tagging @aerosol. I don't think we have problems with the limit at the moment but this is interesting in case we need to bump the limit in the future. |
|
Just on a quick glance, it looks like this is equivalent to the workaround we're applying to avoid hitting ClickHouse limits due to too many bindings: https://github.com/plausible/analytics/blob/fa09b73ff1acff2248a664d652f818c999c7dcca/lib/plausible/stats/sql/where_builder.ex#L54 |
85cbee2 to
81f5a7b
Compare
|
Hey folks! Sorry I didn't follow up. This PR should be up to date with master now. Understand that this isn't the right answer for millions, but there are definitely "medium" number of |
|
Thank you! I still wonder if lists with Sorry for using Codex instead of checking myself, but here's what it "says": test "nil after typed value changes semantics" do
TestRepo.insert!(%Post{title: "hello"})
TestRepo.insert!(%Post{title: ""})
query =
from(p in Post,
where: p.title in ^["hello", nil],
order_by: p.title,
select: p.title
)
# master:
# error: FunctionClauseError in param_type(nil)
# PR:
# sql: SELECT p0."title" FROM "posts" AS p0 WHERE (p0."title" IN {$0:Array(String)}) ORDER BY p0."title"
# params: [["hello", nil]]
# result: ["", "hello"]
#
# This is the bad case: nil is encoded inside Array(String) as the ClickHouse default string.
endCI probably failed due to something similar to this test, which is probably fine and is the expected new behaviour. test "nested empty array has same result but different SQL shape" do
query =
from(n in fragment("numbers(1)"),
select: fragment("array(?)", n.number) in ^[[]]
)
# master:
# sql: SELECT array(f0."number") IN ({$0:Array(Nothing)}) FROM numbers(1) AS f0
# params: [[]]
# result: [0]
# PR:
# sql: SELECT array(f0."number") IN {$0:Array(Array(Nothing))} FROM numbers(1) AS f0
# params: [[[]]]
# result: [0]
end |
INimprovements for Ecto.CHHey folks, thanks for the library! We ran into a problem using the
inoperator over at CargoSense and we're hoping you're open to a change in the SQL those queries produce to make more efficient use of Clickhouse params.Problem
Currently when you have an ecto query like:
This results in clickhouse SQL:
This becomes an issue when
idsor any other array parameter becomes very large, which is common when using Ecto CH to perform bulk operations in Clickhouse.Solution
Now in this PR the same Ecto query produces this Clickhouse SQL:
We have just one parameter no matter the size of the array.