Handling Nulls in Database Join Conditions #5241
Labels
-libs
Libraries: New libraries to be implemented
l-join
p-high
Should be completed in the next sprint
x-new-feature
Type: new feature request
Milestone
This task is automatically imported from the old Task Issue Board and it was originally created by Radosław Waśko.
Original issue is here.
We planned to implement operations in the Table library so that
Nothing == Nothing
evaluates to True (unlike SQL preferred approach which would yield NULL/Undefined for NULL-NULL comparisons).We also want the Database.Table to be consistent with that. We can achieve that by replacing the
a = b
operator with one of:a IS NOT DISTINCT FROM b
(works in Postgres) ora IS b
(only works in SQLite),a <=> b
(only works in MySQL)COALESCE(a = b, a IS NULL AND b IS NULL)
- should work everwhere.This however has some performance implications - all approaches I tried in Postgres were making it unable to exploit indexes set on the column on which the join was performed - which can be very bad for performance.
It seems that using the preferred operator (
<=>
for MySQL,IS
for SQLite) did allow the database to use the existing index. But for Postgres I could find no way to make it use the index and handle nulls the way we want.We need to discuss how we want to approach this null-handling - as there is a potential performance price for NULL=NULL, at least in some databases (Postgres). We also need to make sure that In-Memory table operations (both as
Join_Condition.Equals
and as Column Operation==
) are consistent with what we do for the database (or if we are not consistent, we need to ensure it is very clearly described to the user so that there are no surprises).Comments:
References: the db fiddle I used for most of the testing
https://www.db-fiddle.com/f/sehxaqWyQroFmghUE4A4Pm/1
(Radosław Waśko - Dec 27, 2022)
The text was updated successfully, but these errors were encountered: