bug: many Operations can't handle NULL optional arguments #8833

NickCrews · 2024-03-30T01:08:27Z

What happened?

consider the implementation of

@public
class Substring(Value):
    arg: Value[dt.String]
    start: Value[dt.Integer]
    length: Optional[Value[dt.Integer]] = None

This allows you to leave out the length argument, meaning "until the end of the string". But this is only represented on the python side. If the length is evaluated at runtime to be NULL, then this errors on some backends, like postgres, or gives the wrong result on duckdb.

For example, ibis.literal("abcde").substr(2, ibis.literal(1).nullif(1)) results in NULL in duckdb, but I would expect it to be "cde".

During compilation, we are naive and do the NULL checking only on the python side:

def visit_Substring(self, op, *, arg, start, length):
        start += 1
        arg_length = self.f.length(arg)

        if length is None:
            return self.if_(
                start >= 1,
                self.f.substring(arg, start),
                self.f.substring(arg, start + arg_length),
            )
        return self.if_(
            start >= 1,
            self.f.substring(arg, start, length),
            self.f.substring(arg, start + arg_length, length),
        )

I discovered this when adding more tests to #8832 .

I think what we should do is make Substring more like

@public
class Substring(Value):
    arg: Value[dt.String]
    start: Value[dt.Integer]
    length:Value[dt.Integer] = ibis.null()

and then use sql CASE statements if we can't determine the nullness statically.

but I wasn't able to get this to work. Does this seem like the right direction? Any tips on what to do here?

What version of ibis are you using?

main

What backend(s) are you using, if any?

No response

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

cpcloud · 2024-03-30T11:20:29Z

None indicates that the argument wasn't provided, NULL is something else. We should avoid conflating these two uses.

cpcloud · 2024-03-30T11:25:06Z

The failures notwithstanding, the duckdb behavior looks correct to me in this case: null inputs produce null outputs, NULL doesn't mean "no argument".

The representation of "argument not provided" is unrelated to whether an input is NULL.

NickCrews · 2024-03-30T22:33:39Z

hmmm, I think you are right. We are victims of the SQL standard here. I don't really like this footgun:

ibis.literal("abcde").substr(2, None) gives "cde", but
ibis.literal("abcde").substr(2, ibis.null()) gives NULL
The only way to get around this would be if we did a length = length.fillnull(arg.length()) during compilation, but then this would be unexpected for more experienced SQL users, and less performant. Maybe just better documentation? This same thing exists for many other ops (eg StringFind with the optional start and end), but probably these are less used.

cpcloud · 2024-04-01T13:17:29Z

I think there are other approaches to the problem. I'll state the problem so that we have it written somewhere.

Ibis uses None for two things: 1) the default value for optional arguments and 2) a value that users can use for SQL's NULL.

Users explicitly passing None expect it to behave like they explicitly passed ibis.null() or an equivalent such as ibis.NA.

Since None is being co-opted for use as the "no-argument" sentinel value, Ibis interprets it that way instead of as NULL, leading to a divergence in behavior between the user-facing meaning of None versus how we use it internally.

cpcloud · 2024-04-01T13:27:32Z

One approach that will break user code, but is probably the right approach IMO is to use some other sentinel value to mean "argument wasn't provided".

Bit of a hack, but fairly common is to create a dummy object like NO_ARGUMENT = object(), set that to the default value wherever None is used, and then use arg is NO_ARGUMENT to check for whether an argument was priovided.

NickCrews · 2024-04-01T17:28:26Z

I find the NO_ARGUMENT solution fairly good, I think it might be worth going through a deprecation cycle to get this right.

NickCrews · 2024-04-01T17:31:40Z

wait, would this then mean that for ibis.literal("abcde").substr(2, None), the None gets cast to ibis.null() and the result is therefore NULL? I think this actually might be an even worse footgun than what we have now.

NickCrews added the bug Incorrect behavior inside of ibis label Mar 30, 2024

gforsyth mentioned this issue Apr 25, 2024

[EPIC] Ibis expression API stability #8996

Open

10 tasks

ncclementi added the breaking change Changes that introduce an API break at any level label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: many Operations can't handle NULL optional arguments #8833

bug: many Operations can't handle NULL optional arguments #8833

NickCrews commented Mar 30, 2024

cpcloud commented Mar 30, 2024

cpcloud commented Mar 30, 2024

NickCrews commented Mar 30, 2024

cpcloud commented Apr 1, 2024 •

edited

cpcloud commented Apr 1, 2024

NickCrews commented Apr 1, 2024

NickCrews commented Apr 1, 2024

bug: many Operations can't handle NULL optional arguments #8833

bug: many Operations can't handle NULL optional arguments #8833

Comments

NickCrews commented Mar 30, 2024

What happened?

What version of ibis are you using?

What backend(s) are you using, if any?

Relevant log output

Code of Conduct

cpcloud commented Mar 30, 2024

cpcloud commented Mar 30, 2024

NickCrews commented Mar 30, 2024

cpcloud commented Apr 1, 2024 • edited

cpcloud commented Apr 1, 2024

NickCrews commented Apr 1, 2024

NickCrews commented Apr 1, 2024

cpcloud commented Apr 1, 2024 •

edited