Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDF do not allow passing constant values as params but only columns of a row #12607

Open
piavka opened this issue Jan 22, 2023 · 12 comments
Open
Assignees
Milestone

Comments

@piavka
Copy link

piavka commented Jan 22, 2023

Installation details
Docker scylladb/scylla latest version

#docker exec -it some-scylla scylla --version
5.1.3-0.20230112.addc4666d502

It does not seem to be possible to pass constant values to UDF for example given

CREATE FUNCTION num_between(lower int, upper int, ts int)
       RETURNS NULL ON NULL INPUT
       RETURNS int
       LANGUAGE lua as 'if(ts >= lower and ts <=upper) then return 1 else return 0 end';
the
SELECT num_between(1 , 2, some_int_column) FROM some table;

give error like line 1:18 no viable alternative at input '('
instead it only works with column params
i wonder how can i pass constant params into UDFs where these constants change between different invocations of the same query

@nyh nyh added enhancement user request area/udf User-Defined Functions and Aggregates area/cql labels Jan 22, 2023
@avikivity
Copy link
Member

avikivity commented Jan 23, 2023

functionArgs returns [std::vector<expression> a]
    : '(' ')'
    | '(' t1=term { a.push_back(std::move(t1)); }
          ( ',' tn=term { a.push_back(std::move(tn)); } )*
       ')'
    ;

term returns [expression term1]
    : v=value                          { $term1 = std::move(v); }
    | f=functionName args=functionArgs { $term1 = function_call{std::move(f), std::move(args)}; }
    | '(' c=comparatorType ')' t=term  { $term1 = cast{std::move(t), c}; }
    ;

It seems that adding constant to the term production would be sufficient:

std::optional<expression>
prepare_function_call(const expr::function_call& fc, data_dictionary::database db, const sstring& keyspace, const schema* schema_opt, lw_shared_ptr<column_specification> receiver) {
    if (!receiver) {
        // TODO: It is possible to infer the type of a function call if there is only one overload, or if all overloads return the same type
        return std::nullopt;
    }
    auto&& fun = std::visit(overloaded_functor{
        [] (const shared_ptr<functions::function>& func) {
            return func;
        },
        [&] (const functions::function_name& name) {
            auto args = boost::copy_range<std::vector<::shared_ptr<assignment_testable>>>(fc.args | boost::adaptors::transformed(expr::as_assignment_testable));
            auto fun = functions::functions::get(db, keyspace, name, args, receiver->ks_name, receiver->cf_name, receiver.get());
            if (!fun) {
                throw exceptions::invalid_request_exception(format("Unknown function {} called", name));
            }
            return fun;
        },
    }, fc.func);

However, test_assignment() doesn't work for untyped_constant.

@piavka
Copy link
Author

piavka commented Jan 23, 2023

BTW the UDF above is just simplified example , in my real UDF usage i pass timestamp strings like '2023-01-21 15:30:02' and not ints .. in case it matters
also passing toTimeStamp(now()) as timestamp works

@DoronArazii DoronArazii added this to the 5.3 milestone Jan 24, 2023
@DoronArazii
Copy link

@cvybhu it's in your queue?

@cvybhu
Copy link
Contributor

cvybhu commented May 10, 2023

@cvybhu it's in your queue?

Not really, it's somewhere in the backlog. Although I guess I could move it up in time if needed.

@DoronArazii DoronArazii modified the milestones: 5.3, 5.x May 10, 2023
@ptrsmrn ptrsmrn added the P2 High Priority label May 29, 2023
@cvybhu
Copy link
Contributor

cvybhu commented May 29, 2023

I took a short look to see if this could be fixed quickly, and I think there are two major issues here.

Here's how the grammar for calling a function inside SELECT looks like:

unaliasedSelector returns [expression s]
    @init { expression tmp; }
    :  ( c=cident                                  { tmp = unresolved_identifier{std::move(c)}; }
       | i=intValue { tmp = i; }
       | K_COUNT '(' countArgument ')'             { tmp = make_count_rows_function_expression(); }
       | K_WRITETIME '(' c=cident ')'              { tmp = column_mutation_attribute{column_mutation_attribute::attribute_kind::writetime,
                                                                                              unresolved_identifier{std::move(c)}}; }
       | K_TTL       '(' c=cident ')'              { tmp = column_mutation_attribute{column_mutation_attribute::attribute_kind::ttl,
                                                                                              unresolved_identifier{std::move(c)}}; }
       | f=functionName args=selectionFunctionArgs { tmp = function_call{std::move(f), std::move(args)}; }
       | K_CAST      '(' arg=unaliasedSelector K_AS t=native_type ')'  { tmp = cast{std::move(arg), std::move(t)}; }
       )
       ( '.' fi=cident { tmp = field_selection{std::move(tmp), std::move(fi)}; } )*
    { $s = tmp; }
    ;

selectionFunctionArgs returns [std::vector<expression> a]
    : '(' ')'
    | '(' s1=unaliasedSelector { a.push_back(std::move(s1)); }
          ( ',' sn=unaliasedSelector { a.push_back(std::move(sn)); } )*
      ')'
    ;
  1. There are special cases for some builtin functions (K_COUNT, K_WRITETIME, ...). As soon as we allow passing integers as function arguments an ambiguity occurs - should count(1) take the path that matches K_COUNT or generic function_call?

    This could be fixed by removing the special cases for these functions. Each call to a function should be a generic function_call. It's already the case for expressions inside the WHERE clause.

  2. There is a conflict between string literals and quoted identifiers

    A quoted string literal, like "col1", or "some string" can be either a string literal, or a case-sensitive column/table name, depending on the context. When the parser encounters such a string, it's unable to decide which one is it.

    Let's say there's a function call: add3strings("some_col", "some_other_col", "foobar"). Which ones of these arguments are string literals, and which ones are column names?

    Until now we avoided this issue by either allowing only columns as arguments (inside SELECT), or only constant literals (inside WHERE).
    This is a big problem that we will need to deal with somehow.

@avikivity
Copy link
Member

It's better to move to expressions and then fix it. I started the work, but don't have time to make meaningful progress.

@avikivity
Copy link
Member

wrt. quoted strings, CQL uses single quotes for string literals and double quotes for case-sensitive identifiers. If CQL uses double quotes for literals, we should deprecate it (unfortunately with a flag).

@cvybhu
Copy link
Contributor

cvybhu commented May 29, 2023

wrt. quoted strings, CQL uses single quotes for string literals and double quotes for case-sensitive identifiers. If CQL uses double quotes for literals, we should deprecate it (unfortunately with a flag).

Ah right, maybe that's not the issue. I tried to add term to unaliasedSelector, but ANTLR3 started throwing errors about the grammar not being LL(*). I think that means that there's two branches with the same begninning (?). Sadly the error message isn't very helpful :/
The idea with column names and string literals came to my head and seemed plausible. Especially because the grammar compiles with intLiteral instead of term, albeit with ambiguity warnings.

@cvybhu
Copy link
Contributor

cvybhu commented May 29, 2023

In that case it's probably a conflict between function_call and cident. When ANTLR3 sees foobar it doesn't know whether it's a call to foobar function or foobar column. This could be fixed by doing this left-recursion, as ANTLR says.

Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

@nyh
Copy link
Contributor

nyh commented May 30, 2023

Ah right, maybe that's not the issue. I tried to add term to unaliasedSelector, but ANTLR3 started throwing errors about the grammar not being LL(*).

Yes, ANTLR3 and its "LL(*)" is very sad, especially since good LR parsers have been around since the 1970s (e.g., Yacc. See https://en.wikipedia.org/wiki/LALR_parser). You can see an example in alternator/expressions.g, where I had to "work around" the need to support a common prefix and also operator precedence - something which is trivial to do in Yacc - in ANTLR3 by splitting one rule into a series of rules. But it least it is (or should be) doable.

I think that means that there's two branches with the same begninning (?).

Yes, that seems to be exactly what LL(*) doesn't support and you need to work around.

The idea with column names and string literals came to my head and seemed plausible. Especially because the grammar compiles with intLiteral instead of term, albeit with ambiguity warnings.

I think Cql.g already distinguishes between STRING_LITERAL (single quote) and QUOTED_NAME (double quote) and in most contexts, only one is allowed.

@avikivity
Copy link
Member

avikivity commented May 30, 2023

Ah right, maybe that's not the issue. I tried to add term to unaliasedSelector, but ANTLR3 started throwing errors about the grammar not being LL(*).

Yes, ANTLR3 and its "LL(*)" is very sad, especially since good LR parsers have been around since the 1970s (e.g., Yacc. See https://en.wikipedia.org/wiki/LALR_parser). You can see an example in alternator/expressions.g, where I had to "work around" the need to support a common prefix and also operator precedence - something which is trivial to do in Yacc - in ANTLR3 by splitting one rule into a series of rules. But it least it is (or should be) doable.

antlr4 is supposed to be better here (not sure exactly how)

I think that means that there's two branches with the same begninning (?).

Yes, that seems to be exactly what LL(*) doesn't support and you need to work around.

It's easy to work around.

 function_call :=
    ident '(' ... ')' { ... }
  | ident

@nyh
Copy link
Contributor

nyh commented Jun 20, 2023

In issue #14310 I re-discovered this issue. I noted that it's a general SELECT parser bug and not only about constant values passed to UDFs as the title of this issue suggests: Native functions are also affected and can't get constant parameters, and even more basically, selecting just a constant (I know it's silly, but it should be allowed) is a syntax error. Cassandra 4 allows these things, and Scylla doesn't.

I mentioned that I tried the patch:

+++ b/cql3/Cql.g
@@ -419,9 +419,16 @@ selector returns [shared_ptr<raw_selector> s]
     : us=unaliasedSelector (K_AS c=ident { alias = c; })? { $s = ::make_shared<
raw_selector>(us, alias); }
     ;
 
+selectionLiteral returns [expression value]
+    : c=constant           { $value = std::move(c); }
+    | K_NULL               { $value = make_untyped_null(); }
+    | e=marker             { $value = std::move(e); }
+    ;
+
 unaliasedSelector returns [expression s]
     @init { expression tmp; }
     :  ( c=cident                                  { tmp = unresolved_identifier{std::move(c)}; }
+       | z=selectionLiteral                        { tmp = std::move(z); }
        | K_COUNT '(' countArgument ')'             { tmp = make_count_rows_function_expression(); }
        | K_WRITETIME '(' c=cident ')'              { tmp = column_mutation_attribute{column_mutation_attribute::attribute_kind::writetime,
                                                                                               unresolved_identifier{std::move(c)}}; }

(I used the name "selectionLiteral" that Cassandra's parser uses)

This patch almost works - it does print an ANTLR warning which I don't understand and will have to be fixed, but the build does finish. The problem is that when actually used with a constant selector, e.g., SELECT tinyintAsBlob(4) FROM tbl WHERE p=1 we get a runtime error:

no way to express SELECT constant in the grammar yet

So I guess this needs support also in the evaluate() code.

A test case (passes on Cassandra, fails on Scylla):

import pytest
from util import new_test_table, unique_key_int

@pytest.fixture(scope="module")
def table1(cql, test_keyspace):
    with new_test_table(cql, test_keyspace, "p int, b blob, PRIMARY KEY (p)") as table:
        yield table

def test_constant_function_parameter(cql, table1):
    p = unique_key_int()
    cql.execute(f"INSERT INTO {table1} (p, b) VALUES ({p}, 0x03)")
    # This passes on Scylla today (constant function arguments in restriction)
    assert [(p,)] == list(cql.execute(f"SELECT p FROM {table1} WHERE p={p} AND b=tinyintAsBlob(3) ALLOW FILTERING"))
     # But this doesn't (constant function arguments in selector)
    assert [(b'\x04',)] == list(cql.execute(f"SELECT tinyintAsBlob(4) FROM {table1} WHERE p={p}"))

nyh added a commit to nyh/scylla that referenced this issue Jul 18, 2023
Code in functions.cc creates the different TYPEasblob() and blobasTYPE()
functions for all type names TYPE. The functions for the "counter" type
were skipped, supposedly because "counters are not supported yet". But
counters are supported, so let's add the missing functions.

The code fix is trivial, the tests that verify that the result behaves
like Cassandra took more work.

After this patch, unimplemented::cause::COUNTERS is no longer used
anywhere in the code. I wanted to remove it, but noticed that
unimplemented::cause is a graveyard of unused causes, so decided not
to remove this one either. We should clean it up in a separate patch.

Fixes scylladb#14742

Also includes tests for tangently-related issues:
Refs scylladb#12607
Refs scylladb#14319

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
nyh added a commit to nyh/scylla that referenced this issue Jul 30, 2023
Code in functions.cc creates the different TYPEasblob() and blobasTYPE()
functions for all type names TYPE. The functions for the "counter" type
were skipped, supposedly because "counters are not supported yet". But
counters are supported, so let's add the missing functions.

The code fix is trivial, the tests that verify that the result behaves
like Cassandra took more work.

After this patch, unimplemented::cause::COUNTERS is no longer used
anywhere in the code. I wanted to remove it, but noticed that
unimplemented::cause is a graveyard of unused causes, so decided not
to remove this one either. We should clean it up in a separate patch.

Fixes scylladb#14742

Also includes tests for tangently-related issues:
Refs scylladb#12607
Refs scylladb#14319

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@cvybhu cvybhu removed their assignment Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants