Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: enhance SHOW FUNCTIONS command #99736

Merged

Conversation

luigidellaquila
Copy link
Contributor

@luigidellaquila luigidellaquila commented Sep 21, 2023

Fixes #99507

Enhance SHOW FUNCTIONS command to return as structured information as possible about the function signature, ie.

  • function name
  • return type
  • param names
  • param types
  • param descriptions

For now, as an example, the annotations are used only on sin() and date_parse() functions; if we agree on this approach, I'll proceed to

  • enhance all the currently implemented functions with the needed information
  • improve the function tests to verify that every new implemented function provides meaningful information

This feature can be useful for the end user, but the main goal is to give Kibana an easy way to produce in-line documentation (contextual messages, autocomplete) for functions

Similar to current implementation, that has a @Named("paramName") annotation for function parameters, this PR introduces two more annotations @Param(name, type, description, optional) and @FunctionInfo() to provide information about single parameters and functions.

The result of SHOW FUNCTIONS query will have the following columns:

  • name (keyword): the function name
  • synopsis (keyword): the full signature of the funciton, eg. double sin(n:integer|long|double)
  • argNames (keyword MV): the function argument names
  • argTypes (keyword MV): the function argument types
  • argDescriptions (keyword MD): a textual description of each function argument
  • returnType (keyword): the return type of the function
  • description (keyword): a textual description of the function

Open questions:

  • how structured shoud types be? Eg. should we have a strict @Typed("keyword")/@Typed({"keyword", "text"}) or should we have a more generic type description, eg. @Typed("numeric"), @Typed("any")? The first one is more useful for API consumption but it's hard with our complex type system (type classes, custom types, unsupported and so on); the second one is less structured, but probably more useful for documentation, that is the most immediate use case of this feature. All the types are listed explicitly

  • we have alternatives for the synopsis, eg.

    • ~~functionName(<paramName>:<paramType>, ...): <returnType>~
    • <returnType> functionName(<paramName>:<paramType>, ...)
    • <returnType> functionName(<paramType> <paramName>, ...)
      Using <returnType> functionName(<paramName>:<paramType>, ...) for now. If multiple types are supported, then they will be separated by pipes, eg. double sin(n:integer|long|double).

@github-actions
Copy link

Documentation preview:

@elasticsearchmachine elasticsearchmachine added Team:QL (Deprecated) Meta label for query languages team v8.11.0 labels Sep 21, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)

@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.

@dej611
Copy link
Contributor

dej611 commented Sep 21, 2023

Cool PR @luigidellaquila ! That would be great to use it in Kibana.

Few comments:

  • I would prefer the functionName(<paramName>[?]:<paramType>, ...): <returnType> annotation type for the synopsis (just personal reference)
  • For our use case the argDescriptions and description fields would probably be discarded as we need to translate them anyway. So not sure if it's worth keeping them rather than publish them on the doc (which would be great)
  • I think what is missing here is the variadic function definition, which some functions have (i.e. concat): I think a flag like variadic here would work for us.
  • as for optional args, rather than the (Optional) text in the description I would add a ? either at the end of the arg name (dateString?) or at the beginning of the type (?keyword) to indicate such thing. In terms of synopsis using a notation as dateString ?: keyword would work

Taking as example the date_parse definition, I would propose something like:

name:keyword synopsis:keyword argNames:keyword argTypes:keyword variadic: boolean returnType:keyword
date_parse date_parse(datePattern:keyword, dateString?:keyword): date [datePattern, dateString?] [keyword, keyword] false date

@nik9000
Copy link
Member

nik9000 commented Sep 21, 2023

  • So not sure if it's worth keeping them rather than publish them on the doc (which would be great)

The docs actually use the show functions infrastructure to generate bits of themselves. Something like that should be possible.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very cool! If others are happy with this approach as well, this LGTM.

@@ -40,7 +43,13 @@ public class DateParse extends ScalarFunction implements OptionalArgument, Evalu
private final Expression field;
private final Expression format;

public DateParse(Source source, Expression first, Expression second) {
@Described("Parses a string into a date value")
@Typed("date")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought (maybe for the future): I think it's really good that type information is showing up in the function class; currently, we generate docs by inferring supported types from the tested types - which is great in terms of automation, but comes with a bit of indirection and is not so explicit.

Maybe, we could evolve this by having a proper method on each function that returns the supported input and output types; this would be more explicit and the automatically generated tests could pick that up as well.

With this PR, input/output types are kinda defined in two places now: one is here, via annotations, and the other is in the abstract function test case and is inferred from the tested types. There's also a chance of this drifting apart and being inconsistent.

List<String> args = new ArrayList<>(params.length);
Constructor<?> constructor = constructors[0];
Described functionDescAnn = constructor.getAnnotation(Described.class);
String funcitonDescription = functionDescAnn == null ? "" : functionDescAnn.value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String funcitonDescription = functionDescAnn == null ? "" : functionDescAnn.value();
String functionDescription = functionDescAnn == null ? "" : functionDescAnn.value();

@astefan
Copy link
Contributor

astefan commented Sep 21, 2023

how structured shoud types be? Eg. should we have a strict @typed("keyword")/@typed({"keyword", "text"}) or should we have a more generic type description, eg. @typed("numeric"), @typed("any")? The first one is more useful for API consumption but it's hard with our complex type system (type classes, custom types, unsupported and so on); the second one is less structured, but probably more useful for documentation, that is the most immediate use case of this feature.

I think it would be great if we can synchronize the parameter types with the error messages we generate whenever a wrong parameter type is used. For example, in ToVersion when resolving the types we generate an error of the form argument of [] must be [version, keyword or text], found value [..... Imo, whatever form we use in these errors the same must be used in show functions.

@luigidellaquila
Copy link
Contributor Author

I'm not pushing the railroad diagrams because the generation does not seem to work very well on my machine. In case, it's something we'll have to check before merging

@dej611
Copy link
Contributor

dej611 commented Sep 21, 2023

Reporting feedback as one who had to go thru all the ESQL documentation to workout arg, types and returning types of every function, I would rather prefer that synopsis snippet above in the documentation rather than the railroad diagram.

@luigidellaquila
Copy link
Contributor Author

Thanks everybody for the feedback

Maybe, we could evolve this by having a proper method on each function that returns the supported input and output types; this would be more explicit and the automatically generated tests could pick that up as well.

I thought a bit about this, the real problem is that we don't have an instance of the function (all this is working by reflection) so the best we can do is have a static method.

In general, having the tests decide the expected output of a function is not an optimal solution IMHO: we should be aware of what the function accepts/returns and the tests should verify it, so having a more strict way of defining types is preferrable.

For example, in ToVersion when resolving the types we generate an error of the form argument of [] must be [version, keyword or text], found value [..... Imo, whatever form we use in these errors the same must be used in show functions.

The number of possible options scares me a bit here, especially for the documentation, but I agree that it would be good to be consistent between docs and error messages

@luigidellaquila
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@nik9000
Copy link
Member

nik9000 commented Sep 21, 2023

Reporting feedback as one who had to go thru all the ESQL documentation to workout arg, types and returning types of every function, I would rather prefer that synopsis snippet above in the documentation rather than the railroad diagram.

We can absolutely create these and stuff them somewhere so kibana could pick them up. I'd have no problem just making markdown files and landing them somewhere. The railroad diagram was a request from someone else and it's ok with me. There's also some type information at the bottom of some of the functions now. But neither the type information nor the railroad diagrams is complete at the moment.

Comment on lines 49 to 53
public DateParse(
Source source,
@Named("datePattern") @Typed("keyword") @Described("A valid date pattern") @Optional Expression first,
@Named("dateString") @Typed("keyword") @Described("A string representing a date") Expression second
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having multiple annotations (which can have different order, might be missing, etc..) use one with multiple fields:
@Param(name="datePattern", type="keyword", description="A valid date pattern", optional=true/false)

Separately it's worth checking if the field name is not already available in the debug info (method.getParameterNames()) - this would avoid the name attribute and use the field name instead.
This will force us to actually give them better names such as datePattern instead of first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Debug info could not be available in all the cases, but I'll give it a try

Comment on lines 27 to 28
@Described("Returns the trigonometric sine of an angle")
@Typed("double")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type information is already available in getDataType() - better to get it from there to avoid getting out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataType() method is not an option here unfortunately, it's an instance method and here we don't have function instances, but only classes and reflection.
Even worse, some functions return different types depending on the input, so calling that method makes sense only in the context of a query.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good Luigi! Left some comments.
It would be something nice to have for 8.11 however please time box it - I expect in time a lot of minor tweaks here and there but for now, time is of the essence.

import static org.elasticsearch.xpack.ql.type.DataTypes.KEYWORD;

public class ShowFunctions extends LeafPlan {

private final List<Attribute> attributes;

public record ArgSignature(String name, String type, String description, boolean optional) {}

public record FunctionSignature(String name, List<ArgSignature> args, String returnType, String description, boolean variadic) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to 'promote' this to FunctionDescription and move that as part of the EsqlFunctionRegistry so that it gets created each time a function definition gets created.
Move the reflection/annotation logic there along with some fallbacks in case the annotations are missing (which we should catch through some basic tests that check the functions registered and that they all have the proper description/annotation).

ShowFunction itself should just call the functionregistry and iterate over the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this logic belongs to EsqlFunctionRegistry, I'll move it there for now.
Moving it into FunctionDescription and generating it only once is a bit more complicated, because that class is part of QL. It's probably a good candidate for a follow-up PR, to keep this one small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however please time box it

Definitely!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant FunctionDefinition, FunctionDescription would be a new class specific to ESQL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, FunctionDefinition

@astefan
Copy link
Contributor

astefan commented Sep 22, 2023

For example, in ToVersion when resolving the types we generate an error of the form argument of [] must be [version, keyword or text], found value [..... Imo, whatever form we use in these errors the same must be used in show functions.

The number of possible options scares me a bit here, especially for the documentation, but I agree that it would be good to be consistent between docs and error messages

I'm not following, what "possible options"?
Please, note that the error message for some of these functions is automatically built. ToVersion for example, has an EVALUATORS constant that defines a list of data types and their correspondent evaluator/converter. The error message for improper data type parameters comes from the key of this EVALUATORS Map.

@luigidellaquila
Copy link
Contributor Author

I'm not following, what "possible options"?

I mean the number of supported data types for a parameter. A funciton like to_string() will accept all the available data types, should we return a list of tens of types? Or should we just return "any" or something similar? Maybe the first is better for API development, but the second one will be easier to understand for docs.

@luigidellaquila
Copy link
Contributor Author

@astefan I went with your suggestion to have the full list of supported types, and IMHO the result is pretty good.

I also enhanced the tests to check that types defined in the annotations are the same as those actually tested (and that generate the type tables for documentation), so that we are sure that the three (annotations, actual unit test coverage and docs) are aligned.

For now the test is lenient (ie. if a function is not annotated, the checks are skipped) but as soon as all the functions are correctly annotated we can easily enable a more strict test logic.

continue; // TODO remove this eventually, so that all the functions will have to provide singature info
}
Set<String> signatureTypes = typesFromSignature.get(i);
assertEquals(annotationTypes, signatureTypes);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assertions in @AfterClass are just terrible, but it's the only place where I have all the signatures.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good I believe. Left some small comments.

Another comment, that I mentioned previously and where I didn't get any input for:

Please, note that the error message for some of these functions is automatically built. ToVersion for example, has an EVALUATORS constant that defines a list of data types and their correspondent evaluator/converter. The error message for improper data type parameters comes from the key of this EVALUATORS Map.

The fact that we manually add the parameter types in annotations for each function makes it a fragile mechanism. I haven't looked at all our functions to see if all of them use these evaluators I mentioned in my previous comment, but if they are (or we adapt them all to use these evaluators inside a Map) then the show functions command would use the same mechanism that's already used to list the allowed data types in our functions' error messages. And this is a much more robust mechanism, covered also by testing. I would, at least, look into this as a possibility to have something that's:

  • centralized
  • automatic
  • less error prone

name:keyword | synopsis:keyword | argNames:keyword | argTypes:keyword | argDescriptions:keyword | returnType:keyword | description:keyword | optionalArgs:boolean | variadic:boolean
is_finite |? is_finite(arg1:?) |arg1 |? | "" |? | "" | false | false
is_infinite |? is_infinite(arg1:?) |arg1 |? | "" |? | "" | false | false
is_nan |? is_nan(arg1:?) |arg1 |? | "" |? | "" | false | false
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference: have a query show functions | keep synopsis to make it easier in reviews to see the actual changes in synopsis.

Comment on lines 256 to 258
if (List.class.isAssignableFrom(params[i].getType())) {
variadic = true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (List.class.isAssignableFrom(params[i].getType())) {
variadic = true;
}
variadic |= List.class.isAssignableFrom(params[i].getType());

Constructor<?> constructor = constructors[0];
FunctionInfo functionInfo = constructor.getAnnotation(FunctionInfo.class);
String functionDescription = functionInfo == null ? "" : functionInfo.description();
String[] returnType = functionInfo == null ? new String[] { "?" } : functionInfo.returnType();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the meaning of ? as a return parameter type? If it's a function that's yet to have its proper definition added to it, I suggest an empty string for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's for functions without a proper definition.
It's a transient solution though, my intention to annotate all the functions correctly and then enable a more strict test strategy to verify that ? is never returned.

@luigidellaquila
Copy link
Contributor Author

parameter types in annotations for each function makes it a fragile mechanism. [...] see if all of them use these evaluators I mentioned in my previous comment, but if they are [...] then the show functions command would use the same mechanism that's already used to list the allowed data types

That would be really convenient, but unfortunately it's not the case. to_*() functions are the only functions that use this mechanism, most of (probably all) the other functions use custom logic to validate the inputs and to generate the evaluators, and this logic in some cases depends on the query context, so it's not something we can use in SHOW FUNCTIONS.
I tried to find a better way to infer the types, but I didn't find a solution that works for all the functions and that is robust enough with current function implementation.

@luigidellaquila
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/docs

@luigidellaquila
Copy link
Contributor Author

@elasticmachine update branch

@luigidellaquila
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/docs

@luigidellaquila
Copy link
Contributor Author

@costin @astefan any additional comments on this PR?
It would be cool to merge it today so that I have time to annotate all the functions.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one (late) question: the initial attempt at obtaining this functionality (in ESQL-835) was turned down for similar reasons noted already, but also internationalisation. If this is no longer a concern and Kibana could generate better autocomplete with this, it LGTM.

@luigidellaquila
Copy link
Contributor Author

About internationalization, Kibana won't be able to use our descriptions anyway (for that exact reason), so we could decide to drop them, unless we decide to use them for docs later.
Param names are a different story, they were there already and they were not internationalized (and they should not be IMHO)

@dej611
Copy link
Contributor

dej611 commented Oct 2, 2023

Param names are a different story, they were there already and they were not internationalized (and they should not be IMHO)

Agree here. The only thing which would make sense to localize are type string, but we can take of them on the kibana side as well.

As for the documentation, it would be nice to have a shared documentation place for both Elasticsearch and Kibana ESQL area, internationalisation included where both can pull it and always stay in sync.

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luigi - this is good improvement however I'm afraid there's too much information for this to be useful for human users; I've raised an issue for discussion at #100146
Let's get this in for 8.11 and we'll sort things out later.

@luigidellaquila luigidellaquila added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 2, 2023
@elasticsearchmachine elasticsearchmachine merged commit 6e79013 into elastic:main Oct 2, 2023
12 checks passed
@luigidellaquila luigidellaquila deleted the esql/function_signatures branch October 2, 2023 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >enhancement ES|QL-ui Impacts ES|QL UI Team:QL (Deprecated) Meta label for query languages team v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ES|QL] Improve show functions output
9 participants