Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] PPL query re-writer in Observability plugin #123

Open
anirudha opened this issue May 11, 2022 · 2 comments
Open

[FEATURE] PPL query re-writer in Observability plugin #123

anirudha opened this issue May 11, 2022 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@anirudha
Copy link
Collaborator

Is your feature request related to a problem?
We need the UI widgets to re-write and append/extend the PPL stats and other grammar to issue new queries to render visualization changes

What solution would you like?
We need a library that can do the re-write operations, preferable with an syntax tree

eg.
https://blog.dangl.me/archive/creating-antlr-applications-in-typescript/

@anirudha anirudha added the enhancement New feature or request label May 11, 2022
@mengweieric mengweieric self-assigned this May 12, 2022
@anirudha anirudha changed the title [FEATURE] PPL query re-writer in Dashboards plugin [FEATURE] PPL query re-writer in Observability plugin May 16, 2022
@anirudha
Copy link
Collaborator Author

anirudha commented May 16, 2022

what interface functions do we need for this library ? so that visualizations can work with PPL from the config. panel / or drag and drop/

eg.
add_field_to_x-axis( field)

@mengweieric
Copy link
Collaborator

mengweieric commented May 20, 2022

[WIP] PPL Query Parsing and Building with Antlr for Dashboard Observability

Overview

Observability dashboard currently leverages regular expressions to match, insert, append, replace and delete query segments to achieve query parsing and further rewriting for use cases like inserting time range from time picker, extracting index pattern from queries, and etc. It solves the problem at the moment however the downsides of this solution are also obvious. One of the downsides is it is costly to maintain and scale regular expressions while supporting many more complex use cases. Also, a regular expression is usually coupled with one or only couple of use cases, therefore with the nature of the complexity of PPL, large amounts of regular expressions have to be created and maintained in order to cover vast majority of use cases and corner cases. Therefore, It’s unrealistic for us to keep with regex solution for features which require query parsing.

On the other hand, Observability visualization provides users with only limited capabilities nowadays to visualize their data, where they have to manually write exact query every single time for rendering a visualization. That usually requires not only solid understanding of the language itself but also how different types of visualizations are visualized through what aggregation queries. Users without enough PPL/visualization knowledge and background often feel lost in Observability visualizations as there is a gap between visualize a visualization that the they want and writing the correct query for a specific type of visualizations.

Proposal

In order to address the problems stated above and provide effortless visualizing experience, the existing regular expression based solution is replaced with more robust Antlr based solution for query rewrites.

Query manager

Along with this change, a query manager is introduced to act as a wrapper sitting on top of Antlr solution for managing internal modules, and exposes various interfaces for query parsing/building use cases to consumers.

Why Antlr

ANTLR4 is a very popular parser generator in language parsing and recognition world, and is widely used by many individuals and organizations to build languages, toolings and frameworks. Compared with some other alternatives, ANTLR is fully featured and out-of-box with good integration with IDE. Also, search solutions for Observability is built based upon PPL, which also leverages Antlr as its basic building block. Therefore adopting Antlr4 for Dashboard Observability minimizes the effort to support various query related features as well as uniforms our approaches and methodologies for building search related user interfaces.

Requirements

  • It should support parsing PPL query into units
  • It should support building PPL query based on query units

Architecture

Overall, query manager exposes interfaces to outside world to support query parsing and building services. The core of the overall query manager is Antlr4ts engine which is composed of a lexer and a parser for processing original query, and further generating a CST.

Internally, query manager consists of two modules which are query parser and builder. Query parser is essentially a visitor that traverses the CST and transform it into an AST. Whereas query builder is the opposite way that takes a set of parsed units and recursively builds a new AST.
Screen Shot 2022-08-10 at 11 51 52 AM

Query parser

Query parser is one of the core modules that parses a query into query parts. It can be further divided into syntax recognizer, grammar parser and AST builder. AST builder generates an AST which contains a list of connected PPL nodes where each corresponds to a partial in the original query.
ppl parser(1)

Currently, the query parser only supports parsing stats command of a query, and any other parts of a query will be treated as they are. Therefore the result of the parsing is essentially a stats AST tree which consists of a number of nodes listed below.
Screen Shot 2022-08-10 at 6 53 09 PM

Screen Shot 2022-08-11 at 7 56 10 AM

Once it receives this tree structure, query parser invokes getTokens method of the root node, where it also recursively invokes each getTokens of its children to get a finial parsed object.

Interface PPLQueryParsedStats {
  aggregations: {
    function_alias: string;
    function: {
        name: string;
        value_expression: string;
        percentile_agg_function: string;
    };
  };
  groupby: {
    group_fields: Array<GroupField>;
    span: Span;
  };
  partitions: AggFlag;
  all_num: AggFlag;
  delim: AggFlag;
  dedup_split_value: AggFlag;
}

Interface GroupField {
   name: string;
}

Interface AggFlag {
  keyword: string;
  sign: string;
  value: string;
}

Query builder

As it’s stated above, query builder is the exact opposite way compared to query parser which takes a recipe object that contains PPL query parts, and recursively builds an aggregation AST subtree.

Interface PPLQueryRecipe {
  aggregations: {
    function_alias: string;
    function: {
        name: string;
        value_expression: string;
        percentile_agg_function: string;
    };
  };
  groupby: {
    group_fields: Array<GroupField>;
    span: Span;
  };
  partitions: AggFlag;
  all_num: AggFlag;
  delim: AggFlag;
  dedup_split_value: AggFlag;
}

Once it recursively builds an AST with the recipe, the query builder revokes toString() method in each node, and then recursively composes a new substring of stats and returns it. Query builder usually pairs with query parser, so when it starts to build a new stats subquery, it knows the start and end indices of the stats command partial in original query from parsed results. Therefore it knows where is the inserting/appending positions for the new stats subquery based upon the start/end indices.

Interfaces

Query Manager once initiated will be a singleton instance throughout the observability plugin lifecycle. It exposes two interfaces that once invoked would further initialize query parser/builder instances.

// return new PPLQueryBuilder instance
queryBuilder: () => PPLQueryBuilder;

// return new PPLQueryParser instance
 queryParser: () => PPLQueryParser;

// example usage
const qm = new QueryManager();
const qp = qm.queryParser();
const qb = qm.queryBuilder();

/** query parser **/
// parse query to get CST
parse: (pplQuery: string) => PPLQueryParser; 

// get AST
getStats: () => PPLStatsTokens;

// example usage
const tokens = qp.parse(query).getStats()

/** query builder **/
build: (query: string, pplStatsRecipe: PPLQueryRecipe) => string;

// example usage
const newQuery = qb.build(query, pplStatsRecipe);

Features with query manager integrated

Bi-directional query sync for visualizations

With the capability of composing an aggregation query through configuration UI, a user scenario comes up with it that is a user may modify the stats expression of a query which leads to an inconsistent states between UI and query. In order to not introduce confusions to users, observability visualization supports a two-way query sync on update/re-visualize action where two states

  1. configurations in data configuration UI
  2. aggregation query in query bar

are connected bi-directional. Either one of them changed will change the other (currently only works for aggregations and group by) to make sure they are consistent when update or change action is issued.

Filters

[to-do]

Autocomplete

[to-do]

Drag & Drop

[to-do]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants