Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EQL: Incorporation of Stage 0 transformation into fluent API implementation #2173

Open
homedirectory opened this issue Feb 5, 2024 · 3 comments
Assignees

Comments

@homedirectory
Copy link
Contributor

homedirectory commented Feb 5, 2024

Description

Compilation of EQL expressions involves several phases of analysis and transformation. The focus of
this issue is on the first 2 phases: tokenisation and Stage 0 transformation.

  1. The result of method calls on EQL's fluent API is represented as a sequence of tokens
    corresponding to the called methods, which may also be seen as a sequence of transitions between
    states of an automaton manifested in the form of the fluent API. This phase corresponds to
    the operation of a lexer -- lexical analysis.
  2. The tokens are then subject to an analysis (namely, Stage 0 transformation) that groups them
    according to the rules of EQL's grammar, thereby building an AST for the EQL expression. This
    phase corresponds to syntactic analysis, to some extent, since it makes use of semantic analysis as well.

The following drawbacks have been identified in the course of work with this approach:

  1. The tokenisation phase is not strictly necessary, since method calls on the fluent API (state
    transitions) perform the role of a lexer implicitly: the called method directly corresponds to
    the token. In other words, we get tokens for free.
  2. As a consequence of the first issue, existence of a separate phase dedicated to syntactic
    analysis (Phase 2) leads to duplication of the EQL's grammar rules which are already encoded in
    the automaton that is the fluent API. This duplication occcurs since Phase 2 needs to know how
    to build an AST out of incoming tokens.
  3. Separate existence of two phases, coupled with duplication of grammar rules, requires strict
    synchronisation between them which has its own disadvantages. One example is the current defect
    involving caseWhen with multiple conditions: it is accepted by the fluent API but rejected
    during Stage 0 transformation (Phase 2).

It is proposed to incorporate Phase 2 into Phase 1, effectively moving syntactic analysis into the
implementation of the fluent API. The result of method calls on the fluent API would then be an AST
for the EQL expression. Such an implementation would effectively act as an LR parser (with superpowers, due to the ability of semantic analysis).

Expected outcome

  • Simplified implementation of Stage 0 transformation due to elimination of grammar rules duplication and the need for synchronisation.
  • Elimination of potential errors that could arise because of lack of synchronisation between phases.
@homedirectory
Copy link
Contributor Author

After further discussion and an experimental spike it was concluded that the duplication of grammar rules is inherent to any parsing process. At the moment the only general approach seems to be the usage of a generator for the fluent API and the parser so that grammar rules have to be defined only once.

It was decided that it is better to focus on the construction of a formal grammar for EQL and improvements to the existing parser, until the possibility of fluent API and parser generation is better understood.

@01es
Copy link
Member

01es commented Feb 7, 2024

@homedirectory, can the problems in the existing parser implementation that require improvements be outline?

@homedirectory
Copy link
Contributor Author

@01es, yes, and they should be captured in a separate issue. Among them:

  1. The defect with caseWhen used in conjunction with a combination of conditions.
  2. Insufficient separation of representations of token categories (the abstract syntax tree) and encountered values (the concrete syntax tree), which complicates informal reasoning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants