Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Needs help] Replace in-house lexer by HoaCompiler #712

Open
wants to merge 3 commits into
base: master
from

Conversation

@theofidry
Copy link
Member

commented Apr 5, 2017

Note: this message has been edited to avoid the read of the long discussion of here and the original issue (#601).

Alice ships with an Expression Language, which allows to interpret values such as @user* or <current()>. This is currently done by an in-house Expression Language which uses a Lexer to tokenize the string input and then a Parser which go through those tokens to transform it in an understandable expression. For example:

input: '@user*'

tokens returned by the Lexer:

[
   new Token('@user*', new TokenType(TokenType::WILDCARD_REFERENCE_TYPE)),
]

value returned by the Parser:

FixtureMatchReferenceValue::createWildcardReference('user')

As explained in greater details here, the fixtures are first build into an understandable structure of objects and then they are evaluated to generate the object we wants.

The plan now is to replace the in-house lexer with HoaCompiler. The current one relies on regexes and different pass to try to tokenize the input and then the parser trying to create objects from it. HoaCompiler however works with a better structure: a "grammar" which is a set of rules on how to parse strings is created and return a list of nodes from it. That approach is way more robust and would allow to avoid a lot of edge cases where regexes are just not the tool for the job.

Implementation wise it's not all too complex. Right now most of the Expression Language is tagged as internal so we can afford BC breaks on that part of the library which gives enough freedom to do that work without the need of another major release.

The current PR provides a start of implementation. The major work to do is to write the grammar which @Hywan already started to match our needs. Once done we can then update the Parser to process those Node objects instead of the tokens from the in-house lexer (that part should be relatively easy).

Most of the rules that the Expression Language implements are documented here besides the tests.

@theofidry theofidry added this to the 3.0.0 milestone Apr 5, 2017

@Hywan

This comment has been minimized.

Copy link

commented Apr 11, 2017

Hello @theofidry,

Sorry for the late reply. I have a new job, not really free for open source right now. I will try to find time. I am not forgetting you at all :-). Please, feel free to ping me at anytime if I stay silent for too long.

Thanks!

@theofidry theofidry changed the title Replace in-house lexer by HoaCompiler [Needs help] Replace in-house lexer by HoaCompiler Apr 15, 2017

@theofidry

This comment has been minimized.

Copy link
Member Author

commented Apr 15, 2017

No worries @Hywan, I'm not that free either lately :P, congrats on your new job 😄

@theofidry theofidry modified the milestones: 3.x, 3.0.0 May 7, 2017

@Hywan

This comment has been minimized.

Copy link

commented Jun 13, 2017

Hello @theofidry :-),

I have a little bit more free time right now. And as promised, I will help on this PR.

So here is a little bit of vocabulary to speak the same language:

  • The language we are defining exist in 2 forms: Textual, and in-memory PHP object,
  • The goal of the “parser” is to transform the text into in-memory PHP object,
  • The textual form is called the input,
  • There are 2 analysers that form the “parser”:
    1. Lexical analyser, also called the lexer that splits a the input into a sequence of lexemes (aka tokens),
    2. Syntactic analyser, also called the parser that checks if the tokens are correctly ordered by deriving the sequence of tokens based on rules (defined in a grammar),
  • If both analysers succeed, then an Abstract Syntax Tree can be produced. Such a tree can be visited easily to check more constraints,
  • An AST can be transformed into a object model: An enhanced API on top of your language, like the DOM for HTML for instance. The object model is your in-memory PHP objects for instance.

This is a very classical front-end and middle-end compilation workflow. There is other approach of course, but let's stick on this one.


Based on that, what do you want?

So my plan is to just replace the Lexer with HoaCompiler

You want to replace the lexer of Symfony Expression Language by hoa/compiler's. This is possible, but there is a challenge: The token format used by hoa/compiler might not be understood by the parser of Symfony Expression Language.

the Parser which transforms tokens into VOs

What is VO?


I have started to implement your DSL grammar. However, before going further, I need clarifications here about your exact needs. If you already have an object model defined, then we can “map”/transform an AST to this object model, and everything else will roll.

@theofidry

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2017

Thanks for looking into it @Hywan :)

So basically right now we have a very simple ParserInterface:

interface ParserInterface
{
    /**
     * Parses a value, e.g. 'foo' or '$username' to determine if is a regular value (like 'foo') or is a value that
     * must be processed (like '$username'). If the value must be processed, it will be parsed to generate a value (a
     * ValueInterface instance) ready for processing.
     *
     * @param string $value
     *
     * @throws ExpressionLanguageParseThrowable
     *
     * @return ValueInterface|string|array
     */
    public function parse(string $value);
}

which is what I called "Expression Language" although it's not exactly one and it's not Symfony one either. VO stands for Value Object, which here are ValueInterface objects mainly.

So right now that parser is broken down in two parts:

  • The Lexer for which the current behaviour is defined here. Basically it's an in-house one that parses the string with regexes and generates a list of tokens from it
  • The Parser itself which decorates the lexer to tokenize the input and then process the tokens to generate the ValueInterface objects

The part I'm trying to get rid off by using HoaCompiler is the Lexer to replace this custom regex-based Lexer and tokens. One done the Parser needs to be adapted to consume the generated AST instead of tokens but I expect that to be rather trivial.

The main issue I would say is the grammar itself. Alice DSL is described here but it's something that has been defined over time and grew organically, so I wouldn't be surprised if there's still a few edge cases where things are ambiguous. (But that's also a reason to move to HoaCompiler: it would expose such cases once for all)

Basically what I did in this PR is the starting point: replacing the lexer itself, i.e. starting to define the grammar to transform the input string into an AST to be consumed later.

Let me know if there's one point that is still a bit unclear

@Hywan

This comment has been minimized.

Copy link

commented Jun 13, 2017

OK, let's give it a try :-).

@Hywan

This comment has been minimized.

Copy link

commented Jun 16, 2017

I have few questions so far (I will have more later 😉):

  • Can a reference be inside a parameter?
  • Can a reference be surrounded by “anything” just like parameters? I mean: Parameters are defined as: anything()? (parameter() anything()?)* mostly, like foo <{bar}> baz; is it the same for references, like foo @bar baz?
  • In the given example @user<numberBetween(1,20)>, could we have recursion here, something like @foo<bar(<baz>)>, I hope no because it's a little bit harder (because of the escaping system and the “anything” in parameters).
@theofidry

This comment has been minimized.

Copy link
Member Author

commented Jun 16, 2017

Can a reference be inside a parameter?

From the doc:

'<{X}>' where X is evaluated and can be anything.

and I can see a '<{param<{param2}>}>' so I would say yes. However this looks like a good way to shoot yourself in the foot, so I would say: if it's not too hard, then yes and hope people don't abuse of it. If not, I think restricting it to nested parameters is good enough (and I would prefer that tbh)

Can a reference be surrounded by “anything” just like parameters? I mean: Parameters are defined as: anything()? (parameter() anything()?)* mostly, like foo <{bar}> baz; is it the same for references, like foo @bar baz?

Yes. Also as you can guess, foo @bar baz will fail unless the object referenced by @bar is stringable. There is some annoying cases where it's ambiguous and I was thinking of using {} for delimiters, e.g. foo @{bar->foo}baz, but couldn't find an easy way to implement it with regexes.

In the given example @user<numberBetween(1,20)>, could we have recursion here, something like @foo<bar(<baz>)>, I hope no because it's a little bit harder (because of the escaping system and the “anything” in parameters).

That's a risk we have with moving to HoaCompiler I guess as before the expression would have likely to be evaluated wrong and been considered invalid (simply because of the implementation limitations not due to an invalid syntax). If we identify such cases, I think it depends of what we have as there's multiple actions that can be taken: handle it in the grammar for that not happening or ensuring the system bails out in a graceful manner so that the user can easily identify the issue and fix it.

Ask as many question as needed :P

@Hywan

This comment has been minimized.

Copy link

commented Jun 19, 2017

OK so basically, we can have any strings with the Alice language inside, is that correct? So … alice … where are anything and alice is an Alice construction, like a parameter, a reference etc. Is it correct? And all Alice constructions can be recursive?

@theofidry

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2017

Yeah, although I don't really see a recursion case I guess in theory that would be possible.

There's a few exceptions though:

  • Identity where the content should remain as a string and is lated evaluated with eval()
  • Optional
@Hywan

This comment has been minimized.

Copy link

commented Aug 7, 2017

Back from vacations with more free time. Work again on this issue :-).

theofidry added some commits Apr 5, 2017

WIP

@theofidry theofidry force-pushed the theofidry:compiler branch from a36d01c to 02ffbf8 Dec 17, 2017

@theofidry

This comment has been minimized.

Copy link
Member Author

commented Dec 17, 2017

@Hywan I've updated the PR based on the latest state of the work you pushed of the GitLab repository.

You should see the LexerIntegrationTest for which we can now replace the array of tokens of the previous system by the proper expected TreeNode. It's not the only thing that needs to be updated but that's the very first step, the next one being updating the Parser accordingly. But at least now anyone who which to purse it just need to run

bin/phpunit tests/FixtureBuilder/ExpressionLanguage/Lexer/LexerIntegrationTest.php

Without any more faff

@Hywan

This comment has been minimized.

Copy link

commented Dec 18, 2017

Thanks! I appreciate your help :-).

@cussack

This comment has been minimized.

Copy link

commented Feb 18, 2019

@theofidry @Hywan is this still the "hot" PR or was this topic dropped or superseded by something else?

@theofidry

This comment has been minimized.

Copy link
Member Author

commented Feb 18, 2019

@Hywan

This comment has been minimized.

Copy link

commented Feb 19, 2019

I've been absent for many months, but now I'm back, and I'm catching up everything. This project is down on my todo list, but the list is smaller every day ;-).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.