Skip to content
This repository has been archived by the owner on Sep 20, 2021. It is now read-only.

Hoa Compiler and (E)BNF #17

Open
flip111 opened this issue Feb 23, 2014 · 7 comments
Open

Hoa Compiler and (E)BNF #17

flip111 opened this issue Feb 23, 2014 · 7 comments
Assignees
Labels

Comments

@flip111
Copy link

flip111 commented Feb 23, 2014

Hi, I've some questions about the compiler. Since I'm considering implementing an existing Domain Specific Language (DSL) in PHP. So I would need a compiler for this. But then I noticed that most already defined languages are defined in Backus–Naur Form (BNF) or Extended Backus–Naur Form (EBNF). From here certain questions come to mind:

  1. Is PP language mainly intended for greating new DSL rather then port existing ones?
  2. Why was PP language created and not a compiler for (E)BNF?
  3. Would it be worth porting EBNF to PP or would I be better of using another compiler?
    If it's possible to convert please give a small example. http://doctrine-orm.readthedocs.org/en/latest/reference/dql-doctrine-query-language.html#query-language

For completeness here is a list of resources I found on PHP, Compilers and (E)BNF

EBNF & some other stuff
http://karmin.ch/ebnf/index
http://sourceforge.net/projects/lime-php/
https://github.com/hafriedlander/php-peg
http://php.comsci.us/syntax/statement/ebnf.php
http://php.comsci.us/syntax/statement/bnf.php
http://marc.info/?l=php-internals&m=129387252319019
https://github.com/ferno/loco/blob/master/ebnf.php

BNF
http://www.garshol.priv.no/download/text/bnf.html
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_parser.y
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_scanner.l
http://www.icosaedro.it/articoli/php-syntax-ebnf.txt
http://www.icosaedro.it/articoli/php-syntax-yacc.txt
http://www.phpclasses.org/package/7142-PHP-Parse-language-source-with-a-BNF-grammar-syntax.html
https://github.com/ferno/loco/blob/master/bnf.php
http://code.google.com/p/pragmatic-parser/source/browse/trunk/parser.class.php?r=2

@Hywan Hywan added the question label Feb 24, 2014
@Hywan Hywan self-assigned this Feb 24, 2014
@Hywan
Copy link
Member

Hywan commented Feb 24, 2014

Hello :-),

Is PP language mainly intended for greating new DSL rather then port existing ones?

PP aims at describing all LL(*) grammars. The language is mainly inspired from JavaCC or YACC, but it has additional unique constructions, such as the unification (please, see the README.md or the documentation). Consequently, PP is more expressive than the (E)BNF formalism.

Why was PP language created and not a compiler for (E)BNF?

Please, see the previous answer. We needed new constructions to go futher. Moreover, the (E)BNF formalism has several limitations.

The PP language, along with the Hoa\Compiler\Llk library, is the result of my recent researches for my PhD thesis. These works have been published in A-MOST 2012 (please, see the details, the full research paper or the presentation). One of the goals was to generate data based on a grammar. We have proposed three different algorithms, see the article for more informations.

Would it be worth porting EBNF to PP or would I be better of using another compiler?

Not it's very easy and natural. Taking your example:

QueryLanguage ::= SelectStatement | UpdateStatement | DeleteStatement

becomes

QueryLanguage:
    SelectStatement() | UpdateStatement() | DeleteStatement()

And

SelectStatement ::= SelectClause FromClause [WhereClause] [GroupByClause] [HavingClause] [OrderByClause]
UpdateStatement ::= UpdateClause [WhereClause]
DeleteStatement ::= DeleteClause [WhereClause]

becomes

SelectStatement:
    SelectClause() FromClause() WhereClause()? GroupByClause()? HavingClause()? OrderByClause()?

UpdateStatement:
    UpdateClause() WhereClause()?

DeleteStatement:
    DeleteClause() WhereClause()?

Also:

IdentificationVariable ::= identifier

becomes

IdentificationVariable:
    <identifier>

The documentation is still in french but we are working hard on translating it. The README.md and the full paper should help you.

Is it also possible to build a compiler to transform (E)BNF grammar to PP. It would be really easy. However, the Hoa\Compiler\Llk library provides the construction of an AST at the end of the analyzes if they have succeeded. The AST balance can be controlled by using sharps (again, see the #node construction in the README.md & co.). This information does not appear in a (E)BNF grammar.

Finally, you're welcome on #hoaproject to get help!

@flip111
Copy link
Author

flip111 commented Feb 25, 2014

Thanks for the very extensive reply!

That's great news that PP is more extensive than (E)BNF, as you show porting it won't be a problem then. It's a very interesting suggestion to use the compiler to parse (E)BNF itself and then convert it to PP.

Closing issue: Answer covered original question ...

Off Topic:
By the way i already looked at the full research paper another time, but i was just scratching my head about it o_O The presentation looks a bit more easy to follow.

Generating unit tests is also pretty cool, the phpunit-skeleton-generator is pretty limited with what it does, so it can use some help.

When looking at the presentation i don't understand most of it, but what i think is super valuable is having an @invariant annotation (as i know this from a presentation on Domain Driven Design). So i guess the other ones are also useful :) I'm not sure why you implemented a @throwable annotation as phpdoc already describes a @throws annotation, so might be double?? http://www.phpdoc.org/docs/latest/for-users/phpdoc/tags/throws.html

I've been wondering for a long time how to test all possible inputs for a unit test (without writing all yourself) .. or at least a set that gives you enough probability to cover things. I guess this can now be done with the software in the presentation?

@flip111 flip111 closed this as completed Feb 25, 2014
@Hywan
Copy link
Member

Hywan commented Feb 26, 2014

It's totally off-topic but Praspel is a contract language. It's not just writing an API documentation, it's writing a specification. Read the presentation and maybe take a look at https://github.com/hoaproject/Praspel and https://github.com/hoaproject/Hoathis-Atoum. I'm working hard on it since I reach the end of my PhD thesis and I have to finish this work before June ;-).

@flip111
Copy link
Author

flip111 commented Feb 27, 2014

So Praspel integrated with Atoum by the Hoathis-Atoum library? So it can not be used directly with phpunit?

@Hywan
Copy link
Member

Hywan commented Feb 27, 2014

PHPUnit is a unit testing tool, just like atoum. But atoum is more powerful, faster and more modular. Please, watch https://github.com/atoum/atoum, https://github.com/Hywan/atoum-instrumentation/, https://github.com/jubianchi/atoum, https://github.com/Hywan/atoum and https://github.com/hoaproject/Hoathis-Atoum to get more informations about the activity. We will publish a big article, slides & co. to explain our current work. Stay tuned :-).

@flip111
Copy link
Author

flip111 commented Feb 27, 2014

👍

@flip111
Copy link
Author

flip111 commented Oct 20, 2018

I tried to port an EBNF file to PP, but this is not easy. Manually the process is too error-prone and extremely tedious. Automatic conversion is not easy either. I managed to do it with a bunch of regular expressions and other tricks but encountered many cases were special things needed to be done.

  • remove comments from ebnf
  • replace equal sign to semi-colon for starting rule
  • Replace things like '[' with a placeholder to prevent difficulties with bracket replacement later on.
  • Find groups in brackets, single items should not get parenthesis, several items need parenthesis. Group types: normal, one-or-more, zero-or-more
  • remove command and semi-colon (they have no purpose in PP)
  • replace quoted literals by pp notation ::literal::
  • replace placeholders to pp literal
  • find all literals, make a unique list so that a token alias can be assigned
  • escape characters in literals because pp uses regex to denote them (and also i will use regex to find and replace them)
  • replace literals with token aliases (::literal:: -> ::token1::)
  • further replacement of literals and escape all characters that fall under \h horizontal whitespace, because the pp parser will eat them otherwise. See also Space in token #19
  • make rules a node in grammar (by adding #)
  • add list of tokens to the output file (%token alias regex)
  • add parenthesis to all rule use
  • manually (too much effort to automate this) replace ANY - (literals) with regex [^token_literals]
  • manually (too much effort to automate this) replace quantifier 4 * literal with regex literal{4,4}

Now the output is valid pp syntax however it's not best practice on pp. For example in EBNF to be case insensitive a rule has to be made with a literal for lower case and upper case. To to parse hello the following rules are now present:

#H:
    ::token43:: | ::token73:: 

#E:
    ::token40:: | ::token70:: 

#L:
    ::token47:: | ::token77:: 

#O:
    ::token50:: | ::token80:: 

#myHello:
    H() E() L() L() O()

Now a further optimization step is needed that will replace #myHello with ::hello_token:: where ::hello_token is [hH][eE][lL][lL][oO] for performance and clarity and more idiomatic pp code style.

Also now i have valid pp syntax, but the test input doesn't parse so more has to be done to the grammar.

A full fledged EBNF parser to PP translator with optimizations would be nice to have.

@flip111 flip111 reopened this Oct 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

2 participants