Skip to content
/ mean Public

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. Zero dependencies.

License

Notifications You must be signed in to change notification settings

kpiorno/mean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mean

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. No dependencies.

Motivation

"Mean" is the result of a personal journey to parse and make custom changes to the Python 3 grammar. The library uses an EBNF-like syntax, automaticaly generates the node tree and has a funny way to travel it. Support a Python-like bytecode generator and a mechanism for traveling it for Virtual Machines, Generators, etc... More details at "examples" folder.

Quickstart

Please check the full example simple.cpp at "examples" folder.

Create a grammar file: expr_grammar.mng

expr: term ( ('+' | '-') term )*
term: factor ( ('*' | '/') factor )*
factor: base [ '^' term ]
base: '(' expr ')' | ['-'] NUMBER

Create a source code file: expr_grammar.mn

2 ^ 6 -- 1

So we need a MN_Lexer which will read the source code file. We could use built-in lexer "MN_Simple" which generates tokens: NAME, NUMBER, HEX, +, -, ^:

#include <mn_root.h>
#include <lexers/mn_simple.h>

...

int main(int /*argc*/, char **/*argv[]*/)
{
    //Class with shortcuts methods
    auto root = new mn::MNRoot();
    //Class to store the encountered errors
    mn::MNErrorMsg* err = new mn::MNErrorMsg();
    
    //Built-in Lexer
    auto parse_lexer = new mn::lexer::MN_Simple("path/to/expr_grammar.mn", err);  
    //Output the tree node
    root->test_parser("expr_grammar.mng", "expr", parse_lexer)   
}    

The above code will output the next tree:

|expr -- 100
  |factor -- 100
    |base -- 100
      |(?) -- 17
      |2 -- 26
    |^ -- 2
    |base -- 100
      |(?) -- 17
      |6 -- 26
  |[.] -- 30
    |- -- 2
    |base -- 100
      |- -- 2
      |1 -- 26

The nodes with the number 100 are non-terminals.

The nodes [.] - 30 are lists.

The nodes [?] - 17 are optionals nodes.

The rest of the nodes are terminals.

So lets do the right stuff to travel it. It's possible to use a custom way to do that, but luckily we have the template class MN_Traveler which help us.

Now lets declare a class which inherit from MN_Traveler:

#include <mn_root.h>
#include <lexers/mn_simple.h>

//Class to travel the tree
class Expr: public mn::MNTraveler<Expr, float>
{
public:
    Expr(){
        //Bind the "expr" method with the "expr" grammar's non-terminal
        bind("expr", &Expr::expr, this);    
    }
    float expr(mn::MNNode* node)
    {
        return 1.1; 
    }
    //Entry point to start the travel   
    float walk(mn::MNNode* node)
    {
        return call(node);
    }    
};    

int main(int /*argc*/, char **/*argv[]*/) 
{
    //Class with shortcuts methods
    auto root = new mn::MNRoot();
    //Class to store the encountered errors
    mn::MNErrorMsg* err = new mn::MNErrorMsg();
    auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

    auto traveler = new Expr();
    //The run_parser method generate the tree and use the traveler class to travel it.
    std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",
                                                   parse_lexer, traveler) << std::endl;
    delete traveler;
    delete err;
    delete root;
}

The second parameter of the template class define "float" as the type of all methods bounds to the grammar's non-terminals. Thus we can define specific custom types.

bind("expr", &Expr::expr, this); bind the "expr" method with the "expr" grammar's non-terminal.

The float expr(mn::MNNode* node) ... method will handle the output generated by the non-terminal "expr".

The float walk(mn::MNNode* node) ... is the first method which is invoked. The method call(node) in return call(node) will invoke the right method bound to the non-terminal ("expr" non-terminal according to the tree generated by the previous example). Use "call" method when you need to travel a non-terminal.

So, executing the last code we got the next output:

Result is: 1.1

Which is the value that returns "expr" method.

So, adding the proper funcionalities we got a simple expression evaluator with the next code:

#include <iostream>
#include <math.h>
#include <mn_root.h>
#include <lexers/mn_simple.h>


class Expr: public mn::MNTraveler<Expr, float>
{
public:
    Expr(){
        bind("expr", &Expr::expr, this);
        bind("term", &Expr::term, this);
        bind("factor", &Expr::factor, this);
        bind("base", &Expr::base, this);
    }
    float expr(mn::MNNode* node)
    {
        //expr: term ( ('+' | '-') term )*
        return arith_op(node);
    }

    float term(mn::MNNode* node)
    {
        //term: factor ( ('*' | '/') factor )*    
        return arith_op(node);
    }

    float factor(mn::MNNode* node)
    {
        //factor: base [ '^' term ]
        auto base = node->at(0);
        auto exp = node->at(2);
        float base_value = call(&base);
        float exp_value = call(&exp);
        return pow(base_value, exp_value);
    }

    float base(mn::MNNode* node)
    {
        //base: '(' expr ')' |  ['-'] NUMBER        
        auto n = node->at(0);
        if (n.get_meta() == 2 && n.get_lexeme() == "(")
        {
            auto expr = node->at(1);
            return call(&expr);
        }
        else
        {
            auto number = node->at(1);
            float res = atof(number.get_lexeme().c_str());
            if (n.is_found())
                res *= -1;
            return res;
        }
    }

    float arith_op(mn::MNNode* node)
    {
        auto term = node->at(0);
        float res = call(&term);
        auto term_list = node->at(1);
        for (unsigned int i=0; i < term_list.get_count(); ++i)
        {
            if ((i+1) % 2 == 0)
            {
                auto sign_node = term_list.at(i-1);
                char op = sign_node.get_lexeme()[0];
                auto res_node = term_list.at(i);
                switch (op) {
                case '+':
                    res += call(&res_node);
                    break;
                case '-':
                    res -= call(&res_node);
                    break;
                case '*':
                    res *= call(&res_node);
                    break;
                case '/':
                    res /= call(&res_node);
                    break;
                default:
                    res -= call(&res_node);
                    break;
                }
            }
        }
        return res;
    }

    float walk(mn::MNNode* node)
    {
        return call(node);
    }
};
int main(int /*argc*/, char **/*argv[]*/)
{
    //Class with shortcuts methods
    auto root = new mn::MNRoot();
    //Class to store the encountered errors
    mn::MNErrorMsg* err = new mn::MNErrorMsg();
    auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

    auto traveler = new Expr();
    //The run_parser method generate the tree and use the traveler class to travel it.
    std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",
                                                   parse_lexer, traveler) << std::endl;
    delete traveler;
    delete err;
    delete root;
}
};

Designs

In order to reduce the amount of nodes from generated trees, for the rules like this A: B [C]:

If [C] not encountered then A is replaced by B. According to the last example if you change the context of file expr_grammar.mn to 1 and execute the example, you will got the next output:

|base -- 100
  |(?) -- 17
  |1 -- 26

And the tree will be right traveled because the method "walk" invoke the bound method "base" via return call(node);.

Extending Lexers

Extending lexers adding new tokens is straighfordward, e.g:

enum MN_PY_TOKENS
{
    //Please use tokens index from MN_CUSTOM_TOKEN_INDEX, values below are reserved by the library.
    HEX = MN_CUSTOM_TOKEN_INDEX,
    STAR
};

namespace mn
{
    namespace lexer
    {
        MN_Simple::MN_Simple(const std::string& file_name, MNErrorMsg* error_msg)
            :MNLexer(file_name, error_msg)
        {
            //Register the token HEX. Could be referenced at grammar definition with the alias "HEX"
            register_custom_token("HEX", HEX);
            //Register the token STAR. Could be referenced at grammar definition with the alias "STAR"
            register_custom_token("STAR", STAR);
        }

        MNToken* MN_Simple::next_token()
        {
            ...
            //Somewhere at your "next_token" func
            if (current_char == '')
                //return the STAR token 
                return ret_token(new MNToken(STAR, "", row, col));
            ...    
        }    
    }
}    

So, you could use the new tokens at your grammar like this:

SKY: (STAR)*

Limitations

Despite that it's possible to declare terminals tokens like identifiers or numbers via the EBNF, the "mean" library is designed to generate tokens using MN_Lexer classes. So regular expression to declare tokens is not supported at now. The goal is to use another tokens sources beyond those generated via regular expressions.

The * + operators could be used only inmediate next to ). So expression like NAME* is not supported. Use (NAME)* instead.

if you have a rule like: A: B | D | [C] please consider change it to the form: A: [B | D | [C]] to avoid unexpected behaviours.

About

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. Zero dependencies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published