Mean

A LL-N Grammar Parser for C++11. Fun to travel. Python-like bytecode generator and controller. No dependencies.

Motivation

"Mean" is the result of a personal journey to parse and make custom changes to the Python 3 grammar. The library uses an EBNF-like syntax, automaticaly generates the node tree and has a funny way to travel it. Support a Python-like bytecode generator and a mechanism for traveling it for Virtual Machines, Generators, etc... More details at "examples" folder.

Quickstart

Please check the full example simple.cpp at "examples" folder.

Create a grammar file: expr_grammar.mng

expr: term ( ('+' | '-') term )*
term: factor ( ('*' | '/') factor )*
factor: base [ '^' term ]
base: '(' expr ')' | ['-'] NUMBER

Create a source code file: expr_grammar.mn

2 ^ 6 -- 1

So we need a MN_Lexer which will read the source code file. We could use built-in lexer "MN_Simple" which generates tokens: NAME, NUMBER, HEX, +, -, ^:

#include <mn_root.h>
#include <lexers/mn_simple.h>

...

int main(int /*argc*/, char **/*argv[]*/)
{
    //Class with shortcuts methods
    auto root = new mn::MNRoot();
    //Class to store the encountered errors
    mn::MNErrorMsg* err = new mn::MNErrorMsg();
    
    //Built-in Lexer
    auto parse_lexer = new mn::lexer::MN_Simple("path/to/expr_grammar.mn", err);  
    //Output the tree node
    root->test_parser("expr_grammar.mng", "expr", parse_lexer)   
}

The above code will output the next tree:

|expr -- 100
  |factor -- 100
    |base -- 100
      |(?) -- 17
      |2 -- 26
    |^ -- 2
    |base -- 100
      |(?) -- 17
      |6 -- 26
  |[.] -- 30
    |- -- 2
    |base -- 100
      |- -- 2
      |1 -- 26

The nodes with the number 100 are non-terminals.

The nodes [.] - 30 are lists.

The nodes [?] - 17 are optionals nodes.

The rest of the nodes are terminals.

So lets do the right stuff to travel it. It's possible to use a custom way to do that, but luckily we have the template class MN_Traveler which help us.

Now lets declare a class which inherit from MN_Traveler:

#include <mn_root.h>
#include <lexers/mn_simple.h>

//Class to travel the tree
class Expr: public mn::MNTraveler<Expr, float>
{
public:
    Expr(){
        //Bind the "expr" method with the "expr" grammar's non-terminal
        bind("expr", &Expr::expr, this);    
    }
    float expr(mn::MNNode* node)
    {
        return 1.1; 
    }
    //Entry point to start the travel   
    float walk(mn::MNNode* node)
    {
        return call(node);
    }    
};    

int main(int /*argc*/, char **/*argv[]*/) 
{
    //Class with shortcuts methods
    auto root = new mn::MNRoot();
    //Class to store the encountered errors
    mn::MNErrorMsg* err = new mn::MNErrorMsg();
    auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

    auto traveler = new Expr();
    //The run_parser method generate the tree and use the traveler class to travel it.
    std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",
                                                   parse_lexer, traveler) << std::endl;
    delete traveler;
    delete err;
    delete root;
}

The second parameter of the template class define "float" as the type of all methods bounds to the grammar's non-terminals. Thus we can define specific custom types.

bind("expr", &Expr::expr, this); bind the "expr" method with the "expr" grammar's non-terminal.

The float expr(mn::MNNode* node) ... method will handle the output generated by the non-terminal "expr".

The float walk(mn::MNNode* node) ... is the first method which is invoked. The method call(node) in return call(node) will invoke the right method bound to the non-terminal ("expr" non-terminal according to the tree generated by the previous example). Use "call" method when you need to travel a non-terminal.

So, executing the last code we got the next output:

Result is: 1.1

Which is the value that returns "expr" method.

So, adding the proper funcionalities we got a simple expression evaluator with the next code:

#include <iostream>
#include <math.h>
#include <mn_root.h>
#include <lexers/mn_simple.h>


class Expr: public mn::MNTraveler<Expr, float>
{
public:
    Expr(){
        bind("expr", &Expr::expr, this);
        bind("term", &Expr::term, this);
        bind("factor", &Expr::factor, this);
        bind("base", &Expr::base, this);
    }
    float expr(mn::MNNode* node)
    {
        //expr: term ( ('+' | '-') term )*
        return arith_op(node);
    }

    float term(mn::MNNode* node)
    {
        //term: factor ( ('*' | '/') factor )*    
        return arith_op(node);
    }

    float factor(mn::MNNode* node)
    {
        //factor: base [ '^' term ]
        auto base = node->at(0);
        auto exp = node->at(2);
        float base_value = call(&base);
        float exp_value = call(&exp);
        return pow(base_value, exp_value);
    }

    float base(mn::MNNode* node)
    {
        //base: '(' expr ')' |  ['-'] NUMBER        
        auto n = node->at(0);
        if (n.get_meta() == 2 && n.get_lexeme() == "(")
        {
            auto expr = node->at(1);
            return call(&expr);
        }
        else
        {
            auto number = node->at(1);
            float res = atof(number.get_lexeme().c_str());
            if (n.is_found())
                res *= -1;
            return res;
        }
    }

    float arith_op(mn::MNNode* node)
    {
        auto term = node->at(0);
        float res = call(&term);
        auto term_list = node->at(1);
        for (unsigned int i=0; i < term_list.get_count(); ++i)
        {
            if ((i+1) % 2 == 0)
            {
                auto sign_node = term_list.at(i-1);
                char op = sign_node.get_lexeme()[0];
                auto res_node = term_list.at(i);
                switch (op) {
                case '+':
                    res += call(&res_node);
                    break;
                case '-':
                    res -= call(&res_node);
                    break;
                case '*':
                    res *= call(&res_node);
                    break;
                case '/':
                    res /= call(&res_node);
                    break;
                default:
                    res -= call(&res_node);
                    break;
                }
            }
        }
        return res;
    }

    float walk(mn::MNNode* node)
    {
        return call(node);
    }
};
int main(int /*argc*/, char **/*argv[]*/)
{
    //Class with shortcuts methods
    auto root = new mn::MNRoot();
    //Class to store the encountered errors
    mn::MNErrorMsg* err = new mn::MNErrorMsg();
    auto parse_lexer = new mn::lexer::MN_Simple("expr_grammar.mn", err);

    auto traveler = new Expr();
    //The run_parser method generate the tree and use the traveler class to travel it.
    std::cout << "Result is: " << root->run_parser("expr_grammar.mng", "expr",
                                                   parse_lexer, traveler) << std::endl;
    delete traveler;
    delete err;
    delete root;
}
};

Designs

In order to reduce the amount of nodes from generated trees, for the rules like this A: B [C]:

If [C] not encountered then A is replaced by B. According to the last example if you change the context of file expr_grammar.mn to 1 and execute the example, you will got the next output:

|base -- 100
  |(?) -- 17
  |1 -- 26

And the tree will be right traveled because the method "walk" invoke the bound method "base" via return call(node);.

Extending Lexers

Extending lexers adding new tokens is straighfordward, e.g:

enum MN_PY_TOKENS
{
    //Please use tokens index from MN_CUSTOM_TOKEN_INDEX, values below are reserved by the library.
    HEX = MN_CUSTOM_TOKEN_INDEX,
    STAR
};

namespace mn
{
    namespace lexer
    {
        MN_Simple::MN_Simple(const std::string& file_name, MNErrorMsg* error_msg)
            :MNLexer(file_name, error_msg)
        {
            //Register the token HEX. Could be referenced at grammar definition with the alias "HEX"
            register_custom_token("HEX", HEX);
            //Register the token STAR. Could be referenced at grammar definition with the alias "STAR"
            register_custom_token("STAR", STAR);
        }

        MNToken* MN_Simple::next_token()
        {
            ...
            //Somewhere at your "next_token" func
            if (current_char == '★')
                //return the STAR token 
                return ret_token(new MNToken(STAR, "★", row, col));
            ...    
        }    
    }
}

So, you could use the new tokens at your grammar like this:

SKY: (STAR)*

Limitations

Despite that it's possible to declare terminals tokens like identifiers or numbers via the EBNF, the "mean" library is designed to generate tokens using MN_Lexer classes. So regular expression to declare tokens is not supported at now. The goal is to use another tokens sources beyond those generated via regular expressions.

The * + operators could be used only inmediate next to ). So expression like NAME* is not supported. Use (NAME)* instead.

if you have a rule like: A: B | D | [C] please consider change it to the form: A: [B | D | [C]] to avoid unexpected behaviours.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
examples		examples
include		include
src		src
CMakeLists.txt		CMakeLists.txt
LICENSE.txt		LICENSE.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mean

Motivation

Quickstart

Designs

Extending Lexers

Limitations

About

Releases

Packages

Languages

License

kpiorno/mean

Folders and files

Latest commit

History

Repository files navigation

Mean

Motivation

Quickstart

Designs

Extending Lexers

Limitations

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages