Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer based tags to avoid so many string comparions #44

Open
petermlm opened this issue Apr 4, 2016 · 10 comments
Open

Integer based tags to avoid so many string comparions #44

petermlm opened this issue Apr 4, 2016 · 10 comments

Comments

@petermlm
Copy link
Contributor

petermlm commented Apr 4, 2016

Hello.

I've been exploring MPC because I needed it for a project. MPC is quite cool because It's easy to build a parser and work with it, but one thing has been bothering me.

I need to build a parser for a programming language and after that I need to do several operations over it, such as Type Analysis, its actual evaluation, or generation of some other intermediate code from the resulting AST. The thing is, to do this I need to traverse the resulting AST. This means that at each node I need to do string comparisons with the tag field of the AST.

This makes traversing the tree a slow process.

Is there any use case of MPC that allows me to traverse an AST in a faster way in which I can avoid so many string comparisons? Alternatively, do you have any plans to implement any other identification method for nodes other then string tags?

A way to make things faster would have an integer value associated with every tag. The question is, how to integrate that in MPC in a nice way? How about something like this:

#include <mpc.h>

typedef enum {
    tag_int,
    tag_expr_mul,
    tag_expr_add,
    tag_input
} MyTags;

int main() {
    mpc_parser_t *mpc_int = mpc_new("int", tag_int);
    mpc_parser_t *mpc_expr_mul = mpc_new("expr_mul", tag_expr_mult);
    mpc_parser_t *mpc_expr_add = mpc_new("expr_add", tag_expr_add);
    mpc_parser_t *mpc_input = mpc_new("input", tag_input);

    /* ... */

    return 0;
}

This doesn't look as nice as the current implementation, but while traversing the tree we can have comparisons like:

int traverse(mpc_ast_t *ast) {
    if(ast->tag_i == tag_expr_add) {
        return traverse(ast->children[L_OPR]) + traverse(ast->children[R_OPR]);
    }

    /* ... */
}

What are your thoughts on this?

Regards!

(Notice that the example I've provided is just a quick and dirty example I made using MPC to build a simple integer calculator)

@petermlm
Copy link
Contributor Author

petermlm commented Apr 4, 2016

I've only now notice that #31 mentions something like this. What are the status of any implementation of such a feature, if any?

@orangeduck
Copy link
Owner

Hi petermlm,

At the moment there is still no easy solution to this because different applications need to tag nodes in different ways and strings are the only good generic interface to this sort of thing.

Saying that there are a couple of simple things you can do.

You can declare your own ast type which uses integer tags and do a single pass to convert the mpc ast into your own ast which can be processed faster, or you can hack the mpc ast type and insert your own integer field - and again do a single pass which adds this integer tag to all the nodes by looking at the string tags.

An alternative option is to use the parser combinator approach instead of the grammar approach to specify your language. Here you can specify your own ast type and have complete control over how things get tagged.

Ideally mpc would let you use your own ast type along with the grammar specification by just having the user specify various operations such as how to create, tag, delete, join the custom ast type. This is what I proposed in #31 but never got around to doing it yet.

Your proposal sounds interesting though - I will have a think about it as it may be a practical way to solve this issue.

  • Dan

@petermlm
Copy link
Contributor Author

petermlm commented Apr 4, 2016

The solutions you gave seem very good, thanks!

As to the more ideal solution, since I'll be working with MPC over the next few days I may end up thinking about something. If I do I'll propose it in here and we'll discuss it further!

@orangeduck
Copy link
Owner

orangeduck commented Jun 11, 2016

Hi,

I've pushed up an experimental branch with integer flags (which can be used as tags): https://github.com/orangeduck/mpc/tree/flags

The flags are very easy to use - you just specify them when you create the parser with mpc_flag and then when the language gets parsed they get added into the flags field of mpc_ast_t. You should then be able to iterate over the tree as usual but instead just look at the flags field instead of the tag field. I also added a new option for mpca_lang to disable the string tags completely (for efficiency).

Here you can see I added some tests which basically shows how to set up the grammar to use the flags instead of the tags - I think it is pertty straight forward - ignore the complicated code used for testing the AST output:

https://github.com/orangeduck/mpc/blob/flags/tests/grammar.c#L270

Perhaps you can try it in your project and see if you encounter any issues? Tell me if you are unsure how to use them or if you have any questions.

  • Dan

@petermlm
Copy link
Contributor Author

@orangeduck , just wanted to let you know that I've been extremely busy and haven't had time to look into this. But I'll be reviewing this over the next couple of days and will, hopefully, provide feedback.

@petermlm
Copy link
Contributor Author

@orangeduck , this seems very cool! Exactly what I had in mind. Might be cool to do a simple benchmark with and without the flags. I can work on one. Thinking about the example with the maths expressions, we could generate a random expression with 1000 or more values like:

432 + 754 - 3432 + 7654 / 423 * 6542 / 4312 + 432 ...

And run the parser several times to see how long it takes to do it with and without flags. I can work on that over the next following days.

@orangeduck
Copy link
Owner

Yeah that would be good. Perhaps something in the tests directory that can be run with make bench Actually I was intending to add some proper benchmarking code for #35 as I've just been doing some ad-hoc comparisons.

@petermlm
Copy link
Contributor Author

A benchmark directory separated from tests would be nice. I fell tests and benchmarks are different. Tests just specify if something is correct or not, while benchmarks are use to give insight into some part of some software.

I'll work on the benchmark like I spoke about. I'll try to have this before the weekend. I'll present it outside of the repository. We can come up with more benchmarks and eventually place them in the repository!

@orangeduck
Copy link
Owner

Yeah that works too. Sounds good - no hurry. Look forward to seeing if the integer tags improve performance.

@vkazanov
Copy link

@orangeduck I know it's been a while but... How did those experiments go?

I am looking at including mpc into my project, and I really like the declarative grammar approach - it just reads much better. But having all those string tags is both slow and inconvenient at times.

Integer tags seem nice but there's an obvious limit to how many tags one can have depending on the value that contains the flags.

Being able to specify custom ast nodes would be a killer feature for me but that as a more fundamental change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants