-
Notifications
You must be signed in to change notification settings - Fork 290
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignoring contents of lines that aren't recognized #53
Comments
Perhaps you can just or it with a catch all and then check after the parse which one got captured?
|
That works I think, thanks. The combinators from grammar are a fantastic way to build an AST tree, I like it a lot. Two things that I haven't figured out:
|
(perhaps I can define special rules myself and reference them in the grammar?) |
Hi @cmaughan, Actually that is a great idea I didn't think of - to reference rules in the grammar specified via the normal parser combinator approach - probably that is the easiest way to do something like discarding contents. For whitespace - you may be able to specify whitespace in a regex. Your previous example could be given as |
Hi :) I'd wondered about using a regex like that; it seems to work, but doesn't capture 2 comments in my unit test: "\ comment 1\n\ comment 2". Maybe I'm missing something (or the grammar generation is still being a bit aggressive about discarding newlines behind my back). It also has the side effect of capturing the newline into the AST node. But I'll try making custom rules to catch these special cases - a useful thing to be able to do. |
Oh, the 2 line thing was the final rule not checking for many ;) |
I tried defining a parser manually and adding it into the grammar, but I get a crash here; I think this is probably because my mpc_new/mpc_define parser doesn't have the AST info some how. The value of 'a' is an invalid pointer. |
Hmm, perhaps you can post your code so I can see exactly what you tried? |
Does this help?
|
Sorry for the late reply. What is up with the Did you mean Just wondering what your particular use case is as this code and grammar look a little strange. |
yes, I'm using C++ 11; the R is a string literal (so you don't need to escape stuff, or make multiple lines of ""). auto is just a way for the compiler to deduce the type of 'line' |
True - let me investigate this in a bit more detail at the weekend. |
I might get there before you, since I'm getting to the point where I need it to work; will let you know if I have time to figure it out! |
Hi, Looks like you were right - the error is because The fix is to make mpc_define(line, mpca_tag(mpc_apply(mpc_sym("#line"), mpcf_str_ast), "string")); Here It isn't ideal, but in this case I think it was fine for I've pushed an update to the repo with a new test in Hope this helps,
|
Thanks for investigating this, I think it makes sense! Is the 'string' tag the same thing that I'd see if I had a grammar statement like this: "string : /"[a-z]/" ". i.e. it's just the assigned tree tag?
Suppose I don't care about 'int' and '=' but it's part of the language spec. The "%" is like saying 'Require this, but don't put it in the AST tree. The only other comment I have is that the error reporting is a bit vague and hard to follow. I often get something like 'expected ', or ', or '.....''. Which can be tricky! Anyway, thanks for figuring it out! |
Hi @cmaughan, The Using the combinators, discarding some part of the input is typically done in the fold function. For example this parser parses the expression you mentioned ( static mpc_val_t *custom_fold(int n, mpc_val_t ** xs) {
mpc_ast_t *r = mpc_ast_new("parser|>", "");
mpc_ast_add_child(r, mpc_ast_new("ident", xs[1]));
mpc_ast_add_child(r, mpc_ast_new("num", xs[3]));
free(xs[0]); free(xs[1]); free(xs[2]); free(xs[3]);
return r;
}
mpc_parser_t *p = mpc_and(4, custom_fold,
mpc_sym("int"),
mpc_tok(mpc_ident()),
mpc_sym("="),
mpc_digits(),
free, free, free); So finally this parser Probably the normal/natural way to do this is to prune the tree afterwards but I can see the advantage of pruning at parse time so let me think about what might be reasonable syntax to do so. Do you know if YACC/Bison supports this at all? In regards to the error messages. This is actually already supported - you just need to write a human readable name as a string inbetween the rule name and the colon
|
Thanks for the tip on error strings - that works well and makes things much clearer. Might be worth updating the samples so people know about it. |
How would you ignore content that isn't recognized by a rule?
Supposing I have something like this:
string : /"(.|[^"])*"/ ;
lang : /^/ /$/
This should recognize quoted strings, but would fail on anything else. So how to define the language such that it collects or discards anything that isn't a string?
The text was updated successfully, but these errors were encountered: