Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing erlang terms #51

Open
progman1 opened this issue Jan 29, 2021 · 8 comments
Open

parsing erlang terms #51

progman1 opened this issue Jan 29, 2021 · 8 comments

Comments

@progman1
Copy link

progman1 commented Jan 29, 2021

I run Erlang.Parse.from_file on
https://github.com/erlang/otp/blob/master/lib/wx/api_gen/wxapi.conf

and get the error

failed: In wxapi.conf.copy, at offset 820: syntax error.
  Erlang__Erl_parser.MenhirBasics.Error

probably because the file defines terms to be read by file:consult/1
and is not appropriate to the front door of your parser.
but with a different entry point it could parse terms?

@leostera
Copy link
Owner

Could you show me the file you're trying to parse?

Or an equivalent file that also breaks like this?

That'd help me see if there's anything that I know is currently unsupported by the Menhir parser or if we need to spend some time digging.

Thanks for opening the issue! 🙌🏼

@progman1
Copy link
Author

the link to it is above but here/s an excerpt:

%% %CopyrightEnd%

{const_skip, [wxGenericFindReplaceDialog, wxInvalidDateTime, wxLANGUAGE_KHMER]}.
{not_const,
 [wxRETAINED,
  %% New enums needed for gl contexts not static numbers
  {'wx_GL_COMPAT_PROFILE',   {test_if, "wxCHECK_VERSION(3,1,0)"}},
]}.

@leostera
Copy link
Owner

Oh, sorry, I missed the link.

The parser I think will have trouble parsing that since its built to parse an entire Erlang module. I started the tree-sitter-erlang project to address some of these limitations, but I haven't yet integrated it into the erlang library.

You could try using that tree-sitter parser with something like ocaml-tree-sitter to get up and running. Else I'd be happy to either help you integrate the tree-sitter-erlang into the erlang library or rework the Menhir parser as we just landed a new AST here that is waiting to be used.

@progman1
Copy link
Author

progman1 commented Jan 30, 2021

I don't fully understand!
Terms are part of the erlang language aren't they?
What's the newest erl-parsetree.ml have on the old?
I saw that the parser as-is had just the one entry point (very reasonably :).
And I imagined that another entry point into the grammar could be added,
one directly to a 'Terms' rule.
Which may not be true if 'Term' syntax is not part of the erlang language itself....

You have the incremental parser menhir defnition - how come you're going
after tree-sitter?

FYI, on staring at the format of the wxapi.conf for a while I got the impression it
may not be a very regular syntax - a sort of lists of lists of lists affair that's ok for
erlangs dynamic typing approach. Which suggested to me that I maybe shouldn't start hacking a yacc grammar for it! It also suggests to me that it isn't part of the erlang language as such since you already have a menhir grammar for erlang. I can't remember the limitations of LALR/LR grammars unfortunately.

What's your understanding?
thanks.

@leostera
Copy link
Owner

@progman1 let me try to answer your questions :)

Terms are part of the erlang language aren't they?

Yes, they are.

And I imagined that another entry point into the grammar could be added,
one directly to a 'Terms' rule.

We could make a new parser that reuses the expression language from the main parser, yes. This is because Menhir allows only one %start entrypoint.

how come you're going after tree-sitter?

The Menhir parser is only directly usable within OCaml code, the Tree-sitter parser can be used anywhere with tree-sitter bindings. This is Rust libraries, neovim, github Semantic. The Erlang community benefits more widely from this.


The lowest hanging fruit here would be to refactor erl_parser.mly into 2 parsers: erl_expr_parser.mly and erl_mod_parser.mly. Caramel continues then to rely on the Erlang.Parser.module_from_file/1 and you get a new Erlang.Parser.terms_from_file/1 that you can use to lift your config file into an Erlang.Ast.literal list.

The strong path forward is to do some work and integrate tree-sitter-erlang back into this repository, to use that as the term parser first. If that works, it'll be easier to start migrating the main parser to it.

@progman1
Copy link
Author

thanks for clarifying.
I will tackle the low-hanging fruit! I have done some messing with menhir and something
might be doable about entry points via converting to ocamlyacc grammar first, for an even lower hang!

@progman1
Copy link
Author

progman1 commented Feb 1, 2021

I have a parsed file :)
happily, menhir does actually accept more than one start symbol.
I had to do dangling commas in tuples and lists - maybe that isn't valid expression language after all? (I don't know if 'term' language is any different to expressions)
the file also had multi-line strings which I took to mean should be stuck back together
(macro stringification?) so a change there too.

if these are actually valid erlang then I'm happy to send up the patch?

@leostera
Copy link
Owner

leostera commented Feb 1, 2021

Well I stand corrected! 🙌🏼 I didn't know that, thanks for showing me. Please send a patch 🎉 we can discuss the changes on the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants