Generic input for easier long-term support of languages #1734
Labels
enhancement
Issue/PR that involves features, improvements and other changes
language
PR / Issue deals (partly) with new and/or existing languages for JPlag
major
Major issue/feature/contribution/change
I just stumpled across jPlag.
It is a bit pity, that some languages are in legacy state. I just want to suggest, that a common input format tokens streams (or AST) might be useful for support various languages. For example, to write a parser for Python is hard, but also Python delivers a reusable parser that works and creates ASTs, easy to get the token stream from it and to store this into a JSON, sexpr, etc. and to load this list of tokens into jPlag.
If you would have generic input model in which you can declare your tokens (or AST), you can use the existing parser and write a small adapter for translation.
This might get interesting if you look at tree-sitter. This is a parser framework with several hundreds languages and widely used for syntax-highlighting, etc. Tree-sitter provides an uniform AST representation (s-expr). To support this, or similar format can boost the reach tremendously.
Greetings from down the floor, Alexander
The text was updated successfully, but these errors were encountered: