-
Notifications
You must be signed in to change notification settings - Fork 146
Description
Hi,
I am trying to use TransCoder to translate between JavaScript and Python, so I am trying to build a JavaScript processor for the processing.py pipeline (as mentioned in #42 (comment) ) .
To bulid the processor I need a tokenizer (mentioned in #48 (comment) ), I want to use ANTLR4 which is a parer that contains a lexer.
However, I don't understand how to make use of the TreeSitterLangProcessor class. I tried to reference the java processor and cpp processor and found that I need to provide three init params (specific to one language) by inheriting the TreeSitterLangProcessor:
- JAVA_TOKEN2CHAR
- JAVA_CHAR2TOKEN (built by JAVA_TOKEN2CHAR)
- ast_nodes_type_string
question 1 - what those init params represent?
I don't understand how to find out what value should be stored in the TOKEN2CHAR and ast_nodes_type_string for a new language(JavaScript). For example:
- why "STOKEN00" refers to "//" in JAVA_TOKEN2CHAR, where does the mapping come from?
- why ast_nodes_type_string in java processor has 'character_literal' while the java processor seems to call the same thing as 'char_literal'? how can I find out what to save in the ast_nodes_type_string for a JavaScript processor?
question 2 - when should I use TreeSitterLangProcessor and when not?
Why python processor didn't make use of the TreeSitterLangProcessor class? In which case it is better to use TreeSitterLangProcessor class and in which case better not?
question 3 - why I need a tokenizer given TreeSitterLangProcessor?
As mentioned in question 1, seems if I inherite my JavaScript processor from TreeSitterLangProcessor, those three init params are the only things I need to provide myself and the rest (tokenize and detokenize) is handled by the TreeSitterLangProcessor.
Then why would I need a JavaScript tokenizer (mentioned in #48 (comment) ) such as ANTLR4?
Hope I described my questions clearly and sorry that I am still confused on this after two issues regarding adding a new language.
Thanks for the awesome paper with well-structured repository, and thanks for anyone's help in advance!